A Standalone Machine Learning Platform
ANVIL is a standalone software solution, uniquely crafted to bring the power of machine learning into the hands of audio enthusiasts and creators. It operates as a pre-packaged Python environment, boasting a user-friendly interface that simplifies the complex processes of audio data management and model training. With ANVIL, users can effortlessly create audio datasets and develop models capable of generating audio. These models are also compatible with MACE, further enhancing their utility.
At the heart of ANVIL’s design philosophy is its portability and adaptability. As a portable machine learning environment, ANVIL is poised for rapid evolution, continuously expanding its features to meet the growing demands of the audio creation world in the age of AI.
As of this stage, ANVIL is in a very early state with a focus on the basics of implementing a usable machine learning environment. UI, terminology, and processes are subject to change over testing and iterating on the design.
A Commitment to Local Processing: Empowering Users with Direct Model Training
In a deliberate move away from cloud-based services, ANVIL is engineered to function locally on users’ machines. The vision of ANVIL is to empower users by enabling them to train their own models directly. This approach not only avoids the need for subscription fees associated with cloud computing but also aligns with our belief in harnessing the growing capabilities of average CPUs and GPUs. As hardware technology advances, we are confident that local machines will increasingly be able to handle the computational demands of sophisticated audio model training, making ANVIL an even more powerful tool in the hands of our users.
Requirements
Navigating the Technical Landscape: Understanding ANVIL’s Requirements
As we strive to make ANVIL a versatile and powerful tool in the realm of audio machine learning, it’s important for users to understand the current technical requirements and our ongoing efforts towards broader compatibility.
Windows and NVIDIA Graphics Card: The Current Optimal Setup
At present, ANVIL is tailored for Windows operating systems. The software leverages the Pytorch machine learning environment, which requires CUDA support. Consequently, an NVIDIA graphics card is recommended for optimal performance. This setup ensures that users can fully utilize ANVIL’s capabilities for efficient and effective audio model training.
CPU Mode: Ensuring Universal Compatibility
Recognizing the diverse hardware configurations of our users, ANVIL also includes a CPU mode. While this mode ensures universal compatibility, it’s important to note that CPU processing is significantly slower compared to using a graphics card. We are committed to adapting ANVIL to various chip architectures and keeping pace with the evolving landscape of machine learning applications.
Mac Version: Harnessing the Power of Silicon Chips
A Mac version of ANVIL is currently in development. The aim is to harness the capabilities of Apple’s silicon chips (M1, M2, and M3), which show promise in machine learning tasks when coupled with Metal Performance Shaders (MPS). This development marks a step towards expanding ANVIL’s reach and versatility.
AMD Graphics Cards on Windows: A Future Possibility
For users with AMD graphics cards on Windows, we acknowledge the current limitation due to Pytorch’s lack of support in this setup. However, we are closely monitoring the progress of ROCm support for Windows. We will attempt to integrate this feature into ANVIL as soon as it becomes feasible, aiming to open up more possibilities for our users.
Linux Users: A Potential Future Release
Lastly, for Linux enthusiasts, it’s worth noting that Linux currently supports ROCm/Pytorch. This compatibility opens up the potential for a Linux version of ANVIL, which we may pursue as an additional offering in the future.
Installation
Effortless Installation: Setting Up ANVIL for Your System
Setting up ANVIL is a breeze. Simply extract it to a single directory, and you’re ready to go. ANVIL’s self-contained nature means everything required to run the software is within the main folder. This portability even allows you to carry ANVIL on a thumb drive, enabling model training across different devices. Datasets and Models are stored within this directory by default, but if you wish you may set the path’s to these as anywhere on your system or alternate drives within the settings of ANVIL.
Use "ANVIL - Launcher.exe" within the folder to launch ANVIL.
Matching ANVIL with Your NVIDIA Graphics Card
To harness the full potential of your NVIDIA graphics card, ANVIL requires the appropriate CUDA Toolkit/Runtime drivers. The good news is that most newer drivers include this feature. If ANVIL does not detect the correct CUDA version on your machine, you may need to download the corresponding CUDA Toolkit.
Understanding that different graphics card models align with specific CUDA versions, we offer three variants of ANVIL, each optimized for a range of CUDA releases:
NVIDIA GPU Series | Compatible ANVIL Version | Notes |
Older Series (e.g., GTX 10xx) | ANVIL – CUDA 10.2 | For legacy support |
Mid-Range Series (e.g., RTX 20xx) | ANVIL – CUDA 11.7 | Balanced performance |
Latest Series (e.g., RTX 30xx) | ANVIL – CUDA 12.1 | Optimal for new GPUs |
*For now, we highly recommend doing research on your GPU model to find the appropriate CUDA version. ANVIL may come packaged with an auto-detect in future installers.
If you do not have CUDA installed on your system, here is a link to their official archive:
CUDA Toolkit Archive – https://developer.nvidia.com/cuda-toolkit-archive
CPU-Only Version: Universality in Mind
For those without a compatible graphics card, we offer a light-weight CPU-only version of ANVIL. While this version operates at a slower pace, it ensures that everyone has access to the power of audio model training, regardless of their hardware setup.
Training models with ANVIL
ANVIL revolutionizes the process of audio model training by making it intuitive and accessible, regardless of your technical background. From creating and managing audio datasets to fine-tuning hyperparameters and initiating model training, ANVIL guides you through each step with ease. This section provides a comprehensive walkthrough, helping you import audio files to create a dataset, transform that dataset into a trainable model, begin the training process, and monitor its progress.
It’s important to emphasize the flexibility ANVIL offers. At any point during the training process, you have the ability to pause and export the model.
1) Create a Dataset
- Click on the “Datasets” tab
- Click the “+” icon below the empty “New Dataset” container
2) Add samples and metadata
- Click “Input Directory” to choose an input directory of .wav files.
- Leave “Output Directory” as default unless desired to set location elsewhere
- Adjust Train/Valid ratio (80% is recommended default)
- Converting to 16-bit ensures faster training.
- Click “Save Dataset”
3) Create a new model from the dataset
4) Set the Hyperparameters
- Name the model
- Leave parameters as default for a well-balanced start
- Advanced users can adjust the Hyperparameters (guide provided at bottom of this page)
- “Output Sample Length” determines the length of audio generated by the model in number of samples
- “Number of Iterations” determines the targeted amount of iterations to train for the model
- Click “Save For Later” or “Run Training” to begin training
5) Process of Training a Model
- Name the model
- Leave parameters as default for a well-balanced start
- Advanced users can adjust the Hyperparameters (guide provided at bottom of this page)
- “Output Sample Length” determines the length of audio generated by the model in number of samples. Currently MACE 1.1.6 only supports 16384 (1 second) but future versions will support longer audio generation
- “Sampling Rate” determines the sampling rate of the training audio. Sometimes lower results in more interesting sounds or is more tailored for the acoustic properties of the input data.
- “Number of Iterations” determines the targeted amount of iterations to train for the model
- Click “Save For Later” or “Run Training” to begin training
Generating Audio with Exported Models
Exporting a model in ANVIL is akin to capturing a moment in the model’s learning journey, effectively ‘freezing’ it at its current stage of progress. This process creates a ‘snapshot’ of the model at the specific iteration you choose to pause and export.
Step 1: Export to Freeze Your Progress
- What Happens During Export?: When you export a model, ANVIL saves this snapshot in a dedicated folder, named after the iteration number. The folder contains two essential files – a
.pt
file (the model itself) and a.json
file (metadata about the model). - Why Export?: Exporting is crucial as it not only marks the progress of your model but also unlocks its potential to generate audio. These exported files are what MACE uses to produce audio based on the trained model.
- Exporting models also allow you to view a Loss Visualization graph (added in the actions menu)
Step 2: Generate Audio Using Exported Models
- Generate Audio option: Once you’ve exported a model at least once, ANVIL’s action menu is updated to include the “Generate Audio” option. This feature utilizes the exported model for audio generation. Clicking it will take you to the Audio Generation dialog.
Step 3: Generating Audio Parameters
- “Number of Samples”: The amount of samples you wish to generation. Go ahead and try thousands at once! Audio generation should be very quick on most machines.
- “Iteration”: This will show the amount of times you have exported the model throughout the training process. This will allow you to go “back in training time” to an earlier specific snapshot if desired.
- “Output Directory”: This is where ANVIL will generate the .wav files from the model. Upon clicking “Generate Audio”, a short loading dialog will appear followed by explorer launching the location of the .wav files.
Experimentation
Exporting and generating audio with different iterations of your model allows you to explore the nuances of machine learning in audio generation. Each exported model can yield unique audio outputs, offering a hands-on experience of how different training stages influence the final result. This process not only provides tangible outputs but also deepens your understanding of the intricate dance of training machine learning models and creative sound design.
Using Exported Models within MACE
If you wish to use the exported model within MACE, simply move the .pt
and .json
metadata file from an exported iteration into a folder name of your choosing, and place in the location of where the “Early Access Library” folder has been installed on your machine.
The structure of your folder should appear as:
/Model Name/
-
ModelName.pt
ModelName_params.json
icon.png (optional)
-
The default location of the Early Access Library for MACE is installed at:
"C:\Users\yourusername\Documents\Tensorpunk\MACE\models\Early Access Library"
Icon File Customization
If you wish to have a custom icon appear in MACE, simply place a 249×85 sized .png
image in the same folder as the .pt
file. Users are encouraged to be creative, however templates of the official look and feel will be provided in the near future as well.
Hyperparameters Guide
Hyperparameters in ANVIL play a crucial role in optimizing your model training process. Understanding each parameter helps you customize your model’s training to achieve the best performance. Here’s a guide to the key hyperparameters in ANVIL, along with descriptions and tips for tweaking them:
- Number of Iterations
- Description: Specifies the number of training iterations.
- Tips: A higher number of iterations might lead to a better-performing model but could also result in overfitting. Start with a moderate number and adjust based on the model’s performance. At any time you may pause and export the model for generation. This is mostly to set targets and let machine learning work in the background to reach that goal.
- Noise Latent Dim
- Description: Size of the noise input for the generator.
- Tips: Adjusting this can affect the diversity and quality of generated audio. Experiment with different sizes to find the optimal balance for your dataset.
- G Learning Rate
- Description: The step size at which the generator model learns.
- Tips: A smaller value means a slower learning rate, which can be more precise but slower. Consider starting with the default and adjust based on how quickly your model is learning.
- D Learning Rate
- Description: The step size at which the discriminator model learns.
- Tips: Can be set larger than the generator’s learning rate as per the Two Time-Scale Update Rule (TTUR). This helps in stabilizing the training of the discriminator against the generator.
- Batch Size
- Description: The number of samples processed in one iteration.
- Tips: A larger batch size requires more memory but can result in faster training. However, too large a size might impact the model’s ability to generalize.
- Generator Batch Size Factor
- Description: The factor by which the batch size for the generator is increased.
- Tips: Useful when the generator is more complex than the discriminator. Adjusting this can help balance the training between the two.
- Beta1 and Beta2 (Momentum Terms)
- Description: Momentum terms for optimization.
- Tips: Beta1 and Beta2 control the moving averages of the gradients and squared gradients. Adjusting these can affect the convergence speed and stability of the training.
- N Critic
- Description: The number of times the discriminator is trained per generator training iteration.
- Tips: A higher number of critic iterations can lead to more stable training but can also result in slower training.
- P Coeff (Gradient Penalty Term)
- Description: The coefficient for the gradient penalty term.
- Tips: A higher value can lead to more stable training but can also result in slower training. Adjust based on the stability of your training process.
- Additional Settings
- Normalize Audio: Normalizes the output audio for consistent volume levels.
- Validate: Uses a validation set during training to monitor model performance.
- Use Batchnorm: Applies batch normalization to the input audio, which can aid in model training stability.
- Decay LR: Implements a decay in the learning rate over time to fine-tune the training in later stages.
The Future of ANVIL
At its core, ANVIL is about experimentation and discovery, offering a gateway for those new to machine learning to immerse themselves in this fascinating field without the need for in-depth coding knowledge. More ML based techniques could grow on this platform over time allowing for more options to explore new sonic territories and more.
A Platform for Learning and Growth
ANVIL stands as more than just a tool; it’s a learning platform. It encourages users to experiment with different hyperparameters, datasets, and models, turning the complex world of machine learning into an accessible and educational experience. This hands-on approach provides invaluable insights into how machine learning works, particularly in the realm of audio.
Community Collaboration: Sharing Knowledge
We invite our users to join our Discord community to exchange ideas, tips, and tricks about hyperparameters, share models you’re training or have trained, and even tweak them based on collective insights.