Kozmik Kandi

Cosmos Overview

Sebastian — Tue, 05 Apr 2022 18:33:33 +0000

If you haven’t heard yet, the music software company Waves has recently released a free sample finder called Cosmos. This new piece of software utilizes AI and multiple tag and search functions to help you filter through your sample library and includes a Waves Audio Factory Pack which contains 2,512 different samples.

I should note that Cosmos is a standalone desktop application, not a plugin, but you can still integrate it with your DAW due to its capability to drag and drop samples. To use Cosmos you’ll have to download it from the Waves website and activate/install it via the Waves Central app. You also have the option of logging into your Waves account within Cosmos.

MAIN INTERFACE

At first glance you’ll see that Cosmos is comprised of a search section and tag menus on top, a sample folder manager on the left, and a sample browser that lets you audition the sample, shows you a visual waveform, the sample name, descriptive tags, length, key, and BPM. Also within the browser you have the ability to favorite the sample, add it to a collection, or see its location in the COSMOS view.

LOCAL FOLDERS PANEL

On the left-hand side of the screen, you’ll see the Local Folders panel where you can browse through, manage, and add sample folders. Waves conveniently provides you with the Waves Audio Factory Pack containing 2,512 free samples to get you started.

To add a sample folder, click on the “+” icon. When you add a new folder, COSMOS scans each sample in the folder, extracts data, and creates relevant search tags for each sample. Scanning isn’t recommended during sessions using high CPU.

In the sidebar on the left you’ll notice an icon to the right of the folder icon, if you click on this it will take you to your collections. Waves describes collections as “… a virtual folder that helps you organize samples that you have located.” You can create a new collections folder by clicking the “+” icon in the sidebar.

SAMPLE SEARCH

COSMOS provides you with a few different options to search and filter through samples.

Search: Use the search bar to find samples by name or other descriptive information

Loop vs One Shot:Another search option is the ability to choose whether you’re looking for a loop or one shot. This might seem like a basic filter option but it’s a great way to refine your search.

Tags: Use tags to further filter your search and get more specific results. Tags describe different attributes of a sample file. As you select tags, the tags suggestion bar will change its results by adding, removing, or changing tags. There is a “Clear” button to remove the all the tags you selected and restart your search.

You can use all of these search options in conjunction with each other. For example, I can use the search bar and tags at the same time to filter through samples.

TAG MENUS

Below the tags section there are three dropdown menus: Instrument, Tags, and Key & BPM.

Instrument: COSMOS uses machine learning to figure out what instrument or family of instrument was played in a sample.

Tags: This shows all available tags based on your previous searches and filtering.

Key & BPM: Clicking this brings up a small interface that allows you to change BPM and Key. There’s a slider that lets you pick a range to search around for the BPM, two buttons that allow you to search for double and half-time samples, and a keyboard to filter samples by key with the options to choose major, minor, or relative key.

If COSMOS didn’t correctly categorize your imported samples, then there is an option to modify tags and attributes. Click the three dots on the far right side of the sample in the sample browser section or right-click on the sample’s name or waveform. This will show a drop-down menu with the option to “Edit Tags” or “Edit Attributes” as well as other information about the sample.

COSMOS VIEW

You have the option to see samples visually in a cosmos by clicking on the orbital icon.

In this view, COSMOS analyzes your samples and groups/colors them based on how similar they are to each other. The closer they are, the more similar they are in sound. You can change how the samples are arranged by selecting one of the map selection options (show drop-down). The way that this works is by arranging samples on a horizontal axis based on the map selection you’ve chosen, for example in the “Space” map view, dry samples appear on the left and wet on the right. The “Brightness” map view arranges darkest on the left and brightest on the right, and so on.

There is a gradient switch that changes the color scheme of the samples to suit the map you selected. When the switch is deactivated, the samples will be shown in their original colors, which is based on the instruments family.

if you click on a sample in COSMOS view, it will appear at the top right of the screen. Press the drop down arrow above to expand it and view adjacent samples. You can use your keyboard up and down arrows to browse through these samples. The selected sample can also be dragged and dropped outside of COSMOS, or added to collections and favorites

On the bottom left of the screen you can see a history bar. Click on the left and right arrows to browse between samples you’ve auditioned. Keyboard arrow keys work here as well.

If you would like to download your free copy of COMSOS click here.

Music Rebalance vs RipX Deep Remix

Sebastian — Sat, 12 Feb 2022 20:26:15 +0000

As computers become faster and machine learning algorithms become more sophisticated, we can accomplish increasingly complex tasks with better performance. One of these tasks in the sound world is audio source separation which is the process of separating a full mix into instrument categories. These categories tend to be vocals, bass, percussion, and other. There are a number of products on the market that attempt to do this task, with iZotope’s Music Rebalance module in RX being the most widely known in my estimation. However, iZotope isn’t the only player in town when it comes to integrating machine-learning algorithms in its software. A London-based company called Hit’n’Mix offers a product called RipX Deep Remix which also performs audio source separation.

In this article, we will take a look at both products and compare their features and most importantly their performance in audio source separation.

Features

Music Rebalance is a module that is included in the standard and advanced version of iZotope RX. It separates a song into four categories (voice, bass, percussion, and other) and the volume for each sound category can be adjusted as well as muted or soloed.

There are three options for quality which are good, better, and best. Good provides faster than real-time separation at the cost of quality while Better and Best return higher quality results at the cost of slower rendering times.

Next to the Quality setting is the separation slider which allows you to decide the amount of isolation between the 4 categories. Moving the slider to the left reduces the amount of artifacts in the separated audio with more bleed from other categories while moving the slider to the right provides you with more isolation at the cost of introducing more artifacts in the processed audio.

After you have adjusted the settings to your liking you can either select render or separate. Selecting Render replaces the audio while separate will create individual audio files for each category.

RipX Deep Remix is similar to Music Rebalance but with more features and flexibility. After the audio is analyzed, the resulting stems are listed in the bottom right side of the screen, and just like Music Rebalance, can be muted, soloed, and have their volumes changed.

The main difference is the area of the UI that shows the notes of the stems laid out on a piano roll. Here, we can adjust the pitch of notes by moving them vertically and their timing by moving them horizontally, similar to Melodyne for those familiar with it. Effects including reverb, delay, panning, vibrato, and filtering can be applied to the individual notes of the instruments. We can even export the notes of the stems to midi.

Algorithm Comparison

As we can see, RipX Deep Remix has all the same features as Music Rebalance and more. However, what matters most is how well the separation algorithms of both programs perform.

To compare the algorithms, I chose the song Warrior by Andy Caldwell. I thought the song had a great mix and the instrumentation was relatively simple. Therefore, both programs would have a fair shot at showing how well they work. As for the settings I used for each program, I had the Quality option in Music Rebalance set to Best and the separation slider at 100% to achieve maximum separation. As for RipX, there were no parameters available to tweak.

Reference Mix

Andy Caldwell - Warrior

Vocals

When it comes to the vocals, both programs have some bleeding. In the RipX version, there appears to be some bleed from the drums, while Music Rebalance has it from the guitar and synth instruments. There is also noticeable phasing and gurgling artifacts in the Music Rebalance one.

Vocals - RipX

Vocals - Music Rebalance

Bass

For the bass, both Music Rebalance and RipX do a good job isolating the sound. There is very little bleed in the Music Rebalance example and virtually none in the RipX one. With that said, RipX does a better job at retaining higher frequencies that are part of the bass sound, making it sound a bit brighter and more plucky.

Bass - RipX

Bass - Music Rebalance

Percussion

As for the drums, Music Rebalance seems to capture more of the reverb compared to RipX, which seems drier. There are also more midrange frequencies present in the audio file. However, RipX gives us less bleed and fewer phasing artifacts. Additionally, the RipX version has much clearer transients, unlike music rebalance where the transients are blurry and sound like there’s a swooping sound before the transients.

Perucssion - RipX

Percussion - Music Rebalance

Other instruments

Finally, when it comes to other instruments, both programs offer good isolation. RipX returns very low-level drum bleed while Music Rebalance yields virtually none. However, Music Rebalance has audible phasing artifacts and fewer high frequencies compared to RipX.

Other Instruments - RipX

Other Instruments - Music Rebalance

CONCLUSION

Both iZotope’s Music Rebalance module and Hit’n’Mix’s RipX Deep Remix software provide decent audio separation results. Sure both programs produce artifacts and are not perfect, but I expect the algorithms to improve over time. Possibly to a point where we have virtually no bleed or artifacts, resulting in stems that are indistinguishable from the original.

With that said, I believe that RipX Deep Remix is a more convincing product than Music Rebalance. The separation algorithm, in my opinion, performs the separation task better and can do more things like change the pitch and timing of the notes. In regards to price, RipX can be purchased for just $99 while Music Rebalance costs $399 (RX 9 standard). Granted, you are getting more tools by purchasing RX 9, but it’s not ideal if you are solely interested in acquiring Music Rebalance. However, you can get RX 9 along with many other products at iZotope for $199, if you are okay with a yearly subscription.

Creating a Song with NSynth

Sebastian — Sun, 24 Oct 2021 23:09:51 +0000

This blog post is a follow-up to our latest video on YouTube NSynth Ableton Live Instrument – Overview and Song, where we gave an overview of what NSynth is and showed a song that we made using the Ableton Live Instrument version of it.

ABOUT NSYNTH

NSynth is described as a neural synthesizer that uses “deep neural networks to generate sounds at the level of individual samples.” The NSynth Max for Live device features an X-Y grid with the ability to morph between instruments, snap to grid and zoom functions, start time, playback speed, and control over attack, decay, sustain, and release. You can load one of the pre-made grids or create a grid of your own which will require some extra knowledge and work.

THE TRACK

Drums

For the drums, I knew I would have to resample and edit a bit to get a desirable sound so I tweaked some of the pre-made NSynth grids until a couple of sounds with a nice transient were made. The sounds were resampled and used with Ableton’s sampler, Simpler. A lot of resonant low-pass and high-pass filtering was used to get the kicks, claps, and hi-hats to stand out a bit more and remove unneeded frequencies. I then experimented a bit by transposing each sound in the drum kit to taste to see which samples benefitted from it. Envelope filtering was then used to get the samples to sound more drum-like by shortening the sample length and creating more of an attack to make kicks sound more like kicks and claps sound more like claps. After a base was established with the resampled Nsynth sounds, layering of additional sounds from my sample library helped make things pop even more. Finally the drums were processed through parallel compression, parallel saturation, and parallel EQ to add more overall depth and focus.

Bass

For the bass, some time was spent experimenting and tweaking the different grids in NSynth until I got something I liked. The sound was recorded and sampled into Ableton’s Simpler so that I could further control and manipulate it. Once in simpler, I used high-pass filtering and envelope shaping to taste to see what sounded good. Once a desirable result was achieved, I then applied EQ for further shaping, compression to control the sound, chorus to widen and add depth, parallel saturation to stand out in the mix, side-chain compression to give it movement and groove, and an additional compressor to-taste.

Synth

The same process was used for the synths, tweaking and resampling one of the pre-made NSynth grids until a desirable result was created. Ableton’s simpler was only being used to play the samples to save resources and no editing was done with it. The main synth that you hear in the first part of the verse has a wide array of processing applied. A comb filter was experimented with to shape the sound into a more gritty electronic timbre, followed by compression for control, chorus for width and depth, and a large reverb to make it spacey. The synth that comes in during the second part of the chorus only has a large reverb applied to give it space.

Pad

To create the pads, I ended up using Xfer Records Serum instead of NSynth. Two different presets were chosen to use as a base for each pad. The pad in the first chorus is being processed with EQ to make the highs pop and remove some low end, chorus for width and depth, large reverb for space, compression for control, and sidechain compression to add some groove. The pad that comes in for the bridge and final chorus is being run through the same chain with additional saturation applied to add grit and a sort of lo-fi timbre.

FX

The FX consist of NSynth samples and samples from my library. I layered an impact with some of the claps and I also added reversed sounds every 8 bars in the verse and every 4 bars in the chorus with a few placed in other sections. The reversed sounds have changes in transposition and volume automation to make them fit the track better time-wise but no other processing applied. For the impact, only transposition adjustments and volume automation were used to control the timbre and decay of the sample.

CONCLUSION

NSynth is an interesting tool that I recommend checking out. I think it has potential for some unique sounds and I’m intrigued to hear what can be made from self-made grids. Resampling goes a long way with the sounds you generate. I definitely see it as another tool to extend your palette of sounds in your production.

5 Beats with Magenta Studio

Sebastian — Thu, 02 Sep 2021 20:33:58 +0000

In this article, we go over Magenta Studio and some music examples we’ve created with it. As per the Magenta site, “Magenta Studio is a collection of music plugins built on Magenta’s open-source tools and models. They use cutting-edge machine learning techniques for music generation”. Magenta Studio offers 5 unique modules that generate midi using machine learning. If you are a Max for Live user, you can use these plugins in Ableton Live or you have the option of using them in standalone.

CONTINUE PLUGIN

Continue Beat

The Continue plugin takes a midi input (drums or melody) and elaborates on that midi input for up to 32 measures. Once you choose your midi input, you have 3 parameters to tweak (these parameters repeat on most of the modules). Variation determines how many different midi clips you’d like to generate, each one being unique. Length determines how many bars your midi clip extends for. I would describe temperature as how simple or complex the generated midi comes out, with lower values being simpler and higher values being more complex.

For the Continue beat I created a 1 bar long bass lick to be used as the midi input for the plugin. I generated 8 variations, extending the midi 2 bars, and filtered through them to select which one sounded best. That new selection was then used as the midi input for the plugin, and 4 more variations were generated, extending it 4 more bars. One of those newly generated variations was chosen and extended even further. I continued this process one more time and generated my final batch, creating 4 variations extending the midi clip to be a total of 9 bars. A final midi clip was chosen to be used in the track.

For the drums, I used the same process, an initial midi clip was created and used as an input for the plugin. A few different midi clips were created from scratch to use as inputs for the plugin, but nonetheless, some useable midi was generated.

From there, I laid out the drum and bass midi clips in Ableton’s arrangement view and got into production.

Drums

For the drums, I tweaked Ableton’s 808 Core Kit using filters, saturation, and compression on some of the individual drum sounds and layered it with another drum rack with samples of my own to give it a unique punchy sound. In a separate audio track, a foley clap sample is playing on every other snare hit.

Bass

For the bass, Novation’s Bass Station VST was used and processed through two parallel processing chains. One chain with a chorus effect, EQ, and compression. The second chain with saturation. Both chains are layered over a clean chain. After the parallel processing, the whole thing is being run through EQ, saturation, and 2 different compressors.

Melody

For the melody, Massive’s default preset was used with an automated low pass filter for the swelling effect followed by chorus, compression, reverb, more compression, and saturation.

FX

A reverse cymbal was placed in the beginning, a reversed noise sound was placed before one of the claps, and a little electronic beep was added into the mix. I find these little touches added some interest to the track.

GENERATE

Generate Beat

The Generate plugin generates a 4-bar midi clip with no midi input required. As per Magenta’s description, “Generate uses a Variational Autoencoder that has been trained on millions of melodies and rhythms to learn a summarized representation of musical qualities. Generate chooses a random combination of these summarized qualities and decodes it back to MIDI to produce a new musical clip.” For parameters, you have control over how many unique variations it generates and the temperature of those midi clips.

For the Generate beat, 8 midi variations were generated for the drums. After choosing a midi clip for the drums, some melody midi clip variations were generated, and a clip was chosen. The midi clips were laid out in Ableton’s arrangement view and an 8-bar loop was created.

Drums

For the drums, a drum rack was created from scratch with my own samples. Some envelope shaping, filtering, and layering was applied to the individual samples, followed by compression and saturation being applied to the whole drum rack. Some of the midi had to be duplicated to trigger the layered sounds in the drum rack.

Keys

The keys are Ableton’s Instrument Rack Preset “Det Keys” being run through a chorus effect, EQ, another EQ, and a reverb send.

Bass

The bass is Novation’s Bass Station VST being run through two parallel chains. One chain is a compressor and chorus effect. The second chain is a saturation plugin. The whole effect rack is being run through an EQ, compressor, another EQ for some final shaping, and a reverb send.

Pad

The pad is Massive’s “Dandelion Chorus” preset being processed through an EQ, compressor, chorus effect, bandpass EQ, and reverb.

FX

For effects, a single guiro strike was processed through a comb filter, chorus, and EQ and an ambient clap on the first snare hit every 4 bars going into a compressor, EQ, chorus, reverb, saturation, and low pass filter.

INTERPOLATE

Interpolate Beat

The Interpolate plugin takes two different midi clips as inputs and generates up to 16 hybrid variations of the two midi clips. Magenta goes on to describe, “Interpolate also uses a Variational Autoencoder (VAE) similar to Generate. One way to think of the VAE is as a mapping from MIDI to a compressed space in which similar musical patterns are clustered together. Each of your input patterns is represented by a position on this map. Interpolate draws a line between these positions and returns clips along this line.” For parameters, there is “steps” which controls how many unique variations are generated and “temperature”.

For the Interpolate beat, 4 different midi drum patterns were created: a trap genre pattern, house, another house variation, and a drum and bass pattern. 32 different variations were generated using different combinations of the 4 beats as inputs for the plugin. I then went on to filter through the clips to see what worked. For the bass, 3 different basslines were created and a combination of those were used for the plugin to generate 8 variations. The final midi clip selections were put into Ableton’s arrangement view to create an 8-bar loop.

Drums

For the drums, Ableton’s SwanSong Kit was layered with my own drum rack. Some filtering, envelope shaping, and compression was used on the individual drum sounds with some layers added to the kick, claps, and hi-hats. My custom drum rack is being processed through a compressor and saturation plugin. The chain for the noisy toms is saturation, multiband compression, low pass filter, and EQ.

Bass

For the bass, Ableton instrument “Bass Floor Bounce” was being run through EQ, compression, chorus, and a final compressor for the mix.

Synth

For the synth part, Massive’s 1991 preset was run through chorus, compression, reverb, saturation, some clean up EQ, mix EQ, and mix compression.

FX

A reversed open hi hat with a comb filter, chorus, and compression was used.

GROOVE

Groove Beat

The Groove plugin takes a midi drum pattern and humanizes it. The clip it generates has variations in velocity and timing, giving it the feel of a “live” drummer. Magenta recorded 15 hours of drummers playing on midi drum kits and used this data to train a neural network to predict the unquantized beats as the output.

For the Groove beat, a midi clip of a drum and bass pattern was used as an input for the plugin. I ended up using the first generated clip because I liked it so much. The midi was then sliced up and rearranged in Ableton’s session view to create the foundation of the beat.

Drums

For the drums, I created my own drum rack layering kick, snare, clap, and hi-hat samples. Some filtering, envelope shaping, and compression was used on the individual drum samples. The whole rack is being run through a compressor, parallel saturation chain, EQ, and mix EQ. A lot of the midi was duplicated to trigger the drum layer samples.

Synth Bass

For the synth bass, I used Novation’s Bass Station plugin being run through a compressor, parallel saturation chain, more saturation, chorus, and EQ.

Synth Lead

For the synth lead Ableton’s “Flute Bounty” instrument was used and processed through a chorus effect, EQ, saturation, compression, bandpass EQ, and reverb.

FX

A clap sample was layered in on the first beat every 4 bars. To make the clap sample more dramatic, chorus, saturation, compression, bandpass EQ, and reverb were used.

DRUMIFY

Drumify Beat

The Drumify module takes a midi input and generates a groove based on the rhythm of that input. It can be used to create an accompanying rhythm for a bassline, melody, or create a drum track from a tapped rhythm. You can tweak temperature on the module.

For the beat, a simple bass line was used as the midi input for the module. 8 different variations were generated, and one was chosen. I built the track off that midi clip.

Drums

A drum rack was created from scratch with my own samples. Claps and snares were layered in and some envelope shaping, filtering, and compression were applied to the individual drum elements, with compression being applied to the whole drum rack.

Synth Lead

For the synth lead, Novation’s Bass Station plugin was processed with saturation, chorus, and reverb.

Bass

For the bass, Kontakt’s Scarbee Bass was processed with EQ, compression, a sub filter plugin, parallel saturation, and a chorus plugin.

CONCLUSION

It was a fun time using the Magenta Studio plugins to create these tracks! I’m incredibly satisfied with the results. Obviously, the production plays a huge role in the feel of the track but a lot of the midi that was generated really created some interesting ideas. These ideas could definitely be elaborated on to create full songs. It’s a useful tool and interesting to use for getting ideas, but it’s ultimately up to you on how creative you can get with the midi clips it generates.

XO Drum Sampler Review

Sebastian — Tue, 01 Jun 2021 20:50:26 +0000

Recently I released a video about Atlas, a drum sampler plugin that organizes and categorizes drum samples using artificial intelligence, helping producers create their drum sounds faster.

Another plugin that aims to do the same is XO by XLN Audio. Developed by the creators of the well-known Addictive Drums software, XO is also a drum sampler that utilizes artificial intelligence to let artists quickly generate their drum sounds.

So let’s see what XO is all about.

SPACE

When you load XO, you are greeted with a welcome screen that briefly describes and visually points out the main areas of the plugin. This is helpful for new users and can be turned off if you already know your way around.

The XO SPACE page is where we see our samples mapped out as dots, with each color representing different categories. Not only are the sounds categorized, but similar sounds are placed near each other. XLN doesn’t mention how they determine this, but I noticed that samples that I would consider similar were near each other.

SAMPLE SLOTS

On the left side of the plugin are eight sample slots that represent your drum kit and is where drums samples from the Space are loaded to.

Each slot provides basic controls like gain, mute, and solo, and you can also go to the previous or next similar sample. When a sample in the Space is selected, the Space load buttons will appear next to the slots and can be clicked to load the sample to that slot. You can also hot-swap samples by clicking on the sample dots so that they are immediately swapped out when clicking on sounds in Space or in the Similarity List.

Below the sample slots is the kit similarity feature which is a quick way to change up your kit. It replaces each sound in the kit with the next most similar sound. When you have a kit that’s in the ballpark of what you want and want to explore other similar-sounding kits, this is a helpful feature to use.

SEARCH & FILTER

The Search & Filter panel lets you find samples based on various criteria.

You can search for sounds based on their names and their file path. They can be searched by category so that the Space only shows kick and snare samples for instance. By clicking favorites, only samples that have been favorited show in the Space and you can hide/show samples based on the folders they are from. You can also filter sounds based on their main frequency, their length, and by drumminess or how percussive they are. A sample of a guitar for instance would rank low on drumminess.

SAMPLE FOLDERS

XO comes with thousands of hand-picked drum samples which sound well produced. You can add your own sounds by clicking on the samples folder icon and dragging your folders in or by clicking the Add New Folder button to browse for the folder on your computer. If you add or remove samples from any of your folders, XO will let you know immediately so that you can perform a fresh rescan and keep the Space updated.

EDIT

The edit section is where we can tweak the samples in our kit, create drum patterns, and add effects and processing to the entire kit.

Sequencer

XO’s sequencer allows you to sequence grooves for each sample as well as adjust other parameters. The velocity of a step can be adjusted by selecting it and dragging your mouse up or down. The timing/groove can be adjusted globally for the entire sequence or individually for each sequence lane. At the bottom is the Accentuator which creates dynamic variation by increasing and decreasing the velocity of steps at various intervals (1/2 notes, 1/4 notes, etc.). This is a useful way to add a more human feel to your grooves.

Beat and Sample Combiner

When you open Beat Combiner and/or Sample Combiner, you enter what is called Playground mode. In Playground mode, you can experiment with different drum patterns and samples, and revert back to the state you were in before entering it. Beat Combiner is where you can select different drum patterns for each sample lane. XO provides us with three basic patterns and seven extracted patterns that are based on the original pattern. Sample Combiner lets you audition samples for your kit. The sounds shown are supposed to be similar to those of the original kit and can be changed individually, at the same time, or randomly by clicking the dice icon. You can also use the Live filter so that only the samples that meet certain criteria are populated in Sample Combiner.

Preset Browser

XO comes with over 200 beat presets (in the non-lite version) covering genres from old-school hip hop to techno. A beat preset consists of a drum kit and pattern and is grouped into playlists. The same beat pattern can be played while trying out similar kits by clicking on the circles in the bottom right area. I have been pleased with the beat presets and think it’s a great way to generate kits and drum patterns.

Export WAV & MIDI

XO gives you flexibility when exporting WAV and MIDI files. You can export samples individually or all at once (raw or processed), and your entire drum sequence as WAV or MIDI, or the individual beat stems as WAV. Before processed samples, stems and the beat can be exported, they must be rendered to WAV. The rendered files can then be exported to a folder of your choice on your computer or directly into your DAW.

Pricing

You are presented with two options when purchasing XO:

XO lite is the less expensive option at $119.95 and it comes with 2000+ samples, 100+ beats presets, and 2 supported file formats (WAV, AIFF).
XO is the more expensive version at $179.95 and offers over 8000+ samples, 200+ beat presets, 6 supported file formats (WAV, AIFF, FLAC, MP3, OGG, WMA), more export features such as exporting your beat and stems to WAV, and some additional miscellaneous features like live file system monitor which monitors your added Sample Folders, and will immediately tell you if you’ve added or removed samples, meaning it’s time for a fresh rescan.

You can see the full feature comparison here.

Conclusion

I believe XO is a great plugin that is worth checking out. Not only does it solve the issue of having your drum samples scattered in different places, but it’s also a very useful tool for creating drum grooves. I especially love how you can add different flavors of swing and enhance the dynamics of your beat with Accentuator. I personally have a challenging time developing my drum patterns, so I find those features very helpful.

A Visual Approach to Studying Sound

Sebastian — Mon, 05 Apr 2021 15:38:16 +0000

If you watched my video on Voxengo SPAN Plus, then you are probably here because you want to know more about how I use Span Plus along with tools like iZotope RX 8’s Music Rebalance module to understand sounds better. Well, you have come to the right place!

While I would argue that judging sounds with our ears and less with our eyes generally leads to better results, there is still a vast amount of valuable insight we can uncover from visual audio tools like spectrum analyzers and waveform editors. With this knowledge, we can make better decisions whether we are recording, mixing, or mastering. So in this blog, I will share my process for using this visual approach to learn more about sounds.

1. GATHER THE SOUNDS

The first step is to determine what kind of sound I want to analyze which is based on what my goal is. Perhaps I want a better understanding of mixing dubstep snares. Once I know the type of sound I want to analyze, I then have to go out and get it. The following are the sources I typically go to.

Sample Packs

Sample packs are a great source of information because you have direct access to individual sounds. However, one issue with websites selling sample packs is that they often provide a limited view of what sounds the sample packs contain. This does not mean you have to avoid them, but it is a drawback to keep in mind. Fortunately, there are several websites that allow you to view all the sounds of a sample pack. Websites like this include:

One important point to consider is how does one distinguish sounds that are worth analyzing from those that are not? After all, if I already knew how to distinguish good dubstep snares from bad ones, then this analysis would probably be unnecessary. In a sense, this situation is sort of a catch-22. However, our musical instincts will become wiser over time as our ears develop. In the meantime, we can use things like ratings and reviews (if provided) to estimate the quality of a sample pack. Also using a different source such as a professionally mixed song may serve as a better option which I will touch on.

Remix Competitions

Remix competitions are arguably a more useful source of information than sample packs. Not only do you have access to the individual tracks of a song, but you can also see the relative level differences and spectral relationships between them, which is very useful. Skio Music is one of my favorite places for finding stems. The nice thing about Skio is that it’s free to create an account and download the instrument tracks. Plus the songs are well-produced and some have even been mixed by prominent mixing engineers like Tony Maserati. One downside of Skio is that you are unable to download stems from past competitions. Another is that the diversity of music you can learn from is limited as they tend to be Pop. With that said, it’s still a great resource to get sounds from.

Professionally Mixed Music

While remix competitions are great, you will most likely not find the stems to the song you want. Sometimes you can get lucky and find them via a google search, but this will often not be the case. Do not worry though because we have some options here!

Depending on the sounds you want to analyze, you may be able to get away with simply sampling that sound right from the song. But what if the sounds we want are masked by other instruments? This is where the wonders of machine learning come in. We can use iZotope RX 8’s Music Rebalance module (available in the standard and advanced versions) to separate any song into four parts: vocal, bass, percussion, and other.

To do this, open RX 8 in standalone mode and import the song of interest. Then select a portion of the song that contains the desired sound. After, open the Music Rebalance module and set the quality to best and the separation slider to 100% to achieve the most isolation possible. Then click separate and Music Rebalance will split the song into the four categories. One drawback of Music Rebalance is that the separation quality can vary greatly depending on the song. In some cases, there may be more bleed from other instruments or artifacts like phasing or transient smearing. Therefore, it is a trial-and-error process. I personally like to stick to songs where Music Rebalance performs the separation well.

Also, I recommend using songs that are in a lossless format like .wav or .FLAC. However, lossy formats like MP3 and AAC are perfectly fine as well especially given how efficient the algorithms they use have become.

Plugins

Plugins that contain samples are another useful source for sounds. However, they have the same issue as sample packs in that you still need the ability to identify well-produced sounds. But that shouldn’t keep you from using them because that ability will develop over time.

Other

In addition to the sources mentioned here, there are others that can be helpful for finding sounds. One that may surprise you is Patreon. As a result of concerts and tours being canceled due to COVID-19, many artists have had to find additional ways to generate income and Patreon has been one of them. This is exactly what happened to electronic duo KOAN Sound. On their Patreon page, they have made sample packs and stems to some of their songs available to donating members.

2. CREATE THE SNAPSHOTS

Once I have the sounds I want to analyze, I create snapshots that will be used for analysis. My tools of choice for capturing these snapshots are Voxengo SPAN Plus, which captures the frequency profile, and RX 8’s spectrogram/waveform display, which captures the amplitude profile over time.

Let’s start with Voxengo SPAN Plus. Before you begin, I recommend opening the Spectrum Mode Editor window by clicking on the cog symbol in the upper right area and setting the block size to 4096 samples and Range Lo to -115 dB. Increasing the block size provides more detail with lower frequencies and decreasing Range Lo reveals frequencies that would have otherwise been cut off from the spectrum analyzer display.

Moving forward, I load the sound (say a kick drum) into my DAW which is FL Studio. Then I open an instance of SPAN Plus on the master channel and click on the Static button to open the Static Spectrums Editor window. Next, I loop the sound and then click Take to capture the frequency profile. For percussive sounds like kicks and snares, I click the take button after the sound plays and the peak meter is falling. If clicked too early, especially with kick samples, you won’t capture the fundamental frequency.

To illustrate this point, I took two screenshots of the same kick. The top one was taken after the kick sample played and the peak meter was decreasing, while the bottom one was taken before the kick sample finished.

Now that I have a spectrum snapshot, the next thing I do is adjust the gain or height slider. I set the snapshot height so that either the fundamental or the tallest frequencies of the sound are at approximately -36 dBFS. The point of having this reference level is so that you become accustomed to seeing things in a certain way which makes it easier to compare sounds. So with the snapshot all set, I click save and meaningfully name the sound. To open a snapshot simply click on Load and select the snapshot file.

While Voxengo SPAN gives us information about the frequency distribution at a particular point in time, RX 8’s waveform display shows us information about the change in amplitude over time. I like using iZotope RX 8 for generating waveform graphs, but you can use whichever tool is available to you. If you are using FL Studio, for example, Edison is a great choice and it comes with the DAW.

After opening RX 8, I import and peak normalize the sound to 0 dBFS using the Normalize module. It’s important to peak normalize all of your samples to the same level because setting a reference level makes for effective comparisons. I use 0 dBFS because the waveform uses the full height of the waveform display. Once the sound has been peak normalized, I go to File > Export Screenshot and set the dimensions of the screenshot. I use 1874 x 926 because that’s the largest size RX allows me to export at. Whatever dimensions you choose, be sure to stick with it so that there is visual consistency.

3. UNCOVER THE PATTERNS

The last step is to analyze and take notes on the snapshots I took. Let’s take this snare drum sample for example.

My notes on this spectrum snapshot would include things like:

Fundamental frequency is 147 Hz which is note D3.
Top end peaks between 3,000 – 6,000 Hz and is about 6 dBs lower than the fundamental frequency.
Medium steepness roll off begins at 6,000 Hz and at 20 kHz is at about the same amplitude as the midrange dip.
Sub frequencies are present and about 33 dBs lower than the fundamental.
Overall shape is smiley faced, with its lowest area between 600 – 1,000 Hz.

Here’s a snapshot of the waveform display of that same snare.

My notes here would include:

The snare lasts for about 270 milliseconds.
The beginning of the snare is mostly flat for about 30 milliseconds and then gradually declines for about 40 milliseconds.
There is a sharp 6 dB decrease at about 60 milliseconds.
From 60 milliseconds onward there is a medium decrease in amplitude.

After taking notes on a number of sounds, I do a meta-analysis on what I wrote and attempt to identify patterns between the snapshots. Not only do I try to find similarities between the sounds, but also their differences. Outliers or sounds that visually differ significantly from other samples should not be ignored but investigated because they can provide significant insight.

CONCLUSION

While I don’t suggest replacing your ears with tools like spectrum analyzers and waveform editors, they serve as a tremendous learning tool. As you take notes and discover patterns in the sounds you analyze, you can apply this knowledge to producing, mixing, mastering, etc. The patterns themselves don’t dictate whether you should or shouldn’t abide by them, they merely reveal what things are like. It’s up to you to decide what to do with that information.

I believe that taking this analytical approach is highly valuable and imagine there are others who would benefit from it. I think it’s a great starting point and as you go through the process you will probably come up with your own exciting flavor of it.

Creating a Beat with Artificial Intelligence

Sebastian — Thu, 04 Feb 2021 15:46:32 +0000

Recently, I provided a video overview of AWS DeepComposer, an Amazon Web Service designed to teach developers AI and Machine learning in an engaging way. One thing that AWS DeepComposer does is it takes a melody you have written (or an uploaded midi file) and generates four accompaniment tracks. As a follow-up to the video, I decided to create a short beat using AWS DeepComposer in the process. I thought it would be a great way to show some insight on what you could get out of this service.

I will show you how I went from this simple melody:

Input Melody

…to this beat:

DeepComposer Beat

Generating the Composition

To begin, I set the generative AI technique to Generative Adversarial Networks since that technique generates accompaniment tracks. For the generative algorithm, I kept it on MuseGAN since AWS DeepComposer does not come with any U-Net models, and I didn’t feel like training a model. As for the specific MuseGAN model that would generate the inferences, I chose Rock.

With the model parameters set, I recorded my input melody. This part of the process is trial and error, as it takes some time to understand how different models react to the input melodies you feed them. By around the 10th melody, I generated accompaniment tracks that, while sounded rough, had potential. I also ended up using the Pop model instead of the Rock one.

The melody itself is quite simple (as you heard above) and was performed on a laptop keyboard. Additionally, I turned on rhythm assist which quantized the notes. Ideally, this feature is supposed to return better results from the models.

Here is the composition that the Pop model generated:

The Composition

Editing the Notes

With the composition generated, I downloaded it as a midi file and imported it into FL Studio (my primary DAW).

I began by editing the midi notes of the tracks, starting with the guitar track. I polished up the track by removing notes I didn’t like. Here is the before and after:

Guitar Before

Guitar After

The next track I tackled was the string track. Like the guitar track, most of the midi editing involved removing notes. However, I did add a few notes and changed some of the chords slightly to harmonize better with the guitar track. Here is the before and after:

Strings Before

Strings After

The next sound I tackled was the drums. I knew right away this was not going to fly and replaced it with a hip-hop drum loop that I downloaded from Noiiz.com. Here is the before and after:

Drums Before

Drums After

The next track I touched up on was the bass. I removed extra notes and fixed the rhythm so that it grooved better with the other instruments. Here is the before and after:

As for the input melody itself, I decided not to incorporate it.

So here is the beat so far:

DeepComposer Beat (not mixed)

Polishing the Tracks with Post-Processing

With the tracks edited, the next step was to mix them.

For the guitar track, I actually split it into two guitars so I could pan them to the sides. The guitar on the left is playing higher notes, while the one on the right is playing lower ones. Both guitars had a lot of build-up in the low-mid range, so I carved that out a good amount. The left guitar was quite piercing in the upper-mids (~2 kHz) so that was reduced as well. I removed a couple of harsh resonances around 1kHz and multi-band compression to make the guitars more dynamically consistent. The guitars also had quite a bit of aliasing in the high end, so I used a high-pass filter to reduce it. Finally, I used a reverb plugin called RAUM by Native Instruments to put the guitars in a space.

For the string track, I removed low and high frequencies since I wanted a warmer sound. I reduced some resonant frequencies in the midrange that were annoying and low-mids that made the strings sound muddy. Also, I used iZotope’s Ozone Imager plugin to give the strings a bit more width since they were quite narrow to begin with.

For the drum loop, I reduced boominess from the kick drum and scooped the midrange a little bit. I also boosted around 10 kHz to brighten up the top end and used Neutron 3’s Transient Shaper in multiband mode to reduce the sustain of the kick drum so it would be tighter.

As for the bass track, I reduced the low-mid frequencies to make room for the kick drum. While mixing, I decided to add a sine wave that played the same notes as the bass to reinforce it. In doing that, I used a low cut to remove the fundamental frequency of the bass since the sine wave was replacing it. To make the bass sustain longer, I used compression with a decent amount of gain reduction.

Conclusion

So that is how I made a beat with the help of AWS DeepComposer. It was a fun and stimulating experience being that it was my first time creating music using artificial intelligence. As we saw, AWS DeepComposer was not able to create the beat for me and have it ready for Spotify. The tracks required a decent amount of work. However, I’m sure you could train models that generate high-quality output and require little compositional editing.

With that said, hopefully you found this blog insightful. Perhaps it even peaked your interest in seeing what you can create with the assistance of AWS DeepComposer. Sometime in the future, I will have to try this again using my own trained models. It would be interesting seeing what kind of results I would achieve with them.

Thank you for reading and take care!

5 Trends in Audio Source Separation

Sebastian — Sun, 22 Nov 2020 15:50:28 +0000

On September 28th and 29th, the Audio Engineering Society (AES) had its first machine learning focused event called “Applications of Machine Learning in Audio “. The virtual event covered topics including automatic mixing, machine learning for audio visualization, and sourcing audio data for machine learning. It also featured a presentation on five current trends in audio source separation given by Fabian-Robert Stöter, an Audio-AI researcher at Inria (National Institute for Research in Digital Science and Technology), and Stefan Uhlich, who works in Sony’s R&D center in Stuttgart, Germany.

Below are the trends Stöter and Uhlich discussed in their presentation.

Moving from Supervised Separation to Universal Separation

The first trend introduced was the movement from supervised separation to universal separation. Uhlich explained, “In supervised separation, we have acquired some prior knowledge about the separation problem that we want to solve. For example, we know a priori the number of sources that are there in the mixture, and typically also the number of sources is quite small, so either two or four. And we also know what type of sources we want to separate.” He then explained universal separation saying, “On the other end of the scale, we have really universal separation where we have an unknown number of sources. And even more important, also we deal with an unknown source type in the mixture. So we really don’t know a priori what type of sources are there.”

Uhlich said that universal sound separation has yet to be solved, but lately, there has been a lot of progress. As an example, he talked about how now it is possible to perform audio source separation for an unknown number of sources. Uhlich described the approach stating, “The idea is to recursive the split of one source from the mixture using one and rest speech separation system [1]. The network has two outputs, s of t and r of t, where s of t contains the estimate of one sound source, in this case here for one speaker. And r of t contains the remaining speakers that are left. And the idea is now to iteratively apply this one at rest speech separation system to separate the mixture until basically, we have really split the mixture into all these speakers, into all the sources.” Another example Uhlich gave was the ability to perform source separation for many classes [2]. He explained, “So first the sound event detector is trained, and this sound event detector is then used to identify anchor segments which can be used for training. And using this approach, basically we can train a system on 527 AudioSet classes, which is really amazing.”

2. Training with Imperfect Data

The second trend Uhlich discussed was the ability to deal with imperfect training data. He explained that imperfect could mean several things. In one situation, it could mean the data is weakly labeled. “A way to deal with weakly labeled data is to use sound event classifiers,” said Uhlich. He pointed to research work by Mitsubishi [3] saying, “They showed basically how to train a separator when we have only maybe frame-level information, frame-level labels, or maybe only clip level labels available.”

Under another situation, imperfect could mean the training data is comprised solely of unknown mixtures of sources. In a Google research paper titled “Unsupervised sound separation using mixtures of mixtures,” Uhlich said they “showed with their mixing invariant training that it’s still possible to train a separation network on such data.” He explained that Google researchers extended their permutation invariant training to their mixture invariant training [4]. “Now that we only have mixtures available, we don’t have to solve a permutation problem here, but we have to really solve a mixing problem. We have to decide for every estimated source whether it goes with the first mixture or with the second mixture, and this is solved by finding a binary mixing matrix (a) that you can see here. And this mixture invariant training is really very interesting, and by this, it’s possible to train on training data which only contains mixtures of sources.”

In a third situation, “Imperfect could mean that we want to train speech enhancement systems on noisy speech,” according to Uhlich. He said it would be ideal to have access to large datasets of high-quality and multi-language speech to train such systems. However, collecting this data is challenging and not straightforward. “What people often use are speech data sets that are created for automatic speech recognition, for ASR. And the problem is that this data set often still contains noise, and maybe even on purpose contains noise because we want to have our ASR system to be robust for noise (for typical noise) that can be there. And one example is the Mozilla common voice data set, which is very nice because it’s multilingual, has many speakers, and it’s really also a large amount of samples. But there are typical noises like microphones’ switch on sounds, or even sample dropping that we actually want to remove by speech enhancement.”

So if we don’t have access to clean speech recordings, how can we train speech enhancement systems? Uhlich said, “There is some work that tackles this problem with an approach that is known from image denoising; it’s called Noise2Noise approach. And there’s another approach where a kind of bootstrapping is used, where we access at the beginning to a smaller clean speech data set. And this can be used to bootstrap the training on the larger noisy speech training data.” However, despite these methods, the problem remains largely unsolved.

Perceptual Loss Functions for Speech Enhancement

The third trend presented was the use of perceptual loss functions for speech enhancement. Uhlich said that “Perceptual measures are commonly used for speech enhancement.” Examples he gave were PESQ (the perceptual evaluation of speech quality), and the composite measures CSIG (a signal distortion measurement), CBAK (a background noise distortion measurement), COVL (an overall quality measurement), and STOI, which has shown to correlate well with speech intelligibility. He said these perceptual measures are used as additional loss terms to improve speech enhancement systems’ perceived quality. He added that while the use of perceptual measures is commonplace for speech, it is uncommon for music.

Uhlich also mentioned a method for measuring the perceptual quality of audio source separation called PEASS (perceptual evaluation of audio source separation). He said, “Quite recently by the University of Surrey, it was checked whether this PEASS score still correlates well with human perception. And also they showed that the aps, the artifact-related perceptual score, showed a good correlation with human perception,”

Additionally, he introduced an approach from MIT and Spotify where instead of comparing the ground-truth signal and their estimate, they compared the low-bitrate codec version of the ground truth and the separation. He explained the reasoning was that the low bitrate MP3 codec would have removed everything that is not important for our perception. Therefore, it should be more meaningful to compare our separation to that. Uhlich concluded, “In general, as you can also see from PEASS, there’s really the need for more work on perceptual measures and also loss functions for music. And currently, there’s really not the tools, or at least they are not used by researchers for doing a perpetual evaluation.”

4. Multi-Task Training

The fourth trend addressed was the use of multi-task training for speech enhancement. In other words, combining source separation with other tasks like classification tasks. Stoter said, “A long-standing problem and question we have in audio separation and many related tasks which is… well given we have a perfect separation system, would we then be able to get the better classification system or a better recognition system for speech for example.” He pointed to an event that DCASE (Detection and Classification of Acoustic Scenes and Events) has in this area, saying, “There’s a recent evaluation campaign for the 2020 DCASE where for the first time they ask for the task of sound event detection and to have a separate evaluation for a source separation system and a sound event detection system.”

Uhlich continued by explained the benefit of multi-task training. “It’s beneficial to train at the same time also the mask for the noise component and to have really a loss function which is containing not only the error that we do in the speech prediction but also the error we contain in the predicting the noise mask.” He added, “This additional task acts as a regularizer, and we can observe that the speech mask generalizes better.” Uhlich referred to research done by Sony, where a trained system that predicted speech and noise masks at the same time performed better than the system that only predicted the speech mask.

5. From Research to Deployment

The final trend Fabian and Uhlich talked about was the transition from research to deployment. “This is because of two reasons that we can see this commercial use of audio separation,” according to Uhlich. “The first one is that the quality really reached a level that we couldn’t think of five years ago. And also, the computational resources that we have available on edge devices like smartphones are sufficient for doing real-time inference.” One example he gave was the integration of real-time speech enhancement into products like Google Meet and NVIDIA RTX Voice. “You can find many demos on YouTube where people are trying with vacuum cleaners and all this stuff that you can think of how good the speech enhancement system works, and it’s really quite fun to watch those videos.” He also highlighted how this technology had become more vital, saying, “Such real-time speech enhancement has really also gained more importance now that there are much more video conferences because of COVID-19 and people having to work from home.”

Another example Uhlich provided was Karaoke, which is featured in apps likes Spotify and Line Music. “From Spotify, there’s the sing-along which allows you to turn down the volume of the vocals, and that basically, you can sing yourself to the song. And another one is from the Line music app, which also realizes the karaoke feature by using source separation from Sony.” Additional examples mentioned were the intelligent wind filter in the Sony Xperia 1 mark II smartphone, and the Dolby Atmos versions of Lawrence of Arabia and Gandhi, which used Sony’s AI separation to upmix the films from stereo to 5.1, or from 5.1 to Dolby Atmos.

You can get access to the full presentation, which covers things like the history of audio source separation as well as tools and datasets via Stöter’s website here.

Sources:
[1] Takahashi, Naoya, et al. “Recursive speech separation for unknown number of speakers.” InterSpeech 2019
[2] Kong, Qiuqiang, et al. “Source separation with weakly labelled data: An approach to computational auditory scene analysis.” ICASSP 2020.
[3] Pishdadian, Fatemeh et al. “Finding strength in weakness: Learning to separate sounds with weak supervision.” TASLP 2020.
[4] Wisdom, Scott, et al. “Unsupervised sound separation using mixtures of mixtures.” arXivpreprint arXiv:2006.12701 (2020).