Friday Facts #407 - Automating a soundtrack

Posted by Albert, Donion on 2024-04-19

Today we continue our musical journey.


Last week we presented a general approach to the Factorio Space Age music (FFF-406). We also mentioned that we have some new techniques to not only cover these 5 hours of music, but to also surpass them.

This automatic way of making music is something that I was experimenting with a long time ago, before Factorio.
I've played a lot with random melodies on top of random bass sections, with random rhythmic bases, all programmed with action script (yes pretty old). The results were quite intense, but never good enough to consider them finished tracks.

When the 5 hours soundtrack project for Factorio Space Age started, I immediately thought of those old experiments. Now having Petr composing, and Donion programming, the thing looked different. I just dared to go this way. Now I'm convinced that this was a good decision.

Variable music tracksDonion

These tracks play out differently each time they are selected, they are a kind of procedurally generated music. But we don't want to go too crazy with the randomization, a variable track is more like a set of variations of a single track (without the need to record them all). These tracks take the place of the interludes which play between the main tracks (unless you go rooting around in the hidden settings). The goal is to provide some variety in the music after tens or hundreds of hours spent in game, regular music is still the main focus and large majority of the soundtrack.

Variable music tracks are defined in the prototypes, fully available to modders. These are the components used to define a variable track:

  • Samples
  • These are the smallest building blocks. They are individual pieces of music which get played according to other rules. Samples are played after each other so when one sample finishes the playback seamlessly continues with the next sample.

  • Layers
  • Samples are grouped into layers. Layers dictate how individual samples are composed together. It could be as simple as selecting samples randomly, shuffling all available samples so each plays exactly once, or it can be more complicated with samples being selected based on which sample is currently playing in a different layer. Layer can also contain sublayers where samples overlap in a specific way. Further variations within a layer can be done using a number of properties, defining delayed start, number of repetitions and pauses between repetitions for shuffled layers, offsets of overlapping sublayers, etc. These properties come either from layers themselves or according to the current state of the track.

    Layers and their samples are played aligned to the smallest time unit each track defines for itself, creating a sort of time grid.

    The way how layers are composed is the main source of variation.

  • Sections
  • Sections are collections of layers. There can be one section or multiple of them in a track. Which section is used is determined by the track state. Additionally, sections can overlap. When there is only one section it can overlap itself. Lastly, a section can contain an intermezzo which is played as a normal piece of music, providing an option to compose a hybrid track: part variable, part static.

  • States
  • States and transitions between states are the high level way to define how a variable track is composed. They select which section should play and whether it should overlap the previous one and they define a number of layer properties which are applied to the current section's layers.

    Transitions between states can be based on elapsed time or they can be tied to a specific layer finishing. Multiple possible next states can be defined with different weights so some transitions will be more likely than others. Next state candidates can have additional conditions defined, these conditions have to be met for the state to be considered for transition. For example a transition can be set to only happen if a specific sample is playing in a specific layer at the time of next state selection.

Now that we know what variable tracks are made of, let's look at couple of examples of how a track gets composed.

Please understand these examples are a tech demo, still in progress. Some details may change. The music itself is not representative of what it will be in the game on release, these samples are quite old and made for illustration purposes!

The first example is a track containing three sections. Each section has three layers, a bass layer made out of two sublayers, a middle layer also made out of two sublayers, melody layer and an intermezzo. Transitions between states are based on the melody layer finishing.

This is how an instance of this track might sound:

Recorded in-game using WIP version.

The images are only an illustration and do not correspond to the recording. Here is an actual timeline:

  1. 0:01: The track starts in the begin state, section 2 is selected randomly.
  2. 0:01: Bass layer starts.
  3. 0:12: Middle layer starts.
  4. 0:17: Melody layer starts.
  5. 0:44: Melody layer finishes its first repetition, queuing a pause before second repetition.
  6. 0:51: Melody layer starts its second repetition.
  7. 1:19: Melody layer finishes its second and final repetition, queuing a pause before finishing playing.
  8. 1:23: Melody layer finishes playing, triggering a transition to the inter state, a section different than the previous one is selected randomly, section 0 in this case.
  9. 1:23: Bass layer and middle layer continue playing, now with samples from section 0.
  10. 1:23: Melody layer starts playing again.
  11. 1:42: Melody layer finishes playing its first and only repetition. This time without pause after, triggering a transition to continue state immediately.
  12. 1:42: Section which was selected in state begin is used again.
  13. 1:42: The melody layer uses the same sample shuffling as in the begin state.
  14. 2:09: Melody layer finishes playing its first and only repetition. Again without pause after, triggering a transition to finish state immediately.
  15. 2:09: Section 1 is selected as the only one still unused.
  16. 2:35: Melody layer finishes its first repetition, it continues with a second one without pause.
  17. 3:03: Melody layer finishes its second and final repetition.

This second example shows a track with only one section, but it overlaps itself when transitioning between states at fixed time intervals. The section has three layers. There is also a chance to play an intermezzo if a specific sample played in the first layer.

And this is how this one might sound:

Recorded in-game using WIP version.

Technical challengesDonion

As it turns out, when it comes to music, timing and transitioning things correctly is important. I know, I was shocked too.

Queuing samples

Samples need to be played one after each other without any gaps in order to maintain the track's overall tempo and to avoid audio artifacts, as someone from our forums recently found out the hard way. The music player updates sixty times per second, same as the rest of the game logic. Simply checking if a current sample finished playing to start playing the next sample is not enough, as there could be up to 16.67ms (1s / 60) gap between them, destroying the tempo. Taking the checks outside of the regular update logic into a separate thread or using callbacks for when a sample finishes playing wouldn't work either because of how audio data are mixed together by the SDL_Mixer library (version 2.0.4) we're using.

With our current settings audio is mixed in chunks of 512 samples (these are the audio signal samples, not the music samples) which with the sampling frequency of 44.1kHz makes a mixing interval of roughly 11.6ms. Even if we detected the exact moment when a sample finishes playing, we wouldn't be able to start playing the next sample right at that moment. There would be a gap up to 11.6ms long again. What we really need is a way to queue our music samples.

The SDL_Mixer library doesn't provide such functionality, I needed to build it myself on top of SDL_Mixer with some modification to SDL_Mixer itself. This is not the first time I needed to add a feature to the audio backend, so undeterred I had a queueing system working fairly quickly. Now the music player can queue samples in its leisurely 16.67ms windows and a separate feeder thread take care of stitching the samples together correctly, while the SDL_Mixer doesn't even know it happened.

Transitions between samples

As you can see in the pictures above, the same sample can be played with different lengths. For instance in the first example in picture 5, sample number 3 (yellow) is first played for three units of the grid and then it is played for four units. Unless we want to have variants of the same sample saved in many lengths (we don't), we often need to cut a sample short before playing the next sample. When you do that, you can end up with unpleasant audio artifacts or clicks. A similar thing can happen when you're changing playback position in a music or video player, if you want to try it yourself. What happens is that there is a big jump in the audio signal's level and it sounds like a click or pop. The signal becomes discontinuous if we want to use big words.

You can hear the clipping artifacts in this recording which was taken using an older version (you might need to turn your volume up). It is clear that something has to be done about this.

The way to solve this is to fade-out and/or fade-in samples over a short period of time, let's say 10ms as they come one after another. That way the transition is nice and smooth (continuous). You might find some audio processing applications do this sort of thing automatically by default. Sounds straightforward and SDL_Mixer provides a way to fade-in and fade-out samples, what's the problem? The built-in fading functionality of SDL_Mixer calculates the same target volume for the entire chunk being mixed in the 11.6ms interval. This means the entire fade will fit into one mixing interval, we end up with just one volume level so there is no fading at all. Not to mention that it's impossible to time the fade-out accurately.

Luckily SDL_Mixer provides support for attaching filters (effects) to samples. In the filter we can do whatever we want with the audio data. Writing our own fader with sample level precision (again, those are the audio signal samples) as a filter is trivial. Add an option to delay the fade to the exact moment we need it and voilà, no more annoying clicks.

Aligning layers

Timing is also important when it comes to all the layers and sub-layers playing together. They need to be aligned (synchronized) into a time grid defined by a smallest unit of time for a given track. The first example uses ~286ms (12,600 samples) time unit.

When the SDL's audio thread is mixing it needs to lock the audio device to avoid race conditions with other threads (the rest of the game). Only one thread should change the active audio resources at any given time. For the same reason, when we want to start playing a sample, the audio device is locked.

Even if some layer of a variable track doesn't start playing for some time after the track itself is started we can't wait with starting the layer because we have no way of starting it at an exact time, there is no 'start playing in x milliseconds' functionality and even if there was one, there would still be the problem of mixing in chunks I mentioned already. So we need to start all layers at the same time to have them aligned, the inactive ones will be playing silence.

The audio thread can jump in and lock the audio device at any time. If we have the bad luck of it happening in the moment we started first two layers out of total five for example, the remaining three layers would be started ~11.6ms later, out of alignment. SDL_Mixer doesn't provide explicit locking functions. SDL itself does provide them, however SDL_Mixer tries to acquire the lock for most operations, so some little tweaks are needed anyway, but it's not a difficult thing to add.

Similar problems need to be addressed when a layer is not playing anything in the middle of a track. It can't be just stopped because we couldn't restart it at the precise moment we need it again. It is left playing silence, aligned to the time grid.

Frequently asked questions from last weekDonion

  • Which music plays when remotely viewing different surfaces and when switching surfaces?
  • Each track is tied either to one planet or to space platforms. The surface you are currently looking at (either in regular mode when controlling your character or in remote view) is used to select the appropriate music track. When switching between surfaces, again either with the character or with remote view, the music track is switched. The progress of the current track for each surface is remembered so when you switch back to a surface that had music interrupted by a surface switch, the track will be resumed from the remembered place instead of restarting a new track. This way you have the immediate feedback of switching surfaces including its music without constantly stopping and restarting it.

    Currently the switch between different surfaces' tracks happens immediately. However this might change and some kind of a transition, for example a short fade-in/fade-out, might be added, depending on testing and feedback.

  • Could the music be made more dynamic and react to the game context?
  • In order to not go into technical details last week, we weren't entirely accurate about the music player not knowing anything about the game state. Clearly it knows something if it can react to which surface is being viewed. However, this is the only variable from game state the music player takes into account.

    The engine having these limitations doesn't mean it would be impossible to pass more information along into the music player, that's the easy part. But the music itself would have to be composed with such system in mind from the beginning, years ago. We would need to have all these conditions well defined in advance so Petr could compose in a way it would all work together. Those are the difficult parts. I'm not saying it's impossible to do, but we went a different route.

    Long story short, if a more dynamic music hasn't happened by now, it most likely won't happen.

  • Can we have fine control over music?
  • There are no plans to add complex controls like custom playlists or per surface settings. A simple mod can be created to achieve that if someone really wants that feature.

    Simple controls are a different story, for 2.0 we have added options to bind a key to skip the currently playing track, go back to the previously played track and to pause/resume the music.

  • Will the music play all the time?
  • No, same as it is right now in 1.1, there will be randomized pauses between tracks. At most we would just tweak the pause duration to fit better with the lengths of Space Age tracks, this is subject to feedback from testing.

  • Will the Space Age soundtrack be sold separately?
  • Yes, a digital version of the soundtrack will be available for purchase, same as the base game's soundtrack.

As always, instantiate a variant of your thoughts at the usual places.