SDL Programming in Linux: Spicing up with Sound

Jalesh Jain

A game without audio is like buffet without spice. Without sound bytes game can be played but it would fail in providing an immersive environment. Before the coming of SDL providing sound effects was either very complex to implement or very limited in output. Then came SDL with its core and extended libraries. The core library provides the functionalities to work with wav files. Using the extended libraries, sound formats such as mid, mpeg-1 etc. can be integrated into the gaming environment. In the first section I would be discussing the core audio library. The second section would detail about using the timer APIs. The last section would use the APIs introduced in first section to create an application that is extensible for future projects.

Playing the Sound- SDL way:

Sound is one of the sub-systems of SDL. But unlike other sub-systems, sound not-only needed to be initialized but also opened akin to setting up the video mode. Even then sound can be used only by using playing routines. In essence there are three steps to use sound within the application. They are:

1. Initializing audio
2. Opening the audio
3. Playing the sound

It is in the second step that the format, track rate etc. comes into picture. Following are the details of each step:

1. Initializing audio:

The first step in using audio in an application is initializing the audio subsystem. This is done by passing the parameter referring the audio subsystem i.e. SDL_AUDIO. To put it in code:


This is not different than initialization of any other sub-systems.

2. Opening the audio:

To open anything, be it a file or a socket, certain data has to be passed to the environment such as file name, mode etc. Opening the audio is no different. The data required to be passed include frequency, format etc. In order to provide these to the environment SDL_OpenAudio() is used. This method takes two parameters- both are references of type SDL_AudioSpec which is a structure. The members of this structure are:

a. freq:
It is an integer representing the frequency of the sound to be played. It is measured in samples per second. The common values are 11025, 22050 and 44100. Higher the value, higher the frequency and better the quality.

b. format:
The format of the audio is represented by format. The data type is UInt16. The Format means the size and type of samples being sent. The common value is AUDIO_S16. The other acceptable values include AUDIO_U16, AUDIO_U8. The U stands for unsigned bits, S stands for signed bits and the number represents  the bits in the samples. Hence the value AUDIO_S16 represents a sample  
having 16 bits which are unsigned.

c. channels:
The no. of separate channels to be used is provided as a value of this member. A value of 1 indicates that mono i.e. single channel and a value of 2 indicates that stereo channel has to be used.

d. samples:
This refers to the size audio buffer in samples.

e. callback:
It takes the pointer to the function that would be used to fill the audio buffer. The function takes user data, stream and length of the user data as parameter.

Apart from these the other members include unsigned 8 bit integer representing silence, UInt32 representing the size of the buffer and void pointer to the user data.
In code it would be:

    SDL_AudioSpec wanted;
    void fill_audio(void *udata, Uint8 *stream, int len);

    /* Set the audio format */
    wanted.freq = 22050;
    wanted.format = AUDIO_S16;
    wanted.channels = 2;    /* 1 = mono, 2 = stereo */
    wanted.samples = 1024;  /* Good low-latency value for callback */
    wanted.callback = fill_audio;
    wanted.userdata = NULL;

where the frequency is 22050, the format to be used is in 16 bit unsigned integer, the channel is stereo, the size of the audio buffer in sample is 1024 and function is fill_audio. There is user data to be passed. Next is playing the audio.

3. Playing the Audio:

Playing the audio not only means filling the buffer with required data but also   loading the audio file to be played. The functions required are the callback function and the file playing functions. The file playing functions include:

a. SDL_LoadWav:
It loads a wav file and returns the given SDL_AudioSpec with the corresponding data filled. The first parameter is the name of the wav file. The second is SDL_AudioSpec. If successful the third parameter would contain malloc’d buffer contain the audio data and the last parameter would have the length of the malloc’d audio buffer. In code it would be:

SDL_AudioSpec wave;
            Uint8 *data;
            Uint32 dlen;
           char *file;
SDL_LoadWAV(file, &wave, &data, &dlen);

The above code would load the file represented by file into the data and set its specifications into wave and the length of the buffer into dlen.

     b. SDL_BuildAudioCVT:
To actually use the data it must be converted for which SDL_AudioCVT structure is used. This structure must be initialized. The function to initialize the structure is SDL_BuildAudioCVT. The parameters are pointer to the SDL_AudioCVT structure, format of the source in UInt16, channels in the source in UInt8, rate of the sample in int, format of the destination in UInt16, channels in the destination in UInt8, rate of the sample of destination in int where the source and destination are the formats of conversion. In code:

SDL_BuildAudioCVT(&cvt, wave.format, wave.channels, wave.freq,
                                   AUDIO_S16,2, 22050);

where cvt is the SDL_AudioCVT structure, wave.format, wave.channels, wave.freq are the format, channels and frequency of source format and AUDIO_S16,2, 22050 are the format, channels and frequency of destination format. Discussing SDL_AudioCVT is beyond the scope of this discussion. I will be discussing it in the near future.

d. SDL_ConvertAudio:
This function converts one format of audio to another. It takes only one parameter- previously initialized SDL_AudioCVT. It converts the data pointed to by the buffer of the SDL_AudioCVT member. To understand it fully lets have a look at a detailed code. The comments are self explanatory:

SDL_AudioSpec *desired, *obtained;
SDL_AudioSpec wav_spec;
SDL_AudioCVT  wav_cvt;
Uint32 wav_len;
Uint8 *wav_buf;
int ret;

/* Allocated audio specs */
desired=(SDL_AudioSpec *)malloc(sizeof(SDL_AudioSpec));
obtained=(SDL_AudioSpec *)malloc(sizeof(SDL_AudioSpec));

/* Set desired format */

/* Open the audio device */
if ( SDL_OpenAudio(desired, obtained) < 0 ){
  fprintf(stderr, "Couldn't open audio: %s\n", SDL_GetError());

/* Load the test.wav */
if( SDL_LoadWAV("test.wav", &wav_spec, &wav_buf, &wav_len) == NULL ){
  fprintf(stderr, "Could not open test.wav: %s\n", SDL_GetError());
/* Build AudioCVT */
ret = SDL_BuildAudioCVT(&wav_cvt,
                        wav_spec.format, wav_spec.channels, wav_spec.freq,
                        obtained->format, obtained->channels, obtained->freq);

/* Check that the convert was built */
  fprintf(stderr, "Couldn't build converter!\n");

/* Setup for conversion */
wav_cvt.buf=(Uint8 *)malloc(wav_len*wav_cvt.len_mult);
memcpy(wav_cvt.buf, wav_buf, wav_len);

/* We can delete to original WAV data now  It is coming up next*/

/* And now we're ready to convert */

/* do whatever */

c. SDL_FreeWAV:
Once building and conversion is done then the file loaded into the user data has to be released as it is no longer required. The conversion provides it to the application as a part of the buffer of the SDL_AudioCVT buffer member. To release the memory occupied by user data SDL_FreeWAV has to be used.
In code:

That’s it about the functions. The next section would show how to use it to play the sound.

Playing the Sound- In real world:

Till now I have shown you the code snippets. Now, its time for a full fledged application. So here goes.

First the includes:

#include "SDL.h"
#include "SDL_audio.h"

Then comes the main and opening the audio:
int main()
    extern void mixaudio(void *unused, Uint8 *stream, int len);
    SDL_AudioSpec fmt;

    /* Set 16-bit stereo audio at 22Khz */
    fmt.freq = 22050;
    fmt.format = AUDIO_S16;
    fmt.channels = 2;
    fmt.samples = 512;        /* A good value for games */
    fmt.callback = mixaudio;
    fmt.userdata = NULL;

    /* Open the audio device and start playing sound! */
    if ( SDL_OpenAudio(&fmt, NULL) < 0 ) {
        fprintf(stderr, "Unable to open audio: %s\n", SDL_GetError());
    //can call other functions like mixing and playing functions
    SDL_CloseAudio();//closes the audio and setting the fmt to null

The next part is playing the wav file. For this we need a structure that keeps track of the current sound data, position and length. Following is the structure:

#define NUM_SOUNDS 2
struct sample {
    Uint8 *data;
    Uint32 dpos;
    Uint32 dlen;
} sounds[NUM_SOUNDS];

The next part is playing the file. The function goes thus:

void PlaySound(char *file)
    int index;
    SDL_AudioSpec wave;
    Uint8 *data;
    Uint32 dlen;
    SDL_AudioCVT cvt;

    /* Look for an empty (or finished) sound slot */
    for ( index=0; index<NUM_SOUNDS; ++index ) {
        if ( sounds[index].dpos == sounds[index].dlen ) {
    if ( index == NUM_SOUNDS )

    /* Load the sound file and convert it to 16-bit stereo at 22kHz */
    if ( SDL_LoadWAV(file, &wave, &data, &dlen) == NULL ) {
        fprintf(stderr, "Couldn't load %s: %s\n", file, SDL_GetError());
    SDL_BuildAudioCVT(&cvt, wave.format, wave.channels, wave.freq,
                            AUDIO_S16,   2,             22050);
    cvt.buf = malloc(dlen*cvt.len_mult);
    memcpy(cvt.buf, data, dlen);
    cvt.len = dlen;

    /* Put the sound data in the slot (it starts playing immediately) */
    if ( sounds[index].data ) {
    sounds[index].data = cvt.buf;
    sounds[index].dlen = cvt.len_cvt;
    sounds[index].dpos = 0;

Lastly we have to define the callback function for the SDL_AudioSpec which is:

void mixaudio(void *unused, Uint8 *stream, int len)
    int i;
    Uint32 amount;

    for ( i=0; i<NUM_SOUNDS; ++i ) {
        amount = (sounds[i].dlen-sounds[i].dpos);
        if ( amount > len ) {
            amount = len;
        SDL_MixAudio(stream, &sounds[i].data[sounds[i].dpos], amount, SDL_MIX_MAXVOLUME);
        sounds[i].dpos += amount;

That brings us to the end of this part. I have left several aspects unexplained. The reason is that just explaining them as stand alone functions wouldn’t do any good. They have to be  understood in the context of rendering and scenes. SDL_MixAudio for mixing and the APIs for timer, threading, networking and CD-ROM access are among such functions. From the next part I would be moving towards rendering using OpenGL with SDL as base framework. In the rendering and animations, the real utility of the above mentioned APIs would be revealed. So till next time.