SoundManager 2: Technical Notes

Requirements + Specifications

What SM2 needs, and how it works.

Requirements + Specifications

Prerequisites (client)

  • HTML5 Audio() support and/or Flash plugin, version 8 or higher
  • Supported web browser + platform

Tested Browsers and Platforms

  • Firefox (all versions), Windows/Mac
  • Safari 1.3+ (Mac) / All Windows versions
  • Mobile Webkit: iOS 4.0 devices, iPad 3.2 (original iPad iOS release) and newer
  • Android (2.3+, confirmed on 2.3.3.)
  • Google Chrome (all versions/OSes)
  • Internet Explorer 5.0+, Windows
  • Opera 9.10 (slightly buggy, 9.5+ ideal), Windows/Mac
  • Netscape 8.0+, Windows/Mac
  • Firefox 1.5+, Linux (Flash 9 beta)

In the absence of native HTML5 audio support, JavaScript-to-flash communication is used to provide Flash-based audio playback. JS-to-flash is possible through Flash 8's ExternalInterface feature, which uses a standard browser plugin architecture implemented by each browser manufacturer (see NPAPI.)

For further JS/flash reference, see Adobe's Flash 8 documentation, under the "ExternalInterface support" page which details supported browsers.

Of course, not all possible combinations of browser/OS have been tested. Most modern browsers and devices should work reasonably well.

Caveats + Limitations / FAQ

Supported sound formats (MP3 via Flash 8 and MP4/M4A/AAC via Flash 9 "MovieStar", with caveats)

SM2 uses Flash's native Sound object for loading and managing sound, so it is subject to the same limitations that Flash 8 is. Perhaps a design decision, the Flash 8 sound object only supports MP3 files through the loadSound() ActionScript method. SM2 is not able to load other sound formats, including audio-only SWF files, due to this limitation. Refer to the Flash 8 documentation for details.

MP3 Format Caveats

Additionally, some very low and very high bitrate MP3s, and Variable Bitrate (VBR) MP3s may play either too quickly or too slowly (see "the chipmunk problem"); if you are encountering this issue, try re-encoding at a different bitrate (between 64 kbps and 192 kbps, for example.) Using Constant Bitrate (CBR) encoding may also alleviate this problem.

It has been suggested that sample rates that are neither 22/44 KHz can also contribute to this issue. 44 KHz is the standard CD-spec sample rate, and is recommended for "hi-fi" recordings.

Looping

Perhaps due to the way Flash dynamically loads and decodes MP3 data, seamless looping doesn't seem to be fully implemented. Loops have a noticeable gap between the finish and start. This has been an issue since the original version of SoundManager. Rather than have a broken feature, the funcionality has been omitted until a solid workaround is found.

Flash 8 limitations with multiShot (overlaying/"chorus") effects

Regarding "layering" sounds (calling play() on a sound multiple times): Even though a multi-shot option can be specified, it does not work with Flash 8; a single instance of a sound can only have one timeline. The current behaviour is that when multiShot is specified and play() is called on a currently-playing sound, it will restart from the beginning without an overlay.

However, the API does provide some creative ways (onfinish for looping, multiple sound objects for multi-shot layering) of working around these Flash limitations.

It should be noted that sounds can loop seamlessly and be layered when linked and exported to SWF from within the Flash IDE, but SoundManager does not support SWF-based audio.

Flash 9 multiShot capabilities

The Flash 9-based version of SoundManager2 can successfully layer sounds via "multiShot", truly playing a single sound multiple times on top of itself. However the API will only call certain timing-related methods such as whileplaying() for the first play() "instance" of the sound, to avoid confusion. By contrast, simpler methods such as onfinish() will be called multiple times, one for each instance of play().

ID3 Parsing

ID3 data can differ in formatting, version and subsequently be oddly-parsed by Flash. Values may sometimes be repeated across different fields.

ID3 info seems to fail to load for iTunes 7-edited files, perhaps due to the format or inclusion of album artwork (image data.)

Performance Notes: Caching + RAM Obeservations

Flash appears to use the browser cache (presumably the OS' native, or closest browser,) so the browser's cache size and other settings may affect Flash's cache behaviour. It is safe to assume a 100 MB MP3 will probably not be cached, for example, but a 16 MB one most likely will be.

MP3s appear to be loaded and stored in RAM while loading over HTTP, so memory use needs to be considered for both large MP3s and streaming radio-type applications.

Timing/Latency (JS + Flash, ExternalInterface-related)

JavaScript-to-Flash communication is not instantaneous on slower systems, but can be much better on more modern systems. Latency (timing lag/delays) can be noted in some cases from function call to sound execution. It is possible some performance analysis can help to speed up this area for timing-critical applications involving animation etc., but this area has not been thoroughly investigated yet. Brad Neuberg has some notes on speeding up ExternalInterface which may be relevant.

Flash-to-OS/hardware latency (where flash reports progress, but no sound is heard for a number of milliseconds) may also be an unfortunate reality of Flash-based audio, varying between platform and OS version etc.

Additionally, MP3 files may contain audible gaps at the beginning or end by default when encoded, even if the source (eg. WAVE) file did not. Using optional "nogap" encoding options with programs such as LAME may help to remedy this.

Finally, the useHighPerformance option may help with JS/flash lag. Using this option causes the flash movie to be placed with position:fixed on-screen at all times (though in a small, hidden box) and has been shown to notably improve performance on Mac OS X. As well, flashPollingInterval will use a lower timer value, making polling calls run as quickly as reasonably possible and increasing the frequency of calls to whileplaying(), whileloading() and other time-related events.

Use these options with caution, as overly-aggressive intervals may hinder performance if event callbacks become too frequent.

Serving to HTML5 + Flash Clients

A few notes on HTTP response headers, configuration and so forth.

Tips for serving audio to HTML5 + Flash Clients

HTTP response headers from your server are important. Below is a brief list of recommended practices for serving audio content to clients using HTML5, and/or Flash.

HTML5

  • Likes Content-Length HTTP response headers (can affect duration and playback events if missing.)
  • Arbitrary seeking, dynamic loading/buffering works when the server supports Byte Serving via byte ranges, also known as HTTP partials / range requests. Without this support, behaviour falls back to a Flash-like single-connection, sequential/progressive-style download.
  • Likes a proper MIME type response from the server, e.g., audio/mpeg for MP3 content.

Flash

  • Likes Content-Length HTTP response headers. When missing, duration may be unknown and certain events like whileloading() / whileplaying() may not fire.
  • Does not care about MIME type or partials in server responses.

In summary...

Always serve a proper Content-Length HTTP response header. For HTML5, review that HTTP partials / range requests are enabled and that you are serving the correct MIME type in your response as well.

Never apply gzip or mod_deflate compression to binary assets. It causes playback problems, costs CPU and in some cases, may even increase the transfer size.

How Clients Download Audio

Progressive, sequential HTTP vs. HTTP 206 partials / range requests.

How HTML5 + Flash Clients Download Audio

In terms of HTTP, assets are typically requested and downloaded in sequential byte order over a single connection. Flash typically works this way, but HTML5 clients can use behaviours that are closer to streaming - instead, using multiple requests for pieces of data.

Browsers can request audio from servers either sequentially or using Byte Serving (AKA "partials"), depending on client and server capabilities. The key difference is that with byte serving, clients request data in a "streaming" fashion and can buffer at will, therefore allowing arbitrary seeking, pausing and requesting of audio data in pieces - similar to how video playback typically works on YouTube, for example. Thus, preloading and onload() is less-meaningful when byte serving is involved.

Traditional, "progressive" download (single request)

The typical HTTP file download can be described as follows:

  • Single HTTP request, one TCP connection
  • All bytes are sent sequentially "over the wire"
  • Network connection is closed when the download completes (or fails)
  • SoundManager 2 fires relevant sound onload() callback when the connection closes and the sound is deemed to be valid

In terms of HTTP traffic, the sequence is something like this (excluding some headers for brevity):

  • Client request
    • GET some.mp3 HTTP/1.1
  • Server response
    • HTTP/1.1 200 OK
    • Content-Length: 158958
    • Connection: close
    • Content-type: audio/mpeg

Progressive downloading behaviour in SoundManager 2

This method is typically used by Flash when requesting audio via standard HTTP, and HTML5 clients in the event that Byte Serving (HTTP 206/Partial Content) is not implemented or cannot be negotiated.

  • Playback begins after a small amount of buffering, once the download has started.
  • Once started, the download progress continues until all bytes are received. Stopping or pausing playback does not cancel the download.
  • During load, a SM2 sound object's whileloading() event will fire at a regular interval with bytesLoaded, bytesTotal and duration properties updating as the file progresses. Because Flash only reflects the "amount of duration loaded", durationEstimate is provided as a means of reflecting the total duration before load has completed (at which point duration is 100% and accurate.)
  • During load, the user can only seek within the amount of data (duration) downloaded.
  • onload() fires when the connection is closed, and all bytes have been received.

Byte Serving (partials / byte range requests, "streaming"-style delivery)

HTML5 clients will send a Accept-ranges: bytes header in the HTTP request for an audio asset to indicate their capability for Byte Serving, along with a Range header indicating what piece of the file to download.

If the server supports partials, it will reply with a HTTP/1.1 206 Partial Content header and a Content-Range header indicating the bytes it is going to send.

It appears that servers will return the whole range in the first response unless interrupted (and when a client requests a range of "0-" as in this case), but the initial connection may be dropped by the client if it wishes to stop "buffering" at any point, or if the user tries to seek to a new position in the audio that has not yet buffered.

  • Client request
    • GET some.mp3 HTTP/1.1
    • Range: bytes=0-
  • Server response
    • HTTP/1.1 206 Partial Content
    • Accept-ranges: bytes
    • Content-length: 4237566
    • Content-Range: bytes 0-4237565/7237566
  • The first response includes all bytes in this case, but a client may drop this connection and make a new range request, if needed; for example, the user may seek to a new position in the file where data is not yet available, and the client will make a new request to buffer data beginning at the appropriate offset.

HTTP/1.1 206 Partial Content request/response example

  • Client request
    • GET some.ogg HTTP/1.1
    • Range: bytes=5210604-5275910
    • If-Range: "508107-4d8a0b4e90d26"
  • Server response
    • HTTP/1.1 206 Partial Content
    • Accept-Ranges: bytes
    • Content-length: 65307
    • Content-Range: bytes 5210604-5275910/5275911
    • Content-Type: audio/ogg

Byte Serving behaviour in SoundManager 2

Byte Serving is automatically negotiated between client and server, and offers a number of advantages over traditional downloads. Most notably, Byte Serving is closer to a "streaming" technology and enables clients to seek, buffer and resume playback at arbitrary positions within a file once the duration is known. Larger files benefit most from this technique, since they can be handled in smaller chunks vs. being held entirely in memory a la Flash.

While SoundManager 2 uses the same events for HTML5 and Flash regardless of transport mechanism, there are some notable differences in SoundManager 2's behaviour when using HTML5 and Byte Serving.

Most importantly, the concept of "loaded" (i.e., preloading data and waiting for onload()) with partials is irrelevant due to the way clients are able to pause, cancel and resume requests.

  • Playback begins after a small amount of buffering, once the download has started.
  • The HTML5-native canplay() event fires when playback is ready to begin. This will cause SoundManager 2 to fire a sound's onload() callback. At this point, the sound's duration should be known, but the SM2-provided bytesLoaded and bytesTotal sound properties may not be known and thus are 0/0 during load time, and 1/1 if still undefined at load time.
  • SoundManager 2 attempts to fire whileloading() events as the HTML5-native progress() event fires, but keep in mind that any "bytes loaded" data may refer to blocks (i.e., buffered sequences of data) that are non-sequential and thus bytesLoaded and bytesTotal should not be relied on as an indicator of "total load progress".
  • HTML5 clients may use their own heuristic to determine how much to buffer, if and when to pause, cancel or resume requesting data.
  • Clients may request metadata from the end of a file in some cases, as with OGG formats, in order to determine information like the total duration of the sound.
  • During load (and once the duration is known), the user can seek to any position within the file.
  • A client may drop an open request in order to request a new byte range - for example, to buffer and resume playback if the user jumps to half-way through the file where that data has not been downloaded yet.

Mobile Device Limitations

Known restrictions, quirks and annoyances relating to HTML5 + mobile Webkit (iOS / Android) software.

Mobile Device Restrictions, Quirks and Issues

Mobile devices tend to be somewhat limited in terms of battery life, network connection and other resources. Furthermore, they are unlike larger (laptop/desktop) devices in that audio is often not a "shareable" resource, and only one application may be using the sound hardware at any given time. Thus, mobile devices must be treated slightly differently in terms of playing audio via JavaScript.

iOS (mobile Safari/Webkit)

  • Sound play() calls are blocked by the OS unless in direct response to a user event like touch or click. ("Auto-play" attempts will be blocked, and the sound may fire a suspend event in this case.) For the curious, setTimeout() calls will also result in playback being blocked.
  • Sound load() and load-related methods may also have some limitations, similar to play().
  • Only one sound may be actively playing at a time. If an other sound (or application) is playing audio, it will be stopped by the OS in order to play the new sound.
  • Chained playback (sequential / playlist-style behaviour) works when using the onfinish event handler. Otherwise, blocking occurs.

Possible Alternatives

The Webkit Audio API (possibly a future standard as the Web Audio API) allows for low-level access, manipulation and control of audio from JavaScript. This API is mostly separate from the HTML5 audio API, but will ultimately allow for better fine-grained control and mixing of sound.

Once playback is allowed, the Webkit Audio API should be able to get around many of the current HTML5 audio limitations present on iOS including multiple sound playback, volume and pan control, and dynamic filtering / processing effects. However, the API is not a standard and is not consistently supported in Webkit, let alone other browsers.

SM2 does not presently use the Webkit Audio API. It may be experimentally or more formally added at some point in the future, when the API is more universally-supported.

Debug + Console Output

Console-style messaging, useful for troubleshooting start-up and runtime issues.

Live Debug Output

With debug mode enabled via soundManager.setup({debugMode: true}), SM2 can write helpful troubleshooting information to JavaScript console.log()-style interfaces. Additionally, output can be written to an optional DIV element with the ID of soundmanager-debug.

soundManager.setup({consoleOnly: true}) can be applied to disable HTML output (using console.log()-only methods) as well.

Additionally, debugging within the Flash portion of SM2 is also available and set using soundManager.setup({debugFlash: true}). Debug messages are written to the flash movie itself.

For info on SoundManager 2 loading/initialization failures and how to fix them, see troubleshooting.

Below is a live example of debug output from SM2:

Flash debug output, as applicable: