Gregory Hildstrom Projects Publications Resume Links Contact About Google+ Facebook Youtube Donate




j2kaudio

Overview

j2kaudio is a library designed to leverage the JPEG 2000 2-dimensional wavelet image compression standard to compress audio data. There are several distinct advantages to this approach over many existing lossless and lossy compression/encoding methods:
j2kaudio uses the JasPer implementation, version 1.900.1, of JPEG 2000. libjasper.tar.gz is a minimal JasPer 1.900.1 library with a simplified Makefile and everything but JPEG 2000 removed.

j2kaudio.tar.gz is the source code, version 200707311200.

File Format

j2a files store the JPEG 2000 compressed audio data. The audio is divided up into many smaller frames; each frame is an independant jp2 image file. Here is an example layout of a 3-frame j2a audio file:
float fps
unsigned long bytes
jp2 image data frame 0
unsigned long bytes
jp2 image data frame 1
unsigned long bytes
jp2 image data frame 2
EOF

Data Organization

Each j2a frame is an individual jp2 JPEG 2000 image that is number-of-channels pixels high and samples pixels wide. The image height for stereo audio data is 2 pixels. The image width for 44100Hz audio at 1 FPS would be 44100 pixels. The image width for 44100Hz audio at 10 FPS would be 4410 pixels. Each jp2 image is 1 16-bit grayscale component, for CD-quality stereo audio data. 8-bit audio is stored as 1 8-bit grayscale component and 32-bit float audio is stored as 3 grayscale components: 1-bit sign, 8-bit exponent, and 23-bit mantissa. An example image/frame layout that shows the audio channels, scans, and samples is shown below.
                       Scan
                 0    1    2    3    ...
Channel      0  s0   s1   s2   s3   s...
             1  s0   s1   s2   s3   s...
Here are some notes from the code:
j2kaudio extension and format: .j2a
sample rate: 44100 default for testing
number of audio channels: image height: 2 default for testing
suggested channel layout for multimedia apps:
	row 0: left (mono)
	row 1: right (stereo)
	row 2: subwoofer (2.1)
	row 3: center (3.1)
	row 4: surround left (5.1)
	row 5: surround right (5.1)
	row 6: back (6.1) or rear left (7.1)
	row 7: rear right (7.1)
frames per second: 1 default for testing
sample rate / frames per second: image width: default 44100 / 1 = 44100
bits per sample: 16 default 1-component grayscale images for 16-bit audio
The image frame layout makes adding additional audio channels a trivial task. Additional audio channels, from mono to 5.1 to 24 tracks, just make the images taller.

Compression

The pdf documentation included with the JasPer source code is an excellent reference on JPEG 2000 image compression. An encoding rate of 1 specifies lossless compression, which can recreate the original wave form data for each track exactly. An encoding rate of less than 1 is lossy compression, which means the original wave form data cannot be recreated exactly; information has been lost just like MP3. One interesting aspect of 2-dimensional audio compression is that similarities between channels will enable higher compression. This aspect is similar to MP3 joint-stereo, which also uses similarities between channels to increase compression, but it applies to any number of channels.

Example Program

The program j2kaudio is an example program that converts files from wav to j2a and j2a to wav. The j2kaudio example program loads, compresses or decompresses, and writes one frame of the input file to the output file in a loop, which is ideal for streaming and low memory footprint. The tiny non-streaming open-source wav file I/O code I use just happens to read/write to one giant buffer before doing any disk I/O, which I will fix soon. The example program also truncates the input wav file to the nearest whole frame, so that all frames are the same size. In my test of tk10.wav, this behavior only truncated 0.053s or about 9kB of uncompressed CD-quality data. Future work may allow the last frame to be a special case for a fraction of a second, but I have not worked on that yet.
encoding usage: j2kaudio encodingrate wavfilein.wav j2afileout.j2a (lossless or lossy)
encoding usage: j2kaudio wavfilein.wav j2afileout.j2a (lossless)
encoding usage: j2kaudio wavfilein.wav (lossless)
decoding usage: j2kaudio j2afilein.j2a wavfileout.wav
decoding usage: j2kaudio j2afilein.j2a

Building

  1. cd ~ (go to a working folder like your home directory or desktop)
  2. wget http://geocities.com/hildstrom/projects/j2kaudio/libjasper.tar.gz
  3. wget http://geocities.com/hildstrom/projects/j2kaudio/j2kaudio.tar.gz
  4. tar xzf libjasper.tar.gz
  5. cd libjasper
  6. make
  7. cd ..
  8. tar xzf j2kaudio.tar.gz
  9. cd j2kaudio/src
  10. make

Lossless Performance

Update: I discovered that passing rate=1 to the JasPer JPEG 2000 encoder does not result in lossless compression like I had thought; some information was lost in my initial testing! j2kaudio now seems to have the worst performance of any lossless compressor tested. Its compression performance is similar to shorten, but it is 10 times slower. =:) Just go straight to the tk10.wav results; I did not bother to retest the other wav files. Lossy performance was not affected by this bug. I discovered this undocumented behavior while working on 32-bit support and checking I/O sample by sample. The rate= option must not be passed to the JasPer JPEG 2000 encoder from the j2kaudio code at all for lossless compression. The lossless compression, obviously, sounds identical to the original. For the 16-bit lossless compression comparison I tested against FLAC, LPAC, Shorten, Monkey's Audio, OptimFrog, and LA. Three 16-bit test files were chosen: sinetest.wav, sweeptest.wav, and tk10.wav. For the 32-bit lossless comparison I tested against wavpack, tar gzip, and tar bzip2. Three 32-bit test files were chosen: 32b.wav and tk10-32b.wav, and 32bWN.wav. The test computer is a 3.0GHz P4, 1GB RAM, Fedora Core 6 Linux, and 32-bit kernel. The red cell text indicates an error in initial testing, green cell text indicates the best performance, and orange indicates an updated result. The tk10.wav file was ripped from an audio CD using CDex, which produced identical output to cdda2wav.

sinetest.wav 44.1kHz/16b/2ch
10sec 60Hz sine wave
1764044B
versioncompressed size (B)encoding time (s)compressed size (%)decoding time (s)
j2kaudio200707231230734890.44.10.2
flac1.1.23448771.119.50.1
lpac1.40 3.082665200.515.10.2
shorten3.6.14276290.824.20.1
monkey's audio (mac)3.993229483.118.33.3
optimfrog (ofr)4.522043690.711.50.4
la0.42607792.414.82.1

sweeptest.wav 44.1kHz/16b/2ch
60sec 20Hz-20kHz sweep
10999852B
versioncompressed size (B)encoding time (s)compressed size (%)decoding time (s)
j2kaudio20070723123036421756.133.13.1
flac1.1.247386266.943.00.3
lpac1.40 3.0841843734.538.01.8
shorten3.6.168058330.561.90.3
monkey's audio (mac)3.99472565621.743.022.6
optimfrog (ofr)4.5239475144.035.82.8
la0.4481342116.643.814.5

tk10.wav 44.1kHz/16b/2ch
230sec Twista Kamikaze Track 10 So Sexy
40757852B
versioncompressed size (B)encoding time (s)compressed size (%)decoding time (s)
j2kaudio rate=12007072312302302480232.456.517.2
j2kaudio rate= omitted2007073112003121671223.076.621.3
flac1.1.22860031526.170.21.4
lpac1.40 3.082811061316.969.06.8
shorten3.6.1306513972.275.21.3
monkey's audio (mac)3.992725703281.466.983.7
optimfrog (ofr)4.522701692613.966.310.2
la0.42671035162.065.553.8
tar czf (gzip)1.15.1395890554.3397.10.76
tar cjf (bz2)1.15.13940934919.5396.710.50

32b.wav 44.1kHz/32b/2ch
60sec L ch 20-20kHz sine wave
R ch 20k-20Hz square wave
21168088B
versioncompressed size (B)encoding time (s)compressed size (%)decoding time (s)
j2kaudio2007080111001779266121.3384.0518.3
wavpack4.41.0162561722.2376.791.16
tar czf (gzip)1.15.1114663095.6954.170.4
tar cjf (bz2)1.15.1102576077.4748.464.28

tk10-32b.wav 44.1kHz/32b/2ch
230sec L ch 32-bit tk10.wav mixed to mono
R ch 32-bit tk10.wav mixed to mono reversed
81515704B
versioncompressed size (B)encoding time (s)compressed size (%)decoding time (s)
j2kaudio2007080111005475604578.5967.1768.26
wavpack4.41.0316592627.5438.844.12
tar czf (gzip)1.15.15708656333.8070.032.35
tar cjf (bz2)1.15.14529022330.1955.5616.48

32bWN.wav 44.1kHz/32b/10ch
60sec 10 channels of white noise
105840152B
versioncompressed size (B)encoding time (s)compressed size (%)decoding time (s)
j2kaudio20070801110096382920101.1491.0694.98
wavpack4.41.08712247412.0482.326.80
tar czf (gzip)1.15.19767645313.9692.291.92
tar cjf (bz2)1.15.19868695150.4793.2427.01

Lossy Performance

For the lossy compression comparison I tested against LAME, Ogg Vorbis, and AAC (faac/faad). The lossy quality of j2kaudio does not sound as good as mp3, ogg, or aac of similar bitrate. The main audible compression artifact is high-frequency noise introduced while trying to compress high-frequency audio notes like the symbol, high-hat, triangle, or hiss. The traditional lossy audio encoders lose this high-frequency data in a smooth and mostly-unobtrusive way, although it is still audible. Even though j2kaudio produces some high-frequency noise, it recreates the original wave with less error than the other lossy encoding methods, which I calculated in 2003. A lowpass filter should be able to reduce the highest frequencies from 20kHz down to 15kHz or so, which would ease the compression noise, but this approach has not been implemented yet.

sinetest.wav 44.1kHz/16b/2ch
10sec 60Hz sine wave
1764044B
versioncompressed size (B)encoding time (s)compressed size (%)decoding time (s)audio quality rank
j2kaudio200707231230734890.44.10.24
mp3 (lame)3.971613310.79.10.13
ogg (oggenc)1.0.21637201.69.30.12
aac (faac)1.251628750.79.20.11

sweeptest.wav 44.1kHz/16b/2ch
60sec 20Hz-20kHz sweep
10999852B
versioncompressed size (B)encoding time (s)compressed size (%)decoding time (s)audio quality rank
j2kaudio2007072312309182095.88.31.64
mp3 (lame)3.979989214.69.10.93
ogg (oggenc)1.0.2101536911.89.20.52
aac (faac)1.2510098724.79.20.51

tk10.wav 44.1kHz/16b/2ch
230sec Twista Kamikaze Track 10 So Sexy
40757852B
versioncompressed size (B)encoding time (s)compressed size (%)decoding time (s)audio quality rank
j2kaudio200707231230346184728.68.56.04
mp3 (lame)3.97369810120.09.13.13
ogg (oggenc)1.0.2375724452.69.22.02
aac (faac)1.25373417514.29.22.41