Gregory Hildstrom Projects Publications Resume Contact About Youtube Donate

j2kaudio JPEG 2000 wavelet audio compression

Source Code (2003)
Report: JPEG 2000 Audio (2003)

Some notes from the development code:
j2kaudio was first operational on July 01 2003
j2kaudio was written by Gregory Alan Hildstrom

I am not aware of any patented ideas present in this code. All algorithm ideas for audio storage in the image file are my own. Please notify me immediately if I am in violation of anything so I can remove it from the web. Email me with questions or concerns. Yes I know that my code is not the cleanest. Yes I know that there are different and better ways of doing things. Please redirect flames to /dev/null.

The code to read and write wav files, which is included with the j2kaudio source code, was written by Dr Fred DePiero - CalPoly State University and is open source, but not in the public domain.

j2kaudio uses the jasper implementation of the JPEG 2000 standard, which is covered by its own open source public license.

j2kaudio is a program designed to leverage the wavelet compression of the JPEG 2000 standard to compress audio data. The current design converts wav files to .jp2 JPEG2000 image files. These image files are mostly compatible with .j2k files; I believe that the JPEG 2000 specifies multiple different image and stream formats that may vary slightly by jpeg 2000 implementation. j2kaudio currently uses the Jasper JPEG 2000 implementation and I am very impressed with their work.

Thank you to the Jasper developers.

Also thank you to Dr Fred DePiero; I am using his code to read and write wav files. The original goal was to create a compression method for audio data that can achieve high compression and/or high quality.

j2kaudio is fairly robust and has several encoding methods to choose from. The first working encoding idea was tested with a 16-bit 44,100 Hz 2 channel wav file, which is standard CD quality audio; created with cdda2wav. The first encoding method created a 2 component 16-bit grayscale image that was 44100 pixels wide x NumberOfSeconds pixels high. This design is good because all of the information about the wav file is contained in the image. The number of audio channels is the number of image components. The sample rate is the width of the image.

The number of seconds of audio is the height of the image. The number of samples is the height multiplied by the width of the image. Other methods that have been tried used a single image component and interleaved the audio channels as alternating rows of pixels; another method that has been tried used separate image files to store the individual audio channels, but both of these other methods require information not stored in the image, which is not necessarily a problem.

One great advantage of using the JPEG 2000 standard is that it allows for lossy and lossless adaptive wavelet compression. This means that j2kaudio can perform lossy and lossless audio compression with the same code and same algorithms. A good portion of the j2kaudio source code is matrix manipulation to get the binary audio data into a highly compressible and managable two dimensional or three dimensional matrix. I tried to use as much of the standard template library (STL) as possible. Using STL helps reduce code size and eases dynamic memory management; it made my life easier.

The encodingrate input to j2kaudio is exactly the same as the encoding rate that must be supplied to any other jpeg 2000 encoding engine. It spcifies the target reduction in stored information. An encodingrate of 1 will perform lossless compression; it will try to not lose any information. An encodingrate of 0.5 will try to lose 50% of the information, but retain as much of the quality as possible with the current algorithms. An encodingrate of 0.090726 performs almost the same size reduction as mp3 compression using 128 as the bitrate.

Unfortunately j2kaudio does not currently sound as good as mp3 at moderate to heavy compression, but it does do a more accurate job at recreating the original wav file. Unfortunately the recreation accuracy does not matter if you can hear the difference with j2kaudio and not with mp3. j2kaudio can however perform lossless compression, which mp3 cannot, which means that the recreated wav file is identical to the original.

j2kaudio can also perform some extreme compression, like 100:1 or an encodingrate of 0.01. The audio is recognizalbe, but the compression really suffers at the edge of the image. Sound recreated from the edges of the image cannot be compressed as accurately as data in the middle of the image, which leads to audible artifacts that occur every ImageWidth samples.

This program is a rediculous memory hog. It loads the entire wav file into memory at least twice at the same time. It is a good idea to have several hundred megabytes of free ram to keep from swapping. j2kaudio should run ok with small wav files and 512 MB of ram. I would recommend running it with 1 GB or more for large wav files to keep from swapping.

There is currently no media player that will play these experimental .jp2 files. The process works like this: j2kaudio 0 audio_in.wav image_out.jp2 0.1 - this will create an image file at roughly 10:1 compression. j2kaudio 1 image_out.jp2 audio_recreated.wav - this will recreate a wav file from a properly create jp2 image file. You can then play audio_in.wav and audio_recreated.wav to hear the original and the compressed versions.

During development I have been using jasper version jasper-1.700.2

Jasper is currently available from http://www.ece.uvic.ca/~mdadams/jasper/

You can also find the jasper page by performing a google search for "jasper jpeg 2000"