About Us

Home >Technology > Full Story

MP3 and what it means to Net users?

The MP3 format is a rage today for downloading audio files over the Internet free of cost. Here is an insight into the technology involved and the benefits it offers.

MP3 is a highly compressed, open file format, and has served primarily for the storing and transporting of music over the Internet.

The Internet has permeated into every aspect of living and likewise has introduced a fundamental shift in the areas of information distribution, postal system and commerce. The music industry has not been left unscathed by the World Wide Web. The combination of 'Motion Picture Expert Group, Layer-3' (MP3) and the Internet have revolutionized the century old music industry.

MP3 is a highly compressed, open file format, and has served primarily for the storing and transporting of music over the Internet. Using the MP3 format, one minute CD quality music can be compressed into one megabyte space. This has significantly improved the transport speed for music over the Internet.

MP3 is the first file format that categorizes music by identification fields i.e. tags such as music title, author, genre, year etc. Thus, using the MP3 format and the PC, one can easily create a personal jukebox.

Using the PC and the necessary software, the music stored on a CD can be extracted i.e. ripped and converted into MP3 format file. This can be stored on a PC/CD. Using a PC/portable MP3 player, one can playback the music.

MP3 format enables one to convert a PC into a broadcast server over the Internet. Such a broadcast server is popularly known as 'SHOUT cast'. Thus an author can broadcast his music online, and anyone can listen to this broadcast over the Internet.

MP3 format is an open standard. It allows the development of freeware, shareware and commercial software.

However, the flip side is that, this open standard also facilitates the piracy of copyrighted music.

Speech/audio basics
Speech: Speech is probably the most natural form of communication. For coding purpose, speech is classified as narrow-band and wide-band speech.

Narrow-band speech: Such a signal has a frequency range of 200 Hz to 3200 Hz. Telephony signal is an example of narrow-band signal. High-quality speech transmission and storage requires:
Sampling rate : 8 KHz
Bits/sample : 16 bits

Uncompressed bitrate : 128 Kbps.

The uncompressed bitrate of 128 Kbps is twice the bitrate used in the ordinary telephony.

Wide-band speech: Such a signal has a frequency range of 50 Hz to 7000 Hz. The coding parameters are:
Sampling rate : 16 KHz
Bits/sample : 16 bits
Uncompressed bitrate : 256 Kbps
Wide-band signal is used for conferencing and broadcast applications.
CD audio: It has the full audio frequency range of 20 Hz to 20,000 Hz. The coding parameters are:
Sampling rate : 44.1 KHz
Bits/sample : 16 bits
For 2 channel stereo
Uncompressed bitrate : 1.4 Mbps.
Lossy compression: Speech and audio compression deploy 'LOSSY' techniques. In such methods, there is some loss of accuracy but compression ratio is very high.
Narrow-band : 30 : 1
Wide-band : 15 : 1
CD audio : 24 : 1
Monophonic channel: Single audio channel.
Dual-monophonic channel: It has two independent audio channels.
Stereo mode: It shares bits between two audio channels but does not use joint-stereo coding.
Joint-stereo coding: It takes the advantage of either the correlations between stereo channels or the phase difference between channels or both.
Source coding: A speech signal contains redundancy. The source coding removes redundancy in a signal by estimating a model of the source i.e. vocal-tract system. In source coding, signal bandwidth is limited. Here, an attempt is made to improve a quality matrix such as 'Signal to Noise Ratio' (SNR).
Perceptual audio coding: A perceptual coder uses a model of human perceptual apparatus (ear) to remove the parts of the signal that the human ear cannot perceive. A perceptual coder discards inaudible features of sound signal. It is a lossy compression technique.

The imperceptible information removed by the perceptual coder is called 'irrelevancy'. In practice, a perceptual coder has poor SNR but has better subjective quality.

Masking: It is a perceptual property of the human
auditory system. It has been observed that louder sound masks or hides weaker signal in the neighborhood.
This neighborhood may be in time or frequency space. Thus a strong tone at 700 Hz may mask the weaker signal within ± 70 Hz.

CD quality audio compression standard: MPEG
MPEG is a working group of the International Organization for Standardization/ International Electronics Commission (ISO/IEC). It develops international standards for compression, decompression, processing and coded representation of moving pictures, audio and their combination. So far, MPEG has produced MPEG-1, MPEG-2, MPEG-4 Version 1 and currently working on MPEG-4 Version 2 and MPEG-7.

MPEG-1: This standard addresses the compression of synchronized video and audio at a total bitrate of about 1.5 Mbps. The quality is comparable with that of a VHS cassette. Before MPEG, compressing around 166Mbps of digital video and around 1.5 Mbps of digital stereo sound into a CD looked a difficult task.
MPEG-1 decoder (see Figure-1)
The video compression scheme complies with H.261. This standard integrates field interpolation into the motion-compensated prediction scheme. The audio compression deploys a cost-effective sub-band coding scheme. This enables virtual transparency of audio at a bitrate as low as 128 K bit/sec. Here the audio channel is sampled at 48 Khz. MPEG-1 was the first signal processing standard developed using the 'C' programming language.
MPEG-1 contributed significantly to defuse the highly politicalized issue of television standards. MPEG recognized

that what matters in television signal is not the number of lines or the number of fields/sec but the bandwidth of the signal in the audio domain and the number of pixels in the digital domain.

MPEG-1 provided the first concrete opportunity for the microelectronics industry to invest in digital video/audio technology. MPEG-1 also produced the standard for software implementation of MPEG-1 standard coder/decoder.

The audio part of MPEG-1 has become the key component for 'radio broadcasting at CD quality', offered by digital audio broadcasting.

MPEG/Audio Compression
MPEG/Audio is a generic audio compression standard. This coder deploys perceptual coding. Much of the compression results from the removal of perceptually irrelevant parts of the audio signal.

Features of the MPEG/Audio are:

  • The audio sampling rate can be 32, 44.1 or 48 Khz.
  • The compressed bit stream can have one of several predefined fixed bitrates ranging from 32 to 224 Kbps.

Depending on the audio sampling rate, this translates to compression factors, ranging from 2.7 to 24.
MPEG/Audio offers a choice of three independent layers of compression. This provides a wide range of trade-offs between codec complexity and compressed audio quality.

l Layer-1 compression is the simplest. It best suits bitrates above 128 Kbps. For example, Philips 'Digital compact cassette' used layer-1 compression at 192 Kbps.

l Layer-2 has medium complexity and targets bitrates around 128 kbps. The possible application for layer-2 are coding of audio for 'digital audio broadcasting' (DAB), the storage of synchronized video and audio sequences in CD-ROM and the full-motion extension of CD-interactive and video CD.

l Layer-3 is the most complex but provides the best quality for bitrates around 64 Kbps. This layer is used for audio transmission over ISDN. MPEG/Audio layer-3 standard is popularly known as MP3 standard. MP3 is first ever and better standard format of storing, organizing and playing music with a personal computer.
MP3 Coder (see Figure-2)

A coder comprises a bank of filters, perceptual model and adaptive quantizer etc.

The features include:
The input audio stream passes through a filter bank that divides it into multiple sub-band frequencies. In practice, the filter bank comprises of 32 equal-width sub-band filters of 750 Hz bandwidth each.
The sub band filter is realized by digital signal processing techniques i.e. modified discrete cosine transform. A sub-band filter accepts 32 input samples in time-domains and after processing, produces 32 output samples in the frequency domain. These output samples are further sub-sampled. Finally, for every 32 input samples to a sub-band filter, it gives one output sample.

The filter bank has a provision of changing resolution of the signal width in time or frequency domain. Thus, it is possible to switch filter bank from high frequency/low-time resolution to low-frequency/high-time resolution. The input audio stream also passes through the perceptual (psychoacoustic) model. The model determines the ratio of signal energy to masking threshold for each sub-band.

The output from the filter bank can be quantized based on perceptual threshold. The quantizer block uses the signal-to-mask threshold ratio to implicitly allot and separately control the available number of bits to different sub-band filter outputs.

Thus, the audibility of quantization noise in each sub-band is minimized. Finally, to get better compression, the quantized samples are encoded using variable-length Huffman codes. Here, the most frequently occurring quantized levels are allotted lesser bits compared to less frequently occurring quantized levels.

Bit stream formatter formats the codes samples along with header, side information and 'Cyclic Redundancy Check ' (CRC) into a coded stream. Here, any other ancillary data not pertaining to audio signal can also be multiplexed.
The decoder deciphers this bit stream, restores the quantized sub-band values and finally reconstructs the audio sample. MP3 is a non-real time compression technique. Thus, downloading and playing are separated.

Logistics for MP3
To record and play MP3 files, the following are required:

  • Pentium 133 machine is minimum necessity. Some applications are demanding Pentium 166 MMX chips. For MAC world G3. 233 Mhz or above.
  • RAM size should be a minimum of 32 Mb. However, 64 Mb will deliver a better performance.
  • For storing music on a PC, a CD or a DVD drive is also required. Any CD speed above 2x will work. A SCSI drive minimizes the load on the processor.
  • Quality of sound depends on a sound card. Necessary selection parameters are operating system, PCI/ISA bus, 'Musical Instrument Digital Interface' (MIDI) and the number of voices i.e. tracks supported etc.
  • Speakers, these are available in a wide range of sizes, prices and capabilities.
  • Provision of plenty of storage will be a necessity. Normally, each minute will require 1Mb space. Thus, each gigabyte of space is capable of storing 250 songs of four minutes duration each.
  • Choose the Internet connectivity as fast as possible. Available choices are: 56 K modem, dual-line modem (2x56K bit/sec), ISDN connection and DSL etc.
  • A ripper to record MP3 file from existing sources and a player to play them back is necessary.
  • Portable MP3 players.

Functions performed by MP3

Downloading MP3 Music
MP3 has massive advantages over conventional methods of distributing and listening to music. There are many popular MP3 sites (such as www.mp3.com) from where one can download the music free or for a fee. MP3 compliant music can be distributed via e-mail also.

The process of extracting music stored on a CD and converting it into MP3 format is known as ripping. With the ripping software, the contents of a CD can be downloaded on a PC file. However, ensure that you do not distribute copies. This is illegal.

Personal digital Jukebox
MP3 format has tags that enable identifying and categorizing a file based on author, year, song title, and genre etc. Thus, using identification fields, one can easily organize the music files in a customized manner i.e. personal jukebox. Those customized files can be stored on a CD also. Using a portable MP3 player, one can enjoy music away from home, for hours.

SHOUT cast
It is a streaming technology for delivering and listening compressed audio over the Internet. However, music storage is not possible. Currently, thousands of streaming radio stations exist all over the Internet. For tuning in to a station, the 'MP3 spy' software is required. This is like a smart radio tuner for the web, which searches out signals and lists them. SHOUT cast technology enables an author to broadcast his music over the Internet from his server. This is the most economical and the quickest way to reach the audience.

Portable MP3 player
There are many companies marketing solid-state portable MP3 players. Their intention is to make MP3 format a consumer standard. Portable MP3 players have no internal moving parts. Thus the music would not skip or change pitch when the device is jostled or dropped. This is a major benefit over CD or cassette-based products. However, the standard memory size is limited to 64 Mb. Thus, one-hour music can be stored.

Popular MP3 players are Audic Vox MP-1000, Casio MP3 wrist audio player, 12Go Ego, Samsung's Yepp, RCA's Lyra player and Sony's memory stick Walkman. These portable MP3 players have prices varying from US$200-US $ 500.

Related music compression formats
Proprietary compression formats
Besides MP3, other music compression techniques such as VQF, Liquid Audio's eponymous technology, General Audio Coder, Real Network's Real G2 and Microsoft Window Media are also competing with their proprietary standards. However, these proprietary compression software include security hooks that determine both the allowable number of copies one can make and the type of devices one can use for playing back the songs.
Secure Digital Music Initiative (SDMI)

SDMI is the recording industry's late response to the unexpected rise in popularity of MP3. Its participants include all the major hardware and software companies vying for position in digital music.

The basic idea behind SDMI is the ability to retain control over defining the rules of purchase. The aim is far more than eliminating piracy. The group is working on copyright protection, copyright management and royalty tracking issues. The group is planning a secure way to download the music and prevent free downloading.

SDMI will require MP3 players to include a screening technology that blocks a user from playing pirated copies within a year. In the long-term, music distribution would make use of a new delivery channel and make Net releases promotional copies.

MPEG-2 is supposed to provide standards for interactive TV using telecommunication. This standard has overcome the limitation of analog input and output of MPEG-1 and provides standards for digital input and output.

The highlights of MPEG-2 are:
It supports adoption of the bit stream to the physical layer, multi-program transport, stream identification, encryption and copyright identification.

Part 2 covers the best video-coding standard to efficiently encode interlaced video.

Part 3 provides standard to encode multi-channel audio. It is also backward compatible with MPEG-1

Recently, 'Advanced Audio Coding' (AAC) standard has become available as part 7. AAC is not constrained by backward compatibility with MPEG1 and therefore provides the highest quality. The AAC is bound to become a dominant format for music distribution over the web.

MPEG-4 Audio
It is much more complex and robust. It features sub-layers for handling high- and low- bitrate speech, general audio, synthesized audio and text-to-speech conversion. It also includes a format for combining any combination of these sound elements. The standard is designed to handle diverse applications ranging from the Internet streaming to wireless devices to speech-to-text conversion.

Currently, the annual market size of music distribution is more that US$40 billion. This huge market is controlled by the 'Big 5', Sony, EMI, Warner, BMG and X. All of them are locked in legal battle with companies distributing MP3 music over the Net on copyright issues. Net music available for download is legal and pirated as well.

Some of the important settlements are
MP3.com maintains a database of over 40,000 albums. This database when combined with its software, allows users to store music and then access it via the computer. MP3.com was in a legal battle with Sony on copyright issues. Now, MP3.com has agreed to pay royalty to Sony.

Napster software allows users logged-on to the Internet, to see music files stored by others in their computers. Using the software, users can search and download songs from different computers. Artists, labels or publishers have not licensed the use of a vast majority of these songs.

Napster is facing a legal battle with Bertelsmann. As part of a settlement, both have joined hands to develop secure technology that ensures that a fee is paid when a song is downloaded. Thus, Napster is trying to create a new membership-based service that pays artists, music publishers and record companies.

To overcome piracy, Napster will record MD5 hash of every MP3 music file stored on the server. MD5 is a sort of unique fingerprint that can be used to track recordings, spot illegal downloads and remove pilfered files. However, there is a caveat. Due to differences in the recording software and the degree of compression, different digital recordings of the same song can produce different fingerprints. While that makes tracking difficult, the technology would detect most violations.

Popularity of MP3 format has created a new consumer electronics industry of portable digital music player. The consumer demand for MP3 players is surging exponentially. This will make it increasingly difficult to supplant MP3 with a proprietary format.

In the long run, to prevent piracy, it is necessary to have a commerce system that provides users rich music content that is portable, accessible and affordable. NM

A. K. Vanwasi, GM (R&D), ITI LTD. Naini, Allahabad, can be reached at vanwasi_nni @ itiltd.co.in.

- <Back to Top>-  

Copyright 2001: Indian Express Group (Mumbai, India). All rights reserved throughout the world. This entire site is compiled in Mumbai by The Business Publications Division of the Indian Express Group of Newspapers. Site managed by BPD