MP3
and what it means to Net users?
The
MP3 format is a rage today for downloading audio
files over the Internet free of cost. Here is an
insight into the technology involved and the benefits
it offers.
MP3
is a highly compressed, open file format, and has
served primarily for the storing and transporting
of music over the Internet.
The
Internet has permeated into every aspect of living
and likewise has introduced a fundamental shift
in the areas of information distribution, postal
system and commerce. The music industry has not
been left unscathed by the World Wide Web. The combination
of 'Motion Picture Expert Group, Layer-3' (MP3)
and the Internet have revolutionized the century
old music industry.
MP3
is a highly compressed, open file format, and has
served primarily for the storing and transporting
of music over the Internet. Using the MP3 format,
one minute CD quality music can be compressed into
one megabyte space. This has significantly improved
the transport speed for music over the Internet.
MP3
is the first file format that categorizes music
by identification fields i.e. tags such as music
title, author, genre, year etc. Thus, using the
MP3 format and the PC, one can easily create a personal
jukebox.
Using
the PC and the necessary software, the music stored
on a CD can be extracted i.e. ripped and converted
into MP3 format file. This can be stored on a PC/CD.
Using a PC/portable MP3 player, one can playback
the music.
MP3
format enables one to convert a PC into a broadcast
server over the Internet. Such a broadcast server
is popularly known as 'SHOUT cast'. Thus an author
can broadcast his music online, and anyone can listen
to this broadcast over the Internet.
MP3
format is an open standard. It allows the development
of freeware, shareware and commercial software.
However,
the flip side is that, this open standard also facilitates
the piracy of copyrighted music.
Speech/audio
basics
Speech: Speech is probably the most natural form
of communication. For coding purpose, speech is
classified as narrow-band and wide-band speech.
Narrow-band
speech: Such a signal has a frequency range of 200
Hz to 3200 Hz. Telephony signal is an example of
narrow-band signal. High-quality speech transmission
and storage requires:
Sampling rate : 8 KHz
Bits/sample : 16 bits
Uncompressed
bitrate : 128 Kbps.
The
uncompressed bitrate of 128 Kbps is twice the bitrate
used in the ordinary telephony.
Wide-band
speech: Such a signal has a frequency range of 50
Hz to 7000 Hz. The coding parameters are:
Sampling rate : 16 KHz
Bits/sample : 16 bits
Uncompressed bitrate : 256 Kbps
Wide-band signal is used for conferencing and broadcast
applications.
CD audio: It has the full audio frequency range
of 20 Hz to 20,000 Hz. The coding parameters are:
Sampling rate : 44.1 KHz
Bits/sample : 16 bits
For 2 channel stereo
Uncompressed bitrate : 1.4 Mbps.
Lossy compression: Speech and audio compression
deploy 'LOSSY' techniques. In such methods, there
is some loss of accuracy but compression ratio is
very high.
Narrow-band : 30 : 1
Wide-band : 15 : 1
CD audio : 24 : 1
Monophonic channel: Single audio channel.
Dual-monophonic channel: It has two independent
audio channels.
Stereo mode: It shares bits between two audio channels
but does not use joint-stereo coding.
Joint-stereo coding: It takes the advantage of either
the correlations between stereo channels or the
phase difference between channels or both.
Source coding: A speech signal contains redundancy.
The source coding removes redundancy in a signal
by estimating a model of the source i.e. vocal-tract
system. In source coding, signal bandwidth is limited.
Here, an attempt is made to improve a quality matrix
such as 'Signal to Noise Ratio' (SNR).
Perceptual audio coding: A perceptual coder uses
a model of human perceptual apparatus (ear) to remove
the parts of the signal that the human ear cannot
perceive. A perceptual coder discards inaudible
features of sound signal. It is a lossy compression
technique.
The
imperceptible information removed by the perceptual
coder is called 'irrelevancy'. In practice, a perceptual
coder has poor SNR but has better subjective quality.
Masking:
It is a perceptual property of the human
auditory system. It has been observed that louder
sound masks or hides weaker signal in the neighborhood.
This neighborhood may be in time or frequency space.
Thus a strong tone at 700 Hz may mask the weaker
signal within ± 70 Hz.
CD
quality audio compression standard: MPEG
MPEG is a working group of the International Organization
for Standardization/ International Electronics Commission
(ISO/IEC). It develops international standards for
compression, decompression, processing and coded
representation of moving pictures, audio and their
combination. So far, MPEG has produced MPEG-1, MPEG-2,
MPEG-4 Version 1 and currently working on MPEG-4
Version 2 and MPEG-7.
MPEG-1:
This standard addresses the compression of synchronized
video and audio at a total bitrate of about 1.5
Mbps. The quality is comparable with that of a VHS
cassette. Before MPEG, compressing around 166Mbps
of digital video and around 1.5 Mbps of digital
stereo sound into a CD looked a difficult task.
MPEG-1 decoder (see Figure-1)
The video compression scheme complies with H.261.
This standard integrates field interpolation into
the motion-compensated prediction scheme. The audio
compression deploys a cost-effective sub-band coding
scheme. This enables virtual transparency of audio
at a bitrate as low as 128 K bit/sec. Here the audio
channel is sampled at 48 Khz. MPEG-1 was the first
signal processing standard developed using the 'C'
programming language.
MPEG-1 contributed significantly to defuse the highly
politicalized issue of television standards. MPEG
recognized
that
what matters in television signal is not the number
of lines or the number of fields/sec but the bandwidth
of the signal in the audio domain and the number
of pixels in the digital domain.
MPEG-1
provided the first concrete opportunity for the
microelectronics industry to invest in digital video/audio
technology. MPEG-1 also produced the standard for
software implementation of MPEG-1 standard coder/decoder.
The
audio part of MPEG-1 has become the key component
for 'radio broadcasting at CD quality', offered
by digital audio broadcasting.
MPEG/Audio
Compression
MPEG/Audio is a generic audio compression standard.
This coder deploys perceptual coding. Much of the
compression results from the removal of perceptually
irrelevant parts of the audio signal.
Features
of the MPEG/Audio are:
-
The
audio sampling rate can be 32, 44.1 or 48 Khz.
-
The
compressed bit stream can have one of several
predefined fixed bitrates ranging from 32 to 224
Kbps.
Depending
on the audio sampling rate, this translates to compression
factors, ranging from 2.7 to 24.
MPEG/Audio offers a choice of three independent
layers of compression. This provides a wide range
of trade-offs between codec complexity and compressed
audio quality.
l
Layer-1 compression is the simplest. It best suits
bitrates above 128 Kbps. For example, Philips 'Digital
compact cassette' used layer-1 compression at 192
Kbps.
l
Layer-2 has medium complexity and targets bitrates
around 128 kbps. The possible application for layer-2
are coding of audio for 'digital audio broadcasting'
(DAB), the storage of synchronized video and audio
sequences in CD-ROM and the full-motion extension
of CD-interactive and video CD.
l
Layer-3 is the most complex but provides the best
quality for bitrates around 64 Kbps. This layer
is used for audio transmission over ISDN. MPEG/Audio
layer-3 standard is popularly known as MP3 standard.
MP3 is first ever and better standard format of
storing, organizing and playing music with a personal
computer.
MP3 Coder (see Figure-2)
A
coder comprises a bank of filters, perceptual model
and adaptive quantizer etc.
The
features include:
The input audio stream passes through a filter bank
that divides it into multiple sub-band frequencies.
In practice, the filter bank comprises of 32 equal-width
sub-band filters of 750 Hz bandwidth each.
The sub band filter is realized by digital signal
processing techniques i.e. modified discrete cosine
transform. A sub-band filter accepts 32 input samples
in time-domains and after processing, produces 32
output samples in the frequency domain. These output
samples are further sub-sampled. Finally, for every
32 input samples to a sub-band filter, it gives
one output sample.
The
filter bank has a provision of changing resolution
of the signal width in time or frequency domain.
Thus, it is possible to switch filter bank from
high frequency/low-time resolution to low-frequency/high-time
resolution. The input audio stream also passes through
the perceptual (psychoacoustic) model. The model
determines the ratio of signal energy to masking
threshold for each sub-band.
The
output from the filter bank can be quantized based
on perceptual threshold. The quantizer block uses
the signal-to-mask threshold ratio to implicitly
allot and separately control the available number
of bits to different sub-band filter outputs.
Thus,
the audibility of quantization noise in each sub-band
is minimized. Finally, to get better compression,
the quantized samples are encoded using variable-length
Huffman codes. Here, the most frequently occurring
quantized levels are allotted lesser bits compared
to less frequently occurring quantized levels.
Bit
stream formatter formats the codes samples along
with header, side information and 'Cyclic Redundancy
Check ' (CRC) into a coded stream. Here, any other
ancillary data not pertaining to audio signal can
also be multiplexed.
The decoder deciphers this bit stream, restores
the quantized sub-band values and finally reconstructs
the audio sample. MP3 is a non-real time compression
technique. Thus, downloading and playing are separated.
Logistics
for MP3
To
record and play MP3 files, the following are required:
-
Pentium
133 machine is minimum necessity. Some applications
are demanding Pentium 166 MMX chips. For MAC world
G3. 233 Mhz or above.
-
RAM
size should be a minimum of 32 Mb. However, 64
Mb will deliver a better performance.
-
For
storing music on a PC, a CD or a DVD drive is
also required. Any CD speed above 2x will work.
A SCSI drive minimizes the load on the processor.
-
Quality
of sound depends on a sound card. Necessary selection
parameters are operating system, PCI/ISA bus,
'Musical Instrument Digital Interface' (MIDI)
and the number of voices i.e. tracks supported
etc.
-
Speakers,
these are available in a wide range of sizes,
prices and capabilities.
-
Provision
of plenty of storage will be a necessity. Normally,
each minute will require 1Mb space. Thus, each
gigabyte of space is capable of storing 250 songs
of four minutes duration each.
-
Choose
the Internet connectivity as fast as possible.
Available choices are: 56 K modem, dual-line modem
(2x56K bit/sec), ISDN connection and DSL etc.
-
A
ripper to record MP3 file from existing sources
and a player to play them back is necessary.
-
Portable
MP3 players.
Functions
performed by MP3
Downloading MP3 Music
MP3 has massive advantages over conventional methods
of distributing and listening to music. There are
many popular MP3 sites (such as www.mp3.com) from
where one can download the music free or for a fee.
MP3 compliant music can be distributed via e-mail
also.
Ripping
The process of extracting music stored on a CD and
converting it into MP3 format is known as ripping.
With the ripping software, the contents of a CD
can be downloaded on a PC file. However, ensure
that you do not distribute copies. This is illegal.
Personal
digital Jukebox
MP3 format has tags that enable identifying and
categorizing a file based on author, year, song
title, and genre etc. Thus, using identification
fields, one can easily organize the music files
in a customized manner i.e. personal jukebox. Those
customized files can be stored on a CD also. Using
a portable MP3 player, one can enjoy music away
from home, for hours.
SHOUT
cast
It is a streaming technology for delivering and
listening compressed audio over the Internet. However,
music storage is not possible. Currently, thousands
of streaming radio stations exist all over the Internet.
For tuning in to a station, the 'MP3 spy' software
is required. This is like a smart radio tuner for
the web, which searches out signals and lists them.
SHOUT cast technology enables an author to broadcast
his music over the Internet from his server. This
is the most economical and the quickest way to reach
the audience.
Portable
MP3 player
There are many companies marketing solid-state portable
MP3 players. Their intention is to make MP3 format
a consumer standard. Portable MP3 players have no
internal moving parts. Thus the music would not
skip or change pitch when the device is jostled
or dropped. This is a major benefit over CD or cassette-based
products. However, the standard memory size is limited
to 64 Mb. Thus, one-hour music can be stored.
Popular
MP3 players are Audic Vox MP-1000, Casio MP3 wrist
audio player, 12Go Ego, Samsung's Yepp, RCA's Lyra
player and Sony's memory stick Walkman. These portable
MP3 players have prices varying from US$200-US $
500.
Related music compression formats
Proprietary compression formats
Besides MP3, other music compression techniques
such as VQF, Liquid Audio's eponymous technology,
General Audio Coder, Real Network's Real G2 and
Microsoft Window Media are also competing with their
proprietary standards. However, these proprietary
compression software include security hooks that
determine both the allowable number of copies one
can make and the type of devices one can use for
playing back the songs.
Secure Digital Music Initiative (SDMI)
SDMI
is the recording industry's late response to the
unexpected rise in popularity of MP3. Its participants
include all the major hardware and software companies
vying for position in digital music.
The
basic idea behind SDMI is the ability to retain
control over defining the rules of purchase. The
aim is far more than eliminating piracy. The group
is working on copyright protection, copyright management
and royalty tracking issues. The group is planning
a secure way to download the music and prevent free
downloading.
SDMI
will require MP3 players to include a screening
technology that blocks a user from playing pirated
copies within a year. In the long-term, music distribution
would make use of a new delivery channel and make
Net releases promotional copies.
MPEG-2
MPEG-2 is supposed to provide standards for interactive
TV using telecommunication. This standard has overcome
the limitation of analog input and output of MPEG-1
and provides standards for digital input and output.
The
highlights of MPEG-2 are:
It supports adoption of the bit stream to the physical
layer, multi-program transport, stream identification,
encryption and copyright identification.
Part
2 covers the best video-coding standard to efficiently
encode interlaced video.
Part
3 provides standard to encode multi-channel audio.
It is also backward compatible with MPEG-1
Recently,
'Advanced Audio Coding' (AAC) standard has become
available as part 7. AAC is not constrained by backward
compatibility with MPEG1 and therefore provides
the highest quality. The AAC is bound to become
a dominant format for music distribution over the
web.
MPEG-4
Audio
It is much more complex and robust. It features
sub-layers for handling high- and low- bitrate speech,
general audio, synthesized audio and text-to-speech
conversion. It also includes a format for combining
any combination of these sound elements. The standard
is designed to handle diverse applications ranging
from the Internet streaming to wireless devices
to speech-to-text conversion.
Currently,
the annual market size of music distribution is
more that US$40 billion. This huge market is controlled
by the 'Big 5', Sony, EMI, Warner, BMG and X. All
of them are locked in legal battle with companies
distributing MP3 music over the Net on copyright
issues. Net music available for download is legal
and pirated as well.
Some
of the important settlements are
MP3.com maintains a database of over 40,000 albums.
This database when combined with its software, allows
users to store music and then access it via the
computer. MP3.com was in a legal battle with Sony
on copyright issues. Now, MP3.com has agreed to
pay royalty to Sony.
Napster
software allows users logged-on to the Internet,
to see music files stored by others in their computers.
Using the software, users can search and download
songs from different computers. Artists, labels
or publishers have not licensed the use of a vast
majority of these songs.
Napster
is facing a legal battle with Bertelsmann. As part
of a settlement, both have joined hands to develop
secure technology that ensures that a fee is paid
when a song is downloaded. Thus, Napster is trying
to create a new membership-based service that pays
artists, music publishers and record companies.
To
overcome piracy, Napster will record MD5 hash of
every MP3 music file stored on the server. MD5 is
a sort of unique fingerprint that can be used to
track recordings, spot illegal downloads and remove
pilfered files. However, there is a caveat. Due
to differences in the recording software and the
degree of compression, different digital recordings
of the same song can produce different fingerprints.
While that makes tracking difficult, the technology
would detect most violations.
Conclusion
Popularity of MP3 format has created a new consumer
electronics industry of portable digital music player.
The consumer demand for MP3 players is surging exponentially.
This will make it increasingly difficult to supplant
MP3 with a proprietary format.
In
the long run, to prevent piracy, it is necessary
to have a commerce system that provides users rich
music content that is portable, accessible and affordable.
NM
A.
K. Vanwasi, GM (R&D), ITI LTD. Naini, Allahabad,
can be reached at vanwasi_nni @ itiltd.co.in.