Information Theory
The era
we are living in is sometimes called the age of information. But what is
information, and how much of it is in any message? Let's look at two situations
to determine their information content. Suppose you planned to play tennis with
a friend at a nearby park but a heavy rain prevents you from leaving the house.
Then the telephone rings and your friend tells you the game is off because it
is raining. This message holds no information because you already know it is
raining. Suppose you planned to play, however, at a park by your friend's house
several miles away, the sky was overcast, and the weatherman the night before
had said that the chance of rain in the morning was high. Then your friend
calls and says the game is off because it is raining there. This message
contains information because prior to it you were not certain whether or not it
was raining at the park. Information, therefore, is anything that resolves
uncertainty.
Information theory measures information. It also investigates the
efficient use of information media. A system of information communication is
composed of the following components. The information source produces the
message, or information, to be transmitted. The encoder, or transmitter,
processes the message and changes it to a signal suitable for the channel. The
channel is the medium over which the signal is sent. Random disturbances, or
noise, on the channel cause the received signal to be somewhat different from
the transmitted one. Noise imposes serious limitations on the amount of
information that can be transmitted over a channel. The decoder, or receiver,
processes the received signal to recreate the original message.
The
block diagram illustrates these components by using the key events of Paul
Revere's famed midnight ride in 1775 to warn colonial patriots northwest of
Boston of an impending British raid. The information source was the British
army, and the source output was the information whether the British troops
would attack the Massachusetts towns of Lexington and Concord by land or by
sea. The encoder/transmitter was the sexton of the North Church who was to
place one lantern in the belfry to indicate attack by land or two lanterns for
attack by sea. The channel was the line of sight from the belfry to the place
where Paul Revere, the decoder/receiver, was waiting. Anything that obstructed
Revere's view, such as wind-tossed tree branches, constituted noise. The
communication was only a selection between two alternatives--attack by land or
attack by sea. As long as prior agreement existed between encoder and decoder
(the sexton and Revere) on the alternatives and their proper signals, this
communication system could have sent a selection from any set of two
alternatives. For example, were they infantry or artillery troops? A more
complex system would have been needed to communicate the number of troops in
the raid because of the larger number of possible alternatives.
Binary Digits and Information Transmission
In
teletype communication the source is the person operating a teletypewriter,
which in turn is the encoder. The machine is connected by telephone wire (the
channel) to a teleprinter (the decoder) in a distant city. The person sends his
message by pressing down on the teletypewriter keys. Those keys are a set of
alternatives. The pressing of each key triggers transmission of a sequence of
binary digits that represent the letter on that key. A binary digit is either a
0 or a 1 (see Numeration Systems and Numbers). Binary digits underlie
the language of computers (see Computer). From the teletypewriter
pulsating information encoded in binary digits is sent over commercial
telephone lines. At the other end, the sequence of binary digits is decoded
back to the appropriate letters. For information theory purposes, the length of
the binary sequence needed for encoding individual letters is more important
than how the teletypewriter actually encodes the information. Only two binary
digits are needed to represent four alternatives--00, 01, 10, and 11.
Similarly, three binary digits can represent eight alternatives--000, 001, 010,
011, 100, 101, 110, and 111. As a general principle, each increase of one in
the sequence length doubles the number of alternatives (2 binary digits--4 alternatives,
3--8, 4--16, and so on). In mathematical shorthand, we can say a sequence of n
binary digits can represent 2n alternatives.
Logarithms, especially base-two logarithms (log2), are used
in the computation of information content. The base-two logarithm of a number
is defined as the power to which two must be raised in order to get that
number; hence, for any n, log2(2n) = n. As a result, if a
teletypewriter has m keys on its keyboard, any individual key must activate a
sequence of binary digits that is at least log2m long. For example,
if a keyboard holds 32 keys, a sequence of five binary digits can represent any
given key (00000 for A, 00001 for B, and so on) because the base-two logarithm
of 32 is 5.00000.
Human
communication is still another example of information transmission. A person
with an idea is the source, and the idea is the source output. The idea is
encoded into words and written. Another person decodes the words when reading
them and tries to reconstruct the writer's idea. Each written word was selected
from a set of alternatives, that is, all the words of the language. Information
theory, however, has not shed too much light on this and other kinds of
biological communication because living communication systems are not well enough
understood for the application of the mathematical concepts of the theory. But
the basic notion of information as a transmission of binary digits that
represents a choice from a set of alternatives should prove helpful in future
research into biological information systems. (See also Brain and Spinal
Cord; Nervous System.)
Entropy--The Measure of Information
Entropy
is the amount of information in a source. The entropy of any source is the
fewest number of bits (from binary digits) able to represent the
source in a message. In addition, every channel has a capacity--the maximum
number of bits it can reliably carry. Imagine that a source produces
information with an entropy of H bits per second over a channel with a capacity
of C bits per second. If H is less than C, the source output can be reliably
encoded, sent over the channel, and decoded at the destination. If H is greater
than C, the source output cannot be reliably transmitted to the
destination.
The
simplest source is that which has m equally possible alternatives, and m is a
power of two. The entropy of such a source is mathematically defined as log2m.
The entropy of the information transmitted to Paul Revere was one bit (log22
= 1.00000).
It
becomes more complicated, however, when the source's alternatives are not
equally likely. For example, suppose that the alternatives for the source are
the letters A, B, C, and D. Suppose also that letter A occurs with 1/2 probability (likely every two times), letter B with 1/4 probability (likely every four times), and letters C and D with 1/8 probability each (likely every eight times). Two codes can be devised
for this source. Code 1 requires the familiar two binary digits to represent
the source output. In Code 2, however, A is represented by a single binary
digit, but C and D are represented by three of them. Since A occurs much more
frequently than C or D, the average number of bits needed is less than two (1 3/4 in this case).
What is
the entropy of this example? It, too, is 1 3/4 bits, as we will learn from a formula later. To learn the entropy of a
source in which the letters have unequal probabilities, we must first find the
amount of information, or uncertainty, for each letter. If a letter has p
probability, then its information is log2 (1/p). In the prior
example, letter A has one bit of information [log2 (1/ 1/2) = log22 = 1], letter B has two bits [log2 (1/ 1/4) = log24 = 2], and letters C and D have three bits each [log
(1/ 1/8) = log28 = 3]. Hence, improbable
events hold more information than probable ones. This mathematically confirms
that we learn a great deal from rare and unusual events but nothing from
something that is certain to happen (log21 = 0).
The
entropy, or average uncertainty, of a source with successive but independent
letters is the mean value of their information content. Thus, if the letters of
the source are A1, A2, . . . , Am, and their
probabilities are p(A1), p(A2), . . . , p(Am),
then the source's entropy is:
This formula always leads to an entropy between
zero and log2m. The lower limit arises when one letter is certain
(has probability 1) and the others have zero probability. The upper limit
arises when the letters are equally probable, each having probability 1/m.
Using this formula for letters A, B, C, and D in the previous example, we
find:
Generally, the strategy for choosing a good code
is to make each code word's length equal the information in the corresponding
source letter. Code 2, shown before, is an example of this strategy.
Some of
the ideas of entropy and source encoding operate in a game called 20 Questions.
One player (the source) thinks of an object, and the other players (the
destination) try to identify it by asking up to 20 questions answerable by
"yes" or "no." The object of this game is to ask questions
that will yield the maximum of one bit of information, on the average, about
the object. This means asking questions that split the remaining possibilities
into two equally probable sets. If previous questions have established the
object is a man, then the question "Is he alive?" would be more
informative than "Is he Abraham Lincoln?" By using this strategy, a
player can identify one object from 220 possibilities in only 20
questions.
Often,
successive letters from a source are not statistically independent. Then the
entropy formula is not completely valid. For example, in written English,
"q" is always followed by "u," and "h" is
preceded by "t" far more often than the relative frequency of
"t" warrants. Such letter dependence reduces the entropy value in the
formula. But we may still estimate the entropy with the formula by using it on
a sequence of source letters and then by dividing the answer by the number of
letters in the sequence to get the entropy per letter. As the sequence gets
longer, this estimate approaches the true entropy. In written English, the
decrease in entropy caused by letter dependence is striking. To convince
yourself, have a friend write an unfinished sentence and then you try to guess
the next letter of the last but uncompleted word. The entropy of written
English is estimated to be one bit per letter rather than the 4.7 bits expected
from a source of 26 independent and equally likely letters.
Channels and Noise
Whether
the channel of an information system is a wire, a radio wavelength, a laser
beam, or some other medium, the output signal of the system is never quite the
same as the input signal because of noise intruding on the channel. Noise can
be caused by heat, atmospheric disturbances, or cross talk, which is a form of
interference from other channels. Static in a radio, snow on a television
screen, and background hum in a telephone are familiar examples of noise. To
send information over radio, television, and telephone channels, a modem system
(digital data modulator and demodulator) must be used. At one end
of the channel the modulator converts the sequence of symbols making up the
information into a signal suitable for transmission over the channel. At the
other end the demodulator changes the received signal back into the original
sequence of symbols. Noise, however, interferes by causing occasional errors at
the receiving end of the channel.
Just as
information theory uses a set of symbols with probable occurrences to describe
a source, so too can a channel be described by sets of input and output symbols
and a set of noise-effect probabilities. Two types of channels will be
discussed.
A binary
erasure channel has two input symbols--0 and 1. Either of them can be sent.
With a given probability p, the noise erases the transmitted symbol. When this
happens, the symbol e, standing for erasure, appears at the output
instead of a 0 or a 1. With probability 1 - p, however, the noise does not
erase the transmitted symbol and a 0 or a 1 is correctly produced at the
output.
Suppose
that the channel input is connected with a source that produces 0s and 1s, each
with a 1/2 probability (0s likely half the time, and 1s
likely half the time). Before any symbol is received at the output there is one
bit of entropy (uncertainty) about which symbol will come. After reception of a
0 or a 1 there is no uncertainty about which was sent, and, therefore, one bit
of information was transmitted. If the erasure symbol e appears, however,
there is still one bit of uncertainty about what was sent; as a result, no
information was transmitted. Since p probability will produce e, the
average information that gets through is mathematically said to be 1 - p.
A binary
symmetric channel also has two input symbols. In this channel, however, noise
with p probability converts 0s to 1s and the reverse, instead of erasing them.
Assuming the same source as the other channel, there is one bit of uncertainty
about which symbol will be sent prior to reception. Even after the reception of
a 0, for instance, there is still some uncertainty about what was sent. The
input could have been a 0 with probability 1 - p or it could have been a 1 with
probability p.
Complex
coding and decoding strategies are needed for transmission of reliable data
when many errors occur on the channel. Most of these strategies rely on some
form of parity check. A parity check on a set of binary digits is simply a
digit chosen to be a 1 if the sum of the binary digits is odd and a 0 if it is
even. If a sequence of binary source digits is sent and followed by a parity
check, any erased digit can be recomputed from the check. In more advanced
uses, each source digit is surveyed by many parity checks. Coding and decoding
techniques like this have been a great aid in the transmission of high-quality
information from a variety of deep-space probes back to Earth. (See also
Computer; Numeration Systems and Numbers; Space Travel; Telemetry.)
This article was contributed by Robert G.
Gallager, Professor of Electrical Engineering, Massachusetts Institute of
Technology.
Hashim Ibrahim Filali
1.
Comdata Observe (1-2), 1987H, 1988G -
1408H, 1409H
2.
Comdata Coverage (1), 1988G - 1408H,1409H
3.
Comdata Events (Information System), 1988G,
1989G - 1408H, 1409H 1410H
4.
Catalogue 1996G by I.S. SDM; 1996g (Charts)
5.
Education Activity and View Coverage;
1996g (Charts)
6.
Regular Project..; 1996g (Charts)
7.
Challenge Task I (Business General
Basics); 1996g (Charts)
8.
Challenge Task II (Understanding Data
Processing); 1996g (Charts)
9.
Normal View and Check Coverage; 1996g (Charts)
10. Introduction to
Information Technology ( Real Life Business); 1996g
11. Business Concept and
View Points (Part I); 1996g
12. Business Concept and
View Points (Part II); 1996g
13. Marketing Strategy for
Success; 1996g
14. Basic Rules For
Information Management; 1996g
15. Organization
Management and Administration Coverage; 1996g (CT
iii)
16. An Entrance to Next
Century; 1996g (CT
vi)
17. Survival in Business
by an Easy Procedures; 1996g (CT x)
18. Monitoring Project
Planning and Facilities Update; 1996g (CT xx)
19. Project’s Activities
and Related Tasks; 1996g
20. Join the Competition
and Win the Challenge; 1996g (CT xxv)
21. Directions of
Management and Processing; 1996g (CT xxx)
22. Productiveties
Improvement and Getting Update; 1996g (CT xxxv)
23. Culture Effect in
Marketing Business; 1996g (CT xxxx)
24. Effecient Methods of
Management Administration; 1997g (CT xxxxx)
25. Creating Procedures to
Get Best Project Processing; 1997g (CT 100)
26. Meet the Changing
Demands in the Market; 1997g (Acceptance Package to the Customer) (CT
200)
27. Windows to the
business in the Market; 1997g (CT 222)
28. Sort of Existing
Business - 0X 1997g (Chart) (CT
999)
29. Access All the
Authorized Channels by an Ease
30. A littel Moment in
Managment (Information Technology Systems)
1997G
31. Major and Minor
Activities Coverage 1997G (CT 2000)
32. Way of Organizing the
work (Information Technology Systems)
1997G
33. Dealing Right to get
your Rights (I.T.) Industrial Engineering)
1998G
34. Simple Ways to
Project Activities 1998G
35. Packaging Systems and
Quality Packing (I.T. Industrial Engineering) 1998G
36. Academic and Non
Academic Business (Industrial Engineering)
1998G
37. SDM-IE Newsletters –01- 1998G-1999G