Physically this corresponds to adding the noises or signals represented by the
original ensembles of functions. The following result is derived in Appendix
6. Theorem 15: Let the average power of two ensembles be N1 and N2 and
let their entropy powers be N1 and N2. Then the entropy power of the sum, N3,
is bounded by
White Gaussian noise has the peculiar property that it can absorb any other
noise or signal ensemble which may be added to it with a resultant entropy power
approximately equal to the sum of the white noise power and the signal power
(measured from the average signal value, which is normally zero), provided the
signal power is small, in a certain sense, compared to noise. Consider the function
space associated with these ensembles having n dimensions. The white noise corresponds
to the spherical Gaussian distribution in this space. The signal ensemble corresponds
to another probability distribution, not necessarily Gaussian or spherical.
Let the second moments of this distribution about its center of gravity be ate.
That is, if p(xl,...,x„) is the density distribution function
where the ai are the coordinates of the center of gravity. Now ate is a positive
definite quadratic form, and we can rotate our coordinate system to align it
with the principal directions of this form. ate is then reduced to diagonal
form bit. We require that each btt be small compared to N, the squared radius
of the spherical distribution. In this case the convolution of the noise and
signal produce approximately a Gaussian distribution whose corresponding quadratic
form is
The last term is the signal power, while the first is the noise power. PART
IV: THE CONTINUOUS CHANNEL 24. THE CAPACITY OF A CONTINUOUS CHANNEL In a continuous
channel the input or transmitted signals will be continuous functions of time
f (t) belonging to a certain set, and the output or received signals will be
perturbed versions of these. We will consider only the case where both transmitted
and received signals are limited to a certain band W. They can then be specified,
for a time T, by 2TW numbers, and their statistical structure by finite dimensional
distribution functions. Thus the statistics of the transmitted signal will be
determined by
41 and those of the noise by the conditional probability distribution
The rate of transmission of information for a continuous channel is defined
in a way analogous to that for a discrete channel, namely
where H(x) is the entropy of the input and Hy(x) the equivocation.
The channel capacity C is defined as the maximum of R when we vary the input
over all possible ensembles. This means that in a finite dimensional approximation
we must vary P(x) = P(xl,. . . ,x„) and maximize
It is obvious in this form that R and C are independent of the coordinate system
since the numerator and denominator in log P~
(p(y) will
be multiplied by the same factors when x and y are transformed in any one-to-one
way. This integral expression for C is more general than H (x) -Hy(x).
Properly interpreted (see Appendix 7) it will always exist while H(x) - Hy
(x) may assume an indeterminate form - - - in some cases. This occurs, for example,
if x is limited to a surface of fewer dimensions than n in its n dimensional
approximation. If the logarithmic base used in computing H(x) and Hy(x)
is two then C is the maximum number of binary digits that can be sent per second
over the channel with arbitrarily small equivocation, just as in the discrete
case. This can be seen physically by dividing the space of signals into a large
number of small cells, sufficiently small so that the probability density Px(y)
of signal x being perturbed to pointy is substantially constant over a cell
(either of x or y). If the cells are considered as distinct points the situation
is essentially the same as a discrete channel and the proofs used there will
apply. But it is clear physically that this quantizing of the volume into individual
points cannot in any practical situation alter the final answer significantly,
provided the regions are sufficiently small. Thus the capacity will be the limit
of the capacities for the discrete subdivisions and this is just the continuous
capacity defined above. On the mathematical side it can be shown first (see
Appendix 7) that if u is the message, x is the signal, v is the received sianal
(perturbed by noise) and v is the recovered message then
regardless of what operations are performed on u to obtain x or on y to obtain
v. Thus no matter how we encode the binary digits to obtain the signal, or how
we decode the received signal to recover the message, the discrete rate for
the binary digits does not exceed the channel capacity we have defined. On the
other hand, it is possible under very general conditions to find a coding system
for transmitting binary digits at the rate C with as small an equivocation or
frequency of errors as desired. This is true, for example, if, when we take
a finite dimensional approximating space for the signal functions, P(x,y) is
continuous in both x and y except at a set of points of probability zero. An
important special case occurs when the noise is added to the signal and is independent
of it (in the probability sense). Then Px (y) is a function only
of the difference n = (y - x),
42
and we can assign a definite entropy to the noise (independent of the statistics
of the signal), namely the entropy of the distribution Q(n). This entropy will
be denoted by H(n). Theorem I6: If the signal and noise are independent and
the received signal is the sum of the transmitted signal and the noise then
the rate of transmission is R = H(y) - H(n), i.e., the entropy
of the received signal less the entropy of the noise. The channel capacity is
We have, since y = x + n: H(x,y) = H(x,n). Expanding the left side and
using the fact that x and n are independent
Hence
Since H(n) is independent of P(x), maximizing R requires maximizing H(y), the
entropy of the received signal. If there are certain constraints on the ensemble
of transmitted signals, the entropy of the received signal must be maximized
subject to these constraints. 25. CHANNEL CAPACITY WITH AN AVERAGE POWER LIMITATION
A simple application of Theorem 16 is the case when the noise is a white thermal
noise and the transmitted signals are limited to a certain average power P.
Then the received signals have an average power P+N where N is the average noise
power. The maximum entropy for the received signals occurs when they also form
a white noise ensemble since this is the greatest possible entropy for a power
P +N and can be obtained by a suitable choice of transmitted signals, namely
if they form a white noise ensemble of power P. The entropy (per second) of
the received ensemble is then
and the noise entropy is
The channel capacity is
Summarizing we have the following: Theorem 17: The capacity of a channel
of band W perturbed by white thermal noise power N when the average transmitter
power is limited to P is given by
This means that by sufficiently involved encoding systems we can transmit binary
digits at the rate W 1092 P NN bits per second, with
arbitrarily small frequency of errors. It is not possible to transmit at a higher
rate by any encoding system without a definite positive frequency of errors.
To approximate this limiting rate of transmission the transmitted signals must
approximate, in statistical properties, a white noise. 6 A system
which approaches the ideal rate may be described as follows: Let TThis
and other properties of the white noise case are discussed from the geometrical
point of view in "Communication in the Presence of Noise," loc. cit. 43 M
= 2s samples of white noise be constructed each of duration T. These are
assigned binary numbers from 0 to M - 1. At the transmitter the message sequences
are broken up into groups of s and for each group the corresponding noise sample
is transmitted as the signal. At the receiver the M samples are known and the
actual received signal (perturbed by noise) is compared with each of them. The
sample which has the least R.M.S. discrepancy from the received signal is chosen
as the transmitted signal and the corresponding binary number reconstructed.
This process amounts to choosing the most probable (a posteriori) signal. The
number M of noise samples used will depend on the tolerable frequency E of errors,
but for almost all selections of samples we have
so that no matter how small E is chosen, we can, by taking T sufficiently large,
transmit as near as we wish to TW log P NN binary digits in the time
T. Formulas similar to C = W log P NN for the white noise case have been
developed independently by several other writers, although with somewhat different
interpretations. We may mention the work of N. Wiener, W. G. Tuller,s
and H. Sullivan in this connection. In the case of an arbitrary perturbing noise
(not necessarily white thermal noise) it does not appear that the maximizing
problem involved in determining the channel capacity C can be solved explicitly.
However, upper and lower bounds can be set for C in terms of the average noise
power N the noise entropy power NI. These bounds are sufficiently close together
in most practical cases to furnish a satisfactory solution to the problem. Theorem
18: The capacity of a channel of band W perturbed by an arbitrary noise is bounded
by the ineaualities
where P = average transmitter power N = average noise power NI
= entropy power of the noise. Here again the average power of the perturbed
signals will be P + N. The maximum entropy for this power would occur if the
received signal were white noise and would be Wlog27Fe(P+N). It may not be possible
to achieve this; i.e., there may not be any ensemble of transmitted signals
which, added to the perturbing noise, produce a white thermal noise at the receiver,
but at least this sets an upper bound to H(Y). We have, therefore
This is the upper limit given in the theorem. The lower limit can be obtained
by considering the rate if we make the transmitted signal a white noise, of
power P. In this case the entropy power of the received signal must be at least
as great as that of a white noise of power P+NI since we have shown in in a
previous theorem that the entropy power of the sum of two ensembles is greater
than or equal to the sum of the individual entropy powers. Hence
Cybernetics, loc. cit. 8 "Theoretical Limitations on the
Rate of Transmission of Information," Proceedings of the Institute of Radio
Engineers, v. 37, No. 5, May, 1949, pp. 468-78. and
As P increases, the upper and lower bounds approach each other, so we have
as an asymptotic rate
If the noise is itself white, N = NI and the result reduces to the formula
proved previously:
If the noise is Gaussian but with a spectrum which is not necessarily flat,
NI is the geometric mean of the noise power over the various frequencies in
the band W. Thus
where N(f) is the noise power at frequency f. Theorem 19: If we set the
capacity for a given transmitter power P equal to
then q is monotonic decreasing as P increases and approaches 0 as a limit.
Suppose that for a given power P1 the channel capacity is
This means that the best signal distribution, say p(x), when added to the
noise distribution q(x), gives a received distribution r(y) whose entropy
power is (PI +N-n1). Let us increase the power to PI + AP by adding a white
noise of power AP to the signal. The entropy of the received signal is now
at least
by application of the theorem on the minimum entropy power of a sum. Hence,
since we can attain the H indicated, the entropy of the maximizing distribution
must be at least as great and n must be monotonic decreasing. To show that
n -+ 0 as P -+ - consider a signal which is white noise with a large P. Whatever
the perturbing noise, the received signal will be approximately a white noise,
if P is sufficiently large, in the sense of having an entropy power approaching
P+N. 2E). THE CHANNEL CAPACITY WITH A PEAK POWER LIMITATION [n some applications
the transmitter is limited not by the average power output but by the peak
instantaneous power. The problem of calculating the channel capacity is then
that of maximizing (by variation of the -nsemble of transmitted symbols) H(y)
- H(n) subject to the constraint that all the functions f (t) in the
ensemble be less than or equal to vfS-, say, for all
t. A constraint of this type does not work out as well mathematically as the
average power limitation. The most we have obtained for this case is a lower
bound valid for all N, an "asymptotic" upper bound (valid For large N) and
an asymptotic value of C for N small.
45
Theorem 20: The channel capacity C for a band W perturbed by white thermal
noise of power N is bounded by
where S is the peak allowed transmitter power. For sufficiently large
where e is arbitrarily small. As (and provided the band W starts
at 0)
We wish to maximize the entropy of the received signal. If is large this will
occur very nearly when we maximize the entropy of the transmitted ensemble.
The asymptotic upper bound is obtained by relaxing the conditions on the ensemble.
Let us suppose that the power is limited to S not at every instant of time,
but only at the sample points. The maximum entropy of the transmitted ensemble
under these weakened conditions is certainly greater than or equal to that
under the original conditions. This altered problem can be solved easily.
The maximum entronv occurs if the different samples are independent and have
a distribution function which is constant from
can be calculated as W log4S. The received signal will then have an entropy
less than
This is the desired upper bound to the channel capacity. To obtain a lower
bound consider the same ensemble of functions. Let these functions be passed
through an ideal filter with a triangular transfer characteristic. The gain
is to be unity at frequency 0 and decline linearly down to gain 0 at frequency
W. We first show that the output functions of the filter have a peak
power limitation S at all times (not just the sample points). First we note
that a pulse the filter -'I" going into
in the output. This function is never negative. The input function (in the
general case) can be thought of as the sum of a series of shifted functions
where a, the amplitude of the sample, is not greater than -vfS-.
Hence the output is the sum of shifted functions of the non-negative form
above with the same coefficients. These functions being non-negative, the
greatest positive value for any t is obtained when all the coefficients a
have their maximum positive values, i.e., -VfS. In this case the
input function was a constant of amplitude -vfS- and
since the filter has unit gain for D.C., the output is the same. Hence the
output ensemble has a peak power S. 46 The entropy of the output ensemble
can be calculated from that of the input ensemble by using the theorem dealing
with such a situation. The output entropy is equal to the input entropy plus
the geometrical mean gain of the filter:
Hence the output entropy is
and the channel capacity is greater than
We now wish to show that, for small N (peak signal power over average white
noise power), the channel capacity is approximately
Therefore, if we can find an ensemble of functions such that they correspond
to a rate nearly W log (1 + N and are limited to band W and peak S the result
will be proved. Consider the ensemble of functions of the following type.
A series of t samples have the same value, either +-\,IS- or --\,IS-,
then the next t samples have the same value, etc. The value for a series is
chosen at random, probability z for +-,fS- and z for --,fS-.
If this ensemble be passed through a filter with triangular gain characteristic
(unit gain at D.C.), the output is peak limited to ±S. Furthermore the
average power is nearly S and can be made to approach this by taking t sufficiently
large. The entropy of the sum of this and the thermal noise can be found by
applying the theorem on the sum of a noise and a small signal. This theorem
will apply if
is sufficiently small. This can be ensured by taking N small enough (after
t is chosen). The entropy power will be S+N to as close an approximation as
desired, and hence the rate of transmission as near as we wish to
PART V: THE RATE FOR A CONTINUOUS SOURCE 27. FIDELITY EVALUATION FUNCTIONS
In the case of a discrete source of information we were able to determine
a definite rate of generating information, namely the entropy of the underlying
stochastic process. With a continuous source the situation is considerably
more involved. In the first place a continuously variable quantity can assume
an infinite number of values and requires, therefore, an infinite number of
binary digits for exact specification. This means that to transmit the output
of a continuous source with exact recovery at the receiving point requires,
47 in general, a channel of infinite capacity (in bits per second). Since,
ordinarily, channels have a certain amount of noise, and therefore a finite
capacity, exact transmission is impossible. This, however, evades the real
issue. Practically, we are not interested in exact transmission when we have
a continuous source, but only in transmission to within a certain tolerance.
The question is, can we assign a definite rate to a continuous source when
we require only a certain fidelity of recovery, measured in a suitable way.
Of course, as the fidelity requirements are increased the rate will increase.
It will be shown that we can, in very general cases, define such a rate, having
the property that it is possible, by properly encoding the information, to
transmit it over a channel whose capacity is equal to the rate in question,
and satisfy the fidelity requirements. A channel of smaller capacity is insufficient.
It is first necessary to give a general mathematical formulation of the idea
of fidelity of transmission. Consider the set of messages of a long duration,
say T seconds. The source is described by giving the probability density,
in the associated space, that the source will select the message in question
P(x). A given communication system is described (from the external point of
view) by giving the conditional probability Px(y) that if message
x is produced by the source the recovered message at the receiving point will
be y. The system as a whole (including source and transmission system) is
described by the probability function P(x, y) of having message x and final
output y. If this function is known, the complete characteristics of the system
from the point of view of fidelity are known. Any evaluation of fidelity must
correspond mathematically to an operation applied to P(x,y). This operation
must at least have the properties of a simple ordering of systems; i.e., it
must be possible to say of two systems represented by Pt (x, y) and P2 (x,
y) that, according to our fidelity criterion, either (1) the first has higher
fidelity, (2) the second has higher fidelity, or (3) they have equal fidelity.
This means that a criterion of fidelity can be represented by a numerically
valued function:
whose argument ranges over possible probability functions P(x,y). We will
now show that under very general and reasonable assumptions the function v(P(x,y))
can be written in a seemingly much more specialized form, namely as an average
of a function p(x,y) over the set of possible values of x and y:
To obtain this we need only assume (1) that the source and system are ergodic
so that a very long sample will be, with probability nearly 1, typical of
the ensemble, and (2) that the evaluation is "reasonable" in the sense that
it is possible, by observing a typical input and output xl and yl, to form
a tentative evaluation on the basis of these samples; and if these samples
are increased in duration the tentative evaluation will, with probability
1, approach the exact evaluation based on a full knowledge of P(x,y). Let
the tentative evaluation be p(x,y). Then the function p(x,y) approaches (as
T ~) a constant for almost all (x,y) which are in the high probability region
corresponding to the system:
and we may also write
since
this establishes the desired result. The function p(x, y) has the general
nature of a "distance" between x and y.9 It measures how undesirable t is
(according to our fidelity criterion) to receive y when x is transmitted.
The general result given above ;an be restated as follows: Any reasonable
evaluation can be represented as an average of a distance function )ver the
set of messages and recovered messages x and y weighted according to the probability
P(x,y) of vetting the pair in question, provided the duration T of the messages
be taken sufficiently large. The following are simple examples of evaluation
functions: 91t is not a "metric" in the strict sense, however,
since in general it does not satisfy either p(x,y) =p(y,x) or p(x,y)+p(y,z)
> p(x, z).
48
1. R.M.S. criterion.
In this very commonly used measure of fidelity the distance function p(x,y)
is (apart from a constant factor) the square of the ordinary Euclidean distance
between the points x and y in the associated function space.
2. Frequency weighted
R.M.S. criterion. More generally one can apply different weights to the different
frequency components before using an R.M.S. measure of fidelity. This is equivalent
to passing the difference x(t) -y(t) through a shaping filter and then determining
the average power in the output. Thus let
and
then
3. Absolute error criterion.
4. The structure of the ear and brain determine implicitly an evaluation,
or rather a number of evaluations, appropriate in the case of speech or music
transmission. There is, for example, an "intelligibility" criterion in which
p(x,y) is equal to the relative frequency of incorrectly interpreted words
when message x(t) is received as y(t). Although we cannot give an explicit
representation of p(x,y) in these cases it could, in principle, be determined
by sufficient experimentation. Some of its properties follow from well-known
experimental results in hearing, e.g., the ear is relatively insensitive to
phase and the sensitivity to amplitude and frequency is roughly logarithmic.
5. The discrete case can be considered as a specialization in which we have
tacitly assumed an evaluation based on the frequency of errors. The function
p(x,y) is then defined as the number of symbols in the sequence y differing
from the corresponding symbols in x divided by the total number of symbols
in x. 28. THE RATE FOR A SOURCE RELATIVE TO A FIDELITY EVALUATION We are now
in a position to define a rate of generating information for a continuous
source. We are given P(x) for the source and an evaluation v determined by
a distance function p(x,y) which will be assumed continuous in both x and
y. With a particular system P(x, y) the quality is measured by
Furthermore the rate of flow of binary digits corresponding to P(x,y) is
We define the rate R1 of generating information for a given quality vl of
reproduction to be the minimum of R when we keep v fixed at vi and
vary Px(y). That is:
49 subject to the constraint:
This means that we consider, in effect, all the communication systems that
might be used and that transmit with the required fidelity. The rate of transmission
in bits per second is calculated for each one and we choose that having the
least rate. This latter rate is the rate we assign the source for the fidelity
in question. The justification of this definition lies in the following result:
Theorem 21: If a source has a rate R1 for a valuation v1 it is possible
to encode the output of the source and transmit it over a channel of capacity
C with fidelity as near v1 as desired provided R1 < C. This is not possible
if Rl > C. The last statement in the theorem follows immediately from
the definition of R1 and previous results. If it were not true we could transmit
more than C bits per second over a channel of capacity C. The first part of
the theorem is proved by a method analogous to that used for Theorem 11. We
may, in the first place, divide the (x,y) space into a large number of small
cells and represent the situation as a discrete case. This will not change
the evaluation function by more than an arbitrarily small amount (when the
cells are very small) because of the continuity assumed for p(x,y). Suppose
that PI(x,y) is the particular system which minimizes the rate and gives R1.
We choose from the high probability y's a set at random containing 2(R1+E)T
members where E -+ 0 as T -+ -. With large T each chosen point will be connected
by a high probability line (as in Fig. 10) to a set of x's. A calculation
similar to that used in proving Theorem 11 shows that with large T almost
all x's are covered by the fans from the chosen y points for almost all choices
of the y's. The communication system to be used operates as follows: The selected
points are assigned binary numbers. When a message x is originated it will
(with probability approaching 1 as T -+ -) lie within at least one of the
fans. The corresponding binary number is transmitted (or one of them chosen
arbitrarily if there are several) over the channel by suitable coding means
to give a small probability of error. Since RI < C this is possible. At
the receiving point the corresponding y is reconstructed and used as the recovered
message. The evaluation vi for this system can be made arbitrarily close to
v1 by taking T sufficiently large. This is due to the fact that for each long
sample of message x(t) and recovered message y(t) the evaluation approaches
v1 (with probability 1). It is interesting to note that, in this system, the
noise in the recovered message is actually produced by a kind of general quantizing
at the transmitter and not produced by the noise in the channel. It is more
or less analogous to the quantizing noise in PCM. 29. THE CALCULATION OF RATES
The definition of the rate is similar in many respects to the definition of
channel capacity. In the former
with Px(y) fixed and possibly one or more other constraints (e.g.,
an average power limitation) of the form K= ffP(x,y)A(x,y)dxdy. A partial
solution of the general maximizing problem for determining the rate of a source
can be given. Using Lagrange's method we consider
50 The variational equation (when we take the first variation on P(x,y))
leads to
where A is determined to give the required fidelity and B(x) is chosen to
satisfy
This shows that, with best encoding, the conditional probability of a certain
cause for various receive( y, Py(x) will decline exponentially
with the distance function p(x,y) between the x and y in question. In the
special case where the distance function p(x, y) depends only on the (vector)
difference between a and v.
we have
Hence B(x) is constant, say a, and
Unfortunately these formal solutions are difficult to evaluate in particular
cases and seem to be of little value. In fact, the actual calculation of rates
has been carried out in only a few very simple cases. If the distance function
p(x, y) is the mean square discrepancy between x and y and the message ensemble
is white noise, the rate can be determined. In that case we have
with N = (x - y) z. But the Max Hy (x) occurs when y - x is a white
noise, and is equal to Wl log 27FeN where W is the bandwidth of the message
ensemble. Therefore
where Q is the average message power. This proves the following: Theorem
22: The rate for a white noise source of power Q and band Wl relative
to an R.M.S. measure of fidelity is
where N is the allowed mean square error between original and recovered
messages. More generally with any message source we can obtain inequalities
bounding the rate relative to a mean square error criterion. Theorem 23:
The rate for any source of band Wl is bounded by
where Q is the average power of the source, Ql its entropy power and
N the allowed mean square error. The lower bound follows from the fact that
the MaxHy(x) for a given (x-y)z =N occurs in the white noise case.
The upper bound results if we place points (used in the proof of Theorem 21)
not in the best way but at random in a sphere of radius -\I-Q-
N. ACKNOWLEDGMENTS The writer is indebted to his colleagues at the Laboratories,
particularly to Dr. H. W. Bode, Dr. J. R. Pierce, Dr. B. McMillan, and Dr.
B. M. Oliver for many helpful suggestions and criticisms during the course
of this work. Credit should also be given to Professor N. Wiener, whose elegant
solution of the problems of filtering and prediction of stationary ensembles
has considerably influenced the writer's thinking in this field. APPENDIX
5 Let S1 be any measurable subset of the g ensemble, and S2 the subset of
the f ensemble which gives S1 under the operation T. Then Sl = TS2. Let W
be the operator which shifts all functions in a set by the time A. Then
since T is invariant and therefore commutes with W. Hence if m[S] is the probability
measure of the set S where the second equality is by definition of measure
in the g space, the third since the f ensemble is stationary, and the last
by definition of g measure again. To prove that the ergodic property is preserved
under invariant operations, let St be a subset of the g ensemble which is
invariant under W, and let S2 be the set of all functions f which transform
into S1. Then
this implies
APPENDIX 6 The upper bound, N3 < Nl +N2, is due to the fact that the maximum
possible entropy for a power Ni +N2 occurs when we have a white noise of this
power. In this case the entropy power is Ni +N2. To obtain the lower bound,
suppose we have two distributions in n dimensions P(xi) and q(xi) with entropy
powers N1 and N2. What form should P and q have to minimize the entropy power
N3 of their convolution r(xi):
52
and
H3 = -PH2
Then r(xi) will also be normal with quadratic form Cij. If the
inverses of these forms are ate, bij, cij then We wish
to show that these functions satisfy the minimizing conditions if and only
if ate = Kbij and thus give the minimum H3 under the constraints.
First we have
This should equal
53 APPENDIX 7 The following will indicate a more general and more rigorous
approach to the central definitions of communication theory. Consider a probability
measure space whose elements are ordered pairs (x,y). The variables x, y are
to be identified as the possible transmitted and received signals of some
long duration T. Let us call the set of all points whose x belongs to a subset
Sl of x points the strip over S1, and similarly the set whose y belong to
S2 the strip over S2. We divide x and y into a collection of non-overlapping
measurable subsets Xi and Yt approximate to the rate of transmission R by
and consequently the sum is increased. Thus the various possible subdivisions
form a directed set, with R monotonic increasing with refinement of the subdivision.
We may define R unambiguously as the least upper bound for R1 and write it
This integral, understood in the above sense, includes both the continuous
and discrete cases and of course many others which cannot be represented in
either form. It is trivial in this formulation that if x and u are in one-to-one
correspondence, the rate from u to y is equal to that from x to y. If v is
any function of y (not necessarily with an inverse) then the rate from x to
y is greater than or equal to that from x to v since, in the calculation of
the approximations, the subdivisions of y are essentially a finer subdivision
of those for v. More generally if y and v are related not functionally but
statistically, i.e., we have a probability measure space (y, v), then R(x,
v) < R(x,y). This means that any operation applied to the received signal,
even though it involves statistical elements, does not increase R. Another
notion which should be defined precisely in an abstract formulation of the
theory is that of "dimension rate," that is the average number of dimensions
required per second to specify a member of an ensemble. In the band limited
case 2W numbers per second are sufficient. A general definition can be framed
as follows. Let fa (t) be an ensemble of functions and let PT [f,
(t), f,Q (t)] be a metric measuring
This is a generalization of the measure type definitions of dimension in topology,
and agrees with the intu itive dimension rate for simple ensembles where the
desired result is obvious.