Digital/Analog Voice Demo

This is a comparative demonstration of some digital and analog voice transmission alternatives for the power-limited amateur satellite channel.

These demonstrations all use the same 9 second test file of me calling CQ, recorded at 8000 samples per second with 16 bits per sample. All audio processing was in 16-bit linear format before conversion to 8-bit mu-law for this web page.

For reference, here is the original recording converted directly to mu-law without any other processing.
64kb/s mu-law PCM, no added noise

Now let's run this signal through a simulated SSB transmitter. First we simulate a typical bandpass filter. Here I've used a 256-point finite impulse response (FIR) digital filter with a nominal passband from 300 Hz to 2.7 KHz and a passband gain of 0dB.
Bandpass filtered voice, no noise

Now let's add some channel noise. This was done by generating Gaussian-distributed random numbers, running them through the same bandpass filter, and adding the filtered noise to the already filtered voice. The average signal-to-noise ratio is 10.27 dB, as determined by separately summing the squares of the filtered signal and filtered noise samples over the entire 9 second period and computing their ratios.
Bandpass filtered simulated SSB signal, S/N=10.27dB

Let's Go Digital

Now the question arises: how fast could we send digital data (e.g., digitized voice) using the same average transmitter power, without worrying about bandwidth? Well, to do that we first need to know the ratio of the average signal power S to the noise spectral density, N0. This is 10.27dB (the S/N in the filter bandwidth) plus the filter bandwidth expressed in dB relative to 1Hz.

We could assume the filter bandwith is just 2700 - 300 = 2400 Hz, or 33.8 dBHz and not be far off. But just to be sure I measured the noise bandwidth of the actual filter by running Gaussian noise through it, measuring the ratio of output to input power, and multiplying by one half the sampling rate. This gave a figure of 2441.5 Hz (33.87 dBHz) which is pretty close to 2400 Hz; the slight difference is due to the filter not having "brick wall" skirts (no real filter does).

So now we can compute the average S/N0 ratio for the noisy SSB signal: 10.27dB (in 2441 Hz BW) + 33.87 dBHz BW = 44.14 dBHz. In other words, with the same total energy we used in our 9 second speech sample we could have sent for 9 seconds a carrier with a 44.14 dB S/N ratio as measured in a 1 Hz receiver bandwidth.

Now let's assume we have a digital modem and FEC technique that needs an Eb/N0 (energy per bit to noise spectral density ratio) of 3dB. This can be achieved with ideal BPSK or QPSK modulation and rate 1/2 constraint length 32 convolutional encoding with sequential decoding. The data rate we can achieve with this scheme is therefore 44.14 dBHz - 3dB = 41.14 dBbps, or 13 kb/s.

Voice Coding (Vocoding)

For the past 35 years, standard telephone company practice for digital voice is logarithmic A/D conversion sampling at 8 Khz with 8 bits per sample. That's 64 kb/s. But PCM is not particularly efficient by modern standards. If you can exploit what you know about how humans generate and hear speech, you can compress it to much lower data rates using algorithms variously known as speech coders, voice coders or vocoders. These lower rates are the key to power-efficient digital speech transmission.

Vocoders represent lossy compression. That is, the bits coming out of the decoder are not identical to those going into the encoder. But hopefully the result at least sounds much like the original.

Vocoders generally work by modeling the human vocal tract as an excitation source (the larynx or "vocal cords") followed by a series of accoustic filters formed by the vocal tract (throat, mouth, sinuses, etc). Some of these filters vary slowly with time as the tongue, teeth, lips, etc move. The muscles that shape speech move much more slowly than the bandwidth of the speech itself; that's the key to the vocoder's ability to reduce the required data rate.

The voice decoder (decompressor) models the human vocal tract as a set of digital filters whose parameters are encoded in the compressed speech. These filters are driven ("excited") by a signal that represents the vibration of the original speaker's vocal cords. The big differences among vocoders in data rate, CPU requirements and voice quality generally come from different approaches to encoding the excitation, not from the filters that follow.

13kb/s GSM 06.10 RPE-LTP

A simple approach to the excitation problem is to just send the audio signal that remains after you strip off the filtering. This is known as "residual pulse excitation" and is the basis of the GSM 06.10 RPE-LTP vocoder.

Due largely to the bandwidth required to encode the residual signal, the GSM vocoder requires 13kb/s for its encoded data stream. This is exactly the capacity we computed earlier for our simulated SSB signal. (This is not a coincidence -- I picked the SSB S/N to achieve this result).

So here is the original audio signal, encoded and decoded using the GSM vocoder operating at a constant 13kb/s data rate and without any data errors. I've also included another link to the simulated SSB signal you already heard to make A/B comparison a little easier.
GSM vocoder at 13kb/s Power-equivalent SSB (S/N=10.27dB)

The "cost" in total transponder energy to send this digital signal is exactly the same as for the analog (SSB) case. Some vocoder artifacts are noticeable, but the overall voice quality is clearly much better than the SSB signal.

4.8kb/s FED-STD 1016 CELP

The GSM vocoder is readily available and runs in better than real time on a 486DX2-66, but the data rate is rather high. Lower data rate vocoders are available, and they translate directly into additional transponder power savings. Since the lion's share of the bits from the GSM coder represent the residual, we need to find a more efficient way to represent it. One common way is to use a codebook of waveforms already known to the sender and receiver. Now we can send an index into this codebook instead of sending the actual waveform.

Here is our test file encoded in FED-STD-1016 Codebook Excited Linear Prediction (CELP), decoded and converted to mu-law. The encoded data rate is 4800 bps.
FED-STD-1016 CELP at 4.8kb/s Power-equivalent SSB (S/N=5.94dB)

Not bad, eh? The vocoder artifacts are again noticeable, but remember that 4800 bps is 4.33 dB down from 13kb/s. So the "power equivalent" SSB signal has an average S/N of 10.27 - 4.33 = 5.94 dB.

Unfortunately, due to the way it repeatedly searches its codebooks when encoding, CELP is much more computationally intensive than GSM 06.10. Using the default options, this test file took 96.9 seconds to encode and decode on a 486DX4-100. That's less than 10% of real time, although admittedly I have not tried to optimize this code in any way. CELP probably needs a fairly hefty DSP chip to run in real time.

Qualcomm QCELP (IS-96a)

Qualcomm developed its own "QCELP" vocoder for its Code Division Multiple Access (aka spread spectrum) digital cellular telephone system. QCELP is based on CELP, but with the ability to run at variable rates. When you stop talking, the vocoder and modem "idle" at a low data rate, saving power. There are four data rates in QCELP: full, half, quarter and eighth. Full rate is about 8kb/s, which would be "power equivalent" to a SSB S/N of 8.16dB. At idle (1kb/s) QCELP drops another 6dB, but then again a SSB transmitter produces no power at all in this case.

The benefits of a variable rate vocoder are substantial in a full duplex cellular system since a typical speaker talks only about 40% of the time. The benefits in the half-duplex push-to-talk environment typical in ham radio are less clear.

Here is our test file encoded and decoded in IS-96A QCELP. The encoder produced a total of 450 frames. 410 were full rate, 15 at half rate, 10 at quarter rate, and 15 at eighth rate for an average data rate of about 7.5 kb/s, 93.75% of full rate.
IS-96a vocoder, average 7.5kb/s Power-equivalent SSB (S/N=7.88db)

13kb/s Qualcomm QCELP

We have a new version of QCELP with a full (peak) rate of 13kb/s. It has better voice quality than the IS-96A coder even when the average data rates are constrained to be the same. As I understand it, allowing a higher peak data rate allows faster response to speech transients, improving quality even when the average rate is the same.

What makes the QCELP coders particularly interesting for our purposes is that maximizing the capacity of a spread spectrum cellular telephone system involves minimizing the average data rate. The peak data rate is much less important, especially since you have many users sharing the channel at once; the "law of large numbers" comes to our aid.

The exact same thing is true for amateur satellite use if you consider spacecraft energy and not peak transponder output power to be the critical resource. Lots of users share the transponder at the same time, and it's the sum total power that matters. So vocoders that work well for CDMA cellular ought to work equally well on a linear amateur satellite transponder even when spread spectrum is not used.
13kb/s QCELP vocoder, average 7.5kb/s Power-equivalent SSB (S/N=7.88db)

2.4kb/s FED-STD-1015 LPC-10

One of the earliest vocoders still in use is LPC, Linear Predictive Coding. This particular version of LPC runs at only 2400bps.
LPC-10 vocoder at 2.4kb/s Power-equivalent SSB (SN=2.95db)

The quality of LPC is significantly lower than CELP because it does not attempt to encode the "excitation" or residual with high accuracy. The encoder simply passes pitch information along to the decoder, which regenerates it from a pulse stream. The effect is much like a speech synthesizer (which essentially is a LPC decoder) or a person using an artificial larynx. Nevertheless, the speech is generally intelligible, and it does run at a pretty low data rate. And unlike CELP, it runs in better than real-time on a 486, taking only 50% of a 486DX4-100.

Caveats

Bandwidth

The benefits of digital voice are not free. The main "gotcha" is RF bandwidth. A 13kb/s modem that needs an Eb/N0 of only 3dB cannot possibly fit in only 2.4 KHz. Uncoded QPSK requires a bare minimum of 1/2 Hz of bandwidth for every symbol per second, and realizable modems need more than this. And the rate 1/2 FEC scheme that lowers the Eb/N0 to 3dB will double the required bandwidth. So a fair bandwidth estimate would be 20-25 KHz, ten times that needed for SSB. But the case can be certainly made that on an amateur satellite, power is a much more valuable commodity than RF bandwidth. Indeed, given the "use it or lose it" nature of amateur allocations, a strong case can be made to use as much bandwidth as possible.

Threshold Effects

Decrease the power of a SSB signal by 1 dB and the result is simply a 1 dB reduction in audio S/N. Most might not even notice. (Even the differences between the simulated SSB signals given here are a little subtle to me.) But a 1dB change in S/N to an RF demodulator and FEC decoder can make the difference between perfect operation and no operation. Being inherently nonlinear, digital modulation has a threshold effect much like that of wideband FM, only more pronounced. The stronger (i.e., more efficient) the modulation and coding, the sharper the threshold. Digital modulation can adapt to changing conditions by adjusting power, transmission speed and coding rate, but mechanisms to do this have to be built.

On the other hand, increasing the power of a digital signal above that required to demodulate without errors yields no additional improvement in speech quality. This avoids the incentive inherent in SSB to run excessive power to make one's signal sound better. Furthermore, transmitter power could be continuously and automatically adjusted to exactly that required, eliminating the "laziness" factor as well.

Duty Cycle

The results given above depend somewhat on the characteristics of the audio test file I used. In particular, with a constant rate vocoder the duty cycle of the speech influences the results. When you stop talking into a SSB transmitter, it stops producing significant power even if you keep the microphone button pushed. A constant rate vocoder/modem, though, would keep on consuming the same power as when you're talking. The Qualcomm QCELP vocoder would alleviate this problem, but as stated above this may not be as much of an advantage in a PTT environment as it is in a full duplex cellular system. Note that for our voice sample the average rate was 94%, because there were no significant pauses within the sample.

Back to Phil Karn's Amateur Digital Communications Page

Last updated: 8/11/95