At bit rates of around 16 kbits/s and lower the quality of waveform codecs falls rapidly, as can be seen in Figure 5. Thus at these rates hybrid codecs, especially CELP codecs and their derivatives, tend to be used. However because of the forward adaptive determination of the short term filter coefficients used in most of these codecs, they tend to have high delays. The delay of a speech codec is defined as the time from when a speech sample arrives at the input of its encoder to when the corresponding sample is produced at the output of its decoder, assuming the bit stream from the encoder is fed directly to the decoder. For a typical hybrid speech codec this delay will be of the order of 50 to 100 ms, and such a high delay can cause problems. Therefore in 1988 the CCITT released a set of requirements for a new 16 kbits/s standard, the chief requirements being that the codec should have speech quality comparable to the G721 32 kbits/s ADPCM codec in both error free conditions and over noisy channels, and should have a delay of less than 5ms and idealy less than 2ms.
All the CCITT requirements were met by a backward adpative CELP codec which was developed at AT\&T Bell Labs, and was standardised in 1992 as G728. This codec uses backward adaption to calculate the short term filter coefficients, which means that rather than buffer 20 ms or so of the input speech to calculate the filter coefficients they are found from the past reconstructed speech. This means that the codec can use a much shorter frame length than traditional CELP codecs, and G728 uses a frame length of only 5 samples giving it a total delay of less than 2 ms. A high order (p=50) short term predictor is used, and this eliminates the need for any long term predictor. Thus all ten bits which are available for each five sample vector at 16 kbits/s are used to represent the fixed codebook excitation. Of these ten bits seven are used to transmit the fixed codebook index, and the other three are used to represent the excitation gain. Backward gain adaption is used to aid the quantization of the excitation gain, and at the decoder a postfilter is used to improve the perceptual quality of the reconstructed speech. All this leads to a codec at 16 kbits/s with a delay of less than 2 ms, speech quality equal to or better than G721 and a good robustness to channel errors.
C code to implement the G728 encoder and decoder are available for FTP here . This code was written by Alex Zatsman of Analogue Devices, and revised by Mike Concannon of Columbia University, NY. Many thanks to them.