Joint Algorithm/Hardware Design

Requirements for 5G

The next generation of wireless communication promises greater user experience and new applications. In order to deliver on this promise, 5G will require much higher throughputs to be achieved, so that the wireless communication behaves like a much wider pipe, that can deliver billions of bits per second. This will enable video streams to be opportunistically buffers when the channel conditions are good, so that the playback doesn’t stall when they become poor. Also, 5G will require much lower latencies to be achieved, so that the wireless communication behaves like a much shorter pipe, that delivers the bits with less than a millisecond of delay. This will enable user inputs on a handset to control graphics, artificial intelligence and physics processing that is performed on powerful computers in the cloud, with the results delivered to the handset’s display without the user noticing any delay.

Candidates for 5G

Many candidate technologies have been proposed for 5G:

Millimeter wave = more spectrum
Small cells = more base stations
Massive MIMO = more antennas
Coordinated multipoint = more cooperation
Cognitive radio = more intelligence
New waveforms = more efficiency

All of these imply more signal processing.

Signal processing is the bottleneck of 5G

State of the art wireless communication signal processing works well in 4G. It has a throughput and a latency that are comparable to those of 4G wireless communication. However, unless new advances are achieved in wireless communication signal processing, a bottleneck will be created in 5G. It is no use having very high throughput transmission, if the corresponding processing cannot be completed with an equally high throughput - the overall achievable throughput is limited to that of the weakest link in the chain. Likewise, it is no use having a very low latency transmission, if the corresponding processing cannot be completed with a very low latency - the overall latency is given by the sum of the latencies in the chain.

The conventional approach

The conventional approach to signal processing research is preventing the required advances - Traditionally, some research groups have worked on signal processing algorithms, while others have worked on signal processing implementation. The first type of research group would typically use simulations to characterise the performance, bandwidth, throughput, latency and energy of transmissions using their proposed signal processing algorithms. They would also characterise the complexity of the signal processing algorithms. Later and often separately, the second type of research group would build hardware implementations of the signal processing algorithms and characterise their hardware resources, flexibility, throughput, latency and energy. They would often find that these different characteristics do not have direct one-to-one relationships with the complexity of the algorithms. Owing to this, some “low complexity” algorithms can be very challenging to implement. This disconnect between the signal processing design and the circuit design can lead to the mismatches between the throughputs, latencies and energy consumptions of the transmissions and the processing, causing bottlenecks of the type discussed above.

The Southampton Wireless approach

The Southampton Wireless research group takes a different approach to the design of signal processing algorithms and their implementation - we design these things jointly, taking consideration of the hardware characteristics right from the start of the algorithm design. This approach allows us to design the algorithm so that it is easy to implement and to design the implementation so that it realises the full potential of the algorithm. We jointly optimise the throughputs, latencies and energy consumptions of the transmission and the processing, so that neither imposes a bottleneck on the other. With this approach, we don’t have to worry about “complexity” - our focus is producing something that works well in theory, but also in practice. This is the approach that we have adopted in our spin-out company AccelerComm, which is commercialising our research on the joint design of signal processing algorithms and their implementation.

R. G. Maunder, T. Chen, Aid for Start-Ups, Innovate UK.

R. G. Maunder, B. M. Al-Hashimi, Highly-parallel algorithms and architectures for high-throughput wireless receivers, EP/L010550/1.

R. G. Maunder, B. M. Al-Hashimi, Channel Decoder Architectures for Energy-Constrained Wireless Communication Systems: Holistic Approach, EPSRC EP/J015520/1.

Papers:

Here are some links to our recent algorithms and their implementations in various hardware platforms, including Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Graphics Processing Unit (GPU) and Network on Chip (NoC):

M. F. Brejza, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, “A Flexible Iterative Receiver Architecture for Wireless Sensor Networks: A Joint Source and Channel Coding Design Example,” IET Wireless Sensor Systems (Awaiting Publication). [Online]. Available: http://eprints.soton.ac.uk/402699/

R. Al-Dujaily, A. Li, R. G. Maunder, T. Mak, B. M. Al-Hashimi, and L. Hanzo, “A scalable turbo decoding algorithm for high-throughput network-on-chip implementation,” IEEE Access (Awaiting Publication). [Online]. Available: http://eprints.soton.ac.uk/402785/

M. F. Brejza, T. Wang, W. Zhang, D. Al-Khalili, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, “Exponential Golomb and Rice Error Correction codes for generalized near-capacity joint source and channel coding,” IEEE Access (Awaiting Publication). [Online]. Available: http://eprints.soton.ac.uk/397286/

A. Li, P. Hailes, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, “1.5 Gbit/s FPGA Implementation of a Fully-Parallel Turbo Decoder Designed for Mission-Critical Machine-Type Communication Applications,” IEEE Access, vol. 4, pp. 5452–5473, Aug. 2016. [Online]. Available: http://eprints.soton.ac.uk/399185/

A. Li, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, “Implementation of a fully-parallel turbo decoder on a general-purpose graphics processing unit,” IEEE Access, vol. 4, pp. 5624–5639, June 2016. [Online]. Available: http://eprints.soton.ac.uk/397525/

P. Hailes, L. Xu, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, “A survey of FPGA-based LDPC decoders,” IEEE Commun. Surveys Tuts., vol. 18, no. 2, pp. 1098–1122, Apr. 2016. [Online]. Available: http://dx.doi.org/10.5258/SOTON/384946

X. Zuo, I. Perez-Andrade, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, “Improving the tolerance of stochastic LDPC decoders to overclocking-induced timing errors: A tutorial and design example,” IEEE Access, vol. 4, pp. 1607–1629, Apr. 2016. [Online]. Available: http://eprints.soton.ac.uk/386027/

I. Perez-Andrade, S. Zhong, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, “Stochastic computing improves the timing-error tolerance and latency of turbo decoders: Design guidelines and trade-offs,” IEEE Access, vol. 4, pp. 1008–1038, Feb. 2016. [Online]. Available: http://eprints.soton.ac.uk/386516/

A. Li, L. Xiang, T. Chen, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, “VLSI implementation of fully-parallel LTE turbo decoders,” IEEE Access, vol. 4, pp. 323–346, Jan. 2016. [Online]. Available: http://eprints.soton.ac.uk/386016/

M. F. Brejza, L. Li, R. G. Maunder, B. M. Al-Hashimi, C. Berrou, and L. Hanzo, “20 years of turbo coding and energy-aware design guidelines for energy-constrained wireless applications,” IEEE Commun. Surveys Tuts., vol. 18, no. 1, pp. 8–28, Jan. 2016. [Online]. Available: http://eprints.soton.ac.uk/378161/

R. G. Maunder, “A fully-parallel turbo decoding algorithm,” IEEE Trans. Commun., vol. 63, no. 8, pp. 2762–2775, Aug. 2015. [Online]. Available: http://eprints.soton.ac.uk/368984/

M. F. Brejza, W. Zhang, R. G. Maunder, B. M. Al-Hashimi, and L. Hanzo, “Adaptive iterative detection for expediting the convergence of an iterative JSCC, and demodulator,” in Proc. IEEE Int. Conf. Commun., London, UK, Jun. 2015. [Online]. Available: http://eprints.soton.ac.uk/375712/