# A DMT Modem Prototype for Broadband PLC J.L. Carmona, F.J. Cañete, J.A. Cortés and L. Díez Departamento de Ingeniería de Comunicaciones Universidad de Málaga (Spain) Abstract—In this paper, the design and performance of a modem prototype for high bit-rate indoor Power-Line Communications (PLC) are presented. The system architecture is based on two Field Programmable Gate-Arrays (FPGA) and a Digital Signal Processor (DSP), in which the transmitter and receiver algorithms have been implemented. The modulation employed is Discrete MultiTone (DMT) with bit-loading and adaptive equalization to channel selectivity both in frequency and time. The structure of the hardware and software used are explained, and the results obtained after testing the modem in some indoor PLC channels are discussed. #### I. INTRODUCTION Besides considerable noise and distortion, indoor power line communications channels show a response with a twofold variation with time. On one hand, they exhibit long term variations due to connections and disconnections of electrical devices, although these changes occur in a time scale of minutes. On the other hand, a short-time variation appears, because of the dependency of the behaviour of some electrical devices on the mains voltage (230V and 50 Hz in Europe). This dependency leads to model PLC channels as linear periodically time-varying (LPTV) systems with cyclostationary noise components, both synchronous with the mains cycle [1]. The most extended proposals to characterize this kind of channels are based on LTI models [3, 4]. Under this constraint, transmission techniques usually employed are not designed to compensate for the short term variation of channel response, and this fact limits their performance. On the other hand, the adoption of a LPTV channel model [2] opens a new scenario in which fast adaptation techniques can be used. In this way, these techniques need to be debugged in order to achieve an optimum performance. In this paper, a modem prototype based on DMT modulation is presented. The elements of the DMT transmitter and receiver sides, the hard core of the prototype, are considered as the front-end of the system, which is connected to a soft core that can be easily reprogrammed to test different transmission techniques in real time. Broadband modem prototypes presented so far have not made use of fast adaptive techniques, since they are designed assuming an LTI channel model (see e.g. [5]). However, the prototype described here introduces a frequency equalizer with fast adaptation to compensate for the short-time variation observed in actual channels. The structure of the paper is as follows. In section II, the architecture of the system is presented. Section III describes This work has been supported in part by the Spanish Ministry of Educación y Ciencia under Project No TIC2003-06842. the hardware platform that composes the physical support of the prototype. The descriptions of the transmitter and receiver sides are explained in sections IV and V, respectively, while in section VI some tests are explained and their results are discussed. Finally, section VII summarizes the key items of the paper. #### II. ARCHITECTURE OF THE SYSTEM The operative principle is based on a close loop where the transmitter side inserts a pattern signal, formed by continuous DMT symbols, at the transmission point of the channel, and the receiver side recovers them from the other extreme. A diagram of the DMT prototype setup is shown in Fig. 1. The prototype is supported by a board that is hosted in a personal computer (PC), which is communicated via peripheral component interconnect (PCI) bus to save the receiver side results. The transmitter and receiver sides are connected to the grid through coupling circuits (CC), whose fundamental purpose is to protect the board from the mains and to suppress spectrum replicas. Fig. 1. Diagram of the DMT prototype setup. Since the transmitter and receiver are integrated in the same board, synchronization problems are easily avoided using a common clock source. Consequently, in the results the effect of degradation due to non-perfect synchronization is not included. ### II-A. DMT Specifications Table I gathers the system main parameters. Fig. 2. Hardware platform of the prototype. | Sampling period $(T_S)$ | $0.02\mu s$ | |-----------------------------------|------------------------------------| | Number of Subcarriers $(N)$ | 780 | | DFT size $(N_{DFT})$ | 2048 | | Cyclic Prefix duration $(T_{CP})$ | $[20T_S, 512T_S]$ | | DFT block duration $(T_{DFT})$ | $40.96\mu s = 2048 \cdot T_S$ | | DMT symbol duration | $[41.36\mu s, 51.2\mu s] =$ | | | $[2068 \cdot T_S, 2560 \cdot T_S]$ | $\begin{tabular}{ll} TABLE\ I \\ DMT\ system\ parameters. \end{tabular}$ The transmission scheme is based on the use of the Discrete Fourier Transform (DFT), and its inverse, to perform modulation and demodulation. The number of possible subcarriers in baseband is $N_{DFT}/2$ , where $N_{DFT}$ is the size of DFT. However, since the pass band of the coupling circuits extends from 1 MHz to 20 MHz, only $N < N_{DFT}/2$ subcarriers are used. An increase in the number of subcarriers reduces distortion and hence, improves the bit rate. However, the improvement obtained using more than 1024 subcarriers does not justify the considerable rise in the implementation complexity [6]. On the other hand, if the DMT symbol is too long, it is possible to exceed the channel coherence time. As the system is a modem prototype, the length of the cyclic prefix (CP) can be chosen between 20 and 512 samples, making possible the choice of the spectral efficiency ( $\eta = T_{DFT}/(T_{DFT}+T_{CP})$ ) in the range of 99 % to 80 %. This wide range of values allows choosing an optimal selection for each channel, although a value around 300 samples would be an adequate trade-off between efficiency and signal to distortion for most channels [6]. # III. HARDWARE PLATFORM A high speed DMT system requires a great computational capacity to process the algorithms needed to modulate and demodulate in real time. In order to satisfy this demand, two Field Programmable Gate Arrays (FPGAs) are provided to serve as the front end of the system. These devices, with their inherent parallelism, are best suited for continuous bit by bit or word by word processing with a high throughput. Fig. 2 depicts the hardware platform of the prototype, pointing out the components of each part. However, conditional algorithms with many branches are somehow more difficult to program onto FPGA, which makes difficult to approach the design of adaptive techniques. Furthermore, it is necessary to save several results from the receiver side, in order to evaluate the overall performance of the system. To this aim, a continuous data flow must be sent to the PC that hosts the prototype via PCI bus. Even though the peak rate of this bus is 133 Mbytes/s, certain intervals exist where the rate is null. To avoid data loss is necessary to add a memory buffer at the prototype which absorbs these gaps. For this purpose, a Digital Signal Processor (DSP) has been chosen to work in tandem with a FPGA for the implementation of the receiver side. The FPGA module provides a great capacity of computation, while the DSP module is more flexible to be programmed and has a better memory management, with large external memories and direct memory access (DMA) devices. In addition, DSP will serve as interface to load the initial configuration of the system. # III-A. ADC and DAC AD9767 from Analog Devices is a high speed Digital to Analog Converter (DAC), which includes two high quality 14-bits cores and supports sampling rates up to 125 Msamples/s, while AD9432, also from Analog Devices, is a 12-bit Analog to Digital Converter (ADC) operating at the sampling clock of 105 MHz. These devices are designed to achieve a flat dynamic range and offer a spurious-free dynamic range (SFDR) of about 80 dB, where SFDR is defined as the ratio between the root mean square (RMS) amplitudes of the signal and the peak of the spurious spectral components, including harmonics. Both converters run at a sampling frequency $(f_s)$ of 50 MHz, from a crystal oscillator, which is used to ensure the synchronous operation of transmitter and receiver sides. #### III-B. FPGA modules The most consuming algorithms of the prototype front-end, modulation and demodulation, are carried out by two Xilinx Virtex II FPGAs, with 1 Mgates each one. The resources of this kind of device, placed in a regular array, can be arranged in the following groups. - Configurable Logic Blocks (CLB) provide functional elements for combinatorial and synchronous logic, including basic storage elements. - RAM blocks consisting of 18 Kbit storage elements of configurable Dual-Port RAM. - 18-bit x 18-bit dedicated multipliers. - Digital Clock Manager (DCM) blocks for clock distribution delay compensation. Table II shows the amounts of this blocks in Virtex II: TABLE II RESOURCES OF VIRTEX II. #### III-C. DSP Module TMS320C6201 is a fixed point processor that transfers to the system a great degree of flexibility to be reprogrammed. Besides, this device is provided with 16 Mbytes of external Synchronous Dynamic RAM (SDRAM) for buffering data to the PC. Four DMA devices are provided to work in parallel witch CPU and to make memory operations without spending process cycles of CPU. ## III-D. Data Connections The receiver side is composed by a FPGA and a DSP modules. While the FPGA module support the heaviest computation tasks of the receiver, the DSP finishes the processing (equalization and decision) and sends the results to the PC. As shown in Fig. 2, a 32-bit bus interconnects both modules. The working frequency of this bus is 100 MHz and its operation mode is synchronous, and so a sustained transfer rate of $400 \cdot 10^6$ Mbytes/s is achieved. A FIFO of 512 positions is inserted at each extreme to give certain flexibility to the DSP interrupt service system. On the contrary, the PCI bus presents an asynchronous mode, i.e. data transmissions are allowed only during some intervals. To solve this issue, along with a similar FIFO as employed in the previous case, other larger buffer (a circular buffer of 16 Mbytes) is allocated on the DSP external memory. #### IV. TRANSMITTER SIDE In Fig. 3, a diagram block of the receiver side, implemented on FPGA, is shown. The design is governed by the DAC clock, so all system is run sample by sample at 50 MHz, obtaining an output rate of 50 Msamples/s. Fig. 3. Transmitter block diagram on FPGA. Initially, the DSP module configures some control signals (signals are marked on the diagram with *italics*), which determines the performance of the system, via a serial bus. More specifically, *Scale\_Factor*, needed to compute the Inverse Fast Fourier Transform (IFFT) algorithm and *CP\_length*, which will be explained below. An important feature of DMT prototype is the bit-loading function, i.e. the constellation size of each carrier is set up by the bit loader module. This module consists of a memory of $N_{DFT}/2$ positions, which is read incrementally each cycle, returning the size of the constellation relative to that carrier. The range of the constellations (l) is from 0 (unused carrier) to 10 bits (constellation of $2^{10}$ points). Once the bit load is determined, a randomizer bank generates a pseudo-random word which is mapped in the respective constellation. The bank randomizer is implemented by 10 elemental linear feedback shift registers (LFSR), which are enabled according to the bit load. The constellations have been defined so that they present the same mean energy. Since DMT is a baseband modulation, hermitic generator block calculates the hermitic spectrum of each subcarrier to get a real signal at the output of the IFFT block. The IFFT block uses one radix-4 butterfly processing engine returning one output sample per cycle. This component is based on fixed-point arithmetic of 16 bits and scales the intermediate results according to the *Scale\_Factor* signal, in order to accommodate the dynamic range. The final modulated signal is obtained when the CP, whose length is determined by the *CP\_Length* signal, is inserted at the beginning of the symbol. The maximum frequency supported by the design is 113.5 MHz. The utilization of Virtex II resources is shown in table III. | CLBs | RAM blocks | 18x18 Multipliers | DCMs | |------|------------|-------------------|------| | 79 % | 75 % | 45 % | 12 % | TABLE III UTILIZATION OF VIRTEX II RESOURCES IN THE TRANSMITTER SIDE. #### V. RECEIVER SIDE A block diagram of the FPGA design of the receiver side is shown in Fig. 4. The core of this part is the Fast Fourier Transform (FFT) block, which, as the IFFT block of the transmitter, is based on one radix-4 butterfly processing engine with fixed-point arithmetic of 16 bits. Fig. 4. Receiver block diagram on FPGA. The real and imaginary outputs of IFFT block are packed into an only word of 32 bits to use efficiently the wide of the data bus, and, subsequently, to be introduced in the FIFO subsystem. This subsystem serves as a buffer to provide data to the bus that communicates with the DSP module. However, the excessive computational charge of the algorithms in the DSP module makes necessary to use a number of carriers less than N. To solve this issue, the DSP module sets up initially the $Top\_carrier$ and $Down\_carrier$ signals, which fix the range of carriers to be processed. The DSP program is based on a multi-task framework, as shown in Fig. 5. The highest priority task is the reading operation from the FIFO system, in order to avoid data loss. Secondly, data must be processed (in this task are placed the corresponding algorithms for fast adaptive equalization and decision) and, finally, the results are sent to the PC. Fig. 5. Flow diagram of the DSP framework. In this case, the utilization of Virtex II resources is shown in table IV, and the maximum frequency of the design is also 113.5 MHz. | CLBs | RAM blocks | 18x18 Multipliers | DCMs | |------|------------|-------------------|------| | 82 % | 65 % | 45 % | 12 % | TABLE IV UTILIZATION OF VIRTEX II RESOURCES IN THE RECEIVER SIDE. #### VI. EXPERIMENTAL RESULTS Several tests have been performed to transmit information over some actual indoor PLC channels using the modem prototype. As mentioned, synchronization has been guaranteed by using a common clock signal between transmitter and receiver. The number of carriers managed by the modem has been reduced to N=200, and hence the used bandwidth is about 4.8 MHz. The receiver structure includes (running on the DSP) an adaptive equalizer in the frequency domain, which tries to compensate for the channel cyclic variation. As it is well-known, one of the advantages of DMT modulation over frequency-selective channels is that the resulting sub-channels exhibit a different but flat attenuation [7], which can be removed by means of a scalar Frequency Equalizer (FEQ). In our proposal, as depicted in Fig. 6, an adaptive algorithm tries to compensate for the complex response of each subchannel [8] and, if the step-size is appropriate, also to follow its short-term variations. Fig. 6. Diagram of equalizer and detector. To compute the FEQ coefficients, an implementation of the Least-Mean-Square (LMS) algorithm [9] has been chosen. It is described by (1). $$FEQ_{n+1}^{k} = FEQ_{n}^{k} + \frac{\mu \cdot e_{n}^{k} \cdot \widetilde{S_{n}^{k}}^{*}}{\left|Y_{n}^{k}\right|^{2}} \tag{1}$$ where the innovation value of FEQ for the sub-carrier k in the DMT symbol n+1 is calculated based on its previous value and on the decision error in DMT symbol n, which is defined by (2). $$e_n^k = \widetilde{S_n^k} - \widehat{S_n^k} \tag{2}$$ The $\mu$ coefficient controls the equalizer rate of adaptation and, consequently, its ability of tracking the channel variations versus the speed of convergence. This algorithm uses the instantaneous power of demodulated signal $Y_n^k$ as an estimate of its mean value, which is an acceptable simplification unless Signal to Noise Ratio (SNR) is under 10dB [10]. Transmission starts with an initial phase in which a preamble of QPSK symbols is modulating all the N subcarriers. During it (for several mains cycles of 20ms in Europe-), the FEQ is trained and the Signal to Noise and Distortion Ratio (SNDR) of each sub-carrier (which includes the distortion that appears when channel variation can not be compensated by the FEQ) is estimated. For the test, a CP of 300 samples was chosen, and hence DMT-symbol duration was 2348 samples and there are 425 symbols in a mains cycle. Based on the estimated SNDR, and taking into account a guard margin of 3dB, the bit load of each sub-carrier is calculated for a Bit Error Rate (BER) target of $10^{-5}$ . After this phase, normal transmission mode begins, bearing random information, and the decision error is obtained directly from the detector output (see Fig.6). For the chosen link, a measurement of the LPTV channel response was carried out just before the modem transmission started, using the procedure described in [2]. The purpose was to have an idea of the magnitude of time-variations that modem has to faced with. The variations along the mains cycle of the amplitude frequency response was around 5% for the used sub-carriers. The variation observed at the first sub-carrier is shown in Fig.7, whose shape is representative of the whole band (centred on 16.5MHz). Such 5% would not be a serious problem if small constellations were used, i.e. BPSK or 4-QAM, but constitutes a strong degradation for larger bit loads. To reinforce this idea, the bit load arranged for the different sub-carriers is presented in Fig. 8, for two levels of transmitted Power Spectral Density (PSD): -20dBm/kHz and -30dBm/kHz (flat in the whole band). [Only constellations of an even number of bits per carrier were used.] Fig. 7. Channel attenuation excursion along a mains cycle for the first sub-carrier. Fig. 8. Bit load for the different sub-carriers. Several transmissions were carried out on the same channel to study the system performance. Different values for the $\mu$ coefficient were selected to analyze the receiver ability to compensate for the short-term channel variations. The results of these tests are shown in Fig. 9, in terms of BER measured and bit-rate achieved for the two values of PSD and $\mu = [0.05\ 0.1\ 0.15]$ . Fig. 9. BER (blue solid line) and bit-rate (red dashed line) measured for different configurations of transmission parameters. It is observed that, for larger values of $\mu$ (fastest adaptation), higher bit-rates are achieved, because the channel is better compensated, and so, the SNDR and the bit-load increases. The latter causes that BER also increases due to the denser constellation used, but it is always below the target of $10^{-5}$ except for $\mu=0.15$ , what makes $\mu=0.1$ the best choice. The bit-rate dependency on the $\mu$ value is less when PSD is -30dBm/kHz, because channel noise dominates in SNDR over the distortion due to slow tracking of the channel variation. # VII. CONCLUSSIONS In this paper, a DMT modem prototype for broadband PLC has been presented. It has been implemented using a flexible hardware platform with FPGAs and DSPs. Variable bit loading of sub-carriers is performed to match PLC channels SNR frequency selectivity. Moreover, adaptation techniques to compensate for short-term cyclic channel variation have been designed and their performance has been analyzed. The achieved bit-rate of this system is about 40Mb/s for the limited bandwidth used of 4.8MHz, what leads to an efficiency of 8b/s/Hz. The bandwidth could be extended in the future by using a more powerful computing hardware, for instance, making all the data processing algorithms on FPGAs. #### REFERENCES - F.J. Cañete, J.A. Cortés, L. Díez, J.T. Entrambasaguas and J.L. Carmona, "Fundamental of the cyclic short-time variation of indoor power-line channels response", *International Symposium on Power-Line* Communications and its Applications, ISPLC 2005. - [2] J.A. Cortés, F.J. Cañete, L. Díez and J.T. Entrambasaguas, "Characterization of the cyclic short-time variation of indoor power-line channels response", *International Symposium on Power-Line Communications and its Applications*, ISPLC 2005. - [3] H. Philipps, "Modelling of power line communications channels", International Symposium on Power-Line Communications and its Applications, ISPLC 1999. - [4] M. Zimmermann and K. Dostert, "A multipath model for the powerline channel", *IEEE Trans. on Communications*, pp. 553-559, Feb 2002. - [5] S. Gault, P. Ciblat and W. Hachem, "An OFDMA Based Modem for PowerLine Communications over the Low Voltage Distribution Network", *International Symposium on Power-Line Communications* and its Applications, ISPLC 2005. - [6] F.J. Cañete, J.A. Cortés, L. Díez and J.T. Entrambasaguas, "Modeling and Evaluation of the Indoor Power Line Transmission Medium", *IEEE Communications Magazine*, vol. 41, pp. 41-47, April 2003. - [7] J.M. Cioffi, A Multicarrier Primer, Technical Report, Stanford University, http://www-isl.stanford.edu/people/cioffi/pdf/multicarrier.pdf. - [8] J.S. Chow; J.C. Tu; J.M. Cioffi; "A discrete multitone transceiver system for HDSL applications", *IEEE Journal on Selected Areas in Communications*, vol. 9, Issue 6, pp. 895-908, Aug. 1991. - [9] S. Haykin, Adaptive Filter Theory, Prentice Hall, 1991. - [10] Garcia Garaluz, R. A. "Estudio de algoritmos de procesado de señal aplicados a la tranmisión sobre red eléctrica", Master Thesis, University of Malaga, Feb 2004.