Multi-Processor System-on-Chip 2. Liliana Andrade
in the GFDM kernel. The graph points out where the quantization effects act as a source of noise active upon the data in computation. The addition of quantization noise is a consequence of quantizing the “relative” data value16,17 to a finite number of n + 1 (n fraction, 1 sign) bits. The reference as well as other parts of the test bed are coded in double-complex precision (imag 64 − bit, real 64−bit); hence there is also type casting happening at kernel interfaces.
GFDM parameters are set to K = 512 and M = 7. These parameters can be extrapolated from section 1.2.2. K can be aligned with frequency (IDFT bin size for subcarriers) and M with time (symbols in a slot), for a representative 4G/5G compliant case. The error vector magnitude (EVM) budget is taken from standard specification (3GPP 2018b, 2019a), out of which 1% can be used for digital baseband (DBB) processing according to the engineering rule of thumb (Rohde & Schwarz et al. 2013). In these cases, the channel model is omitted. We measure the achieved EVM indirectly, by measuring signal and noise power and computing EVM, as shown in [1.2]
where Pnoise is the average error power of data carrying subcarriers and Psignal is the average power of data carrying subcarriers. The average power of data carrying subcarriers is invariant to QAM modulations; hence the EVM measured is the same for different QAM modulations, although the budget available is not the same. Measurement results are shown in Figure 1.12 in EV MdB notation with the budget constraints overlapped over the results.
First, based on a theoretical bound for uniform quantization (Proakis and Manolakis 1996), as expected, we observe a decrease of EVM by 6 dB per bit of data bit-length in the constant region where the accumulator bit-length is sufficiently high. Second, when the accumulator bit-length is insufficient, we observe a signal quality degradation reaching the tangent (see equation [1.3])
Third, the transition region is centered around the point ACCbits = databits. The transition region extends [log2M] (rounded towards infinity as in [3.14] = 4) bits away from the region center point. M corresponds to the number of MACs per accumulator of the algorithm shown in Figure 1.9, explaining the need for extra bits logarithmically proportional to the number of MACs. Hence, the following conclusions can be drawn:
– data type with a bit-length of 16 is sufficient for all standard required QAM modulations with a buffer budget ranging from 25.27 dB for QPSK to 8.37 dB for 1024-QAM. This leftover budget can be used in other DBB transmitter processing blocks;
– accumulator guard bits of at least [log2M] are needed to avoid signal degradation due to the EVM transition region.
Since many SotA general-purpose DSPs today have an accumulator complex bit-length of 40 and data word length of 16, we conclude that there is no degradation of signal quality if we implement GFDM on a SotA DSP core. Other data types from the blue triangle (Figure 1.12) may be used as well to achieve further optimization of area and power, for example, the ASIP technology, while still meeting requirements for numerical precision. For generality, in the next section, we will keep ACCbits, databits as parameters.
Figure 1.12. GFDM EVM for varied data and ACC complex bit-lengths compared to adjusted 3GPP EVM DBB requirements (3GPP 2018b, 2019a)
1.5. Implementation
Implementation of algorithms on wide vector processors, such as the vDSP that we used, introduces a series of considerations that need to be taken into account. Furthermore, the design implementation solution space increases even further by having algorithms with multiple loops. In the solution space, we need to make choices, for example, which loop should be vectorized or which loop order is notably impactful on the kernel’s overall performance and requirements. These considerations add yet another layer of complexity and deserve a chapter of their own. Here, we will separate most important elements in an abridged manner to reach the next set of HW requirements: how does the 6G candidate waveform kernel under corner workloads map onto SotA vDSPs? How much of the vDSP core cycle budget does it require?
Is it practical to run the kernel on the vDSP, provided that the current vDSP load is sufficiently low?
A simplified block diagram of the vDSP is illustrated in Figure 1.13 (Damjancevic et al. 2019). But before we dive deeper into the implementation, let us use Table 1.1 to identify for which set of GFDM kernel parameters we need to profile the vDSP. As mentioned in section 1.4, the subcarriers (RE/(Symbol · BW · layer · TTI)) use an IDFT bin that is slightly larger, i.e. rounded up to the nearest step of 2, which, in turn, sets the value for K, as in K = bin size. On the other hand, we choose the number of symbols per slot to be M, i.e. half of the TTI. Other values for M and K are possible as long as they overlap and fit in the TTI in both dimensions. An overview is presented in Table 1.2. Note that for the high-end use cases, the kernel needs to be executed several times to meet the required data rate. Furthermore, the deadline for all cases scales down by half, since M ≙ TTI/2 [OF DM Symbols].
Figure 1.13. vDSP simplified HW block diagram
Table 1.2. Kernel parameters for corner use cases
Use Case | Throughput | TTI | K | M | Kernels req. | Deadline |
|
[μs] | [#] | [#] | [#] | [μs] | |
low-end LTE legacy | 72 | 1000 | 128 | 7 | 1 | 500 |
high-end FR2 4 ×CA, µ = 3, 400MHz | 3,168 | 125 |