Multi-Processor System-on-Chip 2. Liliana Andrade

Multi-Processor System-on-Chip 2 - Liliana Andrade


Скачать книгу
to workload is key to determining whether or not the vDSP has enough available compute resources to execute the kernel. This section covers the GFDM algorithm and its processing graph extrapolated pseudo-code.

      GFDM processing in the literature is divided into three categories:

      1 1) frequency-domain processing (Michailow et al. 2014);

      2 2) time-domain processing (Farhang et al. 2016; Matthé et al. 2016);

      3 3) Radix-2 cascade processing (Nimr et al. 2018).

      We assume (2) time-domain processing for implementation on the vDSP, primarily due to its flexibility and low compute complexity per data point on small workloads.

      1.3.1. Equation

. The resulting GFDM signal x [n] for n = lK + i; l = 0, 1, …, M − 1; i = 0, 1, …, K − 1 is shown in [1.1]

      where l = 0, 1 ,…, M − 1; n = 0, 1,…,K − 1 and g [.]mod N is the impulse response of the K-times over-sampled pulse shaping filter with circular symmetry. Reordering the sums in equation [1.1] leads to a K-point IDFT{d [m]} of the mth data symbol vector d [m] = [d0 [m], d1 [m] ,…, dK−1 [m]]T and subsequent filtering of IDFT outputs D [m] = [D0 [m], D1 [m],…, DK–1 [m]]T. IDFT is a well-researched topic, hence the novelty of interest in our work is the filtering of D [m].

      1.3.2. Dataflow processing graph and matrix representation

      The associated dataflow processing graph is depicted in Figure 1.7, with the notation foo[bar] abbreviated as foobar.

      Each multiply-accumulate (MAC) in Figure 1.7 operates on input/output vector length of K elements that results in the total M 2 K scalar MAC operations, 2M 2 K loads and MK stores. This is the theoretical minimum number of operations. The MAC result is stored in an indexed accumulator register (ACC) and is used in the next MAC of the same index. For readers not familiar with dataflow diagrams, a matrix representation of GFDM filtering is shown in Figure 1.8, and the experience drawn from matrix multiplication implementations can also be used here.

      Figure 1.7. GFDM processing dataflow diagram

      1.3.3. Pseudo-code

      Observing only the equation, a naive programmer could see the modulo load operation on g [.] and get stuck or try to write % for modulo in their C code, hoping that the compiler could deal with it. Fortunately for us, we have drawn the processing graph and can infer the processing required and how to circumvent the modulo operation altogether.

      Figure 1.8. Visualization of time-domain GFDM filtering

      Figure 1.9. GFDM pseudo-code

      It is often taken for granted that we can take a state-of-the-art (SotA) processor “off-the-shelf” and program the kernel straight without thinking much about data degradation due to quantization noise. Yet even an error as small as a percentile off the true value, compounded over all the succeeding processing steps of the system, may result in faulty data for the user application. Therefore, the standardization body defines the error budget (3GPP 2018b, 2019a), which can be distributed among different processing steps.

      Figure 1.10. Precision test bed set up

      Of interest is determining the necessary number of bits to represent input and output data, as well as the necessary bit-length size of the processor accumulators where intermediate results are stored. Input and output share the same data type (bit-length) since that simplifies memory access, while, for accumulators, we assume a different data type. We denote the bit-lengths for these hypothetical data types as databits and ACCbits for input/output and accumulators, respectively. Assuming a Q0.n: normalized to 1 signed integer notation, where n is the number of fractional bits, we consider the following test bed setup, illustrated in Figure 1.10. A random bit stream is mapped to complex QAM symbols. Next, QAM symbols are mapped to dk in equation [1.1], a copy of dk is kept as reference. QAM symbols are GFDM-modulated using IDFT and filtered as per the algorithm shown in Figure 1.9 and afterwards demodulated. Demodulation is done by repeating the algorithm shown in

      Figure 1.9 with a new set of coefficients that invert the first filtering, followed by DFT. Finally, after successful demodulation, the demodulated stream is compared with the reference and the error vector magnitude is measured.

      Figure 1.11. Varied precision quantization of GFDM


Скачать книгу