Multi-Processor System-on-Chip 1. Liliana Andrade
2.3. The MPPA3 many-core processor
2.3.1. Global architecture
The MPPA3 processor architecture (Figure 2.7) applies the defining principles of many-core architectures: processing elements (SCs on a GPGPU) are regrouped with a multi-banked local memory and a slice of the memory hierarchy into compute units (SMs on a GPGPU), which share a global interconnect and access to external memory. The distinguishing features of the MPPA many-core architecture compared to the GPGPU architecture are the integration of fully software-programmable cores for the processing elements, and the provision of an RDMA engine in each compute unit.
The structuring of the MPPA3 architecture into a collection of compute units, each comparable to an embedded multi-core processor, is the main feature that enables the consolidation of application partitions operating at different levels of functional safety and cyber-security, on a single processor. This feature requires provision of global interconnects with support for partition isolation. From experience with previous MPPA processors, it became apparent that chip global interconnects implemented as “network-on-chip” (NoC) may be specialized for two different purposes: generalization of busses and integration of macro-networks (Table 2.2).
Figure 2.7. Overview of the MPPA3 processor
Table 2.2. Types of network-on-chip interconnects
Generalized busses | Integrated macro-network |
Connectionless | Connection-oriented |
Address-based transactions | Stream-based transactions |
Flit-level flow control | [End-to-end flow control] |
Implicit packet routing | Explicit packet routing |
Inside coherent address space | Across address spaces (RDMA) |
Coherency protocol messages | Message multicasting |
Reliable communication | [Packet loss or reordering] |
QoS by priority and aging | QoS by traffic shaping |
Coordination with the DDR controller | Termination of macro-networks |
Accordingly, the MPPA3 processor is fitted with two global interconnects, respectively identified as “RDMA NoC” and “AXI Fabric” (Figure 2.8). The RDMA NoC is a wormhole switching network-on-chip, designed to terminate two 100 Gbps Ethernet controllers, and to carry the remote DMA operations found in supercomputer interconnects or communication libraries such as SHMEM (Hascoët et al. 2017). The AXI Fabric is a crossbar of busses with round-robin arbiters, which connects the compute clusters, the external DDR memory controllers, the PCIe controllers and other I/O controllers. The main I/O interfaces of the MPPA3 processor are a PCI Express subsystem with 16 Gen1/Gen2/Gen3/Gen4 lanes for a peak throughput of 32 GB/s full-duplex, and an Ethernet subsystem composed of two controllers of four lanes each, for a total peak throughput of 200 Gbps full-duplex. Other high-speed I/O are supported by four CAN 2.0A/2.0B/FD controllers, and by two USB 2.0 OTG ULPI controllers.
Figure 2.8. Global interconnects of the MPPA3 processor
Based on this global architecture, the consolidation of functions operating at different levels of functional safety and cyber-security is supported by two mechanisms:
– Memory protection units (MPUs) are provided on the AXI Fabric targets to filter transactions based on their machine addresses. Similarly, selected NoC router links can be disabled. This second mechanism has its parameters set at boot time, and then cannot be overridden without resetting the processor. Its purpose is to partition the processor and its peripherals into physically isolated domains, as in the unmanned aerial vehicle applications discussed in section 2.2.
– Cores and other bus initiators have their address translated from virtual to machine addresses by memory management units (MMUs). These MMUs actually implement a double translation: from virtual to physical, as directed by the operating system or the execution environment; from physical to machine, under the control of a partition monitor operating at the hypervisor privilege level. This first mechanism supports the requirements of isolating safety-critical application partitions in multi-core processors (CAST 2016).
2.3.2. Compute cluster
The compute unit of the MPPA3 processor, called the compute cluster, is structured around a local interconnect (Figure 2.9); it comprises a secure zone and a non-secure zone. The secure zone contains a security and safety management core (RM), a 256 KB dedicated memory bank and a cryptographic accelerator. The RM core of each compute cluster is also connected to