Multi-Processor System-on-Chip 1. Liliana Andrade

Multi-Processor System-on-Chip 1 - Liliana Andrade


Скачать книгу
the total number of cycles on a single RISC-V core is 1.5x8.2 = 12.3 Mcycles.

      Table 1.4. Performance data for the CIFAR-10 CNN graph

Layer type ARC EM9D [ Mcycles ] Processor A [ Mcycles ] Processor B (RISC-V ISA) [ Mcycles ]
0 Permute 0.01
1 Convolution 1.63 6.78
2 Max Pooling 0.14 0.34
3 Convolution 3.46 9.25
4 Avg Pooling 0.09 0.09
5 Convolution 1.76 4.88
6 Avg Pooling 0.07 0.04
7 Fully-connected 0.03 0.02
8 Fully-connected 0.001
Total 7.2 21.4 12.3

      From Table 1.4, we conclude that the ARC EM9D processor spends 3x fewer cycles than processor A and 1.7x fewer cycles than the RISC-V core (processor B) for the same machine learning inference task, without using any specific accelerators. Thanks to the good cycle efficiency, the ARC EM9D processor can be clocked at a low frequency, which helps to save power in a smart IoT edge device.

      Amodei, D., Ananthanarayanan, S., Anubhai, R., Bai, J., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Cheng, Q., Chen, G., Chen, J., Chen, J., Chen, Z., Chrzanowski, M., Coates, A., Diamos, G., Ding, K., Du, N., Elsen, E., Engel, J., Fang, W., Fan, L., Fougner, C., Gao, L., Gong, C., Hannun, A., Han, T., Johannes, L.V., Jiang, B., Ju, C., Jun, B., LeGresley, P., Lin, L., Liu, J., Liu, Y., Li, W., Li, X., Ma, D., Narang, S., Ng, A., Ozair, S., Peng, Y., Prenger, R., Qian, S., Quan, Z., Raiman, J., Rao, V., Satheesh, S., Seetapun, D., Sengupta, S., Srinet, K., Sriram, A., Tang, H., Tang, L., Wang, C., Wang, J., Wang, K., Wang, Y., Wang, Z., Wang, Z., Wu, S., Wei, L., Xiao, B., Xie, W., Xie, Y., Yogatama, D., Yuan, B., Zhan, J., Zhu, Z. (2016). Deep speech 2: End-to-end speech recognition in English and Mandarin. Proceedings of the 33rd International Conference on Machine Learning – Volume 48, ICML-16, 173–182.

      Croome, M. (2018). Using RISC-V in high computing, ultra-low power, programmable circuits for inference on battery operated edge devices [Online]. Available at: https://content.riscv.org/wp-content/uploads/2018/07/Shanghai-1325_GreenWaves_Shanghai-2018-MC-V2.pdf.

      Dutt, N. and Choi, K. (2003). Configurable processors for embedded computing. IEEE Computer, 36(1), 120–123.

      embARC Open Software Platform (2019). Available at: https://embarc.org/.

      Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A.G., Adam, H., and Kalenichenko, D. (2017). Quantization and training of neural networks for efficient integer-arithmetic-only inference. Computing Research Repository. Available at: http://arxiv.org/abs/1712.05877.

      Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., and Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. Computing Research Repository. Available at: https://arxiv.org/abs/1408.5093.

      Krizhevsky, A. (2009). Learning multiple layers of features from tiny images. Technical Report, University of Toronto, 2009.

      Lai, L., Suda, N., and Chandra, V. (2018). CMSIS-NN: Efficient neural network kernels for arm cortex-M CPUs. Computing Research Repository. Available at: http://arxiv.org/abs/1801.06601.


Скачать книгу