International Graduate Research Symposium - IGRS’2022, İstanbul, Türkiye, 1 - 03 Haziran 2022, ss.358
With the latest advances in embedded systems, artificial intelligence applications on edge devices have
increased. In older systems, data was being collected from the edge devices and decision making was
applied on the servers. Therefore, low network speed or network problems were limiting the system
performance. However, smarter applications can be developed and run on modern embedded
systems. System-on-Chip (SoC) architectures which contain CPU and FPGA (Field Programmable Gate
Array) in a single chip together offer low power consumption while running Convolutional Neural
Networks (CNNs). In this paper, we modified, trained and deployed TinyYolov3 architecture for face
detection by using Brevitas and FINN framework on PYNQ-Z2 board which is a low-end cheap
development board having Xilinx Zynq 7020 SoC. With Brevitas which provides quantized version of
convolution, fully-connected and activation layers of a CNN in Pytorch, we created modified version of
TinyYolov3 with various integer bit precisions for weights (W) and activations (A) such as 2W4A, 3W5A,
4W2A, 4W4A, 6W4A and 8W3A. Then, we trained it in a quantized form with WiderFace dataset. To
reduce power consumption and get higher speed, we optimized the logical resource allocation and
used on-chip memory of the FPGA to store the weights and activations. Additionally, we changed the
last layer’s activation function Sigmoid to rescaled HardTanh. To run the trained backbone CNN on the
FPGA, we synthesized it with Vitis HLS and Vivado by using FINN-HLS library which contains layer
definitions of the created model in C++. Besides, we utilized CPU of the SoC for preprocessing,
postprocessing and TCP/IP streaming of the results in a multithreading approach to increase
throughput. As a result, with the 4W4A bit precision, we observed 18 Frames Per Second (FPS)
throughput, 2.4 Watt total power consumption on PYNQ-Z2, 70% utilization of the resources of the
FPGA and 3% Mean Average Precision (mAP) drop on the accuracy compared to nonquantized version
of the model.