20 research outputs found
Modelling and Analysis of FPGA-based MPSoC System with Multiple DNN Accelerators
Deep Neural Networks (DNNs) have been widely applied in many fields for decades, and a standard method for deploying them on embedded systems involves using accelerators. However, due to the resource constraints of embedded systems, improving energy and computing efficiency becomes one of the research challenges in this domain. DNN model optimization and NAS (Neural Architecture Searching) are commonly used to strengthen the DNN model running efficiency on an embedded system. However, because the system’s runtime workloads are varied in practical situations, to further improve the computing efficiency of the system at runtime, real-time hardware and software design space exploration is required to ensure the system is running at the optimal time state at runtime. This paper presents a comprehensive modelling and analysis approach for the performance data (e.g., latency, energy consumption, accuracy, etc.) collected from an AMD-Xilinx heterogeneous MPSoC platform equipped with multiple DNN accelerators. The results demonstrate that the relationships between accuracy loss, hardware performance, and model size are significantly correlated. Furthermore, an appropriate hardware and software configuration could be obtained by giving constraints at runtime
Bayesian Optimization for Efficient Heterogeneous MPSoC based DNN Accelerator Runtime Tuning
With the explosive growth of Internet of Things (IoT) devices and applications, deploying Deep Neural Networks (DNNs) on resource-constrained embedded edge devices has become a popular research trend. Because such systems have limited resources, they need to rely on optimising resource utilisation to meet performance requirements. However, for scenarios where the DNN application and workloads are dynamically changing, the offline system optimisation technique cannot achieve optimal runtime performance in practical environments. Hence, in this PhD project, we propose a Bayesian Optimisation (BO)-based runtime tuning scheme for improving energy efficiency of heterogeneous MPSoC-based DNN accelerator in the context of DNN applications. By seeking suitable hardware configurations of the accelerator for dynamic DNN inference workloads ranging from 200 M to 600 M FLOPs (floating-point operations) at runtime, the recommended configuration can averagely save up to 15.33% energy consumption from a random configuration setting
Application Level Resource Scheduling for Deep Learning Acceleration on MPSoC
Deep Neutral Networks (DNNs) have been widely used in many applications, such as self-driving cars, natural language processing (NLP), image classification, visual object recognition, and so on. Field-programmable gate array (FPGA) based Multiprocessor System on a Chip (MPSoC) is recently considered one of the popular choices for deploying DNN models. However, the limited resource capacity of MPSoC imposes a challenge for such practical implementation. Recent studies revealed the trade-off between the “resources consumed" vs. the “performance achieved". Taking a cue from these findings, we address the problem of efficient implementation of deep learning into the resource-constrained MPSoC in this paper, where each deep learning network is run with different service levels based on resource usage (where a higher service level implies higher performance with increased resource consumption). To this end, we propose a heuristic-based strategy, Application Wise Level Selector (AWLS), for selecting service levels to maximize the overall performance subject to a given resource bound. AWLS can achieve higher performance within a constrained resource budget under various simulation scenarios. Further, we verify the proposed strategy using an AMD-Xilinx Zynq UltraScale+ XCZU9EG SoC. Using a framework designed to deploy multi-DNN on multi-DPUs (Deep Learning Units), it is proved that an optimal solution is achieved from the algorithm, which obtains the highest performance (Frames Per Second) using the same resource budget
Association between fat and fat-free body mass indices on shock attenuation during running.
High amplitudes of shock during running have been thought to be associated with an increased injury risk. This study aimed to quantify the association between dual-energy X-ray absorptiometry (DEXA) quantified body composition, and shock attenuation across the time and frequency domains. Twenty-four active adults participated. A DEXA scan was performed to quantify the fat and fat-free mass of the whole-body, trunk, dominant leg, and viscera. Linear accelerations at the tibia, pelvis, and head were collected whilst participants ran on a treadmill at a fixed dimensionless speed 1.00 Fr. Shock attenuation indices in the time- and frequency-domain (lower frequencies: 3-8 Hz; higher frequencies: 9-20 Hz) were calculated. Pearson correlation analysis was performed for all combinations of DEXA and attenuation indices. Regularised regression was performed to predict shock attenuation indices using DEXA variables. A greater power attenuation between the head and pelvis within the higher frequency range was associated with a greater trunk fat-free mass (r = 0.411, p = 0.046), leg fat-free mass (r = 0.524, p = 0.009), and whole-body fat-free mass (r = 0.480, p = 0.018). For power attenuation of the high-frequency component between the pelvis and head, the strongest predictor was visceral fat mass (β = 48.79). Passive and active tissues could represent important anatomical factors aiding in shock attenuation during running. Depending on the type and location of these masses, an increase in mass may benefit injury risk reduction. Also, our findings could implicate the injury risk potential during weight loss programs
NIRVANA: Non-Invasive Real-Time VulnerAbility ANAlysis for RISC-V Processor
Embedded systems are increasingly susceptible to attack from malicious software, posing a significant threat to critical infrastructure and data. Various shreds of evidence reveal the unknown nature of attacks. In this manuscript, we propose a novel abnormal behaviour monitoring and detection system by designing a self-supervised hardware-based Self-Organizing Map (SOM) algorithm which continuously monitors the execution status of an embedded program and the behaviour of the entire platform as a whole. Our design boasts low resource utilization, high speed, effectiveness, and broad compatibility, making it suitable for real-time detection of malicious behaviour in resource-constrained embedded systems. Experimental trials were conducted on the Piccolo RISC-V processor being prototyped on an FPGA, which achieved an impressive 96% accuracy in detecting malicious programs, at the cost of a marginal 10% increase in resource consumption in comparison to its vanilla counterpart
Monocular 3D Human Pose Markerless Systems for Gait Assessment
Gait analysis plays an important role in the fields of healthcare and sports sciences. Conventional gait analysis relies on costly equipment such as optical motion capture cameras and wearable sensors, some of which require trained assessors for data collection and processing. With the recent developments in computer vision and deep neural networks, using monocular RGB cameras for 3D human pose estimation has shown tremendous promise as a cost-effective and efficient solution for clinical gait analysis. In this paper, a markerless human pose technique is developed using motion captured by a consumer monocular camera (800 × 600 pixels and 30 FPS) for clinical gait analysis. The experimental results have shown that the proposed post-processing algorithm significantly improved the original human pose detection model (BlazePose)’s prediction performance compared to the gold-standard gait signals by 10.7% using the MoVi dataset. In addition, the predicted T2 score has an excellent correlation with ground truth (r = 0.99 and y = 0.94x + 0.01 regression line), which supports that our approach can be a potential alternative to the conventional marker-based solution to assist the clinical gait assessment
SIRT1 mediated gastric cancer progression under glucose deprivation through the FoxO1-Rab7-autophagy axis
PurposeSilent mating type information regulator 2 homolog 1 (SIRT1) and autophagy have a two-way action (promoting cell death or survival) on the progression and treatment of gastric cancer (GC) under different conditions or environments. This study aimed to investigate the effects and underlying mechanism of SIRT1 on autophagy and the malignant biological behavior of GC cells under conditions of glucose deprivation (GD).Materials and methodsHuman immortalized gastric mucosal cell GES-1 and GC cell lines SGC-7901, BGC-823, MKN-45 and MKN-28 were utilized. A sugar-free or low-sugar (glucose concentration, 2.5 mmol/L) DMEM medium was used to simulate GD. Additionally, CCK8, colony formation, scratches, transwell, siRNA interference, mRFP-GFP-LC3 adenovirus infection, flow cytometry and western blot assays were performed to investigate the role of SIRT1 in autophagy and malignant biological behaviors (proliferation, migration, invasion, apoptosis and cell cycle) of GC under GD and the underlying mechanism.ResultsSGC-7901 cells had the longest tolerance time to GD culture conditions, which had the highest expression of SIRT1 protein and the level of basal autophagy. With the extension of GD time, the autophagy activity in SGC-7901 cells also increased. Under GD conditions, we found a close relationship between SIRT1, FoxO1 and Rab7 in SGC-7901 cells. SIRT1 regulated the activity of FoxO1 and upregulated the expression of Rab7 through deacetylation, which ultimately affected autophagy in GC cells. In addition, changing the expression of FoxO1 provided feedback on the expression of SIRT1 in the cell. Reducing SIRT1, FoxO1 or Rab7 expression significantly inhibited the autophagy levels of GC cells under GD conditions, decreased the tolerance of GC cells to GD, enhanced the inhibition of GD in GC cell proliferation, migration and invasion and increased apoptosis induced by GD.ConclusionThe SIRT1-FoxO1-Rab7 pathway is crucial for the autophagy and malignant biological behaviors of GC cells under GD conditions, which could be a new target for the treatment of GC
Distributed Joint Source-Channel Coding in Wireless Sensor Networks
Considering the fact that sensors are energy-limited and the wireless channel conditions in wireless sensor networks, there is an urgent need for a low-complexity coding method with high compression ratio and noise-resisted features. This paper reviews the progress made in distributed joint source-channel coding which can address this issue. The main existing deployments, from the theory to practice, of distributed joint source-channel coding over the independent channels, the multiple access channels and the broadcast channels are introduced, respectively. To this end, we also present a practical scheme for compressing multiple correlated sources over the independent channels. The simulation results demonstrate the desired efficiency
Expanding Window Compressed Sensing for Non-Uniform Compressible Signals
Many practical compressible signals like image signals or the networked data in wireless sensor networks have non-uniform support distribution in their sparse representation domain. Utilizing this prior information, a novel compressed sensing (CS) scheme with unequal protection capability is proposed in this paper by introducing a windowing strategy called expanding window compressed sensing (EW-CS). According to the importance of different parts of the signal, the signal is divided into several nested subsets, i.e., the expanding windows. Each window generates its own measurements using a random sensing matrix. The more significant elements are contained by more windows, so they are captured by more measurements. This design makes the EW-CS scheme have more convenient implementation and better overall recovery quality for non-uniform compressible signals than ordinary CS schemes. These advantages are theoretically analyzed and experimentally confirmed. Moreover, the EW-CS scheme is applied to the compressed acquisition of image signals and networked data where it also has superior performance than ordinary CS and the existing unequal protection CS schemes
Expanding Window Compressed Sensing for Non-Uniform Compressible Signals
Many practical compressible signals like image signals or the networked data in wireless sensor networks have non-uniform support distribution in their sparse representation domain. Utilizing this prior information, a novel compressed sensing (CS) scheme with unequal protection capability is proposed in this paper by introducing a windowing strategy called expanding window compressed sensing (EW-CS). According to the importance of different parts of the signal, the signal is divided into several nested subsets, i.e., the expanding windows. Each window generates its own measurements using a random sensing matrix. The more significant elements are contained by more windows, so they are captured by more measurements. This design makes the EW-CS scheme have more convenient implementation and better overall recovery quality for non-uniform compressible signals than ordinary CS schemes. These advantages are theoretically analyzed and experimentally confirmed. Moreover, the EW-CS scheme is applied to the compressed acquisition of image signals and networked data where it also has superior performance than ordinary CS and the existing unequal protection CS schemes