15 research outputs found
Behind the Last Line of Defense -- Surviving SoC Faults and Intrusions
Today, leveraging the enormous modular power, diversity and flexibility of manycore systems-on-a-chip (SoCs) requires careful orchestration of complex resources, a task left to low-level software, e.g. hypervisors. In current architectures, this software forms a single point of failure and worthwhile target for attacks: once compromised, adversaries gain access to all information and full control over the platform and the environment it controls. This paper proposes Midir, an enhanced manycore architecture, effecting a paradigm shift from SoCs to distributed SoCs. Midir changes the way platform resources are controlled, by retrofitting tile-based fault containment through well known mechanisms, while securing low-overhead quorum-based consensus on all critical operations, in particular privilege management and, thus, management of containment domains. Allowing versatile redundancy management, Midir promotes resilience for all software levels, including at low level. We explain this architecture, its associated algorithms and hardware mechanisms and show, for the example of a Byzantine fault tolerant microhypervisor, that it outperforms the highly efficient MinBFT by one order of magnitude
Behind the Last Line of Defense -- Surviving SoC Faults and Intrusions
Today, leveraging the enormous modular power, diversity and flexibility of
manycore systems-on-a-chip (SoCs) requires careful orchestration of complex
resources, a task left to low-level software, e.g. hypervisors. In current
architectures, this software forms a single point of failure and worthwhile
target for attacks: once compromised, adversaries gain access to all
information and full control over the platform and the environment it controls.
This paper proposes Midir, an enhanced manycore architecture, effecting a
paradigm shift from SoCs to distributed SoCs. Midir changes the way platform
resources are controlled, by retrofitting tile-based fault containment through
well known mechanisms, while securing low-overhead quorum-based consensus on
all critical operations, in particular privilege management and, thus,
management of containment domains. Allowing versatile redundancy management,
Midir promotes resilience for all software levels, including at low level. We
explain this architecture, its associated algorithms and hardware mechanisms
and show, for the example of a Byzantine fault tolerant microhypervisor, that
it outperforms the highly efficient MinBFT by one order of magnitude
Aging detection capability for switch-mode power converters
The detection of degradations and resulting failures in electronic components/systems is of paramount importance for complex industrial applications including nuclear power reactors, aerospace, automotive, and space applications. There is an increasing acceptance of the importance of detection of failures and degradations in electronic components and of the prospect of system-level health monitoring to make a key contribution to detecting and predicting any impending failures. This paper describes a parametric system identification-based health-monitoring method for detecting aging degradations of passive components in switch-mode power converters (SMPCs). A nonparametric system response is identified by perturbing the system with an optimized multitone sinusoidal signal of the order of mVs. The parametric system model is estimated from nonparametric system response using recursive weighted least-square (WLS) algorithm. Finally, the power-stage component values, including their parasitics, are extracted from numerator and denominator coefficients based on the assumed Laplace system model. These extracted component values provide direct diagnostic information of any degradation or anomalies in the components and the system. A proof of concept is initially verified on a simple point-of-load (POL) converter but the same methodology can be applied to other topologies of SMPC
Root Cause Analysis Frameworks for Information Systems
Telecommunications systems have evolved to include an ever-growing number of interdependent hardware and software components with complex interactions. This exponential increase in complexity affects the reliability and stability of network systems. This thesis provides two systematic approaches to improve the speed and quality of the Root Cause Analysis task in telecommunications systems.
The first approach introduces a new fault analysis framework based on association rule mining and evaluates it for telecommunication systems. The approach describes a strategy using association rules to specifically target faults while improving runtime performance relative to the standard Apache Spark implementation. It also introduces a novel filtering strategy called Cover Set filtering that prunes and merges rule sets to produce high-quality, concise and interpretable results. The proposed framework is evaluated with real-world telecommunication datasets. Compared with other strategies, we demonstrate a better rule diversity in general and a sufficiently compact fault analysis.
The second approach tackles Root Cause Analysis from the causal perspective. It is based on Counterfactuals and Nearest Neighbour Matching concepts to identify fault types and highlight the most fault contributing variables. The proposed framework is a proof of concept for finding the root cause of problems based on the causal learning technique. It is demonstrated to be highly compatible with numerical data and highly robust with noisy data.
In conclusion, the proposed frameworks improve the quality and performance of fault troubleshooting tasks in telecommunication systems. Last but not least, the proposed frameworks can be adapted to other information systems with minor modifications
Behind the last line of defense: Surviving SoC faults and intrusions
Today, leveraging the enormous modular power, diversity and flexibility of manycore systems-on-a-chip (SoCs) requires careful orchestration of complex and heterogeneous resources, a task left to low-level software, e.g., hypervisors. In current architectures, this software forms a single point of failure and worthwhile target for attacks: once compromised, adversaries can gain access to all information and full control over the platform and the environment it controls. This article proposes Midir, an enhanced manycore architecture, effecting a paradigm shift from SoCs to distributed SoCs. Midir changes the way platform resources are controlled, by retrofitting tile-based fault containment through well known mechanisms, while securing low-overhead quorum-based consensus on all critical operations, in particular privilege management and, thus, management of containment domains. Allowing versatile redundancy management, Midir promotes resilience for all software levels, including at low level. We explain this architecture, its associated algorithms and hardware mechanisms and show, for the example of a Byzantine fault tolerant microhypervisor, that it outperforms the highly efficient MinBFT by one order of magnitude
Recommended from our members
Deep learning driven data analytics for smart grids
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonAs advanced metering infrastructure (AMI) and wide area monitoring systems (WAMSs) are being deployed rapidly and widely, the conventional power grid is transitioning towards the smart grid at an increasing speed. A number of smart metering devices and real-time monitoring systems are capable to generate a huge volume of data on a daily basis. However, a variety of generated data can be made full use of to advance the development of the smart grid through big data analytics, especially, deep learning. Thus, the thesis is focused on data analysis for smart grids from three different aspects.
Firstly, a real-time data driven event detection method is presented, which is quite robust when dealing with corrupted and significantly noisy data of phase measurement units (PMUs). To be specific, the presented event detection method is based on a novel combination of random matrix theory (RMT) and Kalman filtering. Furthermore, a dynamic Kalman filtering technique is proposed through the adjustment of the measurement noise covariance matrix as the data conditioner of the presented method in order to condition PMU data. The experimental results show that the presented method is indeed quite robust in such practical situations that include significant levels of noisy or missing PMU data.
Secondly, a short-term residential load forecasting method is proposed on the basis of deep learning and k-means clustering, which is capable to extract similarity of residential load effectively and perform prediction accurately at the individual residential level. Specifically, it makes full use of k-means clustering to extract similarity among residential load and deep learning to extract complex patterns of residential load. In addition, in order to improve the forecasting accuracy, a comprehensive feature expression strategy is utilised to describe load characteristics of each time step in detail. The experimental results suggest that the proposed method can achieve a high forecasting accuracy in terms of both root mean square error (RMSE) and mean absolute error (MAE).
Thirdly, an online individual residential load forecasting method is developed based on a combination of deep learning and dynamic mirror descent (DMD), which is able to predict residential load in real time and adjust the prediction error over time in order to improve the prediction performance. More specifically, it firstly employs a long short term memory (LSTM) network to build a prediction model offline, and then applies it online with DMD correcting the prediction error. In order to increase the prediction accuracy, a comprehensive feature expression strategy is used to describe load characteristics at each time step in detail. The experimental results indicate that the developed method can obtain a high prediction accuracy in terms of both RMSE and MAE.
To sum up, the proposed real-time event detection method contributes to the monitoring and operation of smart grids, while the proposed residential load forecasting methods contribute to the demand side response in smart grids.TDX-ASSIS
30th International Conference on Condition Monitoring and Diagnostic Engineering Management (COMADEM 2017)
Proceedings of COMADEM 201
Architectural Support for Hypervisor-Level Intrusion Tolerance in MPSoCs
Increasingly, more aspects of our lives rely on the correctness and safety of computing systems, namely in the embedded and cyber-physical (CPS) domains, which directly affect the physical world. While systems have been pushed to their limits of functionality and efficiency, security threats and generic hardware quality have challenged their safety.
Leveraging the enormous modular power, diversity and flexibility of these systems, often deployed in multi-processor systems-on-chip (MPSoC), requires careful orchestration of complex and heterogeneous resources, a task left to low-level software, e.g., hypervisors. In current architectures, this software forms a single point of failure (SPoF) and a worthwhile target for attacks: once compromised, adversaries can gain access to all information and full control over the platform and the environment it controls, for instance by means of privilege escalation and resource allocation. Currently, solutions to protect low-level software often rely on a simpler, underlying trusted layer which is often a SPoF itself and/or exhibits downgraded performance.
Architectural hybridization allows for the introduction of trusted-trustworthy components, which combined with fault and intrusion tolerance (FIT) techniques leveraging replication, are capable of safely handling critical operations, thus eliminating SPoFs. Performing quorum-based consensus on all critical operations, in particular privilege management, ensures no compromised low-level software can single handedly manipulate privilege escalation or resource allocation to negatively affect other system resources by propagating faults or further extend an adversary’s control. However, the performance impact of traditional Byzantine fault tolerant state-machine replication (BFT-SMR) protocols is prohibitive in the context of MPSoCs due to the high costs of cryptographic operations and the quantity of messages exchanged. Furthermore, fault isolation, one of the key prerequisites in FIT, presents a complicated challenge to tackle, given the whole system resides within one chip in such platforms.
There is so far no solution completely and efficiently addressing the SPoF issue in critical low-level management software. It is our aim, then, to devise such a solution that, additionally, reaps benefit of the tight-coupled nature of such manycore systems. In this thesis we present two architectures, using trusted-trustworthy mechanisms and consensus protocols, capable of protecting all software layers, specifically at low level, by performing critical operations only when a majority of correct replicas agree to their execution: iBFT and Midir. Moreover, we discuss ways in which these can be used at application level on the example of replicated applications sharing critical data structures. It then becomes possible to confine software-level faults and some hardware faults to the individual tiles of an MPSoC, converting tiles into fault containment domains, thus, enabling fault isolation and, consequently, making way to high-performance FIT at the lowest level
Architectural Support for Hypervisor-Level Intrusion Tolerance in MPSoCs
Increasingly, more aspects of our lives rely on the correctness and safety of computing systems, namely in the embedded and cyber-physical (CPS) domains, which directly affect the physical world. While systems have been pushed to their limits of functionality and efficiency, security threats and generic hardware quality have challenged their safety.
Leveraging the enormous modular power, diversity and flexibility of these systems, often deployed in multi-processor systems-on-chip (MPSoC), requires careful orchestration of complex and heterogeneous resources, a task left to low-level software, e.g., hypervisors. In current architectures, this software forms a single point of failure (SPoF) and a worthwhile target for attacks: once compromised, adversaries can gain access to all information and full control over the platform and the environment it controls, for instance by means of privilege escalation and resource allocation. Currently, solutions to protect low-level software often rely on a simpler, underlying trusted layer which is often a SPoF itself and/or exhibits downgraded performance.
Architectural hybridization allows for the introduction of trusted-trustworthy components, which combined with fault and intrusion tolerance (FIT) techniques leveraging replication, are capable of safely handling critical operations, thus eliminating SPoFs. Performing quorum-based consensus on all critical operations, in particular privilege management, ensures no compromised low-level software can single handedly manipulate privilege escalation or resource allocation to negatively affect other system resources by propagating faults or further extend an adversary’s control. However, the performance impact of traditional Byzantine fault tolerant state-machine replication (BFT-SMR) protocols is prohibitive in the context of MPSoCs due to the high costs of cryptographic operations and the quantity of messages exchanged. Furthermore, fault isolation, one of the key prerequisites in FIT, presents a complicated challenge to tackle, given the whole system resides within one chip in such platforms.
There is so far no solution completely and efficiently addressing the SPoF issue in critical low-level management software. It is our aim, then, to devise such a solution that, additionally, reaps benefit of the tight-coupled nature of such manycore systems. In this thesis we present two architectures, using trusted-trustworthy mechanisms and consensus protocols, capable of protecting all software layers, specifically at low level, by performing critical operations only when a majority of correct replicas agree to their execution: iBFT and Midir. Moreover, we discuss ways in which these can be used at application level on the example of replicated applications sharing critical data structures. It then becomes possible to confine software-level faults and some hardware faults to the individual tiles of an MPSoC, converting tiles into fault containment domains, thus, enabling fault isolation and, consequently, making way to high-performance FIT at the lowest level