Search CORE

526 research outputs found

Overhead Management Strategies for Internet of Things Devices

Author: Kamaraj Kavin
Publication venue: Scholar Commons
Publication date: 06/06/2019
Field of study

Overhead (time and energy) management is paramount for IoT edge devices considering their typically resource-constrained nature. In this thesis we present two contributions for lowering resource consumption of IoT devices. The first contribution is minimizing the overhead of the Transport Layer Security (TLS) authentication protocol in the context of IoT networks by selecting a lightweight cipher suite configuration. TLS is the de facto authentication protocol for secure communication in Internet of Things (IoT) applications. However, the processing and energy demands of this protocol are the two essential parameters that must be taken into account with respect to the resource-constraint nature of IoT devices. For the first contribution, we study these parameters using a testbed in which an IoT board (Cypress CYW43907) communicates with a server over an 802.11 wireless link. Although TLS supports a wide-array of cipher suites, in this paper we focus on DHE RSA, ECDHE RSA, and ECDHE ECDSA, which are among the most popular ciphers used due to their robustness. Our studies show that ciphers using Elliptic Curve Diffie Hellman (ECDHE) key exchange are considerably more efficient than ciphers using Diffie Hellman (DHE). Furthermore, ECDSA signature verification consumes more time and energy than RSA signature verification for ECDHE key exchange. This study helps IoT designers choose an appropriate TLS cipher suite based on application demands, computational capabilities, and energy resources available. The second contribution of this thesis is deploying supervised machine learning anomaly detection algorithms on an IoT edge device to reduce data transmission overhead and cloud storage requirements. With continuous monitoring and sensing, millions of Internet of Things sensors all over the world generate tremendous amounts of data every minute. As a result, recent studies start to raise the question as whether to send all the sensing data directly to the cloud (i.e., direct transmission), or to preprocess such data at the network edge and only send necessary data to the cloud (i.e., preprocessing at the edge). Anomaly detection is particularly useful as an edge mining technique to reduce the transmission overhead in such a context when the frequently monitored activities contain only a sparse set of anomalies. This paper analyzes the potential overhead-savings of machine learning based anomaly detection models on the edge in three different IoT scenarios. Our experimental results prove that by choosing the appropriate anomaly detection models, we are able to effectively reduce the total amount of transmission energy as well as minimize required cloud storage. We prove that Random Forest, Multilayer Perceptron, and Discriminant Analysis models can viably save time and energy on the edge device during data transmission. K-Nearest Neighbors, although reliable in terms of prediction accuracy, demands exorbitant overhead and results in net time and energy loss on the edge device. In addition to presenting our model results for the different IoT scenarios, we provide guidelines for potential model selections through analysis of involved tradeoffs such as training overhead, prediction overhead, and classification accuracy

Scholar Commons - Santa Clara University

Inferring Anomalies from Data using Bayesian Networks

Author: Babbar Sakshi
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2013
Field of study

Existing studies on data mining has largely focused on the design of measures and algorithms to identify outliers in large and high dimensional categorical and numeric databases. However, not much stress has been given on the interestingness of the reported outlier. One way to ascertain interestingness and usefulness of the reported outlier is by making use of domain knowledge. In this thesis, we present measures to discover outliers based on background knowledge, represented by a Bayesian network. Using causal relationships between attributes encoded in the Bayesian framework, we demonstrate that meaningful outliers, i.e., outliers which encode important or new information are those which violate causal relationships encoded in the model. Depending upon nature of data, several approaches are proposed to identify and explain anomalies using Bayesian knowledge. Outliers are often identified as data points which are ``rare'', ''isolated'', or ''far away from their nearest neighbors''. We show that these characteristics may not be an accurate way of describing interesting outliers. Through a critical analysis on several existing outlier detection techniques, we show why there is a mismatch between outliers as entities described by these characteristics and ``real'' outliers as identified using Bayesian approach. We show that the Bayesian approaches presented in this thesis has better accuracy in mining genuine outliers while, keeping a low false positive rate as compared to traditional outlier detection techniques

Sydney eScholarship

Oil and Gas flow Anomaly Detection on offshore naturally flowing wells using Deep Neural Networks

Author: Bayazitova Guzel
Publication venue
Publication date: 23/10/2023
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe Oil and Gas industry, as never before, faces multiple challenges. It is being impugned for being dirty, a pollutant, and hence the more demand for green alternatives. Nevertheless, the world still has to rely heavily on hydrocarbons, since it is the most traditional and stable source of energy, as opposed to extensively promoted hydro, solar or wind power. Major operators are challenged to produce the oil more efficiently, to counteract the newly arising energy sources, with less of a climate footprint, more scrutinized expenditure, thus facing high skepticism regarding its future. It has to become greener, and hence to act in a manner not required previously. While most of the tools used by the Hydrocarbon E&P industry is expensive and has been used for many years, it is paramount for the industry’s survival and prosperity to apply predictive maintenance technologies, that would foresee potential failures, making production safer, lowering downtime, increasing productivity and diminishing maintenance costs. Many efforts were applied in order to define the most accurate and effective predictive methods, however data scarcity affects the speed and capacity for further experimentations. Whilst it would be highly beneficial for the industry to invest in Artificial Intelligence, this research aims at exploring, in depth, the subject of Anomaly Detection, using the open public data from Petrobras, that was developed by experts. For this research the Deep Learning Neural Networks, such as Recurrent Neural Networks with LSTM and GRU backbones, were implemented for multi-class classification of undesirable events on naturally flowing wells. Further, several hyperparameter optimization tools were explored, mainly focusing on Genetic Algorithms as being the most advanced methods for such kind of tasks. The research concluded with the best performing algorithm with 2 stacked GRU and the following vector of hyperparameters weights: [1, 47, 40, 14], which stand for timestep 1, number of hidden units 47, number of epochs 40 and batch size 14, producing F1 equal to 0.97%. As the world faces many issues, one of which is the detrimental effect of heavy industries to the environment and as result adverse global climate change, this project is an attempt to contribute to the field of applying Artificial Intelligence in the Oil and Gas industry, with the intention to make it more efficient, transparent and sustainable

Repositório da Universidade Nova de Lisboa

ServeNet: A Deep Neural Network for Web Services Classification

Author: Grolinger Katarina
Li Zhi
Liao Zhifang
Liu Peng
Qamar Nafees
Wang Weiru
Yang Yilong
Publication venue
Publication date: 06/08/2020
Field of study

Automated service classification plays a crucial role in service discovery, selection, and composition. Machine learning has been widely used for service classification in recent years. However, the performance of conventional machine learning methods highly depends on the quality of manual feature engineering. In this paper, we present a novel deep neural network to automatically abstract low-level representation of both service name and service description to high-level merged features without feature engineering and the length limitation, and then predict service classification on 50 service categories. To demonstrate the effectiveness of our approach, we conduct a comprehensive experimental study by comparing 10 machine learning methods on 10,000 real-world web services. The result shows that the proposed deep neural network can achieve higher accuracy in classification and more robust than other machine learning methods.Comment: Accepted by ICWS'2

arXiv.org e-Print Archive

Crossref

Meta-survey on outlier and anomaly detection

Author: Olteanu Madalina
Rossi Fabrice
Yger Florian
Publication venue
Publication date: 12/12/2023
Field of study

The impact of outliers and anomalies on model estimation and data processing is of paramount importance, as evidenced by the extensive body of research spanning various fields over several decades: thousands of research papers have been published on the subject. As a consequence, numerous reviews, surveys, and textbooks have sought to summarize the existing literature, encompassing a wide range of methods from both the statistical and data mining communities. While these endeavors to organize and summarize the research are invaluable, they face inherent challenges due to the pervasive nature of outliers and anomalies in all data-intensive applications, irrespective of the specific application field or scientific discipline. As a result, the resulting collection of papers remains voluminous and somewhat heterogeneous. To address the need for knowledge organization in this domain, this paper implements the first systematic meta-survey of general surveys and reviews on outlier and anomaly detection. Employing a classical systematic survey approach, the study collects nearly 500 papers using two specialized scientific search engines. From this comprehensive collection, a subset of 56 papers that claim to be general surveys on outlier detection is selected using a snowball search technique to enhance field coverage. A meticulous quality assessment phase further refines the selection to a subset of 25 high-quality general surveys. Using this curated collection, the paper investigates the evolution of the outlier detection field over a 20-year period, revealing emerging themes and methods. Furthermore, an analysis of the surveys sheds light on the survey writing practices adopted by scholars from different communities who have contributed to this field. Finally, the paper delves into several topics where consensus has emerged from the literature. These include taxonomies of outlier types, challenges posed by high-dimensional data, the importance of anomaly scores, the impact of learning conditions, difficulties in benchmarking, and the significance of neural networks. Non-consensual aspects are also discussed, particularly the distinction between local and global outliers and the challenges in organizing detection methods into meaningful taxonomies

arXiv.org e-Print Archive

D-AREdevil: a novel approach for discovering disease-associated rare cell populations in mass cytometry data

Author: SUFFIOTTI Madeleine
Publication venue: Université de Lausanne, Faculté de biologie et médecine
Publication date: 01/01/2020
Field of study

Background: The advances in single-cell technologies such as mass cytometry provides increasing resolution of the complexity of cellular samples, allowing researchers to deeper investigate and understand the cellular heterogeneity and possibly detect and discover previously undetectable rare cell populations. The identification of rare cell populations is of paramount importance for understanding the onset, progression and pathogenesis of many diseases. However, their identification remains challenging due to the always increasing dimensionality and throughput of the data generated. Aim: This study aimed at implementing a straightforward approach that efficiently supports a data analyst to identify disease-associated rare cell populations in large and complex biological samples and within reasonable limits of time and computational infrastructure. Methods: We proposed a novel computational framework called D-AREdevil (disease- associated rare cells detection) for cytometry datasets. The main characteristic of our computational framework is the combination of an anomaly detection algorithm (i.e. LOF, or FiRE) that provides a continuous score for individual cells with one of the best performing and fastest unsupervised clustering methods (i.e. FlowSOM). In our approach, the LOF score serves to select a set of candidate cells belonging to one or more subgroups of similar rare cell populations. Then, we tested these subgroups of rare cells for association with a patient group, disease type, clinical outcome or other characteristic of interest. Results: We reported in this study the properties and implementation of D-AREdevil and presented an evaluation of its performances and applications on three different testing datasets based on mass cytometry data. We generated data mixed with one or more known rare cell populations at varying frequencies (below 1%) and tested the ability of our approach to identify those cells in order to bring them to the attention of the data analyst. This is a key step in the process of finding cell subgroups that are associated with a disease or outcome of interest, when their existence and identification is not previously known and has yet to be discovered. Conclusions: We proposed a novel computational framework with demostrated good sensitivity and precision in detecting target rare cell poopulations present at very low frequencies in the total datasets (<1%). -- Contexte: Les avancées en technologies sur cellules individuelles telles que la cytométrie de masse offrent une meilleure résolution de la complexité des échantillons cellulaires, permettant aux chercheurs d’étudier et de comprendre plus en profondeur l’hétérogénéité cellulaire et éventuellement de détecter et découvrir des populations de cellules rares auparavant indétectables. L’identification de populations de cellules rares est importante pour comprendre l’apparition, la progression et la pathogenèse de nombreuses maladies. Cependant, leur identification reste difficile en raison de la haute dimensionnalité et du débit toujours croissants de données générées. But: Cette étude met en œuvre une approche simple et efficace pour identifier des populations de cellules rares associées à une maladie dans des échantillons biologiques vastes et complexes dans des limites de temps et d’infrastructure de calcul raisonnables. Méthodes: Nous proposons un nouveau cadre de calcul appelé D-AREdevil (détection de cellules rares associées à une maladie) pour l’analyse de données de cytométrie de masse. La principale caractéristique de notre cadre computationnel est la combinaison d’un algorithme de détection d’anomalies (LOF ou FiRE) qui fournit un score continu pour chaque cellule avec l’une des méthodes de regroupement non-supervisé les plus performantes et les plus rapides (FlowSOM). Dans notre approche, le score LOF sert à sélectionner un ensemble de cellules candidates appartenant à un ou plusieurs sous-groupes de populations de cellules rares similaires. Ensuite, nous testons ces sous-groupes de cellules rares pour déterminer s’ils sont associées avec un groupe de patients, un type de maladie, un résultat clinique ou une autre caractéristique d’intérêt. Résultats: Dans cette étude, nous avons rapporté les propriétés et l’implémentation de D-AREdevil, et présenté une évaluation de ses performances et applications sur trois jeux de données différents de cytométrie de masse. Nous avons généré des données mélangées contenant une ou plusieurs populations de cellules rares connues à des fréquences variables (inférieures à 1%) et nous avons testé la capacité de notre approche à identifier ces cellules afin de les porter à l’attention de l’analyste. Il s’agit là d’une étape clé dans le processus de recherche de sous-groupes de cellules qui sont associés à une maladie ou à un résultat d’intérêt qui est encore inconnu. Conclusions: Nous proposons un nouveau cadre de calcul avec une bonne sensibilité et une bonne précision dans la détection de cellules rares qui sont présentes à de très basses fréquences dans l’ensemble des données (<1%)

Serveur académique lausannois

Neuromorphic Learning Systems for Supervised and Unsupervised Applications

Author: Chen Qiuwen
Publication venue: SURFACE at Syracuse University
Publication date: 01/12/2016
Field of study

The advancements in high performance computing (HPC) have enabled the large-scale implementation of neuromorphic learning models and pushed the research on computational intelligence into a new era. Those bio-inspired models are constructed on top of unified building blocks, i.e. neurons, and have revealed potentials for learning of complex information. Two major challenges remain in neuromorphic computing. Firstly, sophisticated structuring methods are needed to determine the connectivity of the neurons in order to model various problems accurately. Secondly, the models need to adapt to non-traditional architectures for improved computation speed and energy efficiency. In this thesis, we address these two problems and apply our techniques to different cognitive applications. This thesis first presents the self-structured confabulation network for anomaly detection. Among the machine learning applications, unsupervised detection of the anomalous streams is especially challenging because it requires both detection accuracy and real-time performance. Designing a computing framework that harnesses the growing computing power of the multicore systems while maintaining high sensitivity and specificity to the anomalies is an urgent research need. We present AnRAD (Anomaly Recognition And Detection), a bio-inspired detection framework that performs probabilistic inferences. We leverage the mutual information between the features and develop a self-structuring procedure that learns a succinct confabulation network from the unlabeled data. This network is capable of fast incremental learning, which continuously refines the knowledge base from the data streams. Compared to several existing anomaly detection methods, the proposed approach provides competitive detection accuracy as well as the insight to reason the decision making. Furthermore, we exploit the massive parallel structure of the AnRAD framework. Our implementation of the recall algorithms on the graphic processing unit (GPU) and the Xeon Phi co-processor both obtain substantial speedups over the sequential implementation on general-purpose microprocessor (GPP). The implementation enables real-time service to concurrent data streams with diversified contexts, and can be applied to large problems with multiple local patterns. Experimental results demonstrate high computing performance and memory efficiency. For vehicle abnormal behavior detection, the framework is able to monitor up to 16000 vehicles and their interactions in real-time with a single commodity co-processor, and uses less than 0.2ms for each testing subject. While adapting our streaming anomaly detection model to mobile devices or unmanned systems, the key challenge is to deliver required performance under the stringent power constraint. To address the paradox between performance and power consumption, brain-inspired hardware, such as the IBM Neurosynaptic System, has been developed to enable low power implementation of neural models. As a follow-up to the AnRAD framework, we proposed to port the detection network to the TrueNorth architecture. Implementing inference based anomaly detection on a neurosynaptic processor is not straightforward due to hardware limitations. A design flow and the supporting component library are developed to flexibly map the learned detection networks to the neurosynaptic cores. Instead of the popular rate code, burst code is adopted in the design, which represents numerical value using the phase of a burst of spike trains. This does not only reduce the hardware complexity, but also increases the result\u27s accuracy. A Corelet library, NeoInfer-TN, is implemented for basic operations in burst code and two-phase pipelines are constructed based on the library components. The design can be configured for different tradeoffs between detection accuracy, hardware resource consumptions, throughput and energy. We evaluate the system using network intrusion detection data streams. The results show higher detection rate than some conventional approaches and real-time performance, with only 50mW power consumption. Overall, it achieves 10^8 operations per Joule. In addition to the modeling and implementation of unsupervised anomaly detection, we also investigate a supervised learning model based on neural networks and deep fragment embedding and apply it to text-image retrieval. The study aims at bridging the gap between image and natural language. It continues to improve the bidirectional retrieval performance across the modalities. Unlike existing works that target at single sentence densely describing the image objects, we elevate the topic to associating deep image representations with noisy texts that are only loosely correlated. Based on text-image fragment embedding, our model employs a sequential configuration, connects two embedding stages together. The first stage learns the relevancy of the text fragments, and the second stage uses the filtered output from the first one to improve the matching results. The model also integrates multiple convolutional neural networks (CNN) to construct the image fragments, in which rich context information such as human faces can be extracted to increase the alignment accuracy. The proposed method is evaluated with both synthetic dataset and real-world dataset collected from picture news website. The results show up to 50% ranking performance improvement over the comparison models

Syracuse University Research Facility and Collaborative Environment