1,078 research outputs found

    Real-Time intelligence

    Get PDF
    Dissertação de mestrado em Computer ScienceOver the past 20 years, data has increased in a large scale in various fields. This explosive increase of global data led to the coin of the term Big Data. Big data is mainly used to describe enormous datasets that typically includes masses of unstructured data that may need real-time analysis. This paradigm brings important challenges on tasks like data acquisition, storage and analysis. The ability to perform these tasks efficiently got the attention of researchers as it brings a lot of oportunities for creating new value. Another topic with growing importance is the usage of biometrics, that have been used in a wide set of application areas as, for example, healthcare and security. In this work it is intended to handle the data pipeline of data generated by a large scale biometrics application providing basis for real-time analytics and behavioural classification. The challenges regarding analytical queries (with real-time requirements, due to the need of monitoring the metrics/behavior) and classifiers’ training are particularly addressed.Nos os últimos 20 anos, a quantidade de dados armazenados e passíveis de serem processados, tem vindo a aumentar em áreas bastante diversas. Este aumento explosivo, aliado às potencialidades que surgem como consequência do mesmo, levou ao aparecimento do termo Big Data. Big Data abrange essencialmente grandes volumes de dados, possivelmente com pouca estrutura e com necessidade de processamento em tempo real. As especificidades apresentadas levaram ao aparecimento de desafios nas diversas tarefas do pipeline típico de processamento de dados como, por exemplo, a aquisição, armazenamento e a análise. A capacidade de realizar estas tarefas de uma forma eficiente tem sido alvo de estudo tanto pela indústria como pela comunidade académica, abrindo portas para a criação de valor. Uma outra área onde a evolução tem sido notória é a utilização de biométricas comportamentais que tem vindo a ser cada vez mais acentuada em diferentes cenários como, por exemplo, na área dos cuidados de saúde ou na segurança. Neste trabalho um dos objetivos passa pela gestão do pipeline de processamento de dados de uma aplicação de larga escala, na área das biométricas comportamentais, de forma a possibilitar a obtenção de métricas em tempo real sobre os dados (viabilizando a sua monitorização) e a classificação automática de registos sobre fadiga na interação homem-máquina (em larga escala)

    Collective Contextual Anomaly Detection for Building Energy Consumption

    Get PDF
    Commercial and residential buildings are responsible for a substantial portion of total global energy consumption and as a result make a significant contribution to global carbon emissions. Hence, energy-saving goals that target buildings can have a major impact in reducing environmental damage. During building operation, a significant amount of energy is wasted due to equipment and human-related faults. To reduce waste, today\u27s smart buildings monitor energy usage with the aim of identifying abnormal consumption behaviour and notifying the building manager to implement appropriate energy-saving procedures. To this end, this research proposes the \textit{ensemble anomaly detection} (EAD) framework. The EAD is a generic framework that combines several anomaly detection classifiers using majority voting. This anomaly detection classifiers are formed using existing machine learning algorithm. It is assumed that each anomaly classifier has equal weight. More importantly, to ensure diversity of anomaly classifiers, the EAD is implemented by combining pattern-based and prediction-based anomaly classifiers. For this reason, this research also proposes a new pattern-based anomaly classifier, the \textit{collective contextual anomaly detection using sliding window} (CCAD-SW) framework. The CCAD-SW, which is also a machine leaning-based framework that identifies anomalous consumption patterns using overlapping sliding windows. The EAD framework combines the CCAD-SW, which is implemented using autoencoder, with two prediction-based anomaly classifiers that are implemented using the support vector regression and random forest machine-learning algorithms. In addition, it determines an ensemble threshold that yields an anomaly classifier with optimal anomaly detection capability and false positive minimization. Results show that the EAD performs better than the individual anomaly detection classifiers. In the EAD framework, the optimal ensemble anomaly classifier is not attained by combining the individual learners at their respective optimal performance levels. Instead, an ensemble threshold combination that yields the optimal anomaly classifier was identified by searching through the ensemble threshold space. The research was evaluated using real-world data provided by Powersmiths, located in Brampton, Ontario, Canada

    An ensemble learning framework for anomaly detection in building energy consumption

    Get PDF
    During building operation, a significant amount of energy is wasted due to equipment and human-related faults. To reduce waste, today\u27s smart buildings monitor energy usage with the aim of identifying abnormal consumption behaviour and notifying the building manager to implement appropriate energy-saving procedures. To this end, this research proposes a new pattern-based anomaly classifier, the collective contextual anomaly detection using sliding window (CCAD-SW) framework. The CCAD-SW framework identifies anomalous consumption patterns using overlapping sliding windows. To enhance the anomaly detection capacity of the CCAD-SW, this research also proposes the ensemble anomaly detection (EAD) framework. The EAD is a generic framework that combines several anomaly detection classifiers using majority voting. To ensure diversity of anomaly classifiers, the EAD is implemented by combining pattern-based (e.g., CCAD-SW) and prediction-based anomaly classifiers. The research was evaluated using real-world data provided by Powersmiths, located in Brampton, Ontario, Canada. Results show that the EAD framework improved the sensitivity of the CCAD-SW by 3.6% and reduced false alarm rate by 2.7%

    ExoGAN: Retrieving Exoplanetary Atmospheres Using Deep Convolutional Generative Adversarial Networks

    Get PDF
    Atmospheric retrievals on exoplanets usually involve computationally intensive Bayesian sampling methods. Large parameter spaces and increasingly complex atmospheric models create a computational bottleneck forcing a trade-off between statistical sampling accuracy and model complexity. It is especially true for upcoming JWST and ARIEL observations. We introduce ExoGAN, the Exoplanet Generative Adversarial Network, a new deep learning algorithm able to recognise molecular features, atmospheric trace-gas abundances and planetary parameters using unsupervised learning. Once trained, ExoGAN is widely applicable to a large number of instruments and planetary types. The ExoGAN retrievals constitute a significant speed improvement over traditional retrievals and can be used either as a final atmospheric analysis or provide prior constraints to subsequent retrieval.Comment: 19 pages, 17 figures, 7 table

    AIMS: An Automatic Semantic Machine Learning Microservice Framework to Support Biomedical and Bioengineering Research

    Get PDF
    The fusion of machine learning and biomedical research offers novel ways to understand, diagnose, and treat various health conditions. However, the complexities of biomedical data, coupled with the intricate process of developing and deploying machine learning solutions, often pose significant challenges to researchers in these fields. Our pivotal achievement in this research is the introduction of the Automatic Semantic Machine Learning Microservice Framework (AIMS). AIMS addresses these challenges by automating various stages of the machine learning pipeline, with a particular emphasis on the ontology of machine learning services tailored for the biomedical domain. This ontology encompasses everything from task representation, service modeling, and knowledge acquisition to knowledge reasoning and the establishment of a self-supervised learning policy. Our framework has been crafted to prioritize model interpretability, integrate domain knowledge effortlessly, and handle biomedical data with efficiency. Additionally, AIMS boasts a distinctive feature: it leverages self-supervised knowledge learning through reinforcement learning techniques, paired with an ontology-based policy recording schema. This enables it to autonomously generate, fine-tune, and continually adapt to machine learning models, especially when faced with new tasks and data. Our work has two standout contributions of demonstrating that machine learning processes in the biomedical domain can be automated, while integrating a rich domain knowledge base and providing a way for machines to have a self-learning ability, ensuring they handle new tasks effectively. To showcase AIMS in action, we've highlighted its prowess in three case studies from biomedical tasks. These examples emphasize how our framework can simplify research routines, uplift the caliber of scientific exploration, and set the stage for notable advances

    Energy Consumption Prediction with Big Data: Balancing Prediction Accuracy and Computational Resources

    Get PDF
    In recent years, advances in sensor technologies and expansion of smart meters have resulted in massive growth of energy data sets. These Big Data have created new opportunities for energy prediction, but at the same time, they impose new challenges for traditional technologies. On the other hand, new approaches for handling and processing these Big Data have emerged, such as MapReduce, Spark, Storm, and Oxdata H2O. This paper explores how findings from machine learning with Big Data can benefit energy consumption prediction. An approach based on local learning with support vector regression (SVR) is presented. Although local learning itself is not a novel concept, it has great potential in the Big Data domain because it reduces computational complexity. The local SVR approach presented here is compared to traditional SVR and to deep neural networks with an H2O machine learning platform for Big Data. Local SVR outperformed both SVR and H2O deep learning in terms of prediction accuracy and computation time. Especially significant was the reduction in training time; local SVR training was an order of magnitude faster than SVR or H2O deep learning

    Enabling Design of Middleware for Massive Scale IOT-based Systems

    Full text link
    Recently, the Internet of Things (IoT) technology has rapidly advanced to the stage where it is feasible to discover, locate and identify various smart sensors and devices based on the context, situation, characteristics, and relevancy to query for their data or control actions. Taking things a step further when developing Large Scale Applications requires that two serious issues be overcome. The first issue is to find a solution for data sensing and collection from a massive number of various ubiquitous devices when converging these into the next generation networks. The second important issue is to deal with the “Big Data” that arrive from a very large number of sources. This research emphasizes the need for finding a solution for a large scale data aggregation and delivery. The paper introduces biomimetic design methods for data aggregation in the context of large scale IoT-based systems
    corecore