3,322 research outputs found

    Towards Automated Machine Learning on Imperfect Data for Situational Awareness in Power System

    Get PDF
    The increasing penetration of renewable energy sources (such as solar and wind) and incoming widespread electric vehicles charging introduce new challenges in the power system. Due to the variability and uncertainty of these sources, reliable and cost-effective operations of the power system rely on high level of situational awareness. Thanks to the wide deployment of sensors (e.g., phasor measurement units (PMUs) and smart meters) and the emerging smart Internet of Things (IoT) sensing devices in the electric grid, large amounts of data are being collected, which provide golden opportunities to achieve high level of situational awareness for reliable and cost-effective grid operations.To better utilize the data, this dissertation aims to develop Machine Learning (ML) methods and provide fundamental understanding and systematic exploitation of ML for situational awareness using large amounts of imperfect data collected in power systems, in order to improve the reliability and resilience of power systems.However, building excellent ML models needs clean, accurate and sufficient training data. The data collected from the real-world power system is of low quality. For example, the data collected from wind farms contains a mixture of ramp and non-ramp as well as the mingle of heterogeneous dynamics data; the data in the transmission grid contains noisy, missing, insufficient and inaccurate timestamp data. Employing ML without considering these distinct features in real-world applications cannot build good ML models. This dissertation aims to address these challenges in two applications, wind generation forecast and power system event classification, by developing ML models in an automated way with less efforts from domain experts, as the cost of processing such large amounts of imperfect data by experts can be prohibitive in practice.First, we take heterogeneous dynamics into consideration, especially for ramp events. A Drifting Streaming Peaks-over-Threshold (DSPOT) enhanced self-evolving neural networks-based short-term wind farm generation forecast is proposed by utilizing dynamic ramp thresholds to separate the ramp and non-ramp events, based on which different neural networks are trained to learn different dynamics of wind farm generation. As the efficacy of the neural networks relies on the quality of training datasets (i.e., the classification accuracy of ramp and non-ramp events), a Bayesian optimization based approach is developed to optimize the parameters of DSPOT to enhance the quality of the training datasets and the corresponding performance of the neural networks. Experimental results show that compared with other forecast approaches, the proposed forecast approach can substantially improve the forecast accuracy, especially for ramp events. Next, we address the challenges of event classification due to the low-quality PMU measurements and event logs. A novel machine learning framework is proposed for robust event classification, which consists of three main steps: data preprocessing, fine-grained event data extraction, and feature engineering. Specifically, the data preprocessing step addresses the data quality issues of PMU measurements (e.g., bad data and missing data); in the fine-grained event data extraction step, a model-free event detection method is developed to accurately localize the events from the inaccurate event timestamps in the event logs; and the feature engineering step constructs the event features based on the patterns of different event types, in order to improve the performance and the interpretability of the event classifiers. Moreover, with the small number of good features, we need much less training data to train a good event classifier, which can address the challenge of insufficient and imbalanced training data, and the training time is negligible compared to neural network based approaches. Based on the proposed framework, we developed a workflow for event classification using the real-world PMU data streaming into the system in real time. Using the proposed framework, robust event classifiers can be efficiently trained based on many off-the-shelf lightweight machine learning models. Numerical experiments using the real-world dataset from the Western Interconnection of the U.S power transmission grid show that the event classifiers trained under the proposed framework can achieve high classification accuracy while being robust against low-quality data. Subsequently, we address the challenge of insufficient training labels. The real-world PMU data is often incomplete and noisy, which can significantly reduce the efficacy of existing machine learning techniques that require high-quality labeled training data. To obtain high-quality event logs for large amounts of PMU measurements, it requires significant efforts from domain experts to maintain the event logs and even hand-label the events, which can be prohibitively costly or impractical in practice. So we develop a weakly supervised machine learning approach that can learn a good event classifier using a few labeled PMU data. The key idea is to learn the labels from unlabeled data using a probabilistic generative model, in order to improve the training of the event classifiers. Experimental results show that even with 95\% of unlabeled data, the average accuracy of the proposed method can still achieve 78.4\%. This provides a promising way for domain experts to maintain the event logs in a less expensive and automated manner. Finally, we conclude the dissertation and discuss future directions

    Crowdsourcing in Computer Vision

    Full text link
    Computer vision systems require large amounts of manually annotated data to properly learn challenging visual concepts. Crowdsourcing platforms offer an inexpensive method to capture human knowledge and understanding, for a vast number of visual perception tasks. In this survey, we describe the types of annotations computer vision researchers have collected using crowdsourcing, and how they have ensured that this data is of high quality while annotation effort is minimized. We begin by discussing data collection on both classic (e.g., object recognition) and recent (e.g., visual story-telling) vision tasks. We then summarize key design decisions for creating effective data collection interfaces and workflows, and present strategies for intelligently selecting the most important data instances to annotate. Finally, we conclude with some thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in Computer Graphics and Vision, 201

    RTLS-enabled clinical workflow predictive analysis

    Get PDF

    Detecting human and non-human vocal productions in large scale audio recordings

    Full text link
    We propose an automatic data processing pipeline to extract vocal productions from large-scale natural audio recordings. Through a series of computational steps (windowing, creation of a noise class, data augmentation, re-sampling, transfer learning, Bayesian optimisation), it automatically trains a neural network for detecting various types of natural vocal productions in a noisy data stream without requiring a large sample of labeled data. We test it on two different data sets, one from a group of Guinea baboons recorded from a primate research center and one from human babies recorded at home. The pipeline trains a model on 72 and 77 minutes of labeled audio recordings, with an accuracy of 94.58% and 99.76%. It is then used to process 443 and 174 hours of natural continuous recordings and it creates two new databases of 38.8 and 35.2 hours, respectively. We discuss the strengths and limitations of this approach that can be applied to any massive audio recording

    Cardea: An Open Automated Machine Learning Framework for Electronic Health Records

    Full text link
    An estimated 180 papers focusing on deep learning and EHR were published between 2010 and 2018. Despite the common workflow structure appearing in these publications, no trusted and verified software framework exists, forcing researchers to arduously repeat previous work. In this paper, we propose Cardea, an extensible open-source automated machine learning framework encapsulating common prediction problems in the health domain and allows users to build predictive models with their own data. This system relies on two components: Fast Healthcare Interoperability Resources (FHIR) -- a standardized data structure for electronic health systems -- and several AUTOML frameworks for automated feature engineering, model selection, and tuning. We augment these components with an adaptive data assembler and comprehensive data- and model- auditing capabilities. We demonstrate our framework via 5 prediction tasks on MIMIC-III and Kaggle datasets, which highlight Cardea's human competitiveness, flexibility in problem definition, extensive feature generation capability, adaptable automatic data assembler, and its usability
    corecore