12,613 research outputs found

    Data mining based cyber-attack detection

    Get PDF

    Load curve data cleansing and imputation via sparsity and low rank

    Full text link
    The smart grid vision is to build an intelligent power network with an unprecedented level of situational awareness and controllability over its services and infrastructure. This paper advocates statistical inference methods to robustify power monitoring tasks against the outlier effects owing to faulty readings and malicious attacks, as well as against missing data due to privacy concerns and communication errors. In this context, a novel load cleansing and imputation scheme is developed leveraging the low intrinsic-dimensionality of spatiotemporal load profiles and the sparse nature of "bad data.'' A robust estimator based on principal components pursuit (PCP) is adopted, which effects a twofold sparsity-promoting regularization through an ā„“1\ell_1-norm of the outliers, and the nuclear norm of the nominal load profiles. Upon recasting the non-separable nuclear norm into a form amenable to decentralized optimization, a distributed (D-) PCP algorithm is developed to carry out the imputation and cleansing tasks using networked devices comprising the so-termed advanced metering infrastructure. If D-PCP converges and a qualification inequality is satisfied, the novel distributed estimator provably attains the performance of its centralized PCP counterpart, which has access to all networkwide data. Computer simulations and tests with real load curve data corroborate the convergence and effectiveness of the novel D-PCP algorithm.Comment: 8 figures, submitted to IEEE Transactions on Smart Grid - Special issue on "Optimization methods and algorithms applied to smart grid

    Big Data Caching for Networking: Moving from Cloud to Edge

    Full text link
    In order to cope with the relentless data tsunami in 5G5G wireless networks, current approaches such as acquiring new spectrum, deploying more base stations (BSs) and increasing nodes in mobile packet core networks are becoming ineffective in terms of scalability, cost and flexibility. In this regard, context-aware 55G networks with edge/cloud computing and exploitation of \emph{big data} analytics can yield significant gains to mobile operators. In this article, proactive content caching in 55G wireless networks is investigated in which a big data-enabled architecture is proposed. In this practical architecture, vast amount of data is harnessed for content popularity estimation and strategic contents are cached at the BSs to achieve higher users' satisfaction and backhaul offloading. To validate the proposed solution, we consider a real-world case study where several hours of mobile data traffic is collected from a major telecom operator in Turkey and a big data-enabled analysis is carried out leveraging tools from machine learning. Based on the available information and storage capacity, numerical studies show that several gains are achieved both in terms of users' satisfaction and backhaul offloading. For example, in the case of 1616 BSs with 30%30\% of content ratings and 1313 Gbyte of storage size (78%78\% of total library size), proactive caching yields 100%100\% of users' satisfaction and offloads 98%98\% of the backhaul.Comment: accepted for publication in IEEE Communications Magazine, Special Issue on Communications, Caching, and Computing for Content-Centric Mobile Network

    Towards a framework for designing full model selection and optimization systems

    Get PDF
    People from a variety of industrial domains are beginning to realise that appropriate use of machine learning techniques for their data mining projects could bring great benefits. End-users now have to face the new problem of how to choose a combination of data processing tools and algorithms for a given dataset. This problem is usually termed the Full Model Selection (FMS) problem. Extended from our previous work [10], in this paper, we introduce a framework for designing FMS algorithms. Under this framework, we propose a novel algorithm combining both genetic algorithms (GA) and particle swarm optimization (PSO) named GPS (which stands for GA-PSO-FMS), in which a GA is used for searching the optimal structure for a data mining solution, and PSO is used for searching optimal parameters for a particular structure instance. Given a classification dataset, GPS outputs a FMS solution as a directed acyclic graph consisting of diverse data mining operators that are available to the problem. Experimental results demonstrate the benefit of the algorithm. We also present, with detailed analysis, two model-tree-based variants for speeding up the GPS algorithm

    MARVEL: measured active rotational-vibrational energy levels

    Get PDF
    An algorithm is proposed, based principally on an earlier proposition of Flaud and co-workers [Mol. Phys. 32 (1976) 499], that inverts the information contained in uniquely assigned experimental rotational-vibrational transitions in order to obtain measured active rotational-vibrational energy levels (MARVEL). The procedure starts with collecting, critically evaluating, selecting, and compiling all available measured transitions, including assignments and uncertainties, into a single database. Then, spectroscopic networks (SN) are determined which contain all interconnecting rotational-vibrational energy levels supported by the grand database of the selected transitions. Adjustment of the uncertainties of the lines is performed next, with the help of a robust weighting strategy, until a self-consistent set of lines and uncertainties is achieved. Inversion of the transitions through a weighted least-squares-type procedure results in MARVEL energy levels and associated uncertainties. Local sensitivity coefficients could be computed for each energy level. The resulting set of MARVEL levels is called active as when new experimental measurements become available the same evaluation, adjustment, and inversion procedure should be repeated in order to obtain more dependable energy levels and uncertainties. MARVEL is tested on the example of the H-2 O-17 isotopologue of water and a list of 2736 dependable energy levels, based on 8369 transitions, has been obtained. (c) 2007 Elsevier Inc. All rights reserved

    Predicting \u27Attention Deficit Hyperactive Disorder\u27 using large scale child data set

    Get PDF
    Attention deficit hyperactivity disorder (ADHD) is a disorder found in children affecting about 9.5% of American children aged 13 years or more. Every year, the number of children diagnosed with ADHD is increasing. There is no single test that can diagnose ADHD. In fact, a health practitioner has to analyze the behavior of the child to determine if the child has ADHD. He has to gather information about the child, and his/her behavior and environment. Because of all these problems in diagnosis, I propose to use Machine Learning techniques to predict ADHD by using large scale child data set. Machine learning offers a principled approach for developing sophisticated, automatic, and objective algorithms for analysis of disease. Lot of new approaches have immerged which allows to develop understanding and provides opportunity to do advanced analysis. Use of classification model in detection has made significant impacts in the detection and diagnosis of diseases. I propose to use binary classification techniques for detection and diagnosis of ADHD

    A Survey on IT-Techniques for a Dynamic Emergency Management in Large Infrastructures

    Get PDF
    This deliverable is a survey on the IT techniques that are relevant to the three use cases of the project EMILI. It describes the state-of-the-art in four complementary IT areas: Data cleansing, supervisory control and data acquisition, wireless sensor networks and complex event processing. Even though the deliverableā€™s authors have tried to avoid a too technical language and have tried to explain every concept referred to, the deliverable might seem rather technical to readers so far little familiar with the techniques it describes

    Structural health monitoring of offshore wind turbines: A review through the Statistical Pattern Recognition Paradigm

    Get PDF
    Offshore Wind has become the most profitable renewable energy source due to the remarkable development it has experienced in Europe over the last decade. In this paper, a review of Structural Health Monitoring Systems (SHMS) for offshore wind turbines (OWT) has been carried out considering the topic as a Statistical Pattern Recognition problem. Therefore, each one of the stages of this paradigm has been reviewed focusing on OWT application. These stages are: Operational Evaluation; Data Acquisition, Normalization and Cleansing; Feature Extraction and Information Condensation; and Statistical Model Development. It is expected that optimizing each stage, SHMS can contribute to the development of efficient Condition-Based Maintenance Strategies. Optimizing this strategy will help reduce labor costs of OWTs׳ inspection, avoid unnecessary maintenance, identify design weaknesses before failure, improve the availability of power production while preventing wind turbines׳ overloading, therefore, maximizing the investments׳ return. In the forthcoming years, a growing interest in SHM technologies for OWT is expected, enhancing the potential of offshore wind farm deployments further offshore. Increasing efficiency in operational management will contribute towards achieving UK׳s 2020 and 2050 targets, through ultimately reducing the Levelised Cost of Energy (LCOE)
    • ā€¦
    corecore