1,907 research outputs found

    Automatic data cleaning

    Get PDF

    Heterogeneous data source integration for smart grid ecosystems based on metadata mining

    Get PDF
    The arrival of new technologies related to smart grids and the resulting ecosystem of applications andmanagement systems pose many new problems. The databases of the traditional grid and the variousinitiatives related to new technologies have given rise to many different management systems with several formats and different architectures. A heterogeneous data source integration system is necessary toupdate these systems for the new smart grid reality. Additionally, it is necessary to take advantage of theinformation smart grids provide. In this paper, the authors propose a heterogeneous data source integration based on IEC standards and metadata mining. Additionally, an automatic data mining framework isapplied to model the integrated information.Ministerio de Economía y Competitividad TEC2013-40767-

    FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection

    Get PDF
    In this paper, we propose an novel interactive outlier detection system called feature-rich interactive outlier detection (FRIOD), which features a deep integration of human interaction to improve detection performance and greatly streamline the detection process. A user-friendly interactive mechanism is developed to allow easy and intuitive user interaction in all the major stages of the underlying outlier detection algorithm which includes dense cell selection, location-aware distance thresholding, and final top outlier validation. By doing so, we can mitigate the major difficulty of the competitive outlier detection methods in specifying the key parameter values, such as the density and distance thresholds. An innovative optimization approach is also proposed to optimize the grid-based space partitioning, which is a critical step of FRIOD. Such optimization fully considers the high-quality outliers it detects with the aid of human interaction. The experimental evaluation demonstrates that FRIOD can improve the quality of the detected outliers and make the detection process more intuitive, effective, and efficient

    DQTunePipe: a Set of Python Tools for LIGO Detector Characterization

    Get PDF
    When LIGO\u27s interferometers are in operation, many auxiliary data channels monitor and record the state of the instruments and surrounding environmental conditions. Analyzing these channels allows LIGO scientists to evaluate the quality of the data collected and veto data segments of poor quality. A set of scripts were built up in an ad hoc fashion, sometimes with limited documentation, to assist in this analysis. In this thesis, we present DQTunePipe , a set of Python modules to replace these scripts and aid in the detector characterization of the LIGO instruments. The use of Python makes the analysis method more compatible with existing LIGO tools. DQTunePipe improves data quality analysis by allowing users to select specific detector characterization tasks as well as providing a maintainable framework upon which additional modules may be built. The nature of the Python DQTunePipe code allows the addition of new features with great simplicity. This thesis details the structure of DQTunePipe, serves as its documentation at the time of this writing, and outlines the procedures for incorporating new features

    A Survey on IT-Techniques for a Dynamic Emergency Management in Large Infrastructures

    Get PDF
    This deliverable is a survey on the IT techniques that are relevant to the three use cases of the project EMILI. It describes the state-of-the-art in four complementary IT areas: Data cleansing, supervisory control and data acquisition, wireless sensor networks and complex event processing. Even though the deliverable’s authors have tried to avoid a too technical language and have tried to explain every concept referred to, the deliverable might seem rather technical to readers so far little familiar with the techniques it describes

    Correlation-based methods for data cleaning, with application to biological databases

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Analysis of the NIST database towards the composition of vulnerabilities in attack scenarios

    Get PDF
    The composition of vulnerabilities in attack scenarios has been traditionally performed based on detailed pre- and post-conditions. Although very precise, this approach is dependent on human analysis, is time consuming, and not at all scalable. We investigate the NIST National Vulnerability Database (NVD) with three goals: (i) understand the associations among vulnerability attributes related to impact, exploitability, privilege, type of vulnerability and clues derived from plaintext descriptions, (ii) validate our initial composition model which is based on required access and resulting effect, and (iii) investigate the maturity of XML database technology for performing statistical analyses like this directly on the XML data. In this report, we analyse 27,273 vulnerability entries (CVE 1) from the NVD. Using only nominal information, we are able to e.g. identify clusters in the class of vulnerabilities with no privilege which represent 52% of the entries

    Empowering Patient Similarity Networks through Innovative Data-Quality-Aware Federated Profiling

    Get PDF
    Continuous monitoring of patients involves collecting and analyzing sensory data from a multitude of sources. To overcome communication overhead, ensure data privacy and security, reduce data loss, and maintain efficient resource usage, the processing and analytics are moved close to where the data are located (e.g., the edge). However, data quality (DQ) can be degraded because of imprecise or malfunctioning sensors, dynamic changes in the environment, transmission failures, or delays. Therefore, it is crucial to keep an eye on data quality and spot problems as quickly as possible, so that they do not mislead clinical judgments and lead to the wrong course of action. In this article, a novel approach called federated data quality profiling (FDQP) is proposed to assess the quality of the data at the edge. FDQP is inspired by federated learning (FL) and serves as a condensed document or a guide for node data quality assurance. The FDQP formal model is developed to capture the quality dimensions specified in the data quality profile (DQP). The proposed approach uses federated feature selection to improve classifier precision and rank features based on criteria such as feature value, outlier percentage, and missing data percentage. Extensive experimentation using a fetal dataset split into different edge nodes and a set of scenarios were carefully chosen to evaluate the proposed FDQP model. The results of the experiments demonstrated that the proposed FDQP approach positively improved the DQ, and thus, impacted the accuracy of the federated patient similarity network (FPSN)-based machine learning models. The proposed data-quality-aware federated PSN architecture leveraging FDQP model with data collected from edge nodes can effectively improve the data quality and accuracy of the federated patient similarity network (FPSN)-based machine learning models. Our profiling algorithm used lightweight profile exchange instead of full data processing at the edge, which resulted in optimal data quality achievement, thus improving efficiency. Overall, FDQP is an effective method for assessing data quality in the edge computing environment, and we believe that the proposed approach can be applied to other scenarios beyond patient monitoring
    corecore