Search CORE

1,907 research outputs found

Automatic data cleaning

Author: Zhang J.
Publication venue
Publication date: 17/12/2018
Field of study

Heterogeneous data source integration for smart grid ecosystems based on metadata mining

Author: García Delgado Antonio
Guerrero Alonso Juan Ignacio
León de Mora Carlos
Luque Rodríguez Joaquín
Personal Vázquez Enrique
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

The arrival of new technologies related to smart grids and the resulting ecosystem of applications andmanagement systems pose many new problems. The databases of the traditional grid and the variousinitiatives related to new technologies have given rise to many different management systems with several formats and different architectures. A heterogeneous data source integration system is necessary toupdate these systems for the new smart grid reality. Additionally, it is necessary to take advantage of theinformation smart grids provide. In this paper, the authors propose a heterogeneous data source integration based on IEC standards and metadata mining. Additionally, an automatic data mining framework isapplied to model the integrated information.Ministerio de Economía y Competitividad TEC2013-40767-

idUS. Depósito de Investigación Universidad de Sevilla

FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection

Author: Chang Liang
Fournier-Viger Philippe
Li Hongzhou
Lin Jerry Chun-Wei
Zhang Ji
Zhu Xiaodong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/11/2017
Field of study

In this paper, we propose an novel interactive outlier detection system called feature-rich interactive outlier detection (FRIOD), which features a deep integration of human interaction to improve detection performance and greatly streamline the detection process. A user-friendly interactive mechanism is developed to allow easy and intuitive user interaction in all the major stages of the underlying outlier detection algorithm which includes dense cell selection, location-aware distance thresholding, and final top outlier validation. By doing so, we can mitigate the major difficulty of the competitive outlier detection methods in specifying the key parameter values, such as the density and distance thresholds. An innovative optimization approach is also proposed to optimize the grid-based space partitioning, which is a critical step of FRIOD. Such optimization fully considers the high-quality outliers it detects with the aid of human interaction. The experimental evaluation demonstrates that FRIOD can improve the quality of the detected outliers and make the detection process more intuitive, effective, and efficient

University of Southern Queensland ePrints

DQTunePipe: a Set of Python Tools for LIGO Detector Characterization

Author: Rankins Brooke Anne
Publication venue: eGrove
Publication date: 01/01/2011
Field of study

When LIGO\u27s interferometers are in operation, many auxiliary data channels monitor and record the state of the instruments and surrounding environmental conditions. Analyzing these channels allows LIGO scientists to evaluate the quality of the data collected and veto data segments of poor quality. A set of scripts were built up in an ad hoc fashion, sometimes with limited documentation, to assist in this analysis. In this thesis, we present DQTunePipe , a set of Python modules to replace these scripts and aid in the detector characterization of the LIGO instruments. The use of Python makes the analysis method more compatible with existing LIGO tools. DQTunePipe improves data quality analysis by allowing users to select specific detector characterization tasks as well as providing a maintainable framework upon which additional modules may be built. The nature of the Python DQTunePipe code allows the addition of new features with great simplicity. This thesis details the structure of DQTunePipe, serves as its documentation at the time of this writing, and outlines the procedures for incorporating new features

eGrove (Univ. of Mississippi)

A Survey on IT-Techniques for a Dynamic Emergency Management in Large Infrastructures

Author: Brodt Simon
Bry François
Eckert Michael
Hausmann Steffen
Poppe Olga
Publication venue
Publication date: 30/06/2010
Field of study

This deliverable is a survey on the IT techniques that are relevant to the three use cases of the project EMILI. It describes the state-of-the-art in four complementary IT areas: Data cleansing, supervisory control and data acquisition, wireless sensor networks and complex event processing. Even though the deliverable’s authors have tried to avoid a too technical language and have tried to explain every concept referred to, the deliverable might seem rather technical to readers so far little familiar with the techniques it describes

Open Access LMU

Correlation-based methods for data cleaning, with application to biological databases

Author: KOH LIE YONG
Publication venue
Publication date: 25/09/2007
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

Analysis of the NIST database towards the composition of vulnerabilities in attack scenarios

Author: Franqueira Virginia Nunes Leal
van Keulen Maurice
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2008
Field of study

The composition of vulnerabilities in attack scenarios has been traditionally performed based on detailed pre- and post-conditions. Although very precise, this approach is dependent on human analysis, is time consuming, and not at all scalable. We investigate the NIST National Vulnerability Database (NVD) with three goals: (i) understand the associations among vulnerability attributes related to impact, exploitability, privilege, type of vulnerability and clues derived from plaintext descriptions, (ii) validate our initial composition model which is based on required access and resulting effect, and (iii) investigate the maturity of XML database technology for performing statistical analyses like this directly on the XML data. In this report, we analyse 27,273 vulnerability entries (CVE 1) from the NVD. Using only nominal information, we are able to e.g. identify clusters in the class of vulnerabilities with no privilege which represent 52% of the entries

CLoK

Kent Academic Repository

University of Twente Research Information

Empowering Patient Similarity Networks through Innovative Data-Quality-Aware Federated Profiling

Author: El Kassabi Hadeel T.
Navaz Alramzana Nujum
Serhani Mohamed Adel
Taleb Ikbal
Publication venue: ZU Scholars
Publication date: 01/07/2023
Field of study

Continuous monitoring of patients involves collecting and analyzing sensory data from a multitude of sources. To overcome communication overhead, ensure data privacy and security, reduce data loss, and maintain efficient resource usage, the processing and analytics are moved close to where the data are located (e.g., the edge). However, data quality (DQ) can be degraded because of imprecise or malfunctioning sensors, dynamic changes in the environment, transmission failures, or delays. Therefore, it is crucial to keep an eye on data quality and spot problems as quickly as possible, so that they do not mislead clinical judgments and lead to the wrong course of action. In this article, a novel approach called federated data quality profiling (FDQP) is proposed to assess the quality of the data at the edge. FDQP is inspired by federated learning (FL) and serves as a condensed document or a guide for node data quality assurance. The FDQP formal model is developed to capture the quality dimensions specified in the data quality profile (DQP). The proposed approach uses federated feature selection to improve classifier precision and rank features based on criteria such as feature value, outlier percentage, and missing data percentage. Extensive experimentation using a fetal dataset split into different edge nodes and a set of scenarios were carefully chosen to evaluate the proposed FDQP model. The results of the experiments demonstrated that the proposed FDQP approach positively improved the DQ, and thus, impacted the accuracy of the federated patient similarity network (FPSN)-based machine learning models. The proposed data-quality-aware federated PSN architecture leveraging FDQP model with data collected from edge nodes can effectively improve the data quality and accuracy of the federated patient similarity network (FPSN)-based machine learning models. Our profiling algorithm used lightweight profile exchange instead of full data processing at the edge, which resulted in optimal data quality achievement, thus improving efficiency. Overall, FDQP is an effective method for assessing data quality in the edge computing environment, and we believe that the proposed approach can be applied to other scenarios beyond patient monitoring

ZU Scholars (Zayed University)