Search CORE

37,052 research outputs found

FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection

Author: Chang Liang
Fournier-Viger Philippe
Li Hongzhou
Lin Jerry Chun-Wei
Zhang Ji
Zhu Xiaodong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/11/2017
Field of study

In this paper, we propose an novel interactive outlier detection system called feature-rich interactive outlier detection (FRIOD), which features a deep integration of human interaction to improve detection performance and greatly streamline the detection process. A user-friendly interactive mechanism is developed to allow easy and intuitive user interaction in all the major stages of the underlying outlier detection algorithm which includes dense cell selection, location-aware distance thresholding, and final top outlier validation. By doing so, we can mitigate the major difficulty of the competitive outlier detection methods in specifying the key parameter values, such as the density and distance thresholds. An innovative optimization approach is also proposed to optimize the grid-based space partitioning, which is a critical step of FRIOD. Such optimization fully considers the high-quality outliers it detects with the aid of human interaction. The experimental evaluation demonstrates that FRIOD can improve the quality of the detected outliers and make the detection process more intuitive, effective, and efficient

University of Southern Queensland ePrints

FlashProfile: A Framework for Synthesizing Data Profiles

Author: Gulwani Sumit
Jain Prateek
Millstein Todd
Padhi Saswat
Perelman Daniel
Polozov Oleksandr
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/10/2018
Field of study

We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios. Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters. Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across

153

tasks over

75

large real datasets, we observe a median profiling time of only

\sim\,0.7\,

s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201

arXiv.org e-Print Archive

eScholarship - University of California

Direct measurement of tree height provides different results on the assessment of LiDAR accuracy

Author
Publication venue: 'MDPI AG'
Publication date: 01/04/2017
Field of study

open8noopenSibona, Emanuele; Vitali, Alessandro; Meloni, Fabio; Caffo, Lucia; Dotta, Alberto; Lingua, Emanuele; Motta, Renzo; Garbarino, MatteoSibona, Emanuele; Vitali, Alessandro; Meloni, Fabio; Caffo, Lucia; Dotta, Alberto; Lingua, Emanuele; Motta, Renzo; Garbarino, Matte

Archivio istituzionale della ricerca - Università di Padova