1 research outputs found

    Minimizing User Effort in Large Scale Example-driven Data Exploration

    Get PDF
    Data Exploration is a key ingredient in a widely diverse set of discovery-oriented applications, including scientific computing, financial analysis, and evidence-based medicine. It refers to a series of exploratory tasks that aim to extract useful pieces of knowledge from data, and its challenge is to do so without requiring the user to specify with precision what information is being searched for. The goal of assisting users in constructing their exploratory queries effortlessly, which effectively reveals interesting data objects, has led to the development of a variety of intelligent semi-automatic approaches. Among such approaches, Example-driven Exploration is rapidly becoming an attractive choice for exploratory query formulation since it attempts to minimize the amount of prior knowledge required from the user to form an accurate exploratory query. In particular, this dissertation focuses on interactive Example-driven Exploration, which steers the user towards discovering all data objects relevant to the users’ exploration based on their feedback on a small set of examples. Interactive Example-driven Exploration is especially beneficial for non-expert users, as it enables them to circumvent query languages by assigning relevancy to examples as a proxy for the intended exploratory analysis. However, existing interactive Example-driven Exploration systems fall short of supporting the need to perform complex explorations over large, unstructured high-dimensional data. To overcome these challenges, we have developed new methods of data reduction, example selection, data indexing, and result refinement that support practical, interactive data exploration. The novelty of our approach is anchored on leveraging active learning and query optimization techniques that strike a balance between maximizing accuracy and minimizing user effort in providing feedback while enabling interactive performance for exploration tasks with arbitrary, large-sized datasets. Furthermore, it extends the exploration beyond the structured data by supporting a variety of high-dimensional unstructured data and enables the refinement of results when the exploration task is associated with too many relevant data objects that could be overwhelming to the user. To affirm the effectiveness of our proposed models, techniques, and algorithms, we implemented multiple prototype systems and evaluated them using real datasets. Some of them were also used in domain-specific analytics tools
    corecore