Search CORE

8 research outputs found

Scalable diversification for data exploration platforms

Author: Khan Hina Anwar
Publication venue: 'University of Queensland Library'
Publication date: 18/11/2016
Field of study

Minimizing User Effort in Large Scale Example-driven Data Exploration

Author: Ge Xiaoyu
Publication venue
Publication date: 08/10/2021
Field of study

Data Exploration is a key ingredient in a widely diverse set of discovery-oriented applications, including scientific computing, financial analysis, and evidence-based medicine. It refers to a series of exploratory tasks that aim to extract useful pieces of knowledge from data, and its challenge is to do so without requiring the user to specify with precision what information is being searched for. The goal of assisting users in constructing their exploratory queries effortlessly, which effectively reveals interesting data objects, has led to the development of a variety of intelligent semi-automatic approaches. Among such approaches, Example-driven Exploration is rapidly becoming an attractive choice for exploratory query formulation since it attempts to minimize the amount of prior knowledge required from the user to form an accurate exploratory query. In particular, this dissertation focuses on interactive Example-driven Exploration, which steers the user towards discovering all data objects relevant to the users’ exploration based on their feedback on a small set of examples. Interactive Example-driven Exploration is especially beneficial for non-expert users, as it enables them to circumvent query languages by assigning relevancy to examples as a proxy for the intended exploratory analysis. However, existing interactive Example-driven Exploration systems fall short of supporting the need to perform complex explorations over large, unstructured high-dimensional data. To overcome these challenges, we have developed new methods of data reduction, example selection, data indexing, and result refinement that support practical, interactive data exploration. The novelty of our approach is anchored on leveraging active learning and query optimization techniques that strike a balance between maximizing accuracy and minimizing user effort in providing feedback while enabling interactive performance for exploration tasks with arbitrary, large-sized datasets. Furthermore, it extends the exploration beyond the structured data by supporting a variety of high-dimensional unstructured data and enables the refinement of results when the exploration task is associated with too many relevant data objects that could be overwhelming to the user. To affirm the effectiveness of our proposed models, techniques, and algorithms, we implemented multiple prototype systems and evaluated them using real datasets. Some of them were also used in domain-specific analytics tools

D-Scholarship@Pitt

Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries

Author: Ajwani Deepak
Gatterbauer Wolfgang
Riedewald Mirek
Tziavelis Nikolaos
Yang Xiaofeng
Publication venue
Publication date: 11/09/2020
Field of study

We study ranked enumeration of join-query results according to very general orders defined by selective dioids. Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation. To this end, we extend classic algorithms that find the k-shortest paths in a weighted graph. For full conjunctive queries, including cyclic ones, our approach is optimal in terms of the time to return the top result and the delay between results. These optimality properties are derived for the widely used notion of data complexity, which treats query size as a constant. By performing a careful cost analysis, we are able to uncover a previously unknown tradeoff between two incomparable enumeration approaches: one has lower complexity when the number of returned results is small, the other when the number is very large. We theoretically and empirically demonstrate the superiority of our techniques over batch algorithms, which produce the full result and then sort it. Our technique is not only faster for returning the first few results, but on some inputs beats the batch algorithm even when all results are produced.Comment: 50 pages, 19 figure

arXiv.org e-Print Archive

Research Repository UCD

Similarity-aware query refinement for data exploration

Author: Albarrak Abdullah Mohammed
Publication venue: 'University of Queensland Library'
Publication date: 21/05/2018
Field of study

University of Queensland eSpace

Special Topics in Information Technology

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/01/2022
Field of study

This open access book presents thirteen outstanding doctoral dissertations in Information Technology from the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. Information Technology has always been highly interdisciplinary, as many aspects have to be considered in IT systems. The doctoral studies program in IT at Politecnico di Milano emphasizes this interdisciplinary nature, which is becoming more and more important in recent technological advances, in collaborative projects, and in the education of young researchers. Accordingly, the focus of advanced research is on pursuing a rigorous approach to specific research topics starting from a broad background in various areas of Information Technology, especially Computer Science and Engineering, Electronics, Systems and Control, and Telecommunications. Each year, more than 50 PhDs graduate from the program. This book gathers the outcomes of the thirteen best theses defended in 2020-21 and selected for the IT PhD Award. Each of the authors provides a chapter summarizing his/her findings, including an introduction, description of methods, main achievements and future work on the topic. Hence, the book provides a cutting-edge overview of the latest research trends in Information Technology at Politecnico di Milano, presented in an easy-to-read format that will also appeal to non-specialists

Directory of Open Access Books (DOAB)

Special Topics in Information Technology

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

Report on the Second International Workshop on Exploratory Search in Databases and the Web (ExploreDB 2015)

Author: Koutrika Georgia
Lakshmanan Laks V. S.
Riedewald Mirek
Sharaf Mohamed A.
Stefanidis Kostas
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2015
Field of study

The second ExploreDB 2015 workshop intends to bring together researchers and practitioners from different fields, ranging from data management and information retrieval to data visualization and human computer interaction. The workshop program consisted of two keynote talks and six peer-reviewed research papers. The first keynote talk titled 'Explore-By-Example: A New Database Service for Interactive Data Exploration' was given by Prof. Yanlei Diao from the University of Massachusetts at Amherst. Prof. Diao pointed out that while computing power, memory size, and the ability to collect data are growing exponentially, human ability to understand data remains practically flat. In the second keynote, titled 'Principled Optimization Frameworks for Query Reformulation of Database Queries', Prof. Gautam Das from the University of Texas at Arlington focused on solutions for the many-answers and the empty-answers problems. He proposed to address both problems through ranked retrieval. Xiaoyu Ge, Panos Chrysanthis and Alexandros Labrinidis ('Preferential Diversity') explored how to achieve personalization through preferences on result diversity. Diversity was also the focus in 'Diversifying with Few Regrets, But too Few to Mention' by Zaeem Hussain, Hina Khan and Mohamed Sharaf

University of Queensland eSpace