31 research outputs found
Automatic Table Extension with Open Data
With thousands of data sources available on the web as well as within organisations, data scientists increasingly spend more time searching for data than analysing it. To ease the task of find and integrating relevant data for data mining projects, this dissertation presents two new methods for automatic table extension. Automatic table extension systems take over the task of tata discovery and data integration by adding new columns with new information (new attributes) to any table. The data values in the new columns are extracted from a given corpus of tables
On-the-fly Table Generation
Many information needs revolve around entities, which would be better
answered by summarizing results in a tabular format, rather than presenting
them as a ranked list. Unlike previous work, which is limited to retrieving
existing tables, we aim to answer queries by automatically compiling a table in
response to a query. We introduce and address the task of on-the-fly table
generation: given a query, generate a relational table that contains relevant
entities (as rows) along with their key properties (as columns). This problem
is decomposed into three specific subtasks: (i) core column entity ranking,
(ii) schema determination, and (iii) value lookup. We employ a feature-based
approach for entity ranking and schema determination, combining deep semantic
features with task-specific signals. We further show that these two subtasks
are not independent of each other and can assist each other in an iterative
manner. For value lookup, we combine information from existing tables and a
knowledge base. Using two sets of entity-oriented queries, we evaluate our
approach both on the component level and on the end-to-end table generation
task.Comment: The 41st International ACM SIGIR Conference on Research and
Development in Information Retrieva
Dataset search: a survey
Generating value from data requires the ability to find, access and make
sense of datasets. There are many efforts underway to encourage data sharing
and reuse, from scientific publishers asking authors to submit data alongside
manuscripts to data marketplaces, open data portals and data communities.
Google recently beta released a search service for datasets, which allows users
to discover data stored in various online repositories via keyword queries.
These developments foreshadow an emerging research field around dataset search
or retrieval that broadly encompasses frameworks, methods and tools that help
match a user data need against a collection of datasets. Here, we survey the
state of the art of research and commercial systems in dataset retrieval. We
identify what makes dataset search a research field in its own right, with
unique challenges and methods and highlight open problems. We look at
approaches and implementations from related areas dataset search is drawing
upon, including information retrieval, databases, entity-centric and tabular
search in order to identify possible paths to resolve these open problems as
well as immediate next steps that will take the field forward.Comment: 20 pages, 153 reference