3,584 research outputs found
Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases
A critical challenge in constructing a natural language interface to database
(NLIDB) is bridging the semantic gap between a natural language query (NLQ) and
the underlying data. Two specific ways this challenge exhibits itself is
through keyword mapping and join path inference. Keyword mapping is the task of
mapping individual keywords in the original NLQ to database elements (such as
relations, attributes or values). It is challenging due to the ambiguity in
mapping the user's mental model and diction to the schema definition and
contents of the underlying database. Join path inference is the process of
selecting the relations and join conditions in the FROM clause of the final SQL
query, and is difficult because NLIDB users lack the knowledge of the database
schema or SQL and therefore cannot explicitly specify the intermediate tables
and joins needed to construct a final SQL query. In this paper, we propose
leveraging information from the SQL query log of a database to enhance the
performance of existing NLIDBs with respect to these challenges. We present a
system Templar that can be used to augment existing NLIDBs. Our extensive
experimental evaluation demonstrates the effectiveness of our approach, leading
up to 138% improvement in top-1 accuracy in existing NLIDBs by leveraging SQL
query log information.Comment: Accepted to IEEE International Conference on Data Engineering (ICDE)
201
Applying semantic web technologies to knowledge sharing in aerospace engineering
This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale
Keyword Search in Large-Scale Databases with Topic Cluster Units
To solve the inefficiency of the existing keyword search methods in large databases, this paper proposes TCU-based query, an offline query method based on topic cluster units. First, topic cluster units (TCUs) are constructed through vertical grouping and horizontal grouping on tables and tuples. In contrast to traditional keyword query methods, this offline method cannot only reduce the query response time, but also return results comprising richer and more complete semantic information. In order to further improve the efficiency of data preprocessing, an optimized solution for table join ordering based on the genetic algorithm is presented. Second, we select index terms using the association rule, and then we build an index on every topic cluster; by doing so we can improve the query speed significantly. Finally, we conduct extensive experiments to demonstrate that our approach greatly improves the performance of keyword search
Access to information in digital libraries : users and digital divide
Recognising the importance of information and knowledge in all spheres of human life, the recently held World Summit on Information Society came up with a plan of action for building a global information society. The goal of the world information society initiatives is the same as that of digital library research and development - to make information and knowledge accessibleto everyone in the world. Digital libraries have progressed very rapidly over the past ten or soyears. This paper addresses the two most important aspects of the information society - information users and digital divide. Findings of some large-scale studies on human information behaviour on the web and digital libraries have been discussed. The major findings of a study on access to electronic resources by university students are the presented. Proposed that a one-stop window approach with a task-based information organisation and access system may be the way forward
Simulated evaluation of faceted browsing based on feature selection
In this paper we explore the limitations of facet based browsing which uses sub-needs of an information need for querying and organising the search process in video retrieval. The underlying assumption of this approach is that the search effectiveness will be enhanced if such an approach is employed for interactive video retrieval using textual and visual features. We explore the performance bounds of a faceted system by carrying out a simulated user evaluation on TRECVid data sets, and also on the logs of a prior user experiment with the system. We first present a methodology to reduce the dimensionality of features by selecting the most important ones. Then, we discuss the simulated evaluation strategies employed in our evaluation and the effect on the use of both textual and visual features. Facets created by users are simulated by clustering video shots using textual and visual features. The experimental results of our study demonstrate that the faceted browser can potentially improve the search effectiveness
Web Mining Functions in an Academic Search Application
This paper deals with Web mining and the different categories of Web mining like content, structure and usage mining. The application of Web mining in an academic search application has been discussed. The paper concludes with open problems related to Web mining. The present work can be a useful input to Web users, Web Administrators in a university environment.Database, HITS, IR, NLP, Web mining
Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions
Translating verbose information needs into crisp search queries is a
phenomenon that is ubiquitous but hardly understood. Insights into this process
could be valuable in several applications, including synthesizing large
privacy-friendly query logs from public Web sources which are readily available
to the academic research community. In this work, we take a step towards
understanding query formulation by tapping into the rich potential of community
question answering (CQA) forums. Specifically, we sample natural language (NL)
questions spanning diverse themes from the Stack Exchange platform, and conduct
a large-scale conversion experiment where crowdworkers submit search queries
they would use when looking for equivalent information. We provide a careful
analysis of this data, accounting for possible sources of bias during
conversion, along with insights into user-specific linguistic patterns and
search behaviors. We release a dataset of 7,000 question-query pairs from this
study to facilitate further research on query understanding.Comment: ECIR 2020 Short Pape
- …