28 research outputs found

    Mining Based Natural Language to Database Interface

    Get PDF
    Data/Information plays an important role in our daily life. This data can be generated from many sources like Hospitals, Organizations, and Educational Institutions etc. These data need to be managed and stored in a database. The database is the main source of information. To access, store and manipulate the data stored in a database is a critical task. This requires the knowledge of high level database languages like SQL, where the user writes the high level query to retrieve data from database. But this creates a complex problem for normal users, who are not aware of database languages. To minimize this complexity, NLQP (Natural Language Query System) is designed. This system provides an interface for the end users to write the query in natural language such as English and obtains the result back in Natural Language. The query written in natural language will be converted to SQL like queries by the NLQP system and the required results will be fetched from the database for the user. The main goal of NLQP system is to provide user friendly communication between the end user and the computer from where the data is to be fetched

    Resource description framework triples entity formations using statistical language model

    Get PDF
    A method in formatting unstructured sentences from the source corpus to a specificknowledge representation such as RDF is needed. A method for RDF entity formations from aparagraph of text using statistical language model based on N-gram is introduced. Theimplementation of RDF entity formation is applied on natural language query for informationretrieval of the Islamic knowledge. 300 concepts from the English translation of Holy Quranwith 350 relationships are used as a knowledge base. We evaluate our approach on collectionof queries from the Islamic Research Foundation website with a total, 82 queries and comparethe performance against previous method used in FREyA. The result shown the proposedmethod improved 17.07% on the accuracy of the natural language formulation analysis, whichtested on search strategy. It shows the increment on recall and precision with 7% and 3%.Keywords: semantic web; N-gram; ontology; statistical mode

    CONFSYS2: a redesigned web-based multi-conference management system

    Get PDF
    This thesis presents the design and implementation of ConfSys2, an advanced redesign of ConfSys system, with additional features to help conference/journal organizers to manage the processes of academic conferences/journal, and to provide related services for author and conference participants. The ConfSys2 uses the same Tomcat - Java Servlet/JSP - MySQL platform as ConfSys, but redesigned database structure, software framework, and user interfaces. The features of ConfSys have been retained and improved both in user friendliness and efficiency in ConfSys2, and the lessons learnt from developing and using ConfSys have been incorporated. ConfSys2 not only implements better user interface for ConfSys's useful functions, such as automatically/manually allocating paper to reviewers, debating and rating paper, but also introduces new concepts, such as conference/journal series management, user-group-function management, and smart daemon in conference management to improve data sharing, reduce repetitive work and make management work more flexible. Furthermore, the redesign of databases and software framework provides clear structure and flexibility, thus making the maintenance and expanding of the software system easier

    IDEAS-1997-2021-Final-Programs

    Get PDF
    This document records the final program for each of the 26 meetings of the International Database and Engineering Application Symposium from 1997 through 2021. These meetings were organized in various locations on three continents. Most of the papers published during these years are in the digital libraries of IEEE(1997-2007) or ACM(2008-2021)

    An authoring tool for decision support systems in context questions of ecological knowledge

    Get PDF
    Decision support systems (DSS) support business or organizational decision-making activities, which require the access to information that is internally stored in databases or data warehouses, and externally in the Web accessed by Information Retrieval (IR) or Question Answering (QA) systems. Graphical interfaces to query these sources of information ease to constrain dynamically query formulation based on user selections, but they present a lack of flexibility in query formulation, since the expressivity power is reduced to the user interface design. Natural language interfaces (NLI) are expected as the optimal solution. However, especially for non-expert users, a real natural communication is the most difficult to realize effectively. In this paper, we propose an NLI that improves the interaction between the user and the DSS by means of referencing previous questions or their answers (i.e. anaphora such as the pronoun reference in “What traits are affected by them?”), or by eliding parts of the question (i.e. ellipsis such as “And to glume colour?” after the question “Tell me the QTLs related to awn colour in wheat”). Moreover, in order to overcome one of the main problems of NLIs about the difficulty to adapt an NLI to a new domain, our proposal is based on ontologies that are obtained semi-automatically from a framework that allows the integration of internal and external, structured and unstructured information. Therefore, our proposal can interface with databases, data warehouses, QA and IR systems. Because of the high NL ambiguity of the resolution process, our proposal is presented as an authoring tool that helps the user to query efficiently in natural language. Finally, our proposal is tested on a DSS case scenario about Biotechnology and Agriculture, whose knowledge base is the CEREALAB database as internal structured data, and the Web (e.g. PubMed) as external unstructured information.This paper has been partially supported by the MESOLAP (TIN2010-14860), GEODAS-BI (TIN2012-37493-C03-03), LEGOLANGUAGE (TIN2012-31224) and DIIM2.0 (PROMETEOII/2014/001) projects from the Spanish Ministry of Education and Competitivity. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)

    DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction

    Full text link
    We study the problem of decomposing a complex text-to-sql task into smaller sub-tasks and how such a decomposition can significantly improve the performance of Large Language Models (LLMs) in the reasoning process. There is currently a significant gap between the performance of fine-tuned models and prompting approaches using LLMs on challenging text-to-sql datasets such as Spider. We show that SQL queries, despite their declarative structure, can be broken down into sub-problems and the solutions of those sub-problems can be fed into LLMs to significantly improve their performance. Our experiments with three LLMs show that this approach consistently improves their performance by roughly 10%, pushing the accuracy of LLMs towards state-of-the-art, and even beating large fine-tuned models on the holdout Spider dataset

    Aneesah: a novel methodology and algorithms for sustained dialogues and query refinement in natural language interfaces to databases

    Get PDF
    This thesis presents the research undertaken to develop a novel approach towards the development of a text-based Conversational Natural Language Interface to Databases, known as ANEESAH. Natural Language Interfaces to Databases (NLIDBs) are computer applications, which replace the requirement for an end user to commission a skilled programmer to query a database by using natural language. The aim of the proposed research is to investigate the use of a Natural Language Interface to Database (NLIDB) capable of conversing with users to automate the query formulation process for database information retrieval. Historical challenges and limitations have prevented the wider use of NLIDB applications in real-life environments. The challenges relevant to the scope of proposed research include the absence of flexible conversation between NLIDB applications and users, automated database query building from multiple dialogues and flexibility to sustain dialogues for information refinement. The areas of research explored include; NLIDBs, conversational agents (CAs), natural language processing (NLP) techniques, artificial intelligence (AI), knowledge engineering, and relational databases. Current NLIDBs do not have conversational abilities to sustain dialogues, especially with regards to information required for dynamic query formulation. A novel approach, ANEESAH is introduced to deal with these challenges. ANEESAH was developed to allow users to communicate using natural language to retrieve information from a relational database. ANEESAH can interact with the users conversationally and sustain dialogues to automate the query formulation and information refinement process. The research and development of ANEESAH steered the engineering of several novel NLIDB components such as a CA implemented NLIDB framework, a rule-based CA that combines pattern matching and sentence similarity techniques, algorithms to engage users in conversation and support sustained dialogues for information refinement. Additional components of the proposed framework include a novel SQL query engine for the dynamic formulation of queries to extract database information and perform querying the query operations to support the information refinement. Furthermore, a generic evaluation methodology combining subjective and objective measures was introduced to evaluate the implemented conversational NLIDB framework. Empirical end user evaluation was also used to validate the components of the implemented framework. The evaluation results demonstrated ANEESAH produced the desired database information for users over a set of test scenarios. The evaluation results also revealed that the proposed framework components can overcome the challenges of sustaining dialogues, information refinement and querying the query operations

    Feasibility of using citations as document summaries

    Get PDF
    The purpose of this research is to establish whether it is feasible to use citations as document summaries. People are good at creating and selecting summaries and are generally the standard for evaluating computer generated summaries. Citations can be characterized as concept symbols or short summaries of the document they are citing. Similarity metrics have been used in retrieval and text summarization to determine how alike two documents are. Similarity metrics have never been compared to what human subjects think are similar between two documents. If similarity metrics reflect human judgment, then we can mechanize the selection of citations that act as short summaries of the document they are citing. The research approach was to gather rater data comparing document abstracts to citations about the same document and then to statistically compare those results to several document metrics; frequency count, similarity metric, citation location and type of citation. There were two groups of raters, subject experts and non-experts. Both groups of raters were asked to evaluate seven parameters between abstract and citations: purpose, subject matter, methods, conclusions, findings, implications, readability, andunderstandability. The rater was to identify how strongly the citation represented the content of the abstract, on a five point likert scale. Document metrics were collected for frequency count, cosine, and similarity metric between abstracts and associated citations. In addition, data was collected on the location of the citations and the type of citation. Location was identified and dummy coded for introduction, method, discussion, review of the literature and conclusion. Citations were categorized and dummy coded for whether they refuted, noted, supported, reviewed, or applied information about the cited document. The results show there is a relationship between some similarity metrics and human judgment of similarity.Ph.D., Information Studies -- Drexel University, 200

    Metadatos y recuperación de información: estándares, problemas y aplicabilidad en bibliotecas digitales

    Get PDF
    Programa de Doctorado en DocumentaciónPresidente: Mercedes Caridad Sebastián. - Secretario: Antonio Hernández Pérez. - Vocales: José Carlos Rovira Soler, Eulalia Fuentes i Pujol, José Antonio Gómez Hernánde
    corecore