713 research outputs found
Recommended from our members
Using domain specific language and sequence to sequence models as a hybrid framework for a natural language interface to a database solution
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe aim of this project is to provide a new approach to solving the problem of
converting natural language into a language capable of querying a database or data
repository. This problem has been around for a while, in the 1970's the US Navy
developed a solution called LADDER and since then there have been an array of
solutions, approaches and tweaks that have kept the research community busy. The
introduction of electronic assistants into the smart phone in 2010 has given new
impetus to this problem.
With the increasingly pervasive nature of data and its ever expanding use to answer
questions within business science, medicine extracting data is becoming more important.
The idea behind this project is to make data more democratised by allowing access to it
without the need for specialist languages. The performance and reliability of converting
natural language into structured query language can be problematic in handling nuances
that are prevalent in natural language. Relational databases are not designed to understand
language nuance.
This project introduces the following components as part of a holistic approach to improving
the conversion of a natural language statement into a language capable of querying a data
repository.
â—Ź The idea proposed in this project combines the use of sequence to sequence models
in conjunction with the natural language part of speech technologies and domain
specific languages to convert natural language queries into SQL. The approach
being proposed by this chapter is to use natural language processing to perform an
initial shallow pass of the incoming query and then use Google's Tensor Flow to
refine the query with the use of a sequence to sequence model.
â—Ź This thesis is also proposing to use a Domain Specific Language (DSL) as part of the
conversion process. The use of the DSL has the potential to allow the natural
language query to be translated into more than just an SQL statement, but any query
language such as NoSQL or XQuery
Natural Language Processing on Data Warehouses
The main problem addressed in this research was to use natural language to query data in a data warehouse. To this effect, two natural language processing models were developed and compared on a classic star-schema sales data warehouse with sales facts and date, location and item dimensions. Utterances are queries that people make with natural language, for example, What is the sales value for mountain bikes in Georgia for 1 July 2005?" The first model, the heuristics model, implemented an algorithm that steps through the sequence of utterance words and matches the longest number of consecutive words at the highest grain of the hierarchy. In contrast, the embedding model implemented the word2vec algorithm to create different kinds of vectors from the data warehouse. These vectors are aggregated and then the cosine similarity between vectors was used to identify concepts in the utterances that can be converted to a programming language. To understand question style, a survey was set up which then helped shape random utterances created to use for the evaluation of both methods. The first key insight and main premise for the embedding model to work is a three-step process of creating three types of vectors. The first step is to train vectors (word vectors) for each individual word in the data warehouse; this is called word embeddings. For instance, the word `bike' will have a vector. The next step is when the word vectors are averaged for each unique column value (column vectors) in the data warehouse, thus leaving an entry like `mountain bike' with one vector which is the average of the vectors for `mountain' and `bike'. Lastly, the utterance by the user is averaged (utterance vectors) by using the word vectors created in step one, and then, by using cosine similarity, the utterance vector is matched to the closest column vectors in order to identify data warehouse concepts in the utterance. The second key insight was to train word vectors firstly for location, then separately for item - in other words, per dimension (one set for location, and one set for item). Removing stop words was the third key insight, and the last key insight was to use Global Vectors to instantiate the training of the word vectors. The results of the evaluation of the models indicated that the embedding model was ten times faster than the heuristics model. In terms of accuracy, the embedding algorithm (95.6% accurate) also outperformed the heuristics model (70.1% accurate). The practical application of the research is that these models can be used as a component in a chatbot on data warehouses. Combined with a Structured Query Language query generation component, and building Application Programming Interfaces on top of it, this facilitates the quick and easy distribution of data; no knowledge of a programming language such as Structured Query Language is needed to query the data
Results of the seventh edition of the BioASQ Challenge
The results of the seventh edition of the BioASQ challenge are presented in
this paper. The aim of the BioASQ challenge is the promotion of systems and
methodologies through the organization of a challenge on the tasks of
large-scale biomedical semantic indexing and question answering. In total, 30
teams with more than 100 systems participated in the challenge this year. As in
previous years, the best systems were able to outperform the strong baselines.
This suggests that state-of-the-art systems are continuously improving, pushing
the frontier of research.Comment: 17 pages, 2 figure
Proceedings TLAD 2012:10th International Workshop on the Teaching, Learning and Assessment of Databases
This is the tenth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2012). TLAD 2012 is held on the 9th July at the University of Hertfordshire and hopes to be just as successful as its predecessors. The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics and teachers from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers. Due to the healthy number of high quality submissions this year, the workshop will present eight peer reviewed papers. Of these, six will be presented as full papers and two as short papers. These papers cover a number of themes, including: the teaching of data mining and data warehousing, SQL and NoSQL, databases at school, and database curricula themselves. The final paper will give a timely ten-year review of TLAD workshops, and it is expected that these papers will lead to a stimulating closing discussion, which will continue beyond the workshop. We also look forward to a keynote presentation by Karen Fraser, who has contributed to many TLAD workshops as the HEA organizer. Titled “An Effective Higher Education Academy”, the keynote will discuss the Academy’s plans for the future and outline how participants can get involved
Proceedings TLAD 2012:10th International Workshop on the Teaching, Learning and Assessment of Databases
This is the tenth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2012). TLAD 2012 is held on the 9th July at the University of Hertfordshire and hopes to be just as successful as its predecessors. The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics and teachers from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers. Due to the healthy number of high quality submissions this year, the workshop will present eight peer reviewed papers. Of these, six will be presented as full papers and two as short papers. These papers cover a number of themes, including: the teaching of data mining and data warehousing, SQL and NoSQL, databases at school, and database curricula themselves. The final paper will give a timely ten-year review of TLAD workshops, and it is expected that these papers will lead to a stimulating closing discussion, which will continue beyond the workshop. We also look forward to a keynote presentation by Karen Fraser, who has contributed to many TLAD workshops as the HEA organizer. Titled “An Effective Higher Education Academy”, the keynote will discuss the Academy’s plans for the future and outline how participants can get involved
Clinical Decision Support System for Unani Medicine Practitioners
Like other fields of Traditional Medicines, Unani Medicines have been found
as an effective medical practice for ages. It is still widely used in the
subcontinent, particularly in Pakistan and India. However, Unani Medicines
Practitioners are lacking modern IT applications in their everyday clinical
practices. An Online Clinical Decision Support System may address this
challenge to assist apprentice Unani Medicines practitioners in their
diagnostic processes. The proposed system provides a web-based interface to
enter the patient's symptoms, which are then automatically analyzed by our
system to generate a list of probable diseases. The system allows practitioners
to choose the most likely disease and inform patients about the associated
treatment options remotely. The system consists of three modules: an Online
Clinical Decision Support System, an Artificial Intelligence Inference Engine,
and a comprehensive Unani Medicines Database. The system employs advanced AI
techniques such as Decision Trees, Deep Learning, and Natural Language
Processing. For system development, the project team used a technology stack
that includes React, FastAPI, and MySQL. Data and functionality of the
application is exposed using APIs for integration and extension with similar
domain applications. The novelty of the project is that it addresses the
challenge of diagnosing diseases accurately and efficiently in the context of
Unani Medicines principles. By leveraging the power of technology, the proposed
Clinical Decision Support System has the potential to ease access to healthcare
services and information, reduce cost, boost practitioner and patient
satisfaction, improve speed and accuracy of the diagnostic process, and provide
effective treatments remotely. The application will be useful for Unani
Medicines Practitioners, Patients, Government Drug Regulators, Software
Developers, and Medical Researchers.Comment: 59 pages, 11 figures, Computer Science Bachelor's Thesis on use of
Artificial Intelligence in Clinical Decision Support System for Unani
Medicine
Graphical Database Architecture For Clinical Trials
The general area of the research is Health Informatics. The research focuses on creating an innovative and novel solution to manage and analyze clinical trials data. It constructs a Graphical Database Architecture (GDA) for Clinical Trials (CT) using New Technology for Java (Neo4j) as a robust, a scalable and a high-performance database. The purpose of the research project is to develop concepts and techniques based on architecture to accelerate the processing time of clinical data navigation at lower cost. The research design uses a positivist approach to empirical research. The research is significant because it proposes a new approach of clinical trials through graph theory and designs a responsive structure of clinical data that can be deployed across all the health informatics landscape. It uniquely contributes to scholarly literature of the phenomena of Not only SQL (NoSQL) graph databases, mainly Neo4j in CT, for future research of clinical informatics. A prototype is created and examined to validate the concepts, taking advantage of Neo4j’s high availability, scalability, and powerful graph query language (Cypher). This research study finds that integration of search methodologies and information retrieval with the graphical database provides a solid starting point to manage, query, and analyze the clinical trials data, furthermore the design and the development of a prototype demonstrate the conceptual model of this study. Likewise the proposed clinical trials ontology (CTO) incorporates all data elements of a standard clinical study which facilitate a heuristic overview of treatments, interventions, and outcome results of these studies
- …