Search CORE

713 research outputs found

Recommended from our members

Using domain specific language and sequence to sequence models as a hybrid framework for a natural language interface to a database solution

Author: Skeggs Richard
Publication venue: Brunel University London
Publication date: 01/01/2023
Field of study

This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe aim of this project is to provide a new approach to solving the problem of converting natural language into a language capable of querying a database or data repository. This problem has been around for a while, in the 1970's the US Navy developed a solution called LADDER and since then there have been an array of solutions, approaches and tweaks that have kept the research community busy. The introduction of electronic assistants into the smart phone in 2010 has given new impetus to this problem. With the increasingly pervasive nature of data and its ever expanding use to answer questions within business science, medicine extracting data is becoming more important. The idea behind this project is to make data more democratised by allowing access to it without the need for specialist languages. The performance and reliability of converting natural language into structured query language can be problematic in handling nuances that are prevalent in natural language. Relational databases are not designed to understand language nuance. This project introduces the following components as part of a holistic approach to improving the conversion of a natural language statement into a language capable of querying a data repository. ● The idea proposed in this project combines the use of sequence to sequence models in conjunction with the natural language part of speech technologies and domain specific languages to convert natural language queries into SQL. The approach being proposed by this chapter is to use natural language processing to perform an initial shallow pass of the incoming query and then use Google's Tensor Flow to refine the query with the use of a sequence to sequence model. ● This thesis is also proposing to use a Domain Specific Language (DSL) as part of the conversion process. The use of the DSL has the potential to allow the natural language query to be translated into more than just an SQL statement, but any query language such as NoSQL or XQuery

Brunel University Research Archive

The Power and Potentials of Flexible Query Answering Systems:A Critical and Comprehensive Analysis

Author: Andreasen Troels
Bordogna Gloria
De Tré Guy
Kacprzyk Janusz
Larsen Henrik Legind
Zadrozny Slawomir
Publication venue: SSRN
Publication date: 01/12/2022
Field of study

Roskilde Universitet

Natural Language Processing on Data Warehouses

Author: Maree Stiaan
Publication venue: Department of Statistical Sciences
Publication date: 27/10/2022
Field of study

The main problem addressed in this research was to use natural language to query data in a data warehouse. To this effect, two natural language processing models were developed and compared on a classic star-schema sales data warehouse with sales facts and date, location and item dimensions. Utterances are queries that people make with natural language, for example, What is the sales value for mountain bikes in Georgia for 1 July 2005?" The first model, the heuristics model, implemented an algorithm that steps through the sequence of utterance words and matches the longest number of consecutive words at the highest grain of the hierarchy. In contrast, the embedding model implemented the word2vec algorithm to create different kinds of vectors from the data warehouse. These vectors are aggregated and then the cosine similarity between vectors was used to identify concepts in the utterances that can be converted to a programming language. To understand question style, a survey was set up which then helped shape random utterances created to use for the evaluation of both methods. The first key insight and main premise for the embedding model to work is a three-step process of creating three types of vectors. The first step is to train vectors (word vectors) for each individual word in the data warehouse; this is called word embeddings. For instance, the word `bike' will have a vector. The next step is when the word vectors are averaged for each unique column value (column vectors) in the data warehouse, thus leaving an entry like `mountain bike' with one vector which is the average of the vectors for `mountain' and `bike'. Lastly, the utterance by the user is averaged (utterance vectors) by using the word vectors created in step one, and then, by using cosine similarity, the utterance vector is matched to the closest column vectors in order to identify data warehouse concepts in the utterance. The second key insight was to train word vectors firstly for location, then separately for item - in other words, per dimension (one set for location, and one set for item). Removing stop words was the third key insight, and the last key insight was to use Global Vectors to instantiate the training of the word vectors. The results of the evaluation of the models indicated that the embedding model was ten times faster than the heuristics model. In terms of accuracy, the embedding algorithm (95.6% accurate) also outperformed the heuristics model (70.1% accurate). The practical application of the research is that these models can be used as a component in a chatbot on data warehouses. Combined with a Structured Query Language query generation component, and building Application Programming Interfaces on top of it, this facilitates the quick and easy distribution of data; no knowledge of a programming language such as Structured Query Language is needed to query the data

Cape Town University OpenUCT

Results of the seventh edition of the BioASQ Challenge

Author: A Kosmopoulos
AR Aronson
B Müller
C Gormley
CH Wei
D Dimitriadis
D Ferrucci
G Tsatsaronis
J Demsar
K Liu
S Peng
S Reddy
SE Robertson
T Nunes
W Yin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/06/2020
Field of study

The results of the seventh edition of the BioASQ challenge are presented in this paper. The aim of the BioASQ challenge is the promotion of systems and methodologies through the organization of a challenge on the tasks of large-scale biomedical semantic indexing and question answering. In total, 30 teams with more than 100 systems participated in the challenge this year. As in previous years, the best systems were able to outperform the strong baselines. This suggests that state-of-the-art systems are continuously improving, pushing the frontier of research.Comment: 17 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Proceedings TLAD 2012:10th International Workshop on the Teaching, Learning and Assessment of Databases

Author
Publication venue
Publication date: 01/01/2012
Field of study

This is the tenth in the series of highly successful international workshops on the Teaching, Learning and Assessment of Databases (TLAD 2012). TLAD 2012 is held on the 9th July at the University of Hertfordshire and hopes to be just as successful as its predecessors. The teaching of databases is central to all Computing Science, Software Engineering, Information Systems and Information Technology courses, and this year, the workshop aims to continue the tradition of bringing together both database teachers and researchers, in order to share good learning, teaching and assessment practice and experience, and further the growing community amongst database academics. As well as attracting academics and teachers from the UK community, the workshop has also been successful in attracting academics from the wider international community, through serving on the programme committee, and attending and presenting papers. Due to the healthy number of high quality submissions this year, the workshop will present eight peer reviewed papers. Of these, six will be presented as full papers and two as short papers. These papers cover a number of themes, including: the teaching of data mining and data warehousing, SQL and NoSQL, databases at school, and database curricula themselves. The final paper will give a timely ten-year review of TLAD workshops, and it is expected that these papers will lead to a stimulating closing discussion, which will continue beyond the workshop. We also look forward to a keynote presentation by Karen Fraser, who has contributed to many TLAD workshops as the HEA organizer. Titled “An Effective Higher Education Academy”, the keynote will discuss the Academy’s plans for the future and outline how participants can get involved

Abertay Research Portal

Proceedings TLAD 2012:10th International Workshop on the Teaching, Learning and Assessment of Databases

Author
Publication venue
Publication date: 01/01/2012
Field of study

Abertay Research Portal

Clinical Decision Support System for Unani Medicine Practitioners

Author: Fatima Noor
Mahmood Hafiza Farwa
Nadeem Marriyam
Sultan Haider
Waheed Talha
Publication venue
Publication date: 24/10/2023
Field of study

Like other fields of Traditional Medicines, Unani Medicines have been found as an effective medical practice for ages. It is still widely used in the subcontinent, particularly in Pakistan and India. However, Unani Medicines Practitioners are lacking modern IT applications in their everyday clinical practices. An Online Clinical Decision Support System may address this challenge to assist apprentice Unani Medicines practitioners in their diagnostic processes. The proposed system provides a web-based interface to enter the patient's symptoms, which are then automatically analyzed by our system to generate a list of probable diseases. The system allows practitioners to choose the most likely disease and inform patients about the associated treatment options remotely. The system consists of three modules: an Online Clinical Decision Support System, an Artificial Intelligence Inference Engine, and a comprehensive Unani Medicines Database. The system employs advanced AI techniques such as Decision Trees, Deep Learning, and Natural Language Processing. For system development, the project team used a technology stack that includes React, FastAPI, and MySQL. Data and functionality of the application is exposed using APIs for integration and extension with similar domain applications. The novelty of the project is that it addresses the challenge of diagnosing diseases accurately and efficiently in the context of Unani Medicines principles. By leveraging the power of technology, the proposed Clinical Decision Support System has the potential to ease access to healthcare services and information, reduce cost, boost practitioner and patient satisfaction, improve speed and accuracy of the diagnostic process, and provide effective treatments remotely. The application will be useful for Unani Medicines Practitioners, Patients, Government Drug Regulators, Software Developers, and Medical Researchers.Comment: 59 pages, 11 figures, Computer Science Bachelor's Thesis on use of Artificial Intelligence in Clinical Decision Support System for Unani Medicine

arXiv.org e-Print Archive

Graphical Database Architecture For Clinical Trials

Author: Baset Aiman G.
Publication venue: Aggie Digital Collections and Scholarship
Publication date: 01/01/2015
Field of study

The general area of the research is Health Informatics. The research focuses on creating an innovative and novel solution to manage and analyze clinical trials data. It constructs a Graphical Database Architecture (GDA) for Clinical Trials (CT) using New Technology for Java (Neo4j) as a robust, a scalable and a high-performance database. The purpose of the research project is to develop concepts and techniques based on architecture to accelerate the processing time of clinical data navigation at lower cost. The research design uses a positivist approach to empirical research. The research is significant because it proposes a new approach of clinical trials through graph theory and designs a responsive structure of clinical data that can be deployed across all the health informatics landscape. It uniquely contributes to scholarly literature of the phenomena of Not only SQL (NoSQL) graph databases, mainly Neo4j in CT, for future research of clinical informatics. A prototype is created and examined to validate the concepts, taking advantage of Neo4jâ€™s high availability, scalability, and powerful graph query language (Cypher). This research study finds that integration of search methodologies and information retrieval with the graphical database provides a solid starting point to manage, query, and analyze the clinical trials data, furthermore the design and the development of a prototype demonstrate the conceptual model of this study. Likewise the proposed clinical trials ontology (CTO) incorporates all data elements of a standard clinical study which facilitate a heuristic overview of treatments, interventions, and outcome results of these studies

North Carolina Agricultural and Technical State University: NC A&T SU Bluford Library's Aggie Digital Collections and Scholarship