3,377 research outputs found
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development
Annotation graphs and annotation servers offer infrastructure to support the
analysis of human language resources in the form of time-series data such as
text, audio and video. This paper outlines areas of common need among empirical
linguists and computational linguists. After reviewing examples of data and
tools used or under development for each of several areas, it proposes a common
framework for future tool development, data annotation and resource sharing
based upon annotation graphs and servers.Comment: 8 pages, 6 figure
Recommended from our members
Using domain specific language and sequence to sequence models as a hybrid framework for a natural language interface to a database solution
This thesis was submitted for the award of Doctor of Philosophy and was awarded by Brunel University LondonThe aim of this project is to provide a new approach to solving the problem of
converting natural language into a language capable of querying a database or data
repository. This problem has been around for a while, in the 1970's the US Navy
developed a solution called LADDER and since then there have been an array of
solutions, approaches and tweaks that have kept the research community busy. The
introduction of electronic assistants into the smart phone in 2010 has given new
impetus to this problem.
With the increasingly pervasive nature of data and its ever expanding use to answer
questions within business science, medicine extracting data is becoming more important.
The idea behind this project is to make data more democratised by allowing access to it
without the need for specialist languages. The performance and reliability of converting
natural language into structured query language can be problematic in handling nuances
that are prevalent in natural language. Relational databases are not designed to understand
language nuance.
This project introduces the following components as part of a holistic approach to improving
the conversion of a natural language statement into a language capable of querying a data
repository.
ā The idea proposed in this project combines the use of sequence to sequence models
in conjunction with the natural language part of speech technologies and domain
specific languages to convert natural language queries into SQL. The approach
being proposed by this chapter is to use natural language processing to perform an
initial shallow pass of the incoming query and then use Google's Tensor Flow to
refine the query with the use of a sequence to sequence model.
ā This thesis is also proposing to use a Domain Specific Language (DSL) as part of the
conversion process. The use of the DSL has the potential to allow the natural
language query to be translated into more than just an SQL statement, but any query
language such as NoSQL or XQuery
Survey on Chatbot Design Techniques in Speech Conversation Systems
Human-Computer Speech is gaining momentum as a technique of computer interaction. There has been a recent upsurge in speech based search engines and assistants such as Siri, Google Chrome and Cortana. Natural Language Processing (NLP) techniques such as NLTK for Python can be applied to analyse speech, and intelligent responses can be found by designing an engine to provide appropriate human like responses. This type of programme is called a Chatbot, which is the focus of this study. This paper presents a survey on the techniques used to design Chatbots and a comparison is made between different design techniques from nine carefully selected papers according to the main methods adopted. These papers are representative of the significant improvements in Chatbots in the last decade. The paper discusses the similarities and differences in the techniques and examines in particular the Loebner prize-winning Chatbots
Some Requests for Machine Learning Research from the East African Tech Scene
Based on 46 in-depth interviews with scientists, engineers, and CEOs, this
document presents a list of concrete machine research problems, progress on
which would directly benefit tech ventures in East Africa.Comment: Presented at NIPS 2018 Workshop on Machine Learning for the
Developing Worl
Anomaly Detection in Multivariate Non-stationary Time Series for Automatic DBMS Diagnosis
Anomaly detection in database management systems (DBMSs) is difficult because
of increasing number of statistics (stat) and event metrics in big data system.
In this paper, I propose an automatic DBMS diagnosis system that detects
anomaly periods with abnormal DB stat metrics and finds causal events in the
periods. Reconstruction error from deep autoencoder and statistical process
control approach are applied to detect time period with anomalies. Related
events are found using time series similarity measures between events and
abnormal stat metrics. After training deep autoencoder with DBMS metric data,
efficacy of anomaly detection is investigated from other DBMSs containing
anomalies. Experiment results show effectiveness of proposed model, especially,
batch temporal normalization layer. Proposed model is used for publishing
automatic DBMS diagnosis reports in order to determine DBMS configuration and
SQL tuning.Comment: 8 page
Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective
This paper presents a Lisp architecture for a portable NLP system, termed
LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard,
customized and in-house developed NLP tools. Our system facilitates portability
across different institutions and data systems by incorporating an enriched
Common Data Model (CDM) to standardize necessary data elements. It utilizes
UMLS to perform domain adaptation when integrating generic domain NLP tools. It
also features stand-off annotations that are specified by positional reference
to the original document. We built an interval tree based search engine to
efficiently query and retrieve the stand-off annotations by specifying
positional requirements. We also developed a utility to convert an inline
annotation format to stand-off annotations to enable the reuse of clinical text
datasets with inline annotations. We experimented with our system on several
NLP facilitated tasks including computational phenotyping for lymphoma patients
and semantic relation extraction for clinical notes. These experiments
showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape
- ā¦