214 research outputs found
Object-oriented Neural Programming (OONP) for Document Understanding
We propose Object-oriented Neural Programming (OONP), a framework for
semantically parsing documents in specific domains. Basically, OONP reads a
document and parses it into a predesigned object-oriented data structure
(referred to as ontology in this paper) that reflects the domain-specific
semantics of the document. An OONP parser models semantic parsing as a decision
process: a neural net-based Reader sequentially goes through the document, and
during the process it builds and updates an intermediate ontology to summarize
its partial understanding of the text it covers. OONP supports a rich family of
operations (both symbolic and differentiable) for composing the ontology, and a
big variety of forms (both symbolic and differentiable) for representing the
state and the document. An OONP parser can be trained with supervision of
different forms and strength, including supervised learning (SL) ,
reinforcement learning (RL) and hybrid of the two. Our experiments on both
synthetic and real-world document parsing tasks have shown that OONP can learn
to handle fairly complicated ontology with training data of modest sizes.Comment: accepted by ACL 201
Ontology of core data mining entities
In this article, we present OntoDM-core, an ontology of core data mining
entities. OntoDM-core defines themost essential datamining entities in a three-layered
ontological structure comprising of a specification, an implementation and an application
layer. It provides a representational framework for the description of mining
structured data, and in addition provides taxonomies of datasets, data mining tasks,
generalizations, data mining algorithms and constraints, based on the type of data.
OntoDM-core is designed to support a wide range of applications/use cases, such as
semantic annotation of data mining algorithms, datasets and results; annotation of
QSAR studies in the context of drug discovery investigations; and disambiguation of
terms in text mining. The ontology has been thoroughly assessed following the practices
in ontology engineering, is fully interoperable with many domain resources and
is easy to extend
Standardizing New Diagnostic Tests to Facilitate Rapid Responses to The Covid-19 Pandemic
In order to enhance the data interoperability, an expeditious and accurate standardization solution is highly desirable for naming rapidly emerging novel lab tests, and thus diminishes confusion in early responses to pandemic outbreaks. This is a preliminary study to explore the roles and implementation of medical informatics technology, especially natural language processing and ontology methods, in standardizing information about emerging lab tests during a pandemic, thereby facilitating rapid responses to the pandemic. The ultimate goal of this study is to develop an informatics framework for rapid standardization of lab testing names during a pandemic to better prepare for future public health threats. We first constructed an information model for lab tests approved during the COVID-19 pandemic and built a named entity recognition tool that can automatically extract lab test information specified in the information model from the Emergency Use Authorization(EUA)documents of the U.S. Food and Drug Administration (FDA), thus creating a catalog of approved lab tests with detailed information. To facilitate the standardization of lab testing data in electronic health records, we further developed the COVID-19 TestNorm, a tool that normalizes the names of various COVID-19 lab testing used by different healthcare facilities into standard Logical Observation Identifiers Names and Codes (LOINC). The overall accuracy of COVID-19 TestNorm on the development set was 98.9%, and on the independent test set was 97.4%. Lastly, we conducted a clinical study on COVID-19 re-positivity to demonstrate the utility of standardized lab test information in supporting clinical research. We believe that the result of my study indicates great a potential of medical informatics technologies for facilitating rapid responses to both current and future pandemics
Information Extraction based on Named Entity for Tourism Corpus
Tourism information is scattered around nowadays. To search for the
information, it is usually time consuming to browse through the results from
search engine, select and view the details of each accommodation. In this
paper, we present a methodology to extract particular information from full
text returned from the search engine to facilitate the users. Then, the users
can specifically look to the desired relevant information. The approach can be
used for the same task in other domains. The main steps are 1) building
training data and 2) building recognition model. First, the tourism data is
gathered and the vocabularies are built. The raw corpus is used to train for
creating vocabulary embedding. Also, it is used for creating annotated data.
The process of creating named entity annotation is presented. Then, the
recognition model of a given entity type can be built. From the experiments,
given hotel description, the model can extract the desired entity,i.e, name,
location, facility. The extracted data can further be stored as a structured
information, e.g., in the ontology format, for future querying and inference.
The model for automatic named entity identification, based on machine learning,
yields the error ranging 8%-25%.Comment: 6 pages, 9 figure
Braid: Weaving Symbolic and Neural Knowledge into Coherent Logical Explanations
Traditional symbolic reasoning engines, while attractive for their precision
and explicability, have a few major drawbacks: the use of brittle inference
procedures that rely on exact matching (unification) of logical terms, an
inability to deal with uncertainty, and the need for a precompiled rule-base of
knowledge (the "knowledge acquisition" problem). To address these issues, we
devise a novel logical reasoner called Braid, that supports probabilistic
rules, and uses the notion of custom unification functions and dynamic rule
generation to overcome the brittle matching and knowledge-gap problem prevalent
in traditional reasoners. In this paper, we describe the reasoning algorithms
used in Braid, and their implementation in a distributed task-based framework
that builds proof/explanation graphs for an input query. We use a simple QA
example from a children's story to motivate Braid's design and explain how the
various components work together to produce a coherent logical explanation.
Finally, we evaluate Braid on the ROC Story Cloze test and achieve close to
state-of-the-art results while providing frame-based explanations.Comment: Accepted at AAAI-202
Knowledge-based Biomedical Data Science 2019
Knowledge-based biomedical data science (KBDS) involves the design and
implementation of computer systems that act as if they knew about biomedicine.
Such systems depend on formally represented knowledge in computer systems,
often in the form of knowledge graphs. Here we survey the progress in the last
year in systems that use formally represented knowledge to address data science
problems in both clinical and biological domains, as well as on approaches for
creating knowledge graphs. Major themes include the relationships between
knowledge graphs and machine learning, the use of natural language processing,
and the expansion of knowledge-based approaches to novel domains, such as
Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages
with 3 table
A Survey on Knowledge Graphs: Representation, Acquisition and Applications
Human knowledge provides a formal understanding of the world. Knowledge
graphs that represent structural relations between entities have become an
increasingly popular research direction towards cognition and human-level
intelligence. In this survey, we provide a comprehensive review of knowledge
graph covering overall research topics about 1) knowledge graph representation
learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph,
and 4) knowledge-aware applications, and summarize recent breakthroughs and
perspective directions to facilitate future research. We propose a full-view
categorization and new taxonomies on these topics. Knowledge graph embedding is
organized from four aspects of representation space, scoring function, encoding
models, and auxiliary information. For knowledge acquisition, especially
knowledge graph completion, embedding methods, path inference, and logical rule
reasoning, are reviewed. We further explore several emerging topics, including
meta relational learning, commonsense reasoning, and temporal knowledge graphs.
To facilitate future research on knowledge graphs, we also provide a curated
collection of datasets and open-source libraries on different tasks. In the
end, we have a thorough outlook on several promising research directions
- …