13,066 research outputs found
The Neuro-Symbolic Concept Learner: Interpreting Scenes, Words, and Sentences From Natural Supervision
We propose the Neuro-Symbolic Concept Learner (NS-CL), a model that learns
visual concepts, words, and semantic parsing of sentences without explicit
supervision on any of them; instead, our model learns by simply looking at
images and reading paired questions and answers. Our model builds an
object-based scene representation and translates sentences into executable,
symbolic programs. To bridge the learning of two modules, we use a
neuro-symbolic reasoning module that executes these programs on the latent
scene representation. Analogical to human concept learning, the perception
module learns visual concepts based on the language description of the object
being referred to. Meanwhile, the learned visual concepts facilitate learning
new words and parsing new sentences. We use curriculum learning to guide the
searching over the large compositional space of images and language. Extensive
experiments demonstrate the accuracy and efficiency of our model on learning
visual concepts, word representations, and semantic parsing of sentences.
Further, our method allows easy generalization to new object attributes,
compositions, language concepts, scenes and questions, and even new program
domains. It also empowers applications including visual question answering and
bidirectional image-text retrieval.Comment: ICLR 2019 (Oral). Project page: http://nscl.csail.mit.edu
Statistical parsing of morphologically rich languages (SPMRL): what, how and whither
The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available statistical parsing models to such languages is susceptible to serious performance degradation. The first workshop on statistical parsing of MRLs hosts a variety of contributions which show that despite language-specific idiosyncrasies, the problems associated with parsing MRLs cut across languages and parsing frameworks. In this paper we review the current state-of-affairs with respect to parsing MRLs and point out central challenges. We synthesize the contributions of researchers working on parsing Arabic, Basque, French, German, Hebrew, Hindi and Korean to point out shared solutions across languages. The overarching analysis suggests itself as a source of directions for future investigations
Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases
A critical challenge in constructing a natural language interface to database
(NLIDB) is bridging the semantic gap between a natural language query (NLQ) and
the underlying data. Two specific ways this challenge exhibits itself is
through keyword mapping and join path inference. Keyword mapping is the task of
mapping individual keywords in the original NLQ to database elements (such as
relations, attributes or values). It is challenging due to the ambiguity in
mapping the user's mental model and diction to the schema definition and
contents of the underlying database. Join path inference is the process of
selecting the relations and join conditions in the FROM clause of the final SQL
query, and is difficult because NLIDB users lack the knowledge of the database
schema or SQL and therefore cannot explicitly specify the intermediate tables
and joins needed to construct a final SQL query. In this paper, we propose
leveraging information from the SQL query log of a database to enhance the
performance of existing NLIDBs with respect to these challenges. We present a
system Templar that can be used to augment existing NLIDBs. Our extensive
experimental evaluation demonstrates the effectiveness of our approach, leading
up to 138% improvement in top-1 accuracy in existing NLIDBs by leveraging SQL
query log information.Comment: Accepted to IEEE International Conference on Data Engineering (ICDE)
201
Open-Vocabulary Semantic Parsing with both Distributional Statistics and Formal Knowledge
Traditional semantic parsers map language onto compositional, executable
queries in a fixed schema. This mapping allows them to effectively leverage the
information contained in large, formal knowledge bases (KBs, e.g., Freebase) to
answer questions, but it is also fundamentally limiting---these semantic
parsers can only assign meaning to language that falls within the KB's
manually-produced schema. Recently proposed methods for open vocabulary
semantic parsing overcome this limitation by learning execution models for
arbitrary language, essentially using a text corpus as a kind of knowledge
base. However, all prior approaches to open vocabulary semantic parsing replace
a formal KB with textual information, making no use of the KB in their models.
We show how to combine the disparate representations used by these two
approaches, presenting for the first time a semantic parser that (1) produces
compositional, executable representations of language, (2) can successfully
leverage the information contained in both a formal KB and a large corpus, and
(3) is not limited to the schema of the underlying KB. We demonstrate
significantly improved performance over state-of-the-art baselines on an
open-domain natural language question answering task.Comment: Re-written abstract and intro, other minor changes throughout. This
version published at AAAI 201
Recommended from our members
An improved hidden vector state model approach and its adaptation in extracting protein interaction information from biomedical literature
Large quantity of knowledge, which is important for biological researchers to unveil the mechanism of life, often hides in the literature, such as journal articles, reports, books and so on. Many approaches focusing on extracting information from unstructured text, such as pattern matching, shallow and full parsing, have been proposed especially for biomedical applications. In this paper, we present an information extraction system employing a semantic parser using the Hidden Vector State (HVS) model for protein-protein interactions. We found that it performed better than other established statistical methods and achieved 58.3% and 76.8% in recall and precision respectively. Moreover, the pure data-driven HVS model can be easily adapted to other domains, which is rarely mentioned and possessed by other approaches. Experimental results prove that the model trained on one domain can still generate satisfactory results when shifting to another domain with a small amount of adaptation training data
Natural Language Query in the Biochemistry and Molecular Biology Domains Based on Cognition Search™
Motivation: With the tremendous growth in scientific literature, it is necessary to improve upon the standard pattern matching style of the available search engines. Semantic NLP may be the solution to this problem. Cognition Search (CSIR) is a natural language technology. It is best used by asking a simple question that might be answered in textual data being queried, such as MEDLINE. CSIR has a large English dictionary and semantic database. Cognition’s semantic map enables the search process to be based on meaning rather than statistical word pattern matching and, therefore, returns more complete and relevant results. The Cognition Search engine uses downward reasoning and synonymy which also improves recall. It improves precision through phrase parsing and word sense disambiguation.
Result: Here we have carried out several projects to "teach" the CSIR lexicon medical, biochemical and molecular biological language and acronyms from curated web-based free sources. Vocabulary from the Alliance for Cell Signaling (AfCS), the Human Genome Nomenclature Consortium (HGNC), the United Medical Language System (UMLS) Meta-thesaurus, and The International Union of Pure and Applied Chemistry (IUPAC) was introduced into the CSIR dictionary and curated. The resulting system was used to interpret MEDLINE abstracts. Meaning-based search of MEDLINE abstracts yields high precision (estimated at >90%), and high recall (estimated at >90%), where synonym information has been encoded. The present implementation can be found at http://MEDLINE.cognition.com. 

- …