187,449 research outputs found
Structured Training for Neural Network Transition-Based Parsing
We present structured perceptron training for neural network transition-based
dependency parsing. We learn the neural network representation using a gold
corpus augmented by a large number of automatically parsed sentences. Given
this fixed network representation, we learn a final layer using the structured
perceptron with beam-search decoding. On the Penn Treebank, our parser reaches
94.26% unlabeled and 92.41% labeled attachment accuracy, which to our knowledge
is the best accuracy on Stanford Dependencies to date. We also provide in-depth
ablative analysis to determine which aspects of our model provide the largest
gains in accuracy
Methodological proposal to build a corpus-based ontology in terminology
Corpora are an indispensable resource to improve quality both in the linguistic and conceptual dimension of terminological projects. However, while there is complete agreement that specialised corpora are vital in the linguistic dimension of any terminological project (e.g. to select real contextual examples), there are three different approaches with regard to the conceptual dimension and not all of them employ corpora in their projects. In an attempt to shed some light on the advantages that corpora bring to the representation of specialised knowledge in terminology, this research follows the ontoterminography methodology (Durán-Muñoz 2012) to propose the building of a corpus-based ontology within a terminological project, in particular a specialised resource about an adventure activity (canyoning) in English. More specifically, it describes the different steps that are required to create such an ontology, from the analysis of the specialised domain and the compilation of the corpus to the representation of the specialised knowledge in the form of a corpus-based ontology
Improving self-organising information maps as navigational tools: A semantic approach
Purpose - The goal of the research is to explore whether the use of higher-level semantic features can help us to build better self-organising map (SOM) representation as measured from a human-centred perspective. The authors also explore an automatic evaluation method that utilises human expert knowledge encapsulated in the structure of traditional textbooks to determine map representation quality. Design/methodology/approach - Two types of document representations involving semantic features have been explored - i.e. using only one individual semantic feature, and mixing a semantic feature with keywords. Experiments were conducted to investigate the impact of semantic representation quality on the map. The experiments were performed on data collections from a single book corpus and a multiple book corpus. Findings - Combining keywords with certain semantic features achieves significant improvement of representation quality over the keywords-only approach in a relatively homogeneous single book corpus. Changing the ratios in combining different features also affects the performance. While semantic mixtures can work well in a single book corpus, they lose their advantages over keywords in the multiple book corpus. This raises a concern about whether the semantic representations in the multiple book corpus are homogeneous and coherent enough for applying semantic features. The terminology issue among textbooks affects the ability of the SOM to generate a high quality map for heterogeneous collections. Originality/value - The authors explored the use of higher-level document representation features for the development of better quality SOM. In addition the authors have piloted a specific method for evaluating the SOM quality based on the organisation of information content in the map. © 2011 Emerald Group Publishing Limited
Towards a Knowledge Graph based Speech Interface
Applications which use human speech as an input require a speech interface
with high recognition accuracy. The words or phrases in the recognised text are
annotated with a machine-understandable meaning and linked to knowledge graphs
for further processing by the target application. These semantic annotations of
recognised words can be represented as a subject-predicate-object triples which
collectively form a graph often referred to as a knowledge graph. This type of
knowledge representation facilitates to use speech interfaces with any spoken
input application, since the information is represented in logical, semantic
form, retrieving and storing can be followed using any web standard query
languages. In this work, we develop a methodology for linking speech input to
knowledge graphs and study the impact of recognition errors in the overall
process. We show that for a corpus with lower WER, the annotation and linking
of entities to the DBpedia knowledge graph is considerable. DBpedia Spotlight,
a tool to interlink text documents with the linked open data is used to link
the speech recognition output to the DBpedia knowledge graph. Such a
knowledge-based speech recognition interface is useful for applications such as
question answering or spoken dialog systems.Comment: Under Review in International Workshop on Grounding Language
Understanding, Satellite of Interspeech 201
Leveraging Knowledge Graph Embeddings to Enhance Contextual Representations for Relation Extraction
Relation extraction task is a crucial and challenging aspect of Natural
Language Processing. Several methods have surfaced as of late, exhibiting
notable performance in addressing the task; however, most of these approaches
rely on vast amounts of data from large-scale knowledge graphs or language
models pretrained on voluminous corpora. In this paper, we hone in on the
effective utilization of solely the knowledge supplied by a corpus to create a
high-performing model. Our objective is to showcase that by leveraging the
hierarchical structure and relational distribution of entities within a corpus
without introducing external knowledge, a relation extraction model can achieve
significantly enhanced performance. We therefore proposed a relation extraction
approach based on the incorporation of pretrained knowledge graph embeddings at
the corpus scale into the sentence-level contextual representation. We
conducted a series of experiments which revealed promising and very interesting
results for our proposed approach.The obtained results demonstrated an
outperformance of our method compared to context-based relation extraction
models.Comment: 15 pages, 1 figures, The 17th International Conference on Document
Analysis and Recognitio
The role of the cognitive model profile in knowledge representation and meaning construction: the case of the lexical item Europe
The paper addresses the role of the cognitive model profile, one of the fundamental constructs in LCCM Theory (a.k.a. access semantics), in meaning construction and knowledge representation with respect to the concept of Europe. The study is based on a corpus of news articles retrieved from the Guardian from May 2004 through December 2009 (approximately 930,000 words) and focuses on the lexical item Europe (over 4000 corpus occurrences). The study takes its theoretical underpinnings from LCCM Theory, a theory of lexical representation and semantic composition, which delineates the roles the linguistic and the conceptual systems play in meaning construction (e.g., Evans 2009, 2013).The paper documents the immense semantic potential of the lexical item Europe as manifest in the Guardian’s discourse under analysis. In terms of knowledge representation, to account for the coherent body of multimodal knowledge which the lexical item Europe affords access to, its cognitive model profiles relevant to its two lexical concepts are constructed. As far as the role of the cognitive model profile in meaning construction is concerned, the study demonstrates how the context, specifically the co-text, determines the activation of a respective portion of the cognitive model profile of the lexical item Europe.
Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Typing
Embedding models typically associate each word with a single real-valued
vector, representing its different properties. Evaluation methods, therefore,
need to analyze the accuracy and completeness of these properties in
embeddings. This requires fine-grained analysis of embedding subspaces.
Multi-label classification is an appropriate way to do so. We propose a new
evaluation method for word embeddings based on multi-label classification given
a word embedding. The task we use is fine-grained name typing: given a large
corpus, find all types that a name can refer to based on the name embedding.
Given the scale of entities in knowledge bases, we can build datasets for this
task that are complementary to the current embedding evaluation datasets in:
they are very large, contain fine-grained classes, and allow the direct
evaluation of embeddings without confounding factors like sentence contextComment: 6 pages, The 3rd Workshop on Representation Learning for NLP
(RepL4NLP @ ACL2018
Learning Content Selection Rules for Generating Object Descriptions in Dialogue
A fundamental requirement of any task-oriented dialogue system is the ability
to generate object descriptions that refer to objects in the task domain. The
subproblem of content selection for object descriptions in task-oriented
dialogue has been the focus of much previous work and a large number of models
have been proposed. In this paper, we use the annotated COCONUT corpus of
task-oriented design dialogues to develop feature sets based on Dale and
Reiters (1995) incremental model, Brennan and Clarks (1996) conceptual pact
model, and Jordans (2000b) intentional influences model, and use these feature
sets in a machine learning experiment to automatically learn a model of content
selection for object descriptions. Since Dale and Reiters model requires a
representation of discourse structure, the corpus annotations are used to
derive a representation based on Grosz and Sidners (1986) theory of the
intentional structure of discourse, as well as two very simple representations
of discourse structure based purely on recency. We then apply the
rule-induction program RIPPER to train and test the content selection component
of an object description generator on a set of 393 object descriptions from the
corpus. To our knowledge, this is the first reported experiment of a trainable
content selection component for object description generation in dialogue.
Three separate content selection models that are based on the three theoretical
models, all independently achieve accuracies significantly above the majority
class baseline (17%) on unseen test data, with the intentional influences model
(42.4%) performing significantly better than either the incremental model
(30.4%) or the conceptual pact model (28.9%). But the best performing models
combine all the feature sets, achieving accuracies near 60%. Surprisingly, a
simple recency-based representation of discourse structure does as well as one
based on intentional structure. To our knowledge, this is also the first
empirical comparison of a representation of Grosz and Sidners model of
discourse structure with a simpler model for any generation task
- …