187,449 research outputs found

    Structured Training for Neural Network Transition-Based Parsing

    Full text link
    We present structured perceptron training for neural network transition-based dependency parsing. We learn the neural network representation using a gold corpus augmented by a large number of automatically parsed sentences. Given this fixed network representation, we learn a final layer using the structured perceptron with beam-search decoding. On the Penn Treebank, our parser reaches 94.26% unlabeled and 92.41% labeled attachment accuracy, which to our knowledge is the best accuracy on Stanford Dependencies to date. We also provide in-depth ablative analysis to determine which aspects of our model provide the largest gains in accuracy

    Methodological proposal to build a corpus-based ontology in terminology

    Get PDF
    Corpora are an indispensable resource to improve quality both in the linguistic and conceptual dimension of terminological projects. However, while there is complete agreement that specialised corpora are vital in the linguistic dimension of any terminological project (e.g. to select real contextual examples), there are three different approaches with regard to the conceptual dimension and not all of them employ corpora in their projects. In an attempt to shed some light on the advantages that corpora bring to the representation of specialised knowledge in terminology, this research follows the ontoterminography methodology (Durán-Muñoz 2012) to propose the building of a corpus-based ontology within a terminological project, in particular a specialised resource about an adventure activity (canyoning) in English. More specifically, it describes the different steps that are required to create such an ontology, from the analysis of the specialised domain and the compilation of the corpus to the representation of the specialised knowledge in the form of a corpus-based ontology

    Improving self-organising information maps as navigational tools: A semantic approach

    Get PDF
    Purpose - The goal of the research is to explore whether the use of higher-level semantic features can help us to build better self-organising map (SOM) representation as measured from a human-centred perspective. The authors also explore an automatic evaluation method that utilises human expert knowledge encapsulated in the structure of traditional textbooks to determine map representation quality. Design/methodology/approach - Two types of document representations involving semantic features have been explored - i.e. using only one individual semantic feature, and mixing a semantic feature with keywords. Experiments were conducted to investigate the impact of semantic representation quality on the map. The experiments were performed on data collections from a single book corpus and a multiple book corpus. Findings - Combining keywords with certain semantic features achieves significant improvement of representation quality over the keywords-only approach in a relatively homogeneous single book corpus. Changing the ratios in combining different features also affects the performance. While semantic mixtures can work well in a single book corpus, they lose their advantages over keywords in the multiple book corpus. This raises a concern about whether the semantic representations in the multiple book corpus are homogeneous and coherent enough for applying semantic features. The terminology issue among textbooks affects the ability of the SOM to generate a high quality map for heterogeneous collections. Originality/value - The authors explored the use of higher-level document representation features for the development of better quality SOM. In addition the authors have piloted a specific method for evaluating the SOM quality based on the organisation of information content in the map. © 2011 Emerald Group Publishing Limited

    Towards a Knowledge Graph based Speech Interface

    Full text link
    Applications which use human speech as an input require a speech interface with high recognition accuracy. The words or phrases in the recognised text are annotated with a machine-understandable meaning and linked to knowledge graphs for further processing by the target application. These semantic annotations of recognised words can be represented as a subject-predicate-object triples which collectively form a graph often referred to as a knowledge graph. This type of knowledge representation facilitates to use speech interfaces with any spoken input application, since the information is represented in logical, semantic form, retrieving and storing can be followed using any web standard query languages. In this work, we develop a methodology for linking speech input to knowledge graphs and study the impact of recognition errors in the overall process. We show that for a corpus with lower WER, the annotation and linking of entities to the DBpedia knowledge graph is considerable. DBpedia Spotlight, a tool to interlink text documents with the linked open data is used to link the speech recognition output to the DBpedia knowledge graph. Such a knowledge-based speech recognition interface is useful for applications such as question answering or spoken dialog systems.Comment: Under Review in International Workshop on Grounding Language Understanding, Satellite of Interspeech 201

    Leveraging Knowledge Graph Embeddings to Enhance Contextual Representations for Relation Extraction

    Full text link
    Relation extraction task is a crucial and challenging aspect of Natural Language Processing. Several methods have surfaced as of late, exhibiting notable performance in addressing the task; however, most of these approaches rely on vast amounts of data from large-scale knowledge graphs or language models pretrained on voluminous corpora. In this paper, we hone in on the effective utilization of solely the knowledge supplied by a corpus to create a high-performing model. Our objective is to showcase that by leveraging the hierarchical structure and relational distribution of entities within a corpus without introducing external knowledge, a relation extraction model can achieve significantly enhanced performance. We therefore proposed a relation extraction approach based on the incorporation of pretrained knowledge graph embeddings at the corpus scale into the sentence-level contextual representation. We conducted a series of experiments which revealed promising and very interesting results for our proposed approach.The obtained results demonstrated an outperformance of our method compared to context-based relation extraction models.Comment: 15 pages, 1 figures, The 17th International Conference on Document Analysis and Recognitio

    The role of the cognitive model profile in knowledge representation and meaning construction: the case of the lexical item Europe

    Get PDF
    The paper addresses the role of the cognitive model profile, one of the fundamental constructs in LCCM Theory (a.k.a. access semantics), in meaning construction and knowledge representation with respect to the concept of Europe. The study is based on a corpus of news articles retrieved from the Guardian from May 2004 through December 2009 (approximately 930,000 words) and focuses on the lexical item Europe (over 4000 corpus occurrences). The study takes its theoretical underpinnings from LCCM Theory, a theory of lexical representation and semantic composition, which delineates the roles the linguistic and the conceptual systems play in meaning construction (e.g., Evans 2009, 2013).The paper documents the immense semantic potential of the lexical item Europe as manifest in the Guardian’s discourse under analysis. In terms of knowledge representation, to account for the coherent body of multimodal knowledge which the lexical item Europe affords access to, its cognitive model profiles relevant to its two lexical concepts are constructed. As far as the role of the cognitive model profile in meaning construction is concerned, the study demonstrates how the context, specifically the co-text, determines the activation of a respective portion of the cognitive model profile of the lexical item Europe.

    Evaluating Word Embeddings in Multi-label Classification Using Fine-grained Name Typing

    Full text link
    Embedding models typically associate each word with a single real-valued vector, representing its different properties. Evaluation methods, therefore, need to analyze the accuracy and completeness of these properties in embeddings. This requires fine-grained analysis of embedding subspaces. Multi-label classification is an appropriate way to do so. We propose a new evaluation method for word embeddings based on multi-label classification given a word embedding. The task we use is fine-grained name typing: given a large corpus, find all types that a name can refer to based on the name embedding. Given the scale of entities in knowledge bases, we can build datasets for this task that are complementary to the current embedding evaluation datasets in: they are very large, contain fine-grained classes, and allow the direct evaluation of embeddings without confounding factors like sentence contextComment: 6 pages, The 3rd Workshop on Representation Learning for NLP (RepL4NLP @ ACL2018

    Learning Content Selection Rules for Generating Object Descriptions in Dialogue

    Full text link
    A fundamental requirement of any task-oriented dialogue system is the ability to generate object descriptions that refer to objects in the task domain. The subproblem of content selection for object descriptions in task-oriented dialogue has been the focus of much previous work and a large number of models have been proposed. In this paper, we use the annotated COCONUT corpus of task-oriented design dialogues to develop feature sets based on Dale and Reiters (1995) incremental model, Brennan and Clarks (1996) conceptual pact model, and Jordans (2000b) intentional influences model, and use these feature sets in a machine learning experiment to automatically learn a model of content selection for object descriptions. Since Dale and Reiters model requires a representation of discourse structure, the corpus annotations are used to derive a representation based on Grosz and Sidners (1986) theory of the intentional structure of discourse, as well as two very simple representations of discourse structure based purely on recency. We then apply the rule-induction program RIPPER to train and test the content selection component of an object description generator on a set of 393 object descriptions from the corpus. To our knowledge, this is the first reported experiment of a trainable content selection component for object description generation in dialogue. Three separate content selection models that are based on the three theoretical models, all independently achieve accuracies significantly above the majority class baseline (17%) on unseen test data, with the intentional influences model (42.4%) performing significantly better than either the incremental model (30.4%) or the conceptual pact model (28.9%). But the best performing models combine all the feature sets, achieving accuracies near 60%. Surprisingly, a simple recency-based representation of discourse structure does as well as one based on intentional structure. To our knowledge, this is also the first empirical comparison of a representation of Grosz and Sidners model of discourse structure with a simpler model for any generation task
    • …
    corecore