13,210 research outputs found
A Trie-Structured Bayesian Model for Unsupervised Morphological Segmentation
In this paper, we introduce a trie-structured Bayesian model for unsupervised
morphological segmentation. We adopt prior information from different sources
in the model. We use neural word embeddings to discover words that are
morphologically derived from each other and thereby that are semantically
similar. We use letter successor variety counts obtained from tries that are
built by neural word embeddings. Our results show that using different
information sources such as neural word embeddings and letter successor variety
as prior information improves morphological segmentation in a Bayesian model.
Our model outperforms other unsupervised morphological segmentation models on
Turkish and gives promising results on English and German for scarce resources.Comment: 12 pages, accepted and presented at the CICLING 2017 - 18th
International Conference on Intelligent Text Processing and Computational
Linguistic
Unsupervised Extraction of Representative Concepts from Scientific Literature
This paper studies the automated categorization and extraction of scientific
concepts from titles of scientific articles, in order to gain a deeper
understanding of their key contributions and facilitate the construction of a
generic academic knowledgebase. Towards this goal, we propose an unsupervised,
domain-independent, and scalable two-phase algorithm to type and extract key
concept mentions into aspects of interest (e.g., Techniques, Applications,
etc.). In the first phase of our algorithm we propose PhraseType, a
probabilistic generative model which exploits textual features and limited POS
tags to broadly segment text snippets into aspect-typed phrases. We extend this
model to simultaneously learn aspect-specific features and identify academic
domains in multi-domain corpora, since the two tasks mutually enhance each
other. In the second phase, we propose an approach based on adaptor grammars to
extract fine grained concept mentions from the aspect-typed phrases without the
need for any external resources or human effort, in a purely data-driven
manner. We apply our technique to study literature from diverse scientific
domains and show significant gains over state-of-the-art concept extraction
techniques. We also present a qualitative analysis of the results obtained.Comment: Published as a conference paper at CIKM 201
Using ontology in query answering systems: Scenarios, requirements and challenges
Equipped with the ultimate query answering system, computers would finally be in a position to address all our information needs in a natural way. In this paper, we describe how Language and Computing nv (L&C), a developer of ontology-based natural language understanding systems for the healthcare domain, is working towards the ultimate Question Answering (QA) System for healthcare workers. L&C’s company strategy in this area is to design in a step-by-step fashion the essential components of such a system, each component being designed to solve some one part of the total problem and at the same time reflect well-defined needs on the prat of our customers. We compare our strategy with the research roadmap proposed by the Question Answering Committee of the National Institute of Standards and Technology (NIST), paying special attention to the role of ontology
- …