16 research outputs found

    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010

    Proceedings of the EACL Hackashop on News Media Content Analysis and Automated Report Generation

    Get PDF
    Peer reviewe

    N-gram Based Croatian Language Network: Application in a Smart Environment

    Get PDF
    In the field of natural language processing, language networks represent a method for observing linguistic units and their interactions in different linguistic contexts. This paper uses the previously presented Croatian language network for building a solution capable of generating spoken notifications in Croatian language. The novelty of this paper is that it proposes an approach for generating spoken notifications in smart environments by combining specialized services that enable interaction with the environment and human users. The process employed for generating spoken notifications is described in detail. Also, a novel contribution of this paper is the case-study evaluation of the proposed approach in a smart home environment

    The Future of Information Sciences : INFuture2009 : Digital Resources and Knowledge Sharing

    Get PDF

    Semantic Feature Extraction Using Multi-Sense Embeddings and Lexical Chains

    Full text link
    The relationship between words in a sentence often tell us more about the underlying semantic content of a document than its actual words individually. Natural language understanding has seen an increasing effort in the formation of techniques that try to produce non-trivial features, in the last few years, especially after robust word embeddings models became prominent, when they proved themselves able to capture and represent semantic relationships from massive amounts of data. These new dense vector representations indeed leverage the baseline in natural language processing, but they still fall short in dealing with intrinsic issues in linguistics, such as polysemy and homonymy. Systems that make use of natural language at its core, can be affected by a weak semantic representation of human language, resulting in inaccurate outcomes based on poor decisions. In this subject, word sense disambiguation and lexical chains have been exploring alternatives to alleviate several problems in linguistics, such as semantic representation, definitions, differentiation, polysemy, and homonymy. However, little effort is seen in combining recent advances in token embeddings (e.g. words, documents) with word sense disambiguation and lexical chains. To collaborate in building a bridge between these areas, this work proposes a collection of algorithms to extract semantic features from large corpora as its main contributions, named MSSA, MSSA-D, MSSA-NR, FLLC II, and FXLC II. The MSSA techniques focus on disambiguating and annotating each word by its specific sense, considering the semantic effects of its context. The lexical chains group derive the semantic relations between consecutive words in a document in a dynamic and pre-defined manner. These original techniques' target is to uncover the implicit semantic links between words using their lexical structure, incorporating multi-sense embeddings, word sense disambiguation, lexical chains, and lexical databases. A few natural language problems are selected to validate the contributions of this work, in which our techniques outperform state-of-the-art systems. All the proposed algorithms can be used separately as independent components or combined in one single system to improve the semantic representation of words, sentences, and documents. Additionally, they can also work in a recurrent form, refining even more their results.Ph.D.College of Engineering & Computer ScienceUniversity of Michigan-Dearbornhttps://deepblue.lib.umich.edu/bitstream/2027.42/149647/1/Terry Ruas Final Dissertation.pdfDescription of Terry Ruas Final Dissertation.pdf : Dissertatio

    Natural Language Processing: Emerging Neural Approaches and Applications

    Get PDF
    This Special Issue highlights the most recent research being carried out in the NLP field to discuss relative open issues, with a particular focus on both emerging approaches for language learning, understanding, production, and grounding interactively or autonomously from data in cognitive and neural systems, as well as on their potential or real applications in different domains

    Automatic Structured Text Summarization with Concept Maps

    Get PDF
    Efficiently exploring a collection of text documents in order to answer a complex question is a challenge that many people face. As abundant information on almost any topic is electronically available nowadays, supporting tools are needed to ensure that people can profit from the information's availability rather than suffer from the information overload. Structured summaries can help in this situation: They can be used to provide a concise overview of the contents of a document collection, they can reveal interesting relationships and they can be used as a navigation structure to further explore the documents. A concept map, which is a graph representing concepts and their relationships, is a specific form of a structured summary that offers these benefits. However, despite its appealing properties, only a limited amount of research has studied how concept maps can be automatically created to summarize documents. Automating that task is challenging and requires a variety of text processing techniques including information extraction, coreference resolution and summarization. The goal of this thesis is to better understand these challenges and to develop computational models that can address them. As a first contribution, this thesis lays the necessary ground for comparable research on computational models for concept map--based summarization. We propose a precise definition of the task together with suitable evaluation protocols and carry out experimental comparisons of previously proposed methods. As a result, we point out limitations of existing methods and gaps that have to be closed to successfully create summary concept maps. Towards that end, we also release a new benchmark corpus for the task that has been created with a novel, scalable crowdsourcing strategy. Furthermore, we propose new techniques for several subtasks of creating summary concept maps. First, we introduce the usage of predicate-argument analysis for the extraction of concept and relation mentions, which greatly simplifies the development of extraction methods. Second, we demonstrate that a predicate-argument analysis tool can be ported from English to German with low effort, indicating that the extraction technique can also be applied to other languages. We further propose to group concept mentions using pairwise classifications and set partitioning, which significantly improves the quality of the created summary concept maps. We show similar improvements for a new supervised importance estimation model and an optimal subgraph selection procedure. By combining these techniques in a pipeline, we establish a new state-of-the-art for the summarization task. Additionally, we study the use of neural networks to model the summarization problem as a single end-to-end task. While such approaches are not yet competitive with pipeline-based approaches, we report several experiments that illustrate the challenges - mostly related to training data - that currently limit the performance of this technique. We conclude the thesis by presenting a prototype system that demonstrates the use of automatically generated summary concept maps in practice and by pointing out promising directions for future research on the topic of this thesis

    Study on open science: The general state of the play in Open Science principles and practices at European life sciences institutes

    Get PDF
    Nowadays, open science is a hot topic on all levels and also is one of the priorities of the European Research Area. Components that are commonly associated with open science are open access, open data, open methodology, open source, open peer review, open science policies and citizen science. Open science may a great potential to connect and influence the practices of researchers, funding institutions and the public. In this paper, we evaluate the level of openness based on public surveys at four European life sciences institute

    Semantic approaches to domain template construction and opinion mining from natural language

    Get PDF
    Most of the text mining algorithms in use today are based on lexical representation of input texts, for example bag of words. A possible alternative is to first convert text into a semantic representation, one that captures the text content in a structured way and using only a set of pre-agreed labels. This thesis explores the feasibility of such an approach to two tasks on collections of documents: identifying common structure in input documents (»domain template construction«), and helping users find differing opinions in input documents (»opinion mining«). We first discuss ways of converting natural text to a semantic representation. We propose and compare two new methods with varying degrees of target representation complexity. The first method, showing more promise, is based on dependency parser output which it converts to lightweight semantic frames, with role fillers aligned to WordNet. The second method structures text using Semantic Role Labeling techniques and aligns the output to the Cyc ontology. Based on the first of the above representations, we next propose and evaluate two methods for constructing frame-based templates for documents from a given domain (e.g. bombing attack news reports). A template is the set of all salient attributes (e.g. attacker, number of casualties, \ldots). The idea of both methods is to construct abstract frames for which more specific instances (according to the WordNet hierarchy) can be found in the input documents. Fragments of these abstract frames represent the sought-for attributes. We achieve state of the art performance and additionally provide detailed type constraints for the attributes, something not possible with competing methods. Finally, we propose a software system for exposing differing opinions in the news. For any given event, we present the user with all known articles on the topic and let them navigate them by three semantic properties simultaneously: sentiment, topical focus and geography of origin. The result is a dynamically reranked set of relevant articles and a near real time focused summary of those articles. The summary, too, is computed from the semantic text representation discussed above. We conducted a user study of the whole system with very positive results
    corecore