905 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    A survey on opinion summarization technique s for social media

    Get PDF
    The volume of data on the social media is huge and even keeps increasing. The need for efficient processing of this extensive information resulted in increasing research interest in knowledge engineering tasks such as Opinion Summarization. This survey shows the current opinion summarization challenges for social media, then the necessary pre-summarization steps like preprocessing, features extraction, noise elimination, and handling of synonym features. Next, it covers the various approaches used in opinion summarization like Visualization, Abstractive, Aspect based, Query-focused, Real Time, Update Summarization, and highlight other Opinion Summarization approaches such as Contrastive, Concept-based, Community Detection, Domain Specific, Bilingual, Social Bookmarking, and Social Media Sampling. It covers the different datasets used in opinion summarization and future work suggested in each technique. Finally, it provides different ways for evaluating opinion summarization

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    Knowledge Expansion of a Statistical Machine Translation System using Morphological Resources

    Get PDF
    Translation capability of a Phrase-Based Statistical Machine Translation (PBSMT) system mostly depends on parallel data and phrases that are not present in the training data are not correctly translated. This paper describes a method that efficiently expands the existing knowledge of a PBSMT system without adding more parallel data but using external morphological resources. A set of new phrase associations is added to translation and reordering models; each of them corresponds to a morphological variation of the source/target/both phrases of an existing association. New associations are generated using a string similarity score based on morphosyntactic information. We tested our approach on En-Fr and Fr-En translations and results showed improvements of the performance in terms of automatic scores (BLEU and Meteor) and reduction of out-of-vocabulary (OOV) words. We believe that our knowledge expansion framework is generic and could be used to add different types of information to the model.JRC.G.2-Global security and crisis managemen

    Automatic rule learning exploiting morphological features for named entity recognition in Turkish

    Get PDF
    Named entity recognition (NER) is one of the basic tasks in automatic extraction of information from natural language texts. In this paper, we describe an automatic rule learning method that exploits different features of the input text to identify the named entities located in the natural language texts. Moreover, we explore the use of morphological features for extracting named entities from Turkish texts. We believe that the developed system can also be used for other agglutinative languages. The paper also provides a comprehensive overview of the field by reviewing the NER research literature. We conducted our experiments on the TurkIE dataset, a corpus of articles collected from different Turkish newspapers. Our method achieved an average F-score of 91.08% on the dataset. The results of the comparative experiments demonstrate that the developed technique is successfully applicable to the task of automatic NER and exploiting morphological features can significantly improve the NER from Turkish, an agglutinative language. © The Author(s) 2011
    corecore