68 research outputs found

    Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

    Get PDF
    The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)

    Automatic topic detection of multi-lingual news stories.

    Get PDF
    Wong Kam Lai.Thesis (M.Phil.)--Chinese University of Hong Kong, 2000.Includes bibliographical references (leaves 92-98).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Our Contributions --- p.5Chapter 1.2 --- Organization of this Thesis --- p.5Chapter 2 --- Literature Review --- p.7Chapter 2.1 --- Dragon Systems --- p.7Chapter 2.2 --- Carnegie Mellon University (CMU) --- p.9Chapter 2.3 --- University of Massachusetts (UMass) --- p.10Chapter 2.4 --- IBM T.J. Watson Research Center --- p.11Chapter 2.5 --- BBN Technologies --- p.12Chapter 2.6 --- National Taiwan University (NTU) --- p.13Chapter 2.7 --- Drawbacks of Existing Approaches --- p.14Chapter 3 --- Overview of Proposed Approach --- p.15Chapter 3.1 --- News Source --- p.15Chapter 3.2 --- Story Preprocessing --- p.18Chapter 3.3 --- Concept Term Generation --- p.20Chapter 3.4 --- Named Entity Extraction --- p.21Chapter 3.5 --- Gross Translation of Chinese to English --- p.21Chapter 3.6 --- Topic Detection method --- p.22Chapter 3.6.1 --- Deferral Period --- p.22Chapter 3.6.2 --- Detection Approach --- p.23Chapter 4 --- Concept Term Model --- p.25Chapter 4.1 --- Background of Contextual Analysis --- p.25Chapter 4.2 --- Concept Term Generation --- p.28Chapter 4.2.1 --- Concept Generation Algorithm --- p.28Chapter 4.2.2 --- Concept Term Representation for Detection --- p.33Chapter 5 --- Topic Detection Model --- p.35Chapter 5.1 --- Text Representation and Term Weights --- p.35Chapter 5.1.1 --- Story Representation --- p.35Chapter 5.1.2 --- Topic Representation --- p.43Chapter 5.1.3 --- Similarity Score --- p.43Chapter 5.1.4 --- Time adjustment scheme --- p.46Chapter 5.2 --- Gross Translation Method --- p.48Chapter 5.3 --- The Detection System --- p.50Chapter 5.3.1 --- Detection Requirement --- p.50Chapter 5.3.2 --- The Top Level Model --- p.52Chapter 5.4 --- The Clustering Algorithm --- p.55Chapter 5.4.1 --- Similarity Calculation --- p.55Chapter 5.4.2 --- Grouping Related Elements --- p.56Chapter 5.4.3 --- Topic Identification --- p.60Chapter 6 --- Experimental Results and Analysis --- p.63Chapter 6.1 --- Evaluation Model --- p.63Chapter 6.1.1 --- Evaluation Methodology --- p.64Chapter 6.2 --- Experiments on the effects of tuning the parameter --- p.68Chapter 6.2.1 --- Experiment Setup --- p.68Chapter 6.2.2 --- Results and Analysis --- p.69Chapter 6.3 --- Experiments on the effects of named entities and concept terms --- p.74Chapter 6.3.1 --- Experiment Setup --- p.74Chapter 6.3.2 --- Results and Analysis --- p.75Chapter 6.4 --- Experiments on the effect of using time adjustment --- p.77Chapter 6.4.1 --- Experiment Setup --- p.77Chapter 6.4.2 --- Results and Analysis --- p.79Chapter 6.5 --- Experiments on mono-lingual detection --- p.80Chapter 6.5.1 --- Experiment Setup --- p.80Chapter 6.5.2 --- Results and Analysis --- p.80Chapter 7 --- Conclusions and Future Work --- p.83Chapter 7.1 --- Conclusions --- p.83Chapter 7.2 --- Future Work --- p.85Chapter A --- List of Topics annotated for TDT3 Corpus --- p.86Chapter B --- Matching evaluation topics to hypothesized topics --- p.90Bibliography --- p.9

    Sixth Goddard Conference on Mass Storage Systems and Technologies Held in Cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems

    Get PDF
    This document contains copies of those technical papers received in time for publication prior to the Sixth Goddard Conference on Mass Storage Systems and Technologies which is being held in cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems at the University of Maryland-University College Inn and Conference Center March 23-26, 1998. As one of an ongoing series, this Conference continues to provide a forum for discussion of issues relevant to the management of large volumes of data. The Conference encourages all interested organizations to discuss long term mass storage requirements and experiences in fielding solutions. Emphasis is on current and future practical solutions addressing issues in data management, storage systems and media, data acquisition, long term retention of data, and data distribution. This year's discussion topics include architecture, tape optimization, new technology, performance, standards, site reports, vendor solutions. Tutorials will be available on shared file systems, file system backups, data mining, and the dynamics of obsolescence

    Unlocking Environmental Narratives

    Get PDF
    Understanding the role of humans in environmental change is one of the most pressing challenges of the 21st century. Environmental narratives – written texts with a focus on the environment – offer rich material capturing relationships between people and surroundings. We take advantage of two key opportunities for their computational analysis: massive growth in the availability of digitised contemporary and historical sources, and parallel advances in the computational analysis of natural language. We open by introducing interdisciplinary research questions related to the environment and amenable to analysis through written sources. The reader is then introduced to potential collections of narratives including newspapers, travel diaries, policy documents, scientific proposals and even fiction. We demonstrate the application of a range of approaches to analysing natural language computationally, introducing key ideas through worked examples, and providing access to the sources analysed and accompanying code. The second part of the book is centred around case studies, each applying computational analysis to some aspect of environmental narrative. Themes include the use of language to describe narratives about glaciers, urban gentrification, diversity and writing about nature and ways in which locations are conceptualised and described in nature writing. We close by reviewing the approaches taken, and presenting an interdisciplinary research agenda for future work. The book is designed to be of interest to newcomers to the field and experienced researchers, and set out in a way that it can be used as an accompanying text for graduate level courses in, for example, geography, environmental history or the digital humanities

    Unlocking environmental narratives: towards understanding human environment interactions through computational text analysis

    Full text link
    Understanding the role of humans in environmental change is one of the most pressing challenges of the 21st century. Environmental narratives – written texts with a focus on the environment – offer rich material capturing relationships between people and surroundings. We take advantage of two key opportunities for their computational analysis: massive growth in the availability of digitised contemporary and historical sources, and parallel advances in the computational analysis of natural language. We open by introducing interdisciplinary research questions related to the environment and amenable to analysis through written sources. The reader is then introduced to potential collections of narratives including newspapers, travel diaries, policy documents, scientific proposals and even fiction. We demonstrate the application of a range of approaches to analysing natural language computationally, introducing key ideas through worked examples, and providing access to the sources analysed and accompanying code. The second part of the book is centred around case studies, each applying computational analysis to some aspect of environmental narrative. Themes include the use of language to describe narratives about glaciers, urban gentrification, diversity and writing about nature and ways in which locations are conceptualised and described in nature writing. We close by reviewing the approaches taken, and presenting an interdisciplinary research agenda for future work. The book is designed to be of interest to newcomers to the field and experienced researchers, and set out in a way that it can be used as an accompanying text for graduate level courses in, for example, geography, environmental history or the digital humanities

    Information fusion for automated question answering

    Get PDF
    Until recently, research efforts in automated Question Answering (QA) have mainly focused on getting a good understanding of questions to retrieve correct answers. This includes deep parsing, lookups in ontologies, question typing and machine learning of answer patterns appropriate to question forms. In contrast, I have focused on the analysis of the relationships between answer candidates as provided in open domain QA on multiple documents. I argue that such candidates have intrinsic properties, partly regardless of the question, and those properties can be exploited to provide better quality and more user-oriented answers in QA.Information fusion refers to the technique of merging pieces of information from different sources. In QA over free text, it is motivated by the frequency with which different answer candidates are found in different locations, leading to a multiplicity of answers. The reason for such multiplicity is, in part, the massive amount of data used for answering, and also its unstructured and heterogeneous content: Besides am¬ biguities in user questions leading to heterogeneity in extractions, systems have to deal with redundancy, granularity and possible contradictory information. Hence the need for answer candidate comparison. While frequency has proved to be a significant char¬ acteristic of a correct answer, I evaluate the value of other relationships characterizing answer variability and redundancy.Partially inspired by recent developments in multi-document summarization, I re¬ define the concept of "answer" within an engineering approach to QA based on the Model-View-Controller (MVC) pattern of user interface design. An "answer model" is a directed graph in which nodes correspond to entities projected from extractions and edges convey relationships between such nodes. The graph represents the fusion of information contained in the set of extractions. Different views of the answer model can be produced, capturing the fact that the same answer can be expressed and pre¬ sented in various ways: picture, video, sound, written or spoken language, or a formal data structure. Within this framework, an answer is a structured object contained in the model and retrieved by a strategy to build a particular view depending on the end user (or taskj's requirements.I describe shallow techniques to compare entities and enrich the model by discovering four broad categories of relationships between entities in the model: equivalence, inclusion, aggregation and alternative. Quantitatively, answer candidate modeling im¬ proves answer extraction accuracy. It also proves to be more robust to incorrect answer candidates than traditional techniques. Qualitatively, models provide meta-information encoded by relationships that allow shallow reasoning to help organize and generate the final output

    Unlocking Environmental Narratives

    Get PDF
    Understanding the role of humans in environmental change is one of the most pressing challenges of the 21st century. Environmental narratives – written texts with a focus on the environment – offer rich material capturing relationships between people and surroundings. We take advantage of two key opportunities for their computational analysis: massive growth in the availability of digitised contemporary and historical sources, and parallel advances in the computational analysis of natural language. We open by introducing interdisciplinary research questions related to the environment and amenable to analysis through written sources. The reader is then introduced to potential collections of narratives including newspapers, travel diaries, policy documents, scientific proposals and even fiction. We demonstrate the application of a range of approaches to analysing natural language computationally, introducing key ideas through worked examples, and providing access to the sources analysed and accompanying code. The second part of the book is centred around case studies, each applying computational analysis to some aspect of environmental narrative. Themes include the use of language to describe narratives about glaciers, urban gentrification, diversity and writing about nature and ways in which locations are conceptualised and described in nature writing. We close by reviewing the approaches taken, and presenting an interdisciplinary research agenda for future work. The book is designed to be of interest to newcomers to the field and experienced researchers, and set out in a way that it can be used as an accompanying text for graduate level courses in, for example, geography, environmental history or the digital humanities
    corecore