5,031 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Utilisation of metadata fields and query expansion in cross-lingual search of user-generated Internet video

    Get PDF
    Recent years have seen signicant eorts in the area of Cross Language Information Retrieval (CLIR) for text retrieval. This work initially focused on formally published content, but more recently research has begun to concentrate on CLIR for informal social media content. However, despite the current expansion in online multimedia archives, there has been little work on CLIR for this content. While there has been some limited work on Cross-Language Video Retrieval (CLVR) for professional videos, such as documentaries or TV news broadcasts, there has to date, been no signicant investigation of CLVR for the rapidly growing archives of informal user generated (UGC) content. Key differences between such UGC and professionally produced content are the nature and structure of the textual UGC metadata associated with it, as well as the form and quality of the content itself. In this setting, retrieval eectiveness may not only suer from translation errors common to all CLIR tasks, but also recognition errors associated with the automatic speech recognition (ASR) systems used to transcribe the spoken content of the video and with the informality and inconsistency of the associated user-created metadata for each video. This work proposes and evaluates techniques to improve CLIR effectiveness of such noisy UGC content. Our experimental investigation shows that dierent sources of evidence, e.g. the content from dierent elds of the structured metadata, significantly affect CLIR effectiveness. Results from our experiments also show that each metadata eld has a varying robustness to query expansion (QE) and hence can have a negative impact on the CLIR eectiveness. Our work proposes a novel adaptive QE technique that predicts the most reliable source for expansion and shows how this technique can be effective for improving CLIR effectiveness for UGC content

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Towards effective cross-lingual search of user-generated internet speech

    Get PDF
    The very rapid growth in user-generated social spoken content on online platforms is creating new challenges for Spoken Content Retrieval (SCR) technologies. There are many potential choices for how to design a robust SCR framework for UGS content, but the current lack of detailed investigation means that there is a lack of understanding of the specifc challenges, and little or no guidance available to inform these choices. This thesis investigates the challenges of effective SCR for UGS content, and proposes novel SCR methods that are designed to cope with the challenges of UGS content. The work presented in this thesis can be divided into three areas of contribution as follows. The first contribution of this work is critiquing the issues and challenges that in influence the effectiveness of searching UGS content in both mono-lingual and cross-lingual settings. The second contribution is to develop an effective Query Expansion (QE) method for UGS. This research reports that, encountered in UGS content, the variation in the length, quality and structure of the relevant documents can harm the effectiveness of QE techniques across different queries. Seeking to address this issue, this work examines the utilisation of Query Performance Prediction (QPP) techniques for improving QE in UGS, and presents a novel framework specifically designed for predicting of the effectiveness of QE. Thirdly, this work extends the utilisation of QPP in UGS search to improve cross-lingual search for UGS by predicting the translation effectiveness. The thesis proposes novel methods to estimate the quality of translation for cross-lingual UGS search. An empirical evaluation that demonstrates the quality of the proposed method on alternative translation outputs extracted from several Machine Translation (MT) systems developed for this task. The research then shows how this framework can be integrated in cross-lingual UGS search to find relevant translations for improved retrieval performance

    Language Modeling Approaches to Information Retrieval

    Get PDF
    This article surveys recent research in the area of language modeling (sometimes called statistical language modeling) approaches to information retrieval. Language modeling is a formal probabilistic retrieval framework with roots in speech recognition and natural language processing. The underlying assumption of language modeling is that human language generation is a random process; the goal is to model that process via a generative statistical model. In this article, we discuss current research in the application of language modeling to information retrieval, the role of semantics in the language modeling framework, cluster-based language models, use of language modeling for XML retrieval and future trends

    Behavioral Task Modeling for Entity Recommendation

    Get PDF
    Our everyday tasks involve interactions with a wide range of information. The information that we manage is often associated with a task context. However, current computer systems do not organize information in this way, do not help the user find information in task context, but require explicit user actions such as searching and information seeking. We explore the use of task context to guide the delivery of information to the user proactively, that is, to have the right information easily available at the right time. In this thesis, we used two types of novel contextual information: 24/7 behavioral recordings and spoken conversations for task modeling. The task context is created by monitoring the user's information behavior from temporal, social, and topical aspects; that can be contextualized by several entities such as applications, documents, people, time, and various keywords determining the task. By tracking the association amongst the entities, we can infer the user's task context, predict future information access, and proactively retrieve relevant information for the task at hand. The approach is validated with a series of field studies, in which altogether 47 participants voluntarily installed a screen monitoring system on their laptops 24/7 to collect available digital activities, and their spoken conversations were recorded. Different aspects of the data were considered to train the models. In the evaluation, we treated information sourced from several applications, spoken conversations, and various aspects of the data as different kinds of influence on the prediction performance. The combined influences of multiple data sources and aspects were also considered in the models. Our findings revealed that task information could be found in a variety of applications and spoken conversations. In addition, we found that task context models that consider behavioral information captured from the computer screen and spoken conversations could yield a promising improvement in recommendation quality compared to the conventional modeling approach that considered only pre-determined interaction logs, such as query logs or Web browsing history. We also showed how a task context model could support the users' work performance, reducing their effort in searching by ranking and suggesting relevant information. Our results and findings have direct implications for information personalization and recommendation systems that leverage contextual information to predict and proactively present personalized information to the user to improve the interaction experience with the computer systems.JokapÀivÀisiin tehtÀviimme kuuluu vuorovaikutusta monenlaisten tietojen kanssa. Hallitsemamme tiedot liittyvÀt usein johonkin tehtÀvÀkontekstiin. Nykyiset tietokonejÀrjestelmÀt eivÀt kuitenkaan jÀrjestÀ tietoja tÀllÀ tavalla tai auta kÀyttÀjÀÀ löytÀmÀÀn tietoja tehtÀvÀkontekstista, vaan vaativat kÀyttÀjÀltÀ eksplisiittisiÀ toimia, kuten tietojen hakua ja etsimistÀ. Tutkimme, kuinka tehtÀvÀkontekstia voidaan kÀyttÀÀ ohjaamaan tietojen toimittamista kÀyttÀjÀlle ennakoivasti, eli siten, ettÀ oikeat tiedot olisivat helposti saatavilla oikeaan aikaan. TÀssÀ vÀitöskirjassa kÀytimme kahdenlaisia uusia kontekstuaalisia tietoja: 24/7-kÀyttÀytymistallenteita ja tehtÀvÀn mallintamiseen liittyviÀ puhuttuja keskusteluja. TehtÀvÀkonteksti luodaan seuraamalla kÀyttÀjÀn tietokÀyttÀytymistÀ ajallisista, sosiaalisista ja ajankohtaisista nÀkökulmista katsoen; sitÀ voidaan kuvata useilla entiteeteillÀ, kuten sovelluksilla, asiakirjoilla, henkilöillÀ, ajalla ja erilaisilla tehtÀvÀÀ mÀÀrittÀvillÀ avainsanoilla. Tarkastelemalla nÀiden entiteettien vÀlisiÀ yhteyksiÀ voimme pÀÀtellÀ kÀyttÀjÀn tehtÀvÀkontekstin, ennustaa tulevaa tiedon kÀyttöÀ ja hakea ennakoivasti kÀsillÀ olevaan tehtÀvÀÀn liittyviÀ asiaankuuluvia tietoja. TÀtÀ lÀhestymistapaa arvioitiin kenttÀtutkimuksilla, joissa yhteensÀ 47 osallistujaa asensi vapaaehtoisesti kannettaviin tietokoneisiinsa nÀytönvalvontajÀrjestelmÀn, jolla voitiin 24/7 kerÀtÀ heidÀn saatavilla oleva digitaalinen toimintansa, ja joissa tallennettiin myös heidÀn puhutut keskustelunsa. Mallien kouluttamisessa otettiin huomioon datan eri piirteet. Arvioinnissa kÀsittelimme useista sovelluksista, puhutuista keskusteluista ja datan eri piirteistÀ saatuja tietoja erilaisina vaikutuksina ennusteiden toimivuuteen. Malleissa otettiin huomioon myös useiden tietolÀhteiden ja nÀkökohtien yhteisvaikutukset. Havaintomme paljastivat, ettÀ tehtÀvÀtietoja löytyi useista sovelluksista ja puhutuista keskusteluista. LisÀksi havaitsimme, ettÀ tehtÀvÀkontekstimallit, joissa otetaan huomioon tietokoneen nÀytöltÀ ja puhutuista keskusteluista saadut kÀyttÀytymistiedot, voivat parantaa suositusten laatua verrattuna tavanomaiseen mallinnustapaan, jossa tarkastellaan vain ennalta mÀÀritettyjÀ vuorovaikutuslokeja, kuten kyselylokeja tai verkonselaushistoriaa. Osoitimme myös, miten tehtÀvÀkontekstimalli pystyi tukemaan kÀyttÀjien suoritusta ja vÀhentÀmÀÀn heidÀn hakuihin tarvitsemaansa työpanosta jÀrjestÀmÀllÀ hakutuloksia ja ehdottamalla heille asiaankuuluvia tietoja. Tuloksillamme ja havainnoillamme on suoria vaikutuksia tietojen personointi- ja suositusjÀrjestelmiin, jotka hyödyntÀvÀt kontekstuaalista tietoa ennustaakseen ja esittÀÀkseen ennakoivasti personoituja tietoja kÀyttÀjÀlle ja nÀin parantaakseen vuorovaikutuskokemusta tietokonejÀrjestelmien kanssa

    Affect-based indexing and retrieval of multimedia data

    Get PDF
    Digital multimedia systems are creating many new opportunities for rapid access to content archives. In order to explore these collections using search, the content must be annotated with significant features. An important and often overlooked aspect o f human interpretation o f multimedia data is the affective dimension. The hypothesis o f this thesis is that affective labels o f content can be extracted automatically from within multimedia data streams, and that these can then be used for content-based retrieval and browsing. A novel system is presented for extracting affective features from video content and mapping it onto a set o f keywords with predetermined emotional interpretations. These labels are then used to demonstrate affect-based retrieval on a range o f feature films. Because o f the subjective nature o f the words people use to describe emotions, an approach towards an open vocabulary query system utilizing the electronic lexical database WordNet is also presented. This gives flexibility for search queries to be extended to include keywords without predetermined emotional interpretations using a word-similarity measure. The thesis presents the framework and design for the affectbased indexing and retrieval system along with experiments, analysis, and conclusions
    • 

    corecore