Search CORE

4,917 research outputs found

Robust audio indexing for Dutch spoken-word collections

Author: Huijbregts Marijn
Jong Franciska de
Leeuwen David van
Ordelman Roeland
Publication venue: KNAW
Publication date: 01/01/2005
Field of study

Abstract—Whereas the growth of storage capacity is in accordance with widely acknowledged predictions, the possibilities to index and access the archives created is lagging behind. This is especially the case in the oral history domain and much of the rich content in these collections runs the risk to remain inaccessible for lack of robust search technologies. This paper addresses the history and development of robust audio indexing technology for searching Dutch spoken-word collections and compares Dutch audio indexing in the well-studied broadcast news domain with an oral-history case-study. It is concluded that despite significant advances in Dutch audio indexing technology and demonstrated applicability in several domains, further research is indispensable for successful automatic disclosure of spoken-word collections

University of Twente Research Information

Unravelling the voice of Willem Frederik Hermans: an oral history indexing case study

Author: Huijbregts Marijn
Jong Franciska de
Ordelman Roeland
Publication venue: University of Twente, Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/2009
Field of study

University of Twente Research Information

Towards Query Logs for Privacy Studies: On Deriving Search Queries from Questions

Author: Biega Asia J.
Roy Rishiraj Saha
Schmidt Jana
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Translating verbose information needs into crisp search queries is a phenomenon that is ubiquitous but hardly understood. Insights into this process could be valuable in several applications, including synthesizing large privacy-friendly query logs from public Web sources which are readily available to the academic research community. In this work, we take a step towards understanding query formulation by tapping into the rich potential of community question answering (CQA) forums. Specifically, we sample natural language (NL) questions spanning diverse themes from the Stack Exchange platform, and conduct a large-scale conversion experiment where crowdworkers submit search queries they would use when looking for equivalent information. We provide a careful analysis of this data, accounting for possible sources of bias during conversion, along with insights into user-specific linguistic patterns and search behaviors. We release a dataset of 7,000 question-query pairs from this study to facilitate further research on query understanding.Comment: ECIR 2020 Short Pape

arXiv.org e-Print Archive

MPG.PuRe

The many aspects of fine-grained sentiment analysis : an overview of the task and its main challenges

Author: De Clercq Orphée
Publication venue: IARIA
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

Transductive Learning with String Kernels for Cross-Domain Text Classification

Author: AM Fernández
D Bollegala
G Ifrim
H Lodhi
J Shawe-Taylor
M Franco-Salvador
M Long
Marius Popescu
RT Ionescu
RT Ionescu
RT Ionescu
TG Dietterich
Publication venue
Publication date: 02/11/2018
Field of study

For many text classification tasks, there is a major problem posed by the lack of labeled data in a target domain. Although classifiers for a target domain can be trained on labeled text data from a related source domain, the accuracy of such classifiers is usually lower in the cross-domain setting. Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as native language identification or automatic essay scoring. Moreover, classifiers based on string kernels have been found to be robust to the distribution gap between different domains. In this paper, we formally describe an algorithm composed of two simple yet effective transductive learning approaches to further improve the results of string kernels in cross-domain settings. By adapting string kernels to the test set without using the ground-truth test labels, we report significantly better accuracy rates in cross-domain English polarity classification.Comment: Accepted at ICONIP 2018. arXiv admin note: substantial text overlap with arXiv:1808.0840

arXiv.org e-Print Archive

Crossref

Recommended from our members

User sentiment detection: a YouTube use case

Author: Breslin John G.
Choudhury Smitashree
Publication venue
Publication date: 01/08/2010
Field of study

In this paper we propose an unsupervised lexicon-based approach to detect the sentiment polarity of user comments in YouTube. Polarity detection in social media content is challenging not only because of the existing limitations in current sentiment dictionaries but also due to the informal linguistic styles used by users. Present dictionaries fail to capture the sentiments of community-created terms. To address the challenge we adopted a data-driven approach and prepared a social media specific list of terms and phrases expressing user sentiments and opinions. Experimental evaluation shows the combinatorial approach has greater potential. Finally, we discuss many research challenges involving social media sentiment analysis

Open Research Online