Search CORE

5 research outputs found

Улога истраживача у креирању корпуса конверзационих наратива

Author: Ћирковић Светлана
Publication venue: Бања Лука : Филолошки факултет у Бањој Луци
Publication date: 01/01/2015
Field of study

У овом прилогу се антролингвистичком анализом примера издвојених из разговора који чине корпус формиран за потребе студије Стереотип времена у дискурсу расељених лица са Косова и Метохије указује на улогу истраживача у вођењу теренских разговора са расеље- ним лицима. Фокус анализе је на интервенцијама истраживача у разговорима које су, с једне стране, имале важну улогу у формирању комлетног корпуса, а с друге, те интервенције показују неке од разлика у концептуализацији света истраживача и саговорника, о којима се није унапред могло размишљати и које су уочене тек након анализе транскрипата

Serbian Academy of Science and Arts Digital Archive (DAIS)

Extracting Multilingual Topics from Unaligned Comparable Corpora

Author: C. Zhai
D.M. Blei
D.M. Blei
D.M. Blei
F.J. Och
J. Xu
N. Bel
S.T. Dumais
T.L. Griffiths
X. Wang
Publication venue
Publication date: 01/01/2010
Field of study

Topic models have been studied extensively in the context of monolingual corpora. Though there are some attempts to mine topical structure from cross-lingual corpora, they require clues about document alignments. In this paper we present a generative model called JointLDA which uses a bilingual dictionary to mine multilingual topics from an unaligned corpus. Experiments conducted on different data sets confirm our conjecture that jointly modeling the cross-lingual corpora offers several advantages compared to individual monolingual models. Since the JointLDA model merges related topics in different languages into a single multilingual topic: a) it can fit the data with relatively fewer topics. b) it has the ability to predict related words from a language different than that of the given document. In fact it has better predictive power compared to the bag-of-word based translation model leaving the possibility for JointLDA to be preferred over bag-of-word model for cross-lingual IR applications. We also found that the monolingual models learnt while optimizing the cross-lingual copora are more effective than the corresponding LDA models

CiteSeerX

Crossref

Classifying Bias in Large Multilingual Corpora via Crowdsourcing and Topic Modeling

Author: Caljean Brianna
Calvert Katherine
Chang Ashley
Frank Elliot
Garay Jáuregui Rosana
Palo Geoffrey
Rinker Ryan
Weakly Gareth
Wolfrey Nicolette
Zhang William
Publication venue
Publication date: 01/01/2018
Field of study

Our project extends previous algorithmic approaches to finding bias in large text corpora. We used multilingual topic modeling to examine language-specific bias in the English, Spanish, and Russian versions of Wikipedia. In particular, we placed Spanish articles discussing the Cold War on a Russian-English viewpoint spectrum based on similarity in topic distribution. We then crowdsourced human annotations of Spanish Wikipedia articles for comparison to the topic model. Our hypothesis was that human annotators and topic modeling algorithms would provide correlated results for bias. However, that was not the case. Our annotators indicated that humans were more perceptive of sentiment in article text than topic distribution, which suggests that our classifier provides a different perspective on a text’s bias

Digital Repository at the University of Maryland