Search CORE

323 research outputs found

A link to the past:Constructing historical social networks from unstructured data

Author: van de Camp Matje
Publication venue: Tilburg University
Publication date: 01/01/2016
Field of study

Tilburg University Repository

Entity-Centric Text Mining for Historical Documents

Author: Coll Ardanuy Maria
Publication venue
Publication date: 07/07/2017
Field of study

Georg-August-University Göttingen

HiER 2015. Proceedings des 9. Hildesheimer Evaluierungs- und Retrievalworkshop

Author: Elbeshausen Stefanie
Faaß Getrud
Friesbaum Joachim
Heuwing Ben
Jürgens Julia
Publication venue: Hildesheim : Universitätsverlag Hildesheim
Publication date: 01/01/2015
Field of study

Die Digitalisierung formt unsere Informationsumwelten. Disruptive Technologien dringen verstärkt und immer schneller in unseren Alltag ein und verändern unser Informations- und Kommunikationsverhalten. Informationsmärkte wandeln sich. Der 9. Hildesheimer Evaluierungs- und Retrievalworkshop HIER 2015 thematisiert die Gestaltung und Evaluierung von Informationssystemen vor dem Hintergrund der sich beschleunigenden Digitalisierung. Im Fokus stehen die folgenden Themen: Digital Humanities, Internetsuche und Online Marketing, Information Seeking und nutzerzentrierte Entwicklung, E-Learning

University of Hildesheim

Classifying Attitude by Topic Aspect for English and Chinese Document Collections

Author: Wu Yejun
Publication venue
Publication date: 25/04/2008
Field of study

The goal of this dissertation is to explore the design of tools to help users make sense of subjective information in English and Chinese by comparing attitudes on aspects of a topic in English and Chinese document collections. This involves two coupled challenges: topic aspect focus and attitude characterization. The topic aspect focus is specified by using information retrieval techniques to obtain documents on a topic that are of interest to a user and then allowing the user to designate a few segments of those documents to serve as examples for aspects that she wishes to see characterized. A novel feature of this work is that the examples can be drawn from documents in two languages (English and Chinese). A bilingual aspect classifier which applies monolingual and cross-language classification techniques is used to assemble automatically a large set of document segments on those same aspects. A test collection was designed for aspect classification by annotating consecutive sentences in documents from the Topic Detection and Tracking collections as aspect instances. Experiments show that classification effectiveness can often be increased by using training examples from both languages. Attitude characterization is achieved by classifiers which determine the subjectivity and polarity of document segments. Sentence attitude classification is the focus of the experiments in the dissertation because the best presently available test collection for Chinese attitude classification (the NTCIR-6 Chinese Opinion Analysis Pilot Task) is focused on sentence-level classification. A large Chinese sentiment lexicon was constructed by leveraging existing Chinese and English lexical resources, and an existing character-based approach for estimating the semantic orientation of other Chinese words was extended. A shallow linguistic analysis approach was adopted to classify the subjectivity and polarity of a sentence. Using the large sentiment lexicon with appropriate handling of negation, and leveraging sentence subjectivity density, sentence positivity and negativity, the resulting sentence attitude classifier was more effective than the best previously reported systems

Digital Repository at the University of Maryland

Can humain association norm evaluate latent semantic analysis?

Author: Gatkowska Izabela
Korzycki Michał
Lubaszewski Wiesław
Publication venue: [s.n.]
Publication date: 01/01/2013
Field of study

This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations

Jagiellonian Univeristy Repository

Exploiting the conceptual space in hybrid recommender systems: a semantic-based approach

Author: Cantador Iván
Publication venue
Publication date: 01/01/2008
Field of study

Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, octubre de 200

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

Context & Semantics in News & Web Search

Author: Daan Odijk
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Social-media monitoring for cold-start recommendations

Author: Santos João Manuel Espada dos
Publication venue
Publication date: 01/11/2014
Field of study

Generating personalized movie recommendations to users is a problem that most commonly relies on user-movie ratings. These ratings are generally used either to understand the user preferences or to recommend movies that users with similar rating patterns have rated highly. However, movie recommenders are often subject to the Cold-Start problem: new movies have not been rated by anyone, so, they will not be recommended to anyone; likewise, the preferences of new users who have not rated any movie cannot be learned. In parallel, Social-Media platforms, such as Twitter, collect great amounts of user feedback on movies, as these are very popular nowadays. This thesis proposes to explore feedback shared on Twitter to predict the popularity of new movies and show how it can be used to tackle the Cold-Start problem. It also proposes, at a finer grain, to explore the reputation of directors and actors on IMDb to tackle the Cold-Start problem. To assess these aspects, a Reputation-enhanced Recommendation Algorithm is implemented and evaluated on a crawled IMDb dataset with previous user ratings of old movies,together with Twitter data crawled from January 2014 to March 2014, to recommend 60 movies affected by the Cold-Start problem. Twitter revealed to be a strong reputation predictor, and the Reputation-enhanced Recommendation Algorithm improved over several baseline methods. Additionally, the algorithm also proved to be useful when recommending movies in an extreme Cold-Start scenario, where both new movies and users are affected by the Cold-Start problem

Repositório da Universidade Nova de Lisboa

Unsupervised Recognition of Motion Verbs Metaphoricity in Atyical Political Dialogues

Author: Esposito Fabrizio
Publication venue
Publication date: 10/10/2017
Field of study

This thesis deals with the unsupervised recognition of the novel metaphorical use of lexical items in dialogical naturally-occurring political texts without the recourse to task-specific hand-crafted knowledge. The focus of metaphorical analysis is represented by the class of verbs of motion identified by Beth Levin. These lexical items are investigated in the atypical political genre of the White House Press Briefings due to their role in the communication strategies deployed in public and political discourse. The Computational White House press Briefings (CompWHoB) corpus, a large resource developed as one of the main objectives of the present work, is used for the extraction of the press briefings including the lexical items under analysis. The metaphor recognition of the motion verbs is addressed employing unsupervised techniques which theoretical foundations primarily lie in the Distributional Hypothesis theory, i.e. word embeddings and topic models. Three algorithms are developed for the task, combining the Word2Vec and the Latent Dirichlet Allocation models, and based on two approaches representing their foundational theoretical framework. The first one is defined as "local" and leverages the syntactic relations of the verb of motion with its direct object for the detection of metaphoricity. The second one, termed as "global", drifts away from the use of the syntactic knowledge as feature of the system hence only using the information inferred from the discourse context. The three systems and their corresponding approaches are evaluated against 1220 instances of verbs of motion annotated by human judges according to their metaphoricity. Results show that the global approach performs poorly compared to the other two models also implementing the local approach, leading to the conclusion that a syntax-agnostic system is still far from reaching a significant performance. The evaluation of the local approach yields instead promising results, proving the importance of endowing the machine with syntactic knowledge as also confirmed by a qualitative analysis on the influence of the linguistic properties of metaphorical utterances

Università degli Studi di Napoli Federico Il Open Archive