323 research outputs found
HiER 2015. Proceedings des 9. Hildesheimer Evaluierungs- und Retrievalworkshop
Die Digitalisierung formt unsere Informationsumwelten. Disruptive Technologien dringen verstärkt und immer schneller in unseren Alltag ein und verändern unser Informations- und Kommunikationsverhalten. Informationsmärkte wandeln sich. Der 9. Hildesheimer Evaluierungs- und Retrievalworkshop HIER 2015 thematisiert die Gestaltung und Evaluierung von Informationssystemen vor dem Hintergrund der sich beschleunigenden Digitalisierung. Im Fokus stehen die folgenden Themen: Digital Humanities, Internetsuche und Online Marketing, Information Seeking und nutzerzentrierte Entwicklung, E-Learning
Classifying Attitude by Topic Aspect for English and Chinese Document Collections
The goal of this dissertation is to explore the design of tools to help users make sense of subjective information in English and Chinese by comparing attitudes on aspects of a topic in English and Chinese document collections. This involves two coupled challenges: topic aspect focus and attitude characterization. The topic aspect focus is specified by using information retrieval techniques to obtain documents on a topic that are of interest to a user and then
allowing the user to designate a few segments of those documents to serve as examples for aspects that she wishes to see characterized. A novel feature of this work is that the examples can be drawn from documents in two languages (English and Chinese). A bilingual aspect classifier which applies monolingual and cross-language classification techniques is used to assemble automatically a large set of document segments on those same aspects. A test collection was designed for aspect classification by annotating consecutive sentences in documents from the Topic Detection and Tracking collections as aspect instances. Experiments show that classification effectiveness can often be
increased by using training examples from both languages.
Attitude characterization is achieved by classifiers which determine the subjectivity and polarity of document segments. Sentence attitude classification is the focus of the experiments in
the dissertation because the best presently available test collection for Chinese attitude classification (the NTCIR-6 Chinese Opinion Analysis Pilot Task) is focused on sentence-level
classification. A large Chinese sentiment lexicon was constructed by leveraging existing Chinese and English lexical resources, and an
existing character-based approach for estimating the semantic orientation of other Chinese words was extended. A shallow linguistic analysis approach was adopted to classify the subjectivity and polarity of a sentence. Using the large sentiment lexicon with appropriate handling of negation, and leveraging sentence subjectivity density, sentence positivity and negativity, the resulting sentence attitude classifier was more effective than the best previously reported systems
Can humain association norm evaluate latent semantic analysis?
This paper presents the comparison of word association norm created by a psycholinguistic experiment to association lists generated by algorithms operating on text corpora. We compare lists generated by Church and Hanks algorithm and lists generated by LSA algorithm. An argument is presented on how those automatically generated lists reflect real semantic relations
Exploiting the conceptual space in hybrid recommender systems: a semantic-based approach
Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, octubre de 200
Social-media monitoring for cold-start recommendations
Generating personalized movie recommendations to users is a problem that most commonly relies on user-movie ratings. These ratings are generally used either to understand the user preferences or to recommend movies that users with similar rating patterns have rated highly. However, movie recommenders are often subject to the Cold-Start problem: new movies have not been rated by anyone, so, they will not be recommended to anyone; likewise, the preferences of new users who have not rated any movie cannot be learned. In parallel, Social-Media platforms, such as Twitter, collect great amounts of user feedback on movies, as these are very popular nowadays. This thesis proposes to explore feedback shared on Twitter to predict the popularity of new movies and show how it can be used to tackle the Cold-Start problem. It also proposes, at a finer grain, to explore the reputation of directors and actors on IMDb to tackle the Cold-Start problem. To assess these aspects, a Reputation-enhanced Recommendation Algorithm is implemented and evaluated on a crawled IMDb dataset with previous user ratings of old movies,together with Twitter data crawled from January 2014 to March 2014, to recommend 60 movies affected by the Cold-Start problem. Twitter revealed to be a strong reputation predictor, and the Reputation-enhanced Recommendation Algorithm improved over several baseline methods. Additionally, the algorithm also proved to be useful when recommending movies in an extreme Cold-Start scenario, where both new movies and users are affected by the Cold-Start problem
Unsupervised Recognition of Motion Verbs Metaphoricity in Atyical Political Dialogues
This thesis deals with the unsupervised recognition of the novel metaphorical use of lexical items in dialogical naturally-occurring political texts without the recourse to task-specific hand-crafted knowledge. The focus of metaphorical analysis is represented by the class of verbs of motion identified by Beth Levin. These lexical items are investigated in the atypical political genre of the White House Press Briefings due to their role in the communication strategies deployed in public and political discourse. The Computational White House press Briefings (CompWHoB) corpus, a large resource developed as one of the main objectives of the present work, is used for the extraction of the press briefings including the lexical items under analysis. The metaphor recognition of the motion verbs is addressed employing unsupervised techniques which theoretical foundations primarily lie in the Distributional Hypothesis theory, i.e. word embeddings and topic models. Three algorithms are developed for the task, combining the Word2Vec and the Latent Dirichlet Allocation models, and based on two approaches representing their foundational theoretical framework. The first one is defined as "local" and leverages the syntactic relations of the verb of motion with its direct object for the detection of metaphoricity. The second one, termed as "global", drifts away from the use of the syntactic knowledge as feature of the system hence only using the information inferred from the discourse context. The three systems and their corresponding approaches are evaluated against 1220 instances of verbs of motion annotated by human judges according to their metaphoricity. Results show that the global approach performs poorly compared to the other two models also implementing the local approach, leading to the conclusion that a syntax-agnostic system is still far from reaching a significant performance. The evaluation of the local approach yields instead promising results, proving the importance of endowing the machine with syntactic knowledge as also confirmed by a qualitative analysis on the influence of the linguistic properties of metaphorical utterances
- …