Search CORE

25 research outputs found

Topic-dependent sentiment analysis of financial blogs

Author: Bermingham Adam
Davy Michael
Ferguson Paul
Gurrin Cathal
O'Hare Neil
Sheridan Páraic
Smeaton Alan F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Relating Web pages to enable information-gathering tasks

Author: Bagchi Amitabha
Lahoti Garima
Publication venue
Publication date: 01/01/2010
Field of study

We argue that relationships between Web pages are functions of the user's intent. We identify a class of Web tasks - information-gathering - that can be facilitated by a search engine that provides links to pages which are related to the page the user is currently viewing. We define three kinds of intentional relationships that correspond to whether the user is a) seeking sources of information, b) reading pages which provide information, or c) surfing through pages as part of an extended information-gathering process. We show that these three relationships can be productively mined using a combination of textual and link information and provide three scoring mechanisms that correspond to them: {\em SeekRel}, {\em FactRel} and {\em SurfRel}. These scoring mechanisms incorporate both textual and link information. We build a set of capacitated subnetworks - each corresponding to a particular keyword - that mirror the interconnection structure of the World Wide Web. The scores are computed by computing flows on these subnetworks. The capacities of the links are derived from the {\em hub} and {\em authority} values of the nodes they connect, following the work of Kleinberg (1998) on assigning authority to pages in hyperlinked environments. We evaluated our scoring mechanism by running experiments on four data sets taken from the Web. We present user evaluations of the relevance of the top results returned by our scoring mechanisms and compare those to the top results returned by Google's Similar Pages feature, and the {\em Companion} algorithm proposed by Dean and Henzinger (1999).Comment: In Proceedings of ACM Hypertext 200

arXiv.org e-Print Archive

CiteSeerX

Personalised Web Search using Browsing History and Domain Knowledge

Author: Mayur Bhosale, Navnath Jagdale, Shailesh Gawali, Virendrakumar Dhotre
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/03/2015
Field of study

Different users have different needs when they submit a query to a web search engine. Personalized web search is able to satisfy individual’s information needs by modeling long-term and short-term user interests based on user past queries, actions and incorporate these in the search process. A Personalized Web Search has various levels of effectiveness for different contexts, queries, users etc. Personalized search has been a most important research area and many techniques have been developed and tested, still many challenges and issues are yet to be explored. This paper proposes a framework for building an Enhanced User Profile by using user's browsing history and improving it using domain knowledge. Enhanced User Profile is used for suggesting relevant web pages to the user. The results of experiments show that the suggestions provided to the user using Enhanced User Profile are better than those obtained by using a User Profile. DOI: 10.17762/ijritcc2321-8169.150315

International Journal on Recent and Innovation Trends in Computing and Communication

Web Structure Mining: Exploring Hyperlinks and Algorithms for Information Retrieval

Author: Ravi K.
Singh Ashutosh Kumar
Publication venue: CSM, Sarawak
Publication date: 01/01/2009
Field of study

This paper focus on the Hyperlink analysis, the algorithms used for link analysis, compare those algorithms and the role of hyperlink analysis in Web searching. In the hyperlink analysis, the number of incoming links to a page and the number of outgoing links from that page will be analyzed and the reliability of the linking will be analyzed. Authorities and Hubs concept of Web pages will be explored. The different algorithms used for Link analysis like PageRank, HITS (Hyperlink-Induced Topic Search) and other algorithms will be discussed and compared. The formula used by those algorithms will be explored

espace@Curtin

Generating Clusters of Duplicate Documents: An Approach Based on Frequent Closed Itemsets

Author: Игнатов Д. И.
Кузнецов С. О.
Объедков С. А.
Самохин М. В.
Publication venue: б. и.
Publication date: 01/01/2005
Field of study

Множество документов в Интернете имеют дубликаты, в связи с чем необходимы средства эффективного вычисления кластеров документов-дубликатов [1-5, 8-10, 13-14]. В работе исследуется применение алгоритмов Data Mining для поиска кластеров дубликатов с использованием синтаксических и лексических методов составления образов документов. На основе экспериментальной работы делаются некоторые выводы о способе выбора параметров методов.A vast amount of documents in the Web have duplicates, which necessitates creation of efficient methods for computing clusters of duplicates [1-5, 8-10, 13-14]. In this paper some algorithms of Data Mining are used for constructing clusters of duplicate documents (duplicates), documents being represented by both syntactic and lexical methods. Series of experiments suggest some conclusions about choosing parameters of the methods

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Recommended from our members

Ontology Mapping with domain specific agents in the AQUA Question Answering System

Author: Motta Enrico
Nagy Miklos
Vargas-Vera Maria
Publication venue
Publication date: 01/01/2005
Field of study

This paper describes a domain specific multi-agent ontology-mapping solution in the AQUA query answering system. In order to incorporate uncertainty inherent to the mapping process, the system uses the Dempster-Shafer model for dealing with incomplete and uncertain information produced during the mapping. A novel approach is presented how specialized agents with partial local knowledge of the particular domain achieve ontology mapping without creating global or reference ontology. Our approach is particularly fit for a query-answering scenario, where answer needs to be created in real time to satisfy a query posed by the user

Open Research Online (The Open University)

Cluster Generation and Cluster Labelling for Web Snippets: A Fast and Accurate Hierarchical Solution

Author: Geraci Filippo
Maggini Marco
Pellegrini Marco
Sebastiani Fabrizio
Publication venue
Publication date: 01/01/2006
Field of study

This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to her information need. Strik- ing the right balance between running time and cluster well- formedness was a key point in the design of our system. Both the clustering and the labelling tasks are performed on the ?y by processing only the snippets provided by the auxil- iary search engines, and use no external sources of knowl- edge. Clustering is performed by means of a fast version of the furthest-point-?rst algorithm for metric k-center cluster- ing. Cluster labelling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering ef- fectiveness of Armil against Vivisimo, the de facto industrial standard in Web snippet clustering, using as benchmark a comprehensive set of snippets obtained from the Open Di- rectory Project hierarchy. According to two widely accepted external\u27 metrics of clustering quality, Armil achieves bet- ter performance levels by 10%. We also report the results of a thorough user evaluation of both the clustering and the cluster labelling algorithms. On a standard 1GHz ma- chine, Armil performs clustering and labelling altogether in less than one second

Archivio della Ricerca - Università degli Studi di Siena

PUblication MAnagement

Investigating the document structure as a source of evidence for multimedia fragment retrieval

Author: Boughanem Mohand
Pinel-Sauvagnat Karen
Torjmen-Khemakhem Mouna
Publication venue: 'Elsevier BV'
Publication date: 01/11/2013
Field of study

International audienceMultimedia objects can be retrieved using their context that can be for instance the text surrounding them in documents. This text may be either near or far from the searched objects. Our goal in this paper is to study the impact, in term of effectiveness, of text position relatively to searched objects. The multimedia objects we consider are described in structured documents such as XML ones. The document structure is therefore exploited to provide this text position in documents. Although structural information has been shown to be an effective source of evidence in textual information retrieval, only a few works investigated its interest in multimedia retrieval. More precisely, the task we are interested in this paper is to retrieve multimedia fragments (i.e. XML elements having at least one multimedia object). Our general approach is built on two steps: we first retrieve XML elements containing multimedia objects, and we then explore the surrounding information to retrieve relevant multimedia fragments. In both cases, we study the impact of the surrounding information using the documents structure.Our work is carried out on images, but it can be extended to any other media, since the physical content of multimedia objects is not used. We conducted several experiments in the context of the Multimedia track of the INEX evaluation campaign. Results showed that structural evidences are of high interest to tune the importance of textual context for multimedia retrieval. Moreover, the proposed approach outperforms state of the art approaches

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes