Search CORE

1,079 research outputs found

Do peers see more in a paper than its authors?

Author: Divoli Anna
Hearst Marti
Nakov Preslav
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full-text is not present in the abstract? What should a good summary contain that is not already in the abstract? Do authors and peers see an article differently? We answer these questions by comparing the information content of the abstract to that in citances-sentences containing citations to that article. We contrast the important points of an article as judged by its authors versus as seen by peers. Focusing on the area of molecular interactions, we perform manual and automatic analysis, and we find that the set of all citances to a target article not only covers most information (entities, functions, experimental methods, and other biological concepts) found in its abstract, but also contains 20% more concepts. We further present a detailed summary of the differences across information types, and we examine the effects other citations and time have on the content of citances

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Towards semantic web mining

Author: A. Hotho
A. Maedche
A. Maedche
B. Berendt
B. Berendt
B. Ganter
B. Ganter
B. Mobasher
D. Hand
E.H. Chi
G. Chang
J. Hobbs
J. M. Kleinberg
J. Srivastava
L. Dehaspe
M. Craven
M. Fernández
M. Kifer
M. Spiliopoulou
R. Cooley
S. Chakrabarti
S. Chakrabarti
W. Lin
Publication venue: Springer
Publication date: 01/01/2002
Field of study

Semantic Web Mining aims at combining the two fast-developing research areas Semantic Web and Web Mining. The idea is to improve, on the one hand, the results of Web Mining by exploiting the new semantic structures in the Web; and to make use of Web Mining, on the other hand, for building up the Semantic Web. This paper gives an overview of where the two areas meet today, and sketches ways of how a closer integration could be profitable

CiteSeerX

Crossref

DSpace an der Universität Kassel

Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction

Author: Chakrabarti Soumen
Publication venue
Publication date: 01/01/2011
Field of study

Topic distillation is the process of finding authoritative Web pages and comprehensive “hubs” which reciprocally endorse each other and are relevant to a given query. Hyperlink-based topic distillation has been traditionally applied to a macroscopic Web model where documents are nodes in a directed graph and hyperlinks are edges. Macroscopic models miss valuable clues such as banners, navigation panels, and template-based inclusions, which are embedded in HTML pages using markup tags. Consequently, results of macroscopic distillation algorithms have been deteriorating in quality as Web pages are becoming more complex. We propose a uniform fine-grained model for the Web in which pages are represented by their tag trees (also called their Document Object Models or DOMs) and these DOM trees are interconnected by ordinary hyperlinks. Surprisingly, macroscopic distillation algorithms do not work in the finegrained scenario. We present a new algorithm suitable for the fine-grained model. It can dis-aggregate hubs into coherent regions by segmenting their DOM trees. Mutual endorsement between hubs and authorities involve these regions, rather than single nodes representing complete hubs. Anecdotes and measurements using a 28-query, 366000-document benchmark suite, used in earlier topic distillation research, reveal two benefits from the new algorithm: distillation quality improves and a by-product of distillation is the ability to extract relevant snippets from hubs which are only partially relevant to the query

Automatic cataloguing of Web resources on a personalized taxonomy

Author: Bidarra José
Escudeiro Paula
Publication venue
Publication date: 01/01/2008
Field of study

Information overload is a major concerns retrieval systems face. Information is ubiquitous, available from many distinct sources and the main issue is to get just the right piece of information that might satisfy our specific needs. Many of these sources organize their informational resources on a given ontology. However, these ontologies are static and do not allow for personalization. This fact degrades the value of the service if there is no easy mental mapping between user specific needs and the general source ontology. Organizing informational resources according to particular needs might increase users’ satisfaction and save their time. In this paper we present a methodology to filter and organize informational resources according to users’ interests, thus granting users with a personalized edition of the resource, especially tailored towards their specific needs. We believe that this methodology may be applied in educational scenarios, where we have a repository of educational objects that are organized according to specific objectives, automatically producing specific courseware. Our experimental results confirm that it is possible to automatically personalize document resources with high precision at a reduced editor workload.This work is supported by the POSC/EIA/58367/2004/Site-o-Matic Project (Fundação Ciência e Tecnologia), FEDER e Programa de Financiamento Plurianual de Unidades de I & D. We would like to thank the Expresso newspaper for their support throughout this work.info:eu-repo/semantics/publishedVersio

Repositório Aberto da Universidade Aberta

Accelerated focused crawling through online relevance feedback

Author: Chakrabarti Soumen
Mallela Subramanyam
Punera Kunal
Publication venue
Publication date: 01/01/2002
Field of study

The organization of HTML into a tag tree structure, which is rendered by browsers as roughly rectangular regions with embedded text and HREF links, greatly helps surfers locate and click on links that best satisfy their information need. Can an automatic program emulate this human behavior and thereby learn to predict the relevance of an unseen HREF target page w.r.t. an information need, based on information limited to the HREF source page? Such a capability would be of great interest in focused crawling and resource discovery, because it can fine-tune the priority of unvisited URLs in the crawl frontier, and reduce the number of irrelevant pages which are fetched and discarded

TR-2004017: Towards a Formal Concept Analysis Approach to Exploring Communities on the World Wide Web

Author: Haralick Robert M.
Rome Jayson E.
Publication venue: CUNY Academic Works
Publication date: 01/01/2004
Field of study

City University of New York

Topic-dependent sentiment analysis of financial blogs

Author: Bermingham Adam
Davy Michael
Ferguson Paul
Gurrin Cathal
O'Hare Neil
Sheridan Páraic
Smeaton Alan F.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

While most work in sentiment analysis in the financial domain has focused on the use of content from traditional finance news, in this work we concentrate on more subjective sources of information, blogs. We aim to automatically determine the sentiment of financial bloggers towards companies and their stocks. To do this we develop a corpus of financial blogs, annotated with polarity of sentiment with respect to a number of companies. We conduct an analysis of the annotated corpus, from which we show there is a significant level of topic shift within this collection, and also illustrate the difficulty that human annotators have when annotating certain sentiment categories. To deal with the problem of topic shift within blog articles, we propose text extraction techniques to create topic-specific sub-documents, which we use to train a sentiment classifier. We show that such approaches provide a substantial improvement over full documentclassification and that word-based approaches perform better than sentence-based or paragraph-based approaches

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Data DNA: The Next Generation of Statistical Metadata

Author: Cynthia M. Taeuber
Daniel W. Gillman
Laura Smith
Publication venue: 'Brookings Institution Press'
Publication date: 03/03/2007
Field of study

Describes the components of a complete statistical metadata system and suggests ways to create and structure metadata for better access and understanding of data sets by diverse users

IssueLab