Search CORE

5 research outputs found

Finding related sentence pairs in MEDLINE

Author: C Friedman
CJ Rijsbergen van
DK Milton
EW Sayers
GW Furnas
H Zou
KL Currie
L Smith
L Smith
Larry H. Smith
P Langley
Q Ma
R Artstein
R Wadden
S Jellali
T Dietterich
V Vapnik
W Wilbur
W. John Wilbur
WG Kim
WJ Wilbur
WJ Wilbur
Z Lu
Publication venue: Springer Netherlands
Publication date: 01/01/2010
Field of study

We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior. The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average. We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure

Crossref

Springer - Publisher Connector

PubMed Central

Automatic Term Identification for Bibliometric Mapping

Author: Buter R.K. (Reindert)
Eck N.J.P. (Nees Jan) van
Noyons E.C.M. (Ed)
Waltman L. (Ludo)
Publication venue: Eck, N.J.P. (Nees Jan) van
Publication date: 03/12/2008
Field of study

A term map is a map that visualizes the structure of a scientific field by showing the relations between important terms in the field. The terms shown in a term map are usually selected manually with the help of domain experts. Manual term selection has the disadvantages of being subjective and labor-intensive. To overcome these disadvantages, we propose a methodology for automatic term identification and we use this methodology to select the terms to be included in a term map. To evaluate the proposed methodology, we use it to construct a term map of the field of operations research. The quality of the map is assessed by a number of operations research experts. It turns out that in general the proposed methodology performs quite well

Erasmus University Digital Repository

Scientific structures in context : identification and use of structures, context, and new developments in science

Author: Buter R.K.
Publication venue
Publication date: 26/04/2012
Field of study

The use and visualisation of structures in science (sets of related publications, authors, words) is investigated in a number of applications. We hold that the common ground of a field can explain the use and applicability of these structures.LEI Universiteit LeidenFSW - CWTS - Ou

Leiden University Scholary Publications

Unsupervised Graph-Based Similarity Learning Using Heterogeneous Features.

Author: Muthukrishnan Pradeep
Publication venue
Publication date: 01/01/2011
Field of study

Relational data refers to data that contains explicit relations among objects. Nowadays, relational data are universal and have a broad appeal in many different application domains. The problem of estimating similarity between objects is a core requirement for many standard Machine Learning (ML), Natural Language Processing (NLP) and Information Retrieval (IR) problems such as clustering, classiffication, word sense disambiguation, etc. Traditional machine learning approaches represent the data using simple, concise representations such as feature vectors. While this works very well for homogeneous data, i.e, data with a single feature type such as text, it does not exploit the availability of dfferent feature types fully. For example, scientic publications have text, citations, authorship information, venue information. Each of the features can be used for estimating similarity. Representing such objects has been a key issue in efficient mining (Getoor and Taskar, 2007). In this thesis, we propose natural representations for relational data using multiple, connected layers of graphs; one for each feature type. Also, we propose novel algorithms for estimating similarity using multiple heterogeneous features. Also, we present novel algorithms for tasks like topic detection and music recommendation using the estimated similarity measure. We demonstrate superior performance of the proposed algorithms (root mean squared error of 24.81 on the Yahoo! KDD Music recommendation data set and classiffication accuracy of 88% on the ACL Anthology Network data set) over many of the state of the art algorithms, such as Latent Semantic Analysis (LSA), Multiple Kernel Learning (MKL) and spectral clustering and baselines on large, standard data sets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89824/1/mpradeep_1.pd

Deep Blue Documents at the University of Michigan

Efficient query expansion

Author: Billerbeck B
Publication venue: RMIT University
Publication date: 01/01/2005
Field of study

Hundreds of millions of users each day search the web and other repositories to meet their information needs. However, queries can fail to find documents due to a mismatch in terminology. Query expansion seeks to address this problem by automatically adding terms from highly ranked documents to the query. While query expansion has been shown to be effective at improving query performance, the gain in effectiveness comes at a cost: expansion is slow and resource-intensive. Current techniques for query expansion use fixed values for key parameters, determined by tuning on test collections. We show that these parameters may not be generally applicable, and, more significantly, that the assumption that the same parameter settings can be used for all queries is invalid. Using detailed experiments, we demonstrate that new methods for choosing parameters must be found. In conventional approaches to query expansion, the additional terms are selected from highly ranked documents returned from an initial retrieval run. We demonstrate a new method of obtaining expansion terms, based on past user queries that are associated with documents in the collection. The most effective query expansion methods rely on costly retrieval and processing of feedback documents. We explore alternative methods for reducing query-evaluation costs, and propose a new method based on keeping a brief summary of each document in memory. This method allows query expansion to proceed three times faster than previously, while approximating the effectiveness of standard expansion. We investigate the use of document expansion, in which documents are augmented with related terms extracted from the corpus during indexing, as an alternative to query expansion. The overheads at query time are small. We propose and explore a range of corpus-based document expansion techniques and compare them to corpus-based query expansion on TREC data. These experiments show that document expansion delivers at best limited benefits, while query expansion, including standard techniques and efficient approaches described in recent work, usually delivers good gains. We conclude that document expansion is unpromising, but it is likely that the efficiency of query expansion can be further improved

CiteSeerX

RMIT Research Repository