11,756 research outputs found
A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities
Analysis of multivariate data sets from e.g. microarray studies frequently
results in lists of genes which are associated with some response of interest.
The biological interpretation is often complicated by the statistical
instability of the obtained gene lists with respect to sampling variations,
which may partly be due to the functional redundancy among genes, implying that
multiple genes can play exchangeable roles in the cell. In this paper we use
the concept of exchangeability of random variables to model this functional
redundancy and thereby account for the instability attributable to sampling
variations. We present a flexible framework to incorporate the exchangeability
into the representation of lists. The proposed framework supports
straightforward robust comparison between any two lists. It can also be used to
generate new, more stable gene rankings incorporating more information from the
experimental data. Using a microarray data set from lung cancer patients we
show that the proposed method provides more robust gene rankings than existing
methods with respect to sampling variations, without compromising the
biological significance
An LSH Index for Computing Kendall's Tau over Top-k Lists
We consider the problem of similarity search within a set of top-k lists
under the Kendall's Tau distance function. This distance describes how related
two rankings are in terms of concordantly and discordantly ordered items. As
top-k lists are usually very short compared to the global domain of possible
items to be ranked, creating an inverted index to look up overlapping lists is
possible but does not capture tight enough the similarity measure. In this
work, we investigate locality sensitive hashing schemes for the Kendall's Tau
distance and evaluate the proposed methods using two real-world datasets.Comment: 6 pages, 8 subfigures, presented in Seventeenth International
Workshop on the Web and Databases (WebDB 2014) co-located with ACM SIGMOD201
Diversity and Polarization of Research Performance: Evidence from Hungary
Measuring the intellectual diversity encoded in publication records as a
proxy to the degree of interdisciplinarity has recently received considerable
attention in the science mapping community. The present paper draws upon the
use of the Stirling index as a diversity measure applied to a network model
(customized science map) of research profiles, proposed by several authors. A
modified version of the index is used and compared with the previous versions
on a sample data set in order to rank top Hungarian research organizations
(HROs) according to their research performance diversity. Results, unexpected
in several respects, show that the modified index is a candidate for measuring
the degree of polarization of a research profile. The study also points towards
a possible typology of publication portfolios that instantiate different types
of diversity
Kernel functions based on triplet comparisons
Given only information in the form of similarity triplets "Object A is more
similar to object B than to object C" about a data set, we propose two ways of
defining a kernel function on the data set. While previous approaches construct
a low-dimensional Euclidean embedding of the data set that reflects the given
similarity triplets, we aim at defining kernel functions that correspond to
high-dimensional embeddings. These kernel functions can subsequently be used to
apply any kernel method to the data set
Relation Discovery from Web Data for Competency Management
This paper describes a technique for automatically discovering associations between people and expertise from an analysis of very large data sources (including web pages, blogs and emails), using a family of algorithms that perform accurate named-entity recognition, assign different weights to terms according to an analysis of document structure, and access distances between terms in a document. My contribution is to add a social networking approach called BuddyFinder which relies on associations within a large enterprise-wide "buddy list" to help delimit the search space and also to provide a form of 'social triangulation' whereby the system can discover documents from your colleagues that contain pertinent information about you. This work has been influential in the information retrieval community generally, as it is the basis of a landmark system that achieved overall first place in every category in the Enterprise Search Track of TREC2006
- …