Search CORE

11,756 research outputs found

A framework for list representation, enabling list stabilization through incorporation of gene exchangeabilities

Author: Fontes Magnus
Soneson Charlotte
Publication venue
Publication date: 15/03/2011
Field of study

Analysis of multivariate data sets from e.g. microarray studies frequently results in lists of genes which are associated with some response of interest. The biological interpretation is often complicated by the statistical instability of the obtained gene lists with respect to sampling variations, which may partly be due to the functional redundancy among genes, implying that multiple genes can play exchangeable roles in the cell. In this paper we use the concept of exchangeability of random variables to model this functional redundancy and thereby account for the instability attributable to sampling variations. We present a flexible framework to incorporate the exchangeability into the representation of lists. The proposed framework supports straightforward robust comparison between any two lists. It can also be used to generate new, more stable gene rankings incorporating more information from the experimental data. Using a microarray data set from lung cancer patients we show that the proposed method provides more robust gene rankings than existing methods with respect to sampling variations, without compromising the biological significance

arXiv.org e-Print Archive

Lund University Publications

An LSH Index for Computing Kendall's Tau over Top-k Lists

Author: Michel Sebastian
Pal Koninika
Publication venue
Publication date: 01/01/2014
Field of study

We consider the problem of similarity search within a set of top-k lists under the Kendall's Tau distance function. This distance describes how related two rankings are in terms of concordantly and discordantly ordered items. As top-k lists are usually very short compared to the global domain of possible items to be ranked, creating an inverted index to look up overlapping lists is possible but does not capture tight enough the similarity measure. In this work, we investigate locality sensitive hashing schemes for the Kendall's Tau distance and evaluate the proposed methods using two real-world datasets.Comment: 6 pages, 8 subfigures, presented in Seventeenth International Workshop on the Web and Databases (WebDB 2014) co-located with ACM SIGMOD201

arXiv.org e-Print Archive

MPG.PuRe

Diversity and Polarization of Research Performance: Evidence from Hungary

Author: Kampis George
Soos Sandor
Publication venue
Publication date: 01/01/2010
Field of study

Measuring the intellectual diversity encoded in publication records as a proxy to the degree of interdisciplinarity has recently received considerable attention in the science mapping community. The present paper draws upon the use of the Stirling index as a diversity measure applied to a network model (customized science map) of research profiles, proposed by several authors. A modified version of the index is used and compared with the previous versions on a sample data set in order to rank top Hungarian research organizations (HROs) according to their research performance diversity. Results, unexpected in several respects, show that the modified index is a candidate for measuring the degree of polarization of a research profile. The study also points towards a possible typology of publication portfolios that instantiate different types of diversity

arXiv.org e-Print Archive

ELTE Digital Institutional Repository (EDIT)

Kernel functions based on triplet comparisons

Author: Kleindessner Matthäus
von Luxburg Ulrike
Publication venue
Publication date: 01/01/2017
Field of study

Given only information in the form of similarity triplets "Object A is more similar to object B than to object C" about a data set, we propose two ways of defining a kernel function on the data set. While previous approaches construct a low-dimensional Euclidean embedding of the data set that reflects the given similarity triplets, we aim at defining kernel functions that correspond to high-dimensional embeddings. These kernel functions can subsequently be used to apply any kernel method to the data set

arXiv.org e-Print Archive

Publikationsserver der Universität Tübingen

MPG.PuRe

Relation Discovery from Web Data for Competency Management

Author: Eisenstadt M.
Goncalves A
Motta E.
Pacheco R
Song D.
Uren V.
Zhu J.L.
Publication venue
Publication date: 01/12/2007
Field of study

This paper describes a technique for automatically discovering associations between people and expertise from an analysis of very large data sources (including web pages, blogs and emails), using a family of algorithms that perform accurate named-entity recognition, assign different weights to terms according to an analysis of document structure, and access distances between terms in a document. My contribution is to add a social networking approach called BuddyFinder which relies on associations within a large enterprise-wide "buddy list" to help delimit the search space and also to provide a form of 'social triangulation' whereby the system can discover documents from your colleagues that contain pertinent information about you. This work has been influential in the information retrieval community generally, as it is the basis of a landmark system that achieved overall first place in every category in the Enterprise Search Track of TREC2006

Open Access Institutional Repository at Robert Gordon University

Open Research Online (The Open University)