Search CORE

361 research outputs found

An adaptive version of k-medoids to deal with the uncertainty in clustering heterogeneous data using an intermediary fusion approach

Author: A Oliva
A Strehl
Aalaa Mojahed
B Khaleghi
Beatriz de la Iglesia
BV Dasarathy
D Hall
DJ Berndt
E Acar
G Salton
GRG Lanckriet
GRG Lanckriet
H-S Park
L Kaufman
L Kaufman
LR Dice
M Žitnik
MA Abidi
MH Vliet van
N-EE Faouzi
OA Akeem
P Pavlidis
RA Baeza-Yates
S Jaccard
TN Manjunath
TY Chan
WM Rand
Y Shi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

This paper introduces Hk-medoids, a modified version of the standard k-medoids algorithm. The modification extends the algorithm for the problem of clustering complex heterogeneous objects that are described by a diversity of data types, e.g. text, images, structured data and time series. We first proposed an intermediary fusion approach to calculate fused similarities between objects, SMF, taking into account the similarities between the component elements of the objects using appropriate similarity measures. The fused approach entails uncertainty for incomplete objects or for objects which have diverging distances according to the different component. Our implementation of Hk-medoids proposed here works with the fused distances and deals with the uncertainty in the fusion process. We experimentally evaluate the potential of our proposed algorithm using five datasets with different combinations of data types that define the objects. Our results show the feasibility of the our algorithm, and also they show a performance enhancement when comparing to the application of the original SMF approach in combination with a standard k-medoids that does not take uncertainty into account. In addition, from a theoretical point of view, our proposed algorithm has lower computation complexity than the popular PAM implementation

Crossref

University of East Anglia digital repository

L2-norm multiple kernel learning and its application to biomedical data fusion

Author: A Daemen
A Daemen
Anneleen Daemen
AY Ng
B Schölkopf
Bart De Moor
C Bottomley
C Leslie
DMJ Tax
ED Andersen
FR Bach
G Condous
G Thomas
GC Cawley
GRG Lanckriet
GRG Lanckriet
J Gudmundsson
J Shawe-Taylor
JAK Suykens
JAK Suykens
Johan AK Suykens
JP Ye
K Tretyakov
K Veropoulos
Leon-Charles Tranchevent
M Grant
M Grant
M Kloft
M Kloft
M Kowalski
O Gevaert
R Hettich
R Reemtsen
RA Eeles
S Aerts
S Sonnenburg
S Yu
Shi Yu
SJ Kim
T De Bie
T van den Bosch
Tillmann Falck
V Vapnik
Y Zheng
Yves Moreau
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as <it>L</it>∞, <it>L</it>1, and <it>L</it>2 MKL. In particular, <it>L</it>2 MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing <it>L</it>∞ MKL method. In real biomedical applications, <it>L</it>2 MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources. Results We provide a theoretical analysis of the relationship between the <it>L</it>2 optimization of kernels in the dual problem with the <it>L</it>2 coefficient regularization in the primal problem. Understanding the dual <it>L</it>2 problem grants a unified view on MKL and enables us to extend the <it>L</it>2 method to a wide range of machine learning problems. We implement <it>L</it>2 MKL for ranking and classification problems and compare its performance with the sparse <it>L</it>∞ and the averaging <it>L</it>1 MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. <it>L</it>2 MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel <it>L</it>2 MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing. Conclusions This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in <it>L</it>∞ MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing <it>L</it>2 kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL. Availability The MATLAB code of algorithms implemented in this paper is downloadable from <url>http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Enhanced protein fold recognition through a novel data integration approach

Author: A Andreeva
A Rakotomamonjy
AL Yuille
B Schölkopf
C Ding
CA Micchelli
CE Rasmussen
Colin Campbell
DT Jones
F Bach
F Bach
GRG Lanckriet
GRG Lanckriet
HB Shen
HW Mewes
I Dubchak
J Shawe-Taylor
J Ye
J Ye
JM Borwein
JV Davis
K Bleakley
K Chou
K Tsuda
Kaizhu Huang
L Liao
L Lo Conte
L Sun
L Vandenberghe
M Girolami
N Aronszajn
N Cristianini
ND Lawrence
PD Tao
R Hettich
RI Kondor
S Amari
S Ji
S Sonnenburg
T Damoulas
T Hastie
T Kato
Y Lin
Y Nesterov
Y Yamanishi
Y Ying
Yiming Ying
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Protein fold recognition is a key step in protein three-dimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the issue of finding the most efficient method for combining these different informative data sources and exploring their relative significance for protein fold classification. Kernel methods have been extensively used for biological data analysis. They can incorporate separate fold discriminatory features into kernel matrices which encode the similarity between samples in their respective data sources. Results In this paper we consider the problem of integrating multiple data sources using a kernel-based approach. We propose a novel information-theoretic approach based on a Kullback-Leibler (KL) divergence between the output kernel matrix and the input kernel matrix so as to integrate heterogeneous data sources. One of the most appealing properties of this approach is that it can easily cope with multi-class classification and multi-task learning by an appropriate choice of the output kernel matrix. Based on the position of the output and input kernel matrices in the KL-divergence objective, there are two formulations which we respectively refer to as <it>MKLdiv-dc </it>and <it>MKLdiv-conv</it>. We propose to efficiently solve MKLdiv-dc by a difference of convex (DC) programming method and MKLdiv-conv by a projected gradient descent algorithm. The effectiveness of the proposed approaches is evaluated on a benchmark dataset for protein fold recognition and a yeast protein function prediction problem. Conclusion Our proposed methods MKLdiv-dc and MKLdiv-conv are able to achieve state-of-the-art performance on the SCOP PDB-40D benchmark dataset for protein fold prediction and provide useful insights into the relative significance of informative data sources. In particular, MKLdiv-dc further improves the fold discrimination accuracy to 75.19% which is a more than 5% improvement over competitive Bayesian probabilistic and SVM margin-based kernel learning methods. Furthermore, we report a competitive performance on the yeast protein function prediction problem.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Explore Bristol Research

Recommended from our members

Once bitten, not necessarily shy? Determinants of foreign market re-entry commitment strategies

Author: A Bonaccorsi
A Delios
A Madhok
AS Gaur
B Hedberg
B Levitt
C Oliver
CC Chung
CL Welch
CM Chan
CM Chan
CM Sousa
D Yiu
DAN Li
DJ O’Keefe
E Anderson
E Tsang
ER Banalieva
F Vermeulen
GL Clark
GRG Benito
GRG Benito
GRG Benito
GRG Benito
GY Gao
H Kok
HG Barkema
HG Barkema
HJ Sapienza
I Surdu
I Surdu
Irina Surdu
J Anand
J Cantwell
J Cohen
J Cohen
J Johanson
J Lampel
J Xia
J-F Hennart
J-G Cegarra-Navarro
JC Casillas
JC Casillas
JD Gwartney
JE Clarke
JG March
JG March
JM Hoenig
JM Shaver
JW Lu
K Mellahi
Kamel Mellahi
KD Brouthers
KD Brouthers
KE Meyer
KE Meyer
KE Meyer
KE Meyer
Keith W Glaister
KL Newman
L Argote
M Bernini
M Demirbag
M Zollo
MA Hitt
MA Hitt
MA Lyles
MA Villa De
MA Witt
MF Guillén
MF Guillén
MK Christianson
MP Holan de
MS Feldman
MW Peng
MW Peng
N Nummela
P Cairns
P Meschi
P Padmanabhan
PM Madsen
Q Tan
R Belderbos
R Cyert
R García-García
RF Hurley
RP Rumelt
RRG Javalgi
S Ang
S Song
SJ Chang
T Hutzschenreuter
T Kostova
T Vissak
TL Amburgey
V Hernandez
VJ Duriau
WH Starbuck
Y Zeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We investigate foreign market re-entry commitment strategies, namely the changes in the modes of operation (commitment) undertaken by multinational enterprises (MNEs) as they return to foreign markets from which they had previously exited. We combine organisational learning theory with the institutional change literature to examine the antecedents of re-entry commitment strategies. From an analysis of 1,020 re-entry events between 1980 and 2016, we find that operation mode prior to exit is a strong predictor of subsequent re-entry mode. Contrary to the predictions of learning theory, we did not find support for the effect of experience accumulated during the initial market endeavour on the re-entry commitment strategies of MNEs. In turn, exit motives significantly impact on the re-entrants' decision to re-enter via a different mode of operation, by either increasing or decreasing their commitment to the market. We show that re-entrants do not replicate unsuccessful operation mode strategies if they had previously underperformed in the market. When favourable host institutional changes occur during the time-out period re-entrants tend to increase commitment in the host market irrespective of the degree of prior experience accumulated in the market

Central Archive at the University of Reading

Crossref

Warwick Research Archives Portal Repository

White Rose Research Online

A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data

Author: A Enright
A Gavin
A Grigoriev
A Hoerl
AJ Dobson
EG WS Cleveland
G GH
GRG Lanckriet
H Ge
M Deng
M Eisen
M Fellenberg
MPS Brown
O Troyanskaya
P Liang
P Pavlidis
P Pavlidis
R Overbeek
R Tibshirani
Walter L Ruzzo
WS Noble
Y Zheng
Zizhen Yao
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. METHODS: In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN) algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the accuracy of predictions. The method gracefully extends to multi-way classification problems. RESULTS: We apply this technique to gene function prediction according to three well-known Escherichia coli classification schemes suggested by biologists, using information derived from microarray and genome sequencing data. We demonstrate that our algorithm dramatically outperforms the naive KNN methods and is competitive with support vector machine (SVM) algorithms for integrating heterogenous data. We also show that by combining different data sources, prediction accuracy can improve significantly. CONCLUSION: Our extension of KNN with automatic feature weighting, multi-class prediction, and probabilistic inference, enhance prediction accuracy significantly while remaining efficient, intuitive and flexible. This general framework can also be applied to similar classification problems involving heterogeneous datasets

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

“A long-term mortality analysis of subsidized firms in rural areas: an empirical study in the Portuguese Alentejo region”

Author: A Cerqua
A Monte Del
A Santos
A Santos
B Guloglu
C Bernini
C Bernini
C Paunov
D Durafour
DG Silva De
E Battistin
EW Nafziger
GRG Clarke
INE
INE
INE
J Carvalho
K Fukuda
L Ferreira
M Grapeggia
M Verbeek
MG Colombo
MJ Alonso-Nuez
N Gur
O Falck
P Holmes
P Neto
P Voigt
PA Geroski
R Agarwal
R Mamede
S Tsoukas
TM Stearns
U Brixy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Studies have demonstrated that public policies to support private firms’ investment have the ability to promote entrepreneurship, but the sustainability of subsidized firms has not often been analysed. This paper aims to examine this dimension specifically through evaluating the mortality of subsidized firms in the long-term. The analysis focuses on a case study of the LEADER+ Programme in the Alentejo region of Portugal. With this purpose, the paper examines the activity status (active or not active) of 154 private, rural, for-profit firms in Alentejo that had received a subsidy to support investment between 2002 and 2008 under the LEADER+ Programme. The methodology is based on binary choice models in order to study the probability of these firms still being active. The explanatory variables used are the following: (1) the characteristics of entrepreneurs and managers’ strategic decisions, (2) firm profile and characteristics, (3) regional economic environment. Data assessment showed that the cumulative mortality rate of firms on 31st December 2013 is over 20 %. Interpretation of the regression model revealed that he probability of firms’ survival increases with higher investment, firm age and regional business concentration, whereas the number of applications made by firms has a negative impact on their survival. So it seems that for subsidized firms the amount of investment is as important as its frequency

Crossref

DI-fusion

Repositório Científico da Universidade de Évora

A new pairwise kernel for biological network inference with support vector machines

Author: A Ben-Hur
A Ramani
B Schölkopf
C Harbison
C von Mering
E Sprinzak
E Xing
EM Marcotte
F Pazos
GD Bader
GRG Lanckriet
GS Kimeldorf
HW Mewes
IW Tsang
Jean-Philippe Vert
Jian Qiu
JP Vert
KQ Weinberger
N Aronszajn
N Friedman
P Pavlidis
R Jansen
RI Kondor
S Boyd
S Martin
SF Altschul
SM Gomez
VN Vapnik
William S Noble
WK Huh
Y Qi
Y Yamanishi
Y Yamanishi
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

International audienceBACKGROUND: Much recent work in bioinformatics has focused on the inference of various types of biological networks, representing gene regulation, metabolic processes, protein-protein interactions, etc. A common setting involves inferring network edges in a supervised fashion from a set of high-confidence edges, possibly characterized by multiple, heterogeneous data sets (protein sequence, gene expression, etc.). RESULTS: Here, we distinguish between two modes of inference in this setting: direct inference based upon similarities between nodes joined by an edge, and indirect inference based upon similarities between one pair of nodes and another pair of nodes. We propose a supervised approach for the direct case by translating it into a distance metric learning problem. A relaxation of the resulting convex optimization problem leads to the support vector machine (SVM) algorithm with a particular kernel for pairs, which we call the metric learning pairwise kernel. This new kernel for pairs can easily be used by most SVM implementations to solve problems of supervised classification and inference of pairwise relationships from heterogeneous data. We demonstrate, using several real biological networks and genomic datasets, that this approach often improves upon the state-of-the-art SVM for indirect inference with another pairwise kernel, and that the combination of both kernels always improves upon each individual kernel. CONCLUSION: The metric learning pairwise kernel is a new formulation to infer pairwise relationships with SVM, which provides state-of-the-art results for the inference of several biological networks from heterogeneous genomic data

Crossref

Springer - Publisher Connector

PubMed Central

HAL Descartes

HAL-MINES ParisTech

Identidad étnica y redes personales entre jóvenes de Sarajevo

Author: A McAuley
B Petersen
C Halliburton
C Shapiro
CA Bartlett
DE Welch
DK Tse
EE Learner
EM Roche
FHR Seringhaus
G Knight
GRG Benito
GS Yip
I Ayal
J Oxley
J-E Vahlne
JA Quelch
L Oxelheim
L Welch
LC Leonidou
LC Leonidou
M Sarkar
ME Porter
MJ Blaine
N Piercy
Networks and Informal Communication
P Buckley
P Marshall
PD Lynch
PJ Dowling
R Cross
R Vernon
R Westbrook
R Widdows
RL Daft
S Burenstam-Linder
S Macdonald
T Coltman
T Kayworth
TK Madsen
TR Lituchy
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2003
Field of study

After fieldwork conducted among young people in Sarajevo, we found a relation between the discourses sustained by them and the ethnic categories they use to classify people and to identify themselves. Also we have found that people self-affiliated as "Bosnians" play an important role in the network of multiethnic relationships, in which strong ties, surprisingly, are still very important. Finally we found a relationship between the composition of personal networks and the ethnic discourses that are maintained.Después de un trabajo de campo realizado con un grupo de jóvenes en Sarajevo, hemos constatado la existencia de una relación entre los discursos que sostienen y las categorías étnicas que utilizan tanto para clasificar a los demás como para auto-identificarse. Asimismo hemos encontrado que los jóvenes que se autodenominan "Bosnios" juegan un rol importante en la red de relaciones multiétnicas, en la que los lazos fuertes, sorprendentemente, son muy importantes. Finalmente hemos hallado una relación entre la composición de las redes personales y los discursos étnicos que se sostienen. Vivimos, o creemos vivir, en múltiples "comunidades", imaginadas o no. Al mismo tiempo, el individuo y no el lugar, la familia o el grupo, se sitúa en el centro de la vida social y de las comunicaciones (Cf. Wellman, 2001). En este contexto, inducido por el avance del capitalismo flexible (Castells, 1996), pensamos que para entender adecuadamente la identidad o identidades postuladas por los individuos es necesario estudiar las redes personales y su dinámica. Desde esta perspectiva no podemos hablar de "etnias" o "multietnicidad" sin más precisiones, pues son conceptos basados en una concepción esencialista y estática de la identidad individual. El concepto de "sociedad multiétnica" es utilizado de una manera engañosamente progresista y objetiva, pues lo que en realidad legitima es la existencia de diferencias esenciales entre personas, alejando en lugar de acercar. Sin embargo, somos plenamente conscientes que los discursos esencialistas de la identidad étnica son omnipresentes, con enormes efectos políticos e individuales. Que planteemos que la concepción esencialista de la identidad sea inapropiada desde un punto de vista académico, no significa que ésta no se utilice políticamente y por lo tanto tenga consecuencias formidables en las relaciones sociales. Precisamente el estudio de las redes personales nos permite situarnos en una perspectiva que no utiliza con pretensiones analíticas conceptos "folk", como son los de "etnia", "pueblo" o "nación", sino que los sitúa en el terreno de los discursos sustentados por los actores (y los estados y medios de comunicación) y nos permite contextualizarlos mediante conceptos etic, es decir, impuestos por los investigadores. Sólo así podemos superar las tautologías que abundan en los discursos étnicos

Crossref

Directory of Open Access Journals

Diposit Digital de Documents de la UAB

Estimation of ground reaction forces and ankle moment with multiple, low-cost sensors

Author: A Arndt
A Forner Cordero
A Pantelopoulos
AH Abdul Razak
AK Ramanathan
C Giacomozzi
C Liedtke
D Rosenbaum
Daniel A. Jacobs
Daniel P. Ferris
DT-P Fong
DW Hahs
GRG Barnes
H Rouhani
HLP Hurkmans
HLP Hurkmans
J Friedman
KG Silbernagel
KJ Chesnin
M Godi
P Pourcelot
PR Cavanagh
RR Picard
S Park
SB Williams
SL Delp
T Liu
YAM Kots
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples

Author: A Su
B Brancotte
B Calvo
B Linghu
B Liu
B Schölkopf
B Schölkopf
B Schölkopf
C Giallourakis
C Perez-Iratxeta
C Son
CC Chang
EA Adie
F Denis
F Mordelet
Fantine Mordelet
FS Turner
G Lanckriet
GRG Lanckriet
J Freudenberg
Jean-Philippe Vert
K Bleakley
K Lage
L Jacob
L Jacob
LC Tranchevent
M van Driel
N López-Bigas
N Tiffin
O Vanunu
P Pavlidis
RI Kondor
S Aerts
S Köhler
S Yu
T De Bie
T Evgeniou
T Hwang
U Ala
V McKusick
X Wu
Y Yamanishi
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to prioritize genes within the list of candidates, by exploiting the wealth of information available about the genes in various databases. Results We propose ProDiGe, a novel algorithm for Prioritization of Disease Genes. ProDiGe implements a novel machine learning strategy based on learning from positive and unlabeled examples, which allows to integrate various sources of information about the genes, to share information about known disease genes across diseases, and to perform genome-wide searches for new disease genes. Experiments on real data show that ProDiGe outperforms state-of-the-art methods for the prioritization of genes in human diseases. Conclusions ProDiGe implements a new machine learning paradigm for gene prioritization, which could help the identification of new disease genes. It is freely available at <url>http://cbio.ensmp.fr/prodige</url>.</p

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals