Search CORE

248 research outputs found

A Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data

Author: A Enright
A Gavin
A Grigoriev
A Hoerl
AJ Dobson
EG WS Cleveland
G GH
GRG Lanckriet
H Ge
M Deng
M Eisen
M Fellenberg
MPS Brown
O Troyanskaya
P Liang
P Pavlidis
P Pavlidis
R Overbeek
R Tibshirani
Walter L Ruzzo
WS Noble
Y Zheng
Zizhen Yao
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: As a variety of functional genomic and proteomic techniques become available, there is an increasing need for functional analysis methodologies that integrate heterogeneous data sources. METHODS: In this paper, we address this issue by proposing a general framework for gene function prediction based on the k-nearest-neighbor (KNN) algorithm. The choice of KNN is motivated by its simplicity, flexibility to incorporate different data types and adaptability to irregular feature spaces. A weakness of traditional KNN methods, especially when handling heterogeneous data, is that performance is subject to the often ad hoc choice of similarity metric. To address this weakness, we apply regression methods to infer a similarity metric as a weighted combination of a set of base similarity measures, which helps to locate the neighbors that are most likely to be in the same class as the target gene. We also suggest a novel voting scheme to generate confidence scores that estimate the accuracy of predictions. The method gracefully extends to multi-way classification problems. RESULTS: We apply this technique to gene function prediction according to three well-known Escherichia coli classification schemes suggested by biologists, using information derived from microarray and genome sequencing data. We demonstrate that our algorithm dramatically outperforms the naive KNN methods and is competitive with support vector machine (SVM) algorithms for integrating heterogenous data. We also show that by combining different data sources, prediction accuracy can improve significantly. CONCLUSION: Our extension of KNN with automatic feature weighting, multi-class prediction, and probabilistic inference, enhance prediction accuracy significantly while remaining efficient, intuitive and flexible. This general framework can also be applied to similar classification problems involving heterogeneous datasets

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

L2-norm multiple kernel learning and its application to biomedical data fusion

Author: A Daemen
A Daemen
Anneleen Daemen
AY Ng
B Schölkopf
Bart De Moor
C Bottomley
C Leslie
DMJ Tax
ED Andersen
FR Bach
G Condous
G Thomas
GC Cawley
GRG Lanckriet
GRG Lanckriet
J Gudmundsson
J Shawe-Taylor
JAK Suykens
JAK Suykens
Johan AK Suykens
JP Ye
K Tretyakov
K Veropoulos
Leon-Charles Tranchevent
M Grant
M Grant
M Kloft
M Kloft
M Kowalski
O Gevaert
R Hettich
R Reemtsen
RA Eeles
S Aerts
S Sonnenburg
S Yu
Shi Yu
SJ Kim
T De Bie
T van den Bosch
Tillmann Falck
V Vapnik
Y Zheng
Yves Moreau
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background This paper introduces the notion of optimizing different norms in the dual problem of support vector machines with multiple kernels. The selection of norms yields different extensions of multiple kernel learning (MKL) such as <it>L</it>∞, <it>L</it>1, and <it>L</it>2 MKL. In particular, <it>L</it>2 MKL is a novel method that leads to non-sparse optimal kernel coefficients, which is different from the sparse kernel coefficients optimized by the existing <it>L</it>∞ MKL method. In real biomedical applications, <it>L</it>2 MKL may have more advantages over sparse integration method for thoroughly combining complementary information in heterogeneous data sources. Results We provide a theoretical analysis of the relationship between the <it>L</it>2 optimization of kernels in the dual problem with the <it>L</it>2 coefficient regularization in the primal problem. Understanding the dual <it>L</it>2 problem grants a unified view on MKL and enables us to extend the <it>L</it>2 method to a wide range of machine learning problems. We implement <it>L</it>2 MKL for ranking and classification problems and compare its performance with the sparse <it>L</it>∞ and the averaging <it>L</it>1 MKL methods. The experiments are carried out on six real biomedical data sets and two large scale UCI data sets. <it>L</it>2 MKL yields better performance on most of the benchmark data sets. In particular, we propose a novel <it>L</it>2 MKL least squares support vector machine (LSSVM) algorithm, which is shown to be an efficient and promising classifier for large scale data sets processing. Conclusions This paper extends the statistical framework of genomic data fusion based on MKL. Allowing non-sparse weights on the data sources is an attractive option in settings where we believe most data sources to be relevant to the problem at hand and want to avoid a "winner-takes-all" effect seen in <it>L</it>∞ MKL, which can be detrimental to the performance in prospective studies. The notion of optimizing <it>L</it>2 kernels can be straightforwardly extended to ranking, classification, regression, and clustering algorithms. To tackle the computational burden of MKL, this paper proposes several novel LSSVM based MKL algorithms. Systematic comparison on real data sets shows that LSSVM MKL has comparable performance as the conventional SVM MKL algorithms. Moreover, large scale numerical experiments indicate that when cast as semi-infinite programming, LSSVM MKL can be solved more efficiently than SVM MKL. Availability The MATLAB code of algorithms implemented in this paper is downloadable from <url>http://homes.esat.kuleuven.be/~sistawww/bioi/syu/l2lssvm.html</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Once bitten, not necessarily shy? Determinants of foreign market re-entry commitment strategies

Author: A Bonaccorsi
A Delios
A Madhok
AS Gaur
B Hedberg
B Levitt
C Oliver
CC Chung
CL Welch
CM Chan
CM Chan
CM Sousa
D Yiu
DAN Li
DJ O’Keefe
E Anderson
E Tsang
ER Banalieva
F Vermeulen
GL Clark
GRG Benito
GRG Benito
GRG Benito
GRG Benito
GY Gao
H Kok
HG Barkema
HG Barkema
HJ Sapienza
I Surdu
I Surdu
Irina Surdu
J Anand
J Cantwell
J Cohen
J Cohen
J Johanson
J Lampel
J Xia
J-F Hennart
J-G Cegarra-Navarro
JC Casillas
JC Casillas
JD Gwartney
JE Clarke
JG March
JG March
JM Hoenig
JM Shaver
JW Lu
K Mellahi
Kamel Mellahi
KD Brouthers
KD Brouthers
KE Meyer
KE Meyer
KE Meyer
KE Meyer
Keith W Glaister
KL Newman
L Argote
M Bernini
M Demirbag
M Zollo
MA Hitt
MA Hitt
MA Lyles
MA Villa De
MA Witt
MF Guillén
MF Guillén
MK Christianson
MP Holan de
MS Feldman
MW Peng
MW Peng
N Nummela
P Cairns
P Meschi
P Padmanabhan
PM Madsen
Q Tan
R Belderbos
R Cyert
R García-García
RF Hurley
RP Rumelt
RRG Javalgi
S Ang
S Song
SJ Chang
T Hutzschenreuter
T Kostova
T Vissak
TL Amburgey
V Hernandez
VJ Duriau
WH Starbuck
Y Zeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

We investigate foreign market re-entry commitment strategies, namely the changes in the modes of operation (commitment) undertaken by multinational enterprises (MNEs) as they return to foreign markets from which they had previously exited. We combine organisational learning theory with the institutional change literature to examine the antecedents of re-entry commitment strategies. From an analysis of 1,020 re-entry events between 1980 and 2016, we find that operation mode prior to exit is a strong predictor of subsequent re-entry mode. Contrary to the predictions of learning theory, we did not find support for the effect of experience accumulated during the initial market endeavour on the re-entry commitment strategies of MNEs. In turn, exit motives significantly impact on the re-entrants' decision to re-enter via a different mode of operation, by either increasing or decreasing their commitment to the market. We show that re-entrants do not replicate unsuccessful operation mode strategies if they had previously underperformed in the market. When favourable host institutional changes occur during the time-out period re-entrants tend to increase commitment in the host market irrespective of the degree of prior experience accumulated in the market

Central Archive at the University of Reading

Crossref

Warwick Research Archives Portal Repository

White Rose Research Online

Enhanced protein fold recognition through a novel data integration approach

Author: A Andreeva
A Rakotomamonjy
AL Yuille
B Schölkopf
C Ding
CA Micchelli
CE Rasmussen
Colin Campbell
DT Jones
F Bach
F Bach
GRG Lanckriet
GRG Lanckriet
HB Shen
HW Mewes
I Dubchak
J Shawe-Taylor
J Ye
J Ye
JM Borwein
JV Davis
K Bleakley
K Chou
K Tsuda
Kaizhu Huang
L Liao
L Lo Conte
L Sun
L Vandenberghe
M Girolami
N Aronszajn
N Cristianini
ND Lawrence
PD Tao
R Hettich
RI Kondor
S Amari
S Ji
S Sonnenburg
T Damoulas
T Hastie
T Kato
Y Lin
Y Nesterov
Y Yamanishi
Y Ying
Yiming Ying
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Protein fold recognition is a key step in protein three-dimensional (3D) structure discovery. There are multiple fold discriminatory data sources which use physicochemical and structural properties as well as further data sources derived from local sequence alignments. This raises the issue of finding the most efficient method for combining these different informative data sources and exploring their relative significance for protein fold classification. Kernel methods have been extensively used for biological data analysis. They can incorporate separate fold discriminatory features into kernel matrices which encode the similarity between samples in their respective data sources. Results In this paper we consider the problem of integrating multiple data sources using a kernel-based approach. We propose a novel information-theoretic approach based on a Kullback-Leibler (KL) divergence between the output kernel matrix and the input kernel matrix so as to integrate heterogeneous data sources. One of the most appealing properties of this approach is that it can easily cope with multi-class classification and multi-task learning by an appropriate choice of the output kernel matrix. Based on the position of the output and input kernel matrices in the KL-divergence objective, there are two formulations which we respectively refer to as <it>MKLdiv-dc </it>and <it>MKLdiv-conv</it>. We propose to efficiently solve MKLdiv-dc by a difference of convex (DC) programming method and MKLdiv-conv by a projected gradient descent algorithm. The effectiveness of the proposed approaches is evaluated on a benchmark dataset for protein fold recognition and a yeast protein function prediction problem. Conclusion Our proposed methods MKLdiv-dc and MKLdiv-conv are able to achieve state-of-the-art performance on the SCOP PDB-40D benchmark dataset for protein fold prediction and provide useful insights into the relative significance of informative data sources. In particular, MKLdiv-dc further improves the fold discrimination accuracy to 75.19% which is a more than 5% improvement over competitive Bayesian probabilistic and SVM margin-based kernel learning methods. Furthermore, we report a competitive performance on the yeast protein function prediction problem.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Explore Bristol Research

“A long-term mortality analysis of subsidized firms in rural areas: an empirical study in the Portuguese Alentejo region”

Author: A Cerqua
A Monte Del
A Santos
A Santos
B Guloglu
C Bernini
C Bernini
C Paunov
D Durafour
DG Silva De
E Battistin
EW Nafziger
GRG Clarke
INE
INE
INE
J Carvalho
K Fukuda
L Ferreira
M Grapeggia
M Verbeek
MG Colombo
MJ Alonso-Nuez
N Gur
O Falck
P Holmes
P Neto
P Voigt
PA Geroski
R Agarwal
R Mamede
S Tsoukas
TM Stearns
U Brixy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Studies have demonstrated that public policies to support private firms’ investment have the ability to promote entrepreneurship, but the sustainability of subsidized firms has not often been analysed. This paper aims to examine this dimension specifically through evaluating the mortality of subsidized firms in the long-term. The analysis focuses on a case study of the LEADER+ Programme in the Alentejo region of Portugal. With this purpose, the paper examines the activity status (active or not active) of 154 private, rural, for-profit firms in Alentejo that had received a subsidy to support investment between 2002 and 2008 under the LEADER+ Programme. The methodology is based on binary choice models in order to study the probability of these firms still being active. The explanatory variables used are the following: (1) the characteristics of entrepreneurs and managers’ strategic decisions, (2) firm profile and characteristics, (3) regional economic environment. Data assessment showed that the cumulative mortality rate of firms on 31st December 2013 is over 20 %. Interpretation of the regression model revealed that he probability of firms’ survival increases with higher investment, firm age and regional business concentration, whereas the number of applications made by firms has a negative impact on their survival. So it seems that for subsidized firms the amount of investment is as important as its frequency

Crossref

DI-fusion

Repositório Científico da Universidade de Évora

Identidad étnica y redes personales entre jóvenes de Sarajevo

Author: A McAuley
B Petersen
C Halliburton
C Shapiro
CA Bartlett
DE Welch
DK Tse
EE Learner
EM Roche
FHR Seringhaus
G Knight
GRG Benito
GS Yip
I Ayal
J Oxley
J-E Vahlne
JA Quelch
L Oxelheim
L Welch
LC Leonidou
LC Leonidou
M Sarkar
ME Porter
MJ Blaine
N Piercy
Networks and Informal Communication
P Buckley
P Marshall
PD Lynch
PJ Dowling
R Cross
R Vernon
R Westbrook
R Widdows
RL Daft
S Burenstam-Linder
S Macdonald
T Coltman
T Kayworth
TK Madsen
TR Lituchy
Publication venue: 'Universitat Autonoma de Barcelona'
Publication date: 01/01/2003
Field of study

After fieldwork conducted among young people in Sarajevo, we found a relation between the discourses sustained by them and the ethnic categories they use to classify people and to identify themselves. Also we have found that people self-affiliated as "Bosnians" play an important role in the network of multiethnic relationships, in which strong ties, surprisingly, are still very important. Finally we found a relationship between the composition of personal networks and the ethnic discourses that are maintained.Después de un trabajo de campo realizado con un grupo de jóvenes en Sarajevo, hemos constatado la existencia de una relación entre los discursos que sostienen y las categorías étnicas que utilizan tanto para clasificar a los demás como para auto-identificarse. Asimismo hemos encontrado que los jóvenes que se autodenominan "Bosnios" juegan un rol importante en la red de relaciones multiétnicas, en la que los lazos fuertes, sorprendentemente, son muy importantes. Finalmente hemos hallado una relación entre la composición de las redes personales y los discursos étnicos que se sostienen. Vivimos, o creemos vivir, en múltiples "comunidades", imaginadas o no. Al mismo tiempo, el individuo y no el lugar, la familia o el grupo, se sitúa en el centro de la vida social y de las comunicaciones (Cf. Wellman, 2001). En este contexto, inducido por el avance del capitalismo flexible (Castells, 1996), pensamos que para entender adecuadamente la identidad o identidades postuladas por los individuos es necesario estudiar las redes personales y su dinámica. Desde esta perspectiva no podemos hablar de "etnias" o "multietnicidad" sin más precisiones, pues son conceptos basados en una concepción esencialista y estática de la identidad individual. El concepto de "sociedad multiétnica" es utilizado de una manera engañosamente progresista y objetiva, pues lo que en realidad legitima es la existencia de diferencias esenciales entre personas, alejando en lugar de acercar. Sin embargo, somos plenamente conscientes que los discursos esencialistas de la identidad étnica son omnipresentes, con enormes efectos políticos e individuales. Que planteemos que la concepción esencialista de la identidad sea inapropiada desde un punto de vista académico, no significa que ésta no se utilice políticamente y por lo tanto tenga consecuencias formidables en las relaciones sociales. Precisamente el estudio de las redes personales nos permite situarnos en una perspectiva que no utiliza con pretensiones analíticas conceptos "folk", como son los de "etnia", "pueblo" o "nación", sino que los sitúa en el terreno de los discursos sustentados por los actores (y los estados y medios de comunicación) y nos permite contextualizarlos mediante conceptos etic, es decir, impuestos por los investigadores. Sólo así podemos superar las tautologías que abundan en los discursos étnicos

Crossref

Directory of Open Access Journals

Diposit Digital de Documents de la UAB

A new pairwise kernel for biological network inference with support vector machines

Author: A Ben-Hur
A Ramani
B Schölkopf
C Harbison
C von Mering
E Sprinzak
E Xing
EM Marcotte
F Pazos
GD Bader
GRG Lanckriet
GS Kimeldorf
HW Mewes
IW Tsang
Jean-Philippe Vert
Jian Qiu
JP Vert
KQ Weinberger
N Aronszajn
N Friedman
P Pavlidis
R Jansen
RI Kondor
S Boyd
S Martin
SF Altschul
SM Gomez
VN Vapnik
William S Noble
WK Huh
Y Qi
Y Yamanishi
Y Yamanishi
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

International audienceBACKGROUND: Much recent work in bioinformatics has focused on the inference of various types of biological networks, representing gene regulation, metabolic processes, protein-protein interactions, etc. A common setting involves inferring network edges in a supervised fashion from a set of high-confidence edges, possibly characterized by multiple, heterogeneous data sets (protein sequence, gene expression, etc.). RESULTS: Here, we distinguish between two modes of inference in this setting: direct inference based upon similarities between nodes joined by an edge, and indirect inference based upon similarities between one pair of nodes and another pair of nodes. We propose a supervised approach for the direct case by translating it into a distance metric learning problem. A relaxation of the resulting convex optimization problem leads to the support vector machine (SVM) algorithm with a particular kernel for pairs, which we call the metric learning pairwise kernel. This new kernel for pairs can easily be used by most SVM implementations to solve problems of supervised classification and inference of pairwise relationships from heterogeneous data. We demonstrate, using several real biological networks and genomic datasets, that this approach often improves upon the state-of-the-art SVM for indirect inference with another pairwise kernel, and that the combination of both kernels always improves upon each individual kernel. CONCLUSION: The metric learning pairwise kernel is a new formulation to infer pairwise relationships with SVM, which provides state-of-the-art results for the inference of several biological networks from heterogeneous genomic data

Crossref

Springer - Publisher Connector

PubMed Central

HAL Descartes

HAL-MINES ParisTech

Scoring Protein Relationships in Functional Interaction Networks Predicted from Sequence Data

Author: A Vazquez
B Schwikowski
C von Mering
C von Mering
CE Shannon
Christophe Herman
CL Myers
D Devos
E Nabieva
G Subramanian
Gaston K. Mazandu
GRG Lanckriet
HN Chua
HN Chua
HN Chua
J Krawczyk
J Xiong
JCD Mackay
K Raman
K Tsuda
LJ Jensen
M Deng
M Deng
M Li
MA Mahdavi
Nicola J. Mulder
NJ Mulder
NJ Mulder
O Bastian
O Bastian
OG Troyanskaya
P Baldi
PG Aaron
RVL Hartley
S Hunter
S Letovsky
S Yellaboina
SF Altschul
SF Altschul
SF Altschul
TM Murali
WR Pearson
X Mao
Y Chen
Y-R Cho
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

The abundance of diverse biological data from various sources constitutes a rich source of knowledge, which has the power to advance our understanding of organisms. This requires computational methods in order to integrate and exploit these data effectively and elucidate local and genome wide functional connections between protein pairs, thus enabling functional inferences for uncharacterized proteins. These biological data are primarily in the form of sequences, which determine functions, although functional properties of a protein can often be predicted from just the domains it contains. Thus, protein sequences and domains can be used to predict protein pair-wise functional relationships, and thus contribute to the function prediction process of uncharacterized proteins in order to ensure that knowledge is gained from sequencing efforts. In this work, we introduce information-theoretic based approaches to score protein-protein functional interaction pairs predicted from protein sequence similarity and conserved protein signature matches. The proposed schemes are effective for data-driven scoring of connections between protein pairs. We applied these schemes to the Mycobacterium tuberculosis proteome to produce a homology-based functional network of the organism with a high confidence and coverage. We use the network for predicting functions of uncharacterised proteins

CiteSeerX

Cape Town University OpenUCT

Crossref

Directory of Open Access Journals

PubMed Central

Disease-Aging Network Reveals Significant Roles of Aging Genes in Connecting Genetic Diseases

Author: A Budovsky
A Budovsky
A Friedman
A Kowald
A Kriete
A Ozgur
AL Barabasi
C Soti
D Harman
David B. Searls
DJ Watts
E Ravasz
G Jin
GRG Lanckriet
H Kitano
H Xue
HD Osiewacz
HJ Kiss
I Feldman
J Hasty
JDJ Han
Jiguang Wang
JP de Magalhaes
JP de Magalhaes
JR Managbanag
KI Goh
L Hayflick
Luonan Chen
M Wolfson
MEJ Newman
P Shannon
P Zuppan
PF Jonsson
Q Cui
R Albert
R Bell
RI Kondor
S Karni
S Maere
S Maslov
S Peri
S Vasto
Shihua Zhang
T Ideker
T Ishunina
TBL Kirkwood
U Brandes
U Stelzl
X Jiang
X Wu
Xiang-Sun Zhang
Y Li
Yong Wang
Z Spiro
Z Tu
Publication venue: Public Library of Science
Publication date: 01/09/2009
Field of study

One of the challenging problems in biology and medicine is exploring the underlying mechanisms of genetic diseases. Recent studies suggest that the relationship between genetic diseases and the aging process is important in understanding the molecular mechanisms of complex diseases. Although some intricate associations have been investigated for a long time, the studies are still in their early stages. In this paper, we construct a human disease-aging network to study the relationship among aging genes and genetic disease genes. Specifically, we integrate human protein-protein interactions (PPIs), disease-gene associations, aging-gene associations, and physiological system–based genetic disease classification information in a single graph-theoretic framework and find that (1) human disease genes are much closer to aging genes than expected by chance; and (2) diseases can be categorized into two types according to their relationships with aging. Type I diseases have their genes significantly close to aging genes, while type II diseases do not. Furthermore, we examine the topological characters of the disease-aging network from a systems perspective. Theoretical results reveal that the genes of type I diseases are in a central position of a PPI network while type II are not; (3) more importantly, we define an asymmetric closeness based on the PPI network to describe relationships between diseases, and find that aging genes make a significant contribution to associations among diseases, especially among type I diseases. In conclusion, the network-based study provides not only evidence for the intricate relationship between the aging process and genetic diseases, but also biological implications for prying into the nature of human diseases

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Internationalisation speed and MNE performance: A study of the market-seeking expansion of retail MNEs

Author: A Mohr
A Nadolska
Alexander Mohr
AM Rugman
AM Rugman
AM Rugman
AT Mohr
B Kogut
B Mascarenhas
CH Oh
E Penrose
F Bonaglia
F Vermeulen
Georgios Batsakis
GRG Benito
H Lee
H Wagner
HG Barkema
HG Barkema
HR Feeser
I Dierickx
J Bell
J Cohen
J Gunawan
J Johanson
J Wooldridge
JA Mathews
JC Casillas
JC Casillas
JE Clarke
JF Hennart
JJ Boddewyn
JM Pennings
JT Mentzer
JW Lu
K Gielens
K Gielens
KM Eisenhardt
KS Powell
LS Aiken
M Easterby-Smith
M Hilmersson
MA Cohen
MB Lieberman
R Davidson
RJ Jiang
S Chetty
S Zaheer
S-J Chang
S-J Chang
T Hutzschenreuter
T Hutzschenreuter
T Hutzschenreuter
TW Tong
VA Zeithaml
WM Cohen
Y Luo
YY Kor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/05/2016
Field of study

Existing research is divided on whether firms that rapidly expand their overseas operations perform better than firms that internationalize slowly. Drawing on Penrose’s theory of the growth of the firm we argue that the positive effects of rapid internationalization give way to negative effects with increasing internationalization speed, leading to an inverted U-shaped association between internationalization speed and firm performance. We analyse the market-seeking expansion of 110 retailers over a 10-year period (2003–2012) and find support for a curvilinear relationship between internationalization speed and firm performance that is moderated by the geographic scope of firms’ internationalization path and firms’ international experience. Our study contributes to resolving conflicting views on the link between internationalization speed and firm performance

Crossref

Brunel University Research Archive