Search CORE

281 research outputs found

A bagging SVM to learn from positive and unlabeled examples

Author: Mordelet Fantine
Vert Jean-Philippe
Publication venue
Publication date: 19/07/2010
Field of study

We consider the problem of learning a binary classifier from a training set of positive and unlabeled examples, both in the inductive and in the transductive setting. This problem, often referred to as \emph{PU learning}, differs from the standard supervised classification problem by the lack of negative examples in the training set. It corresponds to an ubiquitous situation in many applications such as information retrieval or gene ranking, when we have identified a set of data of interest sharing a particular property, and we wish to automatically retrieve additional data sharing the same property among a large and easily available pool of unlabeled data. We propose a conceptually simple method, akin to bagging, to approach both inductive and transductive PU learning problems, by converting them into series of supervised binary classification problems discriminating the known positive examples from random subsamples of the unlabeled set. We empirically demonstrate the relevance of the method on simulated and real data, where it performs at least as well as existing methods while being faster

arXiv.org e-Print Archive

HAL-MINES ParisTech

Improving the Efficiency of a Multicast File Transfer Tool based on ALC

Author: Mordelet Benoit
Roca Vincent
Publication venue: HAL CCSD
Publication date: 01/01/2002
Field of study

This work describes several techniques that we used to design a multicast file transfer tool on top of ALC, the Asynchronous Layered Coding protocol proposed by the RMT IETF working group. More specifically we analyze several object and symbol ordering schemes that improve transmission efficiency and we see how the Application Level Framing (ALF) paradigm can help to reduce memory requirements and enable processing to be hidden behind communica- tions. Because of its popularity and availability we use a Reed-Solomon FEC code, yet most of our results can be applied to other FEC codes. A strength of this work resides in the fact that all the techniques introduced have actually been implemented and their benefits quantified

INRIA a CCSD electronic archive server

HAL-Rennes 1

Bayesian nonparametric discovery of isoforms and individual specific quantification

Author: Aguiar Derek
Cheng Li-Fang
Dumitrascu Bianca
Engelhardt Barbara E.
Mordelet Fantine
Pai Athma A.
Publication venue: eScholarship@UMassChan
Publication date: 23/03/2017
Field of study

Most human protein-coding genes can be transcribed into multiple distinct mRNA isoforms. These alternative splicing patterns encourage molecular diversity, and dysregulation of isoform expression plays an important role in disease etiology. However, isoforms are difficult to characterize from short-read RNA-seq data because they share identical subsequences and occur in different frequencies across tissues and samples. Here, we develop BIISQ, a Bayesian nonparametric model for isoform discovery and individual specific quantification from short-read RNA-seq data. BIISQ does not require isoform reference sequences but instead estimates an isoform catalog shared across samples. We use stochastic variational inference for efficient posterior estimates and demonstrate superior precision and recall for simulations compared to state-of-the-art isoform reconstruction methods. BIISQ shows the most gains for low abundance isoforms, with 36% more isoforms correctly inferred at low coverage versus a multi-sample method and 170% more versus single-sample methods. We estimate isoforms in the GEUVADIS RNA-seq data and validate inferred isoforms by associating genetic variants with isoform ratios

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Directory of Open Access Journals

eScholarship@UMMS

SIRENE: Supervised Inference of Regulatory Networks

Author: Akutsu
Bansal
Ben-Hur
di Bernardo
F. Mordelet
Friedman
Gardner
J.-P. Vert
Ma
Salgado
Shannon
Tavazoie
Tucker
Waterman
Yamanishi
Publication venue: 'Oxford University Press (OUP)'
Publication date: 27/02/2008
Field of study

Living cells are the product of gene expression programs that involve the regulated transcription of thousands of genes. The elucidation of transcriptional regulatory networks in thus needed to understand the cell's working mechanism, and can for example be useful for the discovery of novel therapeutic targets. Although several methods have been proposed to infer gene regulatory networks from gene expression data, a recent comparison on a large-scale benchmark experiment revealed that most current methods only predict a limited number of known regulations at a reasonable precision level. We propose SIRENE, a new method for the inference of gene regulatory networks from a compendium of expression data. The method decomposes the problem of gene regulatory network inference into a large number of local binary classification problems, that focus on separating target genes from non-targets for each TF. SIRENE is thus conceptually simple and computationally efficient. We test it on a benchmark experiment aimed at predicting regulations in E. coli, and show that it retrieves of the order of 6 times more known regulations than other state-of-the-art inference methods

arXiv.org e-Print Archive

Crossref

HAL Descartes

HAL-MINES ParisTech

Chemokine transport across human vascular endothelial cells

Author: David Male
Dzenko K. A.
Elodie Mordelet
Heather A. Davies
Ignacio A. Romero
Philippa Hillyer
Rot A.
Salvetti F.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2007
Field of study

Leukocyte migration across vascular endothelium is mediated by chemokines that are either synthesized by the endothelium or transferred across the endothelium from the tissue. The mechanism of transfer of two chemokines, CXCL10 (interferon gamma inducible protein [IP]-10) and CCL2 (macrophage chemotactic protein [MCP]-1), was compared across dermal and lung microvessel endothelium and saphenous vein endothelium. The rate of transfer depended on both the type of endothelium and the chemokine. The permeability coefficient (Pe) for CCL2 movement across saphenous vein was twice the value for dermal endothelium and four times that for lung endothelium. In contrast, the Pe value for CXCL10 was lower for saphenous vein endothelium than the other endothelia. The differences in transfer rate between endothelia was not related to variation in paracellular permeability using a paracellular tracer, inulin, and immunoelectron microscopy showed that CXCL10 was transferred from the basal membrane in a vesicular compartment, before distribution to the apical membrane. Although all three endothelia expressed high levels of the receptor for CXCL10 (CXCR3), the transfer was not readily saturable and did not appear to be receptor dependent. After 30 min, the chemokine started to be reinternalized from the apical membrane in clathrin-coated vesicles. The data suggest a model for chemokine transcytosis, with a separate pathway for clearance of the apical surface

Crossref

Open Research Online

Reverse Engineering Gene Networks with ANN: Variability in Network Inference Algorithms

Author: A Barabasi
A Baralla
A Braunstein
A Braunstein
A Krishnan
A Margolin
B Di Camillo
B Matthews
C Bishop
C Marr
C Steinhoff
D Marbach
D Stokic
E Dimitrova
E Keedwell
F He
F Markowetz
F Mordelet
G Altay
G Altay
G Karlebach
Giuseppe Jurman
I Nemenman
J Faith
J Peregrin-Alvarez
J Supper
L Song
M Bailly-Bechet
M Bansal
Marco Grimaldi
MB Eisen
N Friedman
P Baldi
P Erdös
P Langfelder
P Meyer
Paolo Provero
R De Smet
R Neal
R Neal
Roberto Visintainer
S Kauffman
S Lahabar
S Tuna
T Cover
VA Huynh-Thu
Y Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 24/09/2010
Field of study

Motivation :Reconstructing the topology of a gene regulatory network is one of the key tasks in systems biology. Despite of the wide variety of proposed methods, very little work has been dedicated to the assessment of their stability properties. Here we present a methodical comparison of the performance of a novel method (RegnANN) for gene network inference based on multilayer perceptrons with three reference algorithms (ARACNE, CLR, KELLER), focussing our analysis on the prediction variability induced by both the network intrinsic structure and the available data. Results: The extensive evaluation on both synthetic data and a selection of gene modules of "Escherichia coli" indicates that all the algorithms suffer of instability and variability issues with regards to the reconstruction of the topology of the network. This instability makes objectively very hard the task of establishing which method performs best. Nevertheless, RegnANN shows MCC scores that compare very favorably with all the other inference methods tested. Availability: The software for the RegnANN inference algorithm is distributed under GPL3 and it is available at the corresponding author home page (http://mpba.fbk.eu/grimaldi/regnann-supmat

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

Directory of Open Access Journals

PubMed Central

ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples

Author: A Su
B Brancotte
B Calvo
B Linghu
B Liu
B Schölkopf
B Schölkopf
B Schölkopf
C Giallourakis
C Perez-Iratxeta
C Son
CC Chang
EA Adie
F Denis
F Mordelet
Fantine Mordelet
FS Turner
G Lanckriet
GRG Lanckriet
J Freudenberg
Jean-Philippe Vert
K Bleakley
K Lage
L Jacob
L Jacob
LC Tranchevent
M van Driel
N López-Bigas
N Tiffin
O Vanunu
P Pavlidis
RI Kondor
S Aerts
S Köhler
S Yu
T De Bie
T Evgeniou
T Hwang
U Ala
V McKusick
X Wu
Y Yamanishi
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to prioritize genes within the list of candidates, by exploiting the wealth of information available about the genes in various databases. Results We propose ProDiGe, a novel algorithm for Prioritization of Disease Genes. ProDiGe implements a novel machine learning strategy based on learning from positive and unlabeled examples, which allows to integrate various sources of information about the genes, to share information about known disease genes across diseases, and to perform genome-wide searches for new disease genes. Experiments on real data show that ProDiGe outperforms state-of-the-art methods for the prioritization of genes in human diseases. Conclusions ProDiGe implements a new machine learning paradigm for gene prioritization, which could help the identification of new disease genes. It is freely available at <url>http://cbio.ensmp.fr/prodige</url>.</p

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Supervised prediction of drug–target interactions using bipartite local models

Author: Bleakley
Campillos
Chang
Cheng
Dobson
Gunther
Haggarty
Hattori
Jacob
Kanehisa
Keiser
Kevin Bleakley
Kuruvilla
Lotan
Mahe
Mordelet
Nagamine
R Development Core Team
Rarey
Saigo
Schomburg
Schölkopf
Schölkopf
Smith
Stockwell
Vapnik
Wheeler
Wishart
Yamanishi
Yamanishi
Yoshihiro Yamanishi
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: In silico prediction of drug–target interactions from heterogeneous biological data is critical in the search for drugs for known diseases. This problem is currently being attacked from many different points of view, a strong indication of its current importance. Precisely, being able to predict new drug–target interactions with both high precision and accuracy is the holy grail, a fundamental requirement for in silico methods to be useful in a biological setting. This, however, remains extremely challenging due to, amongst other things, the rarity of known drug–target interactions

Crossref

PubMed Central

Treegrass: a 3D, process-based model for simulating plant interactions in tree–grass ecosystems

Author: Asner
Baruch
Belsky
Belsky
Breshears
Brouwer
Ciret
Cruz
Dewar
Durand
Eagleson
G. Simioni
Gignoux
H. Sinoquet
Howard
Isichei
J. Gignoux
Jackson
Jarvis
Jeltsch
Joffre
Knoop
Korzukhin
Landsberg
Le Roux
Le Roux
Le Roux
Le Roux
Le Roux
Medina
Menaut
Menaut
Monteith
Monteith
Moon
Mordelet
Mordelet
Ozisik
Pacala
Parton
Rhoades
Scholes
Scholes
Schulze
Simoes
Sinoquet
Sinoquet
Solbrig
Stuart-Hill
Tournebize
Ullman
Walker
Weishampel
Weltzin
X. Le Roux
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Scuba:Scalable kernel-based gene prioritization

Author: Alessandro Sperduti
B Chen
B Chen
C Wu
D Botstein
D Börnigen
D Nitsch
D Salgado
D Seelow
Dinh Van Tran
E Adie
F Aiolli
F Aiolli
F Fouss
F Mordelet
Fabio Aiolli
Giorgio Valle
Guido Zampieri
I Vastrik
J Chen
J Hanley
J Hutz
J Shawe-Taylor
K Borgwardt
K Goh
L Jensen
M Gönen
M Kanehisa
M Polato
M Ritchie
M Whirl-Carrillo
Michele Donini
Nicolò Navarin
O Chapelle
P Chebotarev
P Devijver
P Zakeri
R Kondor
S Aerts
S Köhler
S Köhler
S Yu
T De Bie
T Strachan
TS Keshava Prasad
X Wang
Y Chen
Y Moreau
Y Yoshida
Y Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Abstract Background The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. Results We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Conclusions Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba

Crossref

Directory of Open Access Journals

Teeside University's Research Repository

Archivio istituzionale della ricerca - Università di Padova