Search CORE

9,485 research outputs found

A kernel-based approach for detecting outliers of high-dimensional biological data

Author: A Malossini
B Schölkopf
C Aggarwal
D Koller
E Knorr
E Knorr
F Angiulli
H Ressom
J Oh
Jean Gao
JS Wang
Jung Hun Oh
K Kadota
L Manevitz
M Tumminello
R Lilien
S Bandyopadhyay
S Zhou
T Fawcett
T Golub
U Alon
W Lee
Publication venue: BioMed Central
Publication date: 29/04/2009
Field of study

Automatic Network Fingerprinting through Single-Node Motifs

Author: AK Jain
AL Barabási
AL Barabási
Christoph Echtermeyer
D Arthur
D Centola
D Lazer
DJ MacKay
DJ Watts
DJ Watts
E Bullmore
E Estrada
E Parzen
FA Rodrigues
Francisco A. Rodrigues
G Szabo
H Jeong
I Bordino
J Guare
J Ozik
J Wang
JJ Ramasco
JW Eaton
LDF Costa
LDF Costa
LDF Costa
LDF Costa
LDF Costa
Luciano da Fontoura Costa
M Barthélemy
M Faloutsos
M Groening
M Kaiser
M Kaiser
M Kaiser
M Kaiser
M Kitsak
M Kuramochi
M Middendorf
M Perc
M Perc
MA Nowak
Marcus Kaiser
Matjaz Perc
MEJ Newman
MEJ Newman
MEJ Newman
N Kashtan
O Sporns
P Erdös
P Ribeiro
PC Mahalanobis
R Albert
R Albert
R Albert
R Milo
R Milo
R Pastor-Satorras
RA Johnson
RO Duda
S Boccaletti
S Carmi
S Funk
S Meloni
S Milgram
S Saavedra
S Schnettler
S Wasserman
SB Seidman
SP Borgatti
SV Buldyrev
T Gross
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Complex networks have been characterised by their specific connectivity patterns (network motifs), but their building blocks can also be identified and described by node-motifs---a combination of local network features. One technique to identify single node-motifs has been presented by Costa et al. (L. D. F. Costa, F. A. Rodrigues, C. C. Hilgetag, and M. Kaiser, Europhys. Lett., 87, 1, 2009). Here, we first suggest improvements to the method including how its parameters can be determined automatically. Such automatic routines make high-throughput studies of many networks feasible. Second, the new routines are validated in different network-series. Third, we provide an example of how the method can be used to analyse network time-series. In conclusion, we provide a robust method for systematically discovering and classifying characteristic nodes of a network. In contrast to classical motif analysis, our approach can identify individual components (here: nodes) that are specific to a network. Such special nodes, as hubs before, might be found to play critical roles in real-world networks.Comment: 16 pages (4 figures) plus supporting information 8 pages (5 figures

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

RCAAP - Repositório Científico de Acesso Aberto de Portugal

Mining protein database using machine learning techniques

Author: Camargo Renata
Niranjan Mahesan
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/06/2008
Field of study

With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous. We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies. In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a "knowledge gap" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins

Southampton (e-Prints Soton)

Identification of Outlying Observations with Quantile Regression for Censored Data

Author: Cho HyungJun
Eo Soo-Heang
Hong Seung-Mo
Publication venue
Publication date: 30/04/2014
Field of study

Outlying observations, which significantly deviate from other measurements, may distort the conclusions of data analysis. Therefore, identifying outliers is one of the important problems that should be solved to obtain reliable results. While there are many statistical outlier detection algorithms and software programs for uncensored data, few are available for censored data. In this article, we propose three outlier detection algorithms based on censored quantile regression, two of which are modified versions of existing algorithms for uncensored or censored data, while the third is a newly developed algorithm to overcome the demerits of previous approaches. The performance of the three algorithms was investigated in simulation studies. In addition, real data from SEER database, which contains a variety of data sets related to various cancers, is illustrated to show the usefulness of our methodology. The algorithms are implemented into an R package OutlierDC which can be conveniently employed in the \proglang{R} environment and freely obtained from CRAN

arXiv.org e-Print Archive

CiteSeerX

One-Class Classification: Taxonomy of Study and Review of Techniques

Author: Khan Shehroz S.
Madden Michael G.
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 29/11/2013
Field of study

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

arXiv.org e-Print Archive

Access to Research at National University of Ireland, Galway

Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.

Author: Das Diya
Dudoit Sandrine
Fletcher Russell B
Ngai John
Purdom Elizabeth
Risso Davide
Street Kelly
Yosef Nir
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

BackgroundSingle-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve.ResultsWe introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods.ConclusionsSlingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression

Directory of Open Access Journals

eScholarship - University of California

Archivio istituzionale della ricerca - Università di Padova