9,485 research outputs found
Automatic Network Fingerprinting through Single-Node Motifs
Complex networks have been characterised by their specific connectivity
patterns (network motifs), but their building blocks can also be identified and
described by node-motifs---a combination of local network features. One
technique to identify single node-motifs has been presented by Costa et al. (L.
D. F. Costa, F. A. Rodrigues, C. C. Hilgetag, and M. Kaiser, Europhys. Lett.,
87, 1, 2009). Here, we first suggest improvements to the method including how
its parameters can be determined automatically. Such automatic routines make
high-throughput studies of many networks feasible. Second, the new routines are
validated in different network-series. Third, we provide an example of how the
method can be used to analyse network time-series. In conclusion, we provide a
robust method for systematically discovering and classifying characteristic
nodes of a network. In contrast to classical motif analysis, our approach can
identify individual components (here: nodes) that are specific to a network.
Such special nodes, as hubs before, might be found to play critical roles in
real-world networks.Comment: 16 pages (4 figures) plus supporting information 8 pages (5 figures
Mining protein database using machine learning techniques
With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous. We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies. In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a "knowledge gap" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins
Identification of Outlying Observations with Quantile Regression for Censored Data
Outlying observations, which significantly deviate from other measurements,
may distort the conclusions of data analysis. Therefore, identifying outliers
is one of the important problems that should be solved to obtain reliable
results. While there are many statistical outlier detection algorithms and
software programs for uncensored data, few are available for censored data. In
this article, we propose three outlier detection algorithms based on censored
quantile regression, two of which are modified versions of existing algorithms
for uncensored or censored data, while the third is a newly developed algorithm
to overcome the demerits of previous approaches. The performance of the three
algorithms was investigated in simulation studies. In addition, real data from
SEER database, which contains a variety of data sets related to various
cancers, is illustrated to show the usefulness of our methodology. The
algorithms are implemented into an R package OutlierDC which can be
conveniently employed in the \proglang{R} environment and freely obtained from
CRAN
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.
BackgroundSingle-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve.ResultsWe introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods.ConclusionsSlingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression
- …