220,969 research outputs found
Transcription Factor-DNA Binding Via Machine Learning Ensembles
We present ensemble methods in a machine learning (ML) framework combining
predictions from five known motif/binding site exploration algorithms. For a
given TF the ensemble starts with position weight matrices (PWM's) for the
motif, collected from the component algorithms. Using dimension reduction, we
identify significant PWM-based subspaces for analysis. Within each subspace a
machine classifier is built for identifying the TF's gene (promoter) targets
(Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool.
Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string)
feature PWM-based subspaces that stand out in identifying gene targets. We
approach Problem 3 (binding sites) with a novel machine learning approach that
uses promoter string features and ML importance scores in a classification
algorithm locating binding sites across the genome. For target gene
identification this method improves performance (measured by the F1 score) by
about 10 percentage points over the (a) motif scanning method and (b) the
coexpression-based association method. Top motif outperformed 5 component
algorithms as well as two other common algorithms (BEST and DEME). For
identifying individual binding sites on a benchmark cross species database
(Tompa et al., 2005) we match the best performer without much human
intervention. It also improved the performance on mammalian TFs.
The ensemble can integrate orthogonal information from different weak
learners (potentially using entirely different types of features) into a
machine learner that can perform consistently better for more TFs. The TF gene
target identification component (problem 1 above) is useful in constructing a
transcriptional regulatory network from known TF-target associations. The
ensemble is easily extendable to include more tools as well as future PWM-based
information.Comment: 33 page
Rectangular Layouts and Contact Graphs
Contact graphs of isothetic rectangles unify many concepts from applications
including VLSI and architectural design, computational geometry, and GIS.
Minimizing the area of their corresponding {\em rectangular layouts} is a key
problem. We study the area-optimization problem and show that it is NP-hard to
find a minimum-area rectangular layout of a given contact graph. We present
O(n)-time algorithms that construct -area rectangular layouts for
general contact graphs and -area rectangular layouts for trees.
(For trees, this is an -approximation algorithm.) We also present an
infinite family of graphs (rsp., trees) that require (rsp.,
) area.
We derive these results by presenting a new characterization of graphs that
admit rectangular layouts using the related concept of {\em rectangular duals}.
A corollary to our results relates the class of graphs that admit rectangular
layouts to {\em rectangle of influence drawings}.Comment: 28 pages, 13 figures, 55 references, 1 appendi
Truncating the loop series expansion for Belief Propagation
Recently, M. Chertkov and V.Y. Chernyak derived an exact expression for the
partition sum (normalization constant) corresponding to a graphical model,
which is an expansion around the Belief Propagation solution. By adding
correction terms to the BP free energy, one for each "generalized loop" in the
factor graph, the exact partition sum is obtained. However, the usually
enormous number of generalized loops generally prohibits summation over all
correction terms. In this article we introduce Truncated Loop Series BP
(TLSBP), a particular way of truncating the loop series of M. Chertkov and V.Y.
Chernyak by considering generalized loops as compositions of simple loops. We
analyze the performance of TLSBP in different scenarios, including the Ising
model, regular random graphs and on Promedas, a large probabilistic medical
diagnostic system. We show that TLSBP often improves upon the accuracy of the
BP solution, at the expense of increased computation time. We also show that
the performance of TLSBP strongly depends on the degree of interaction between
the variables. For weak interactions, truncating the series leads to
significant improvements, whereas for strong interactions it can be
ineffective, even if a high number of terms is considered.Comment: 31 pages, 12 figures, submitted to Journal of Machine Learning
Researc
Communities in Networks
We survey some of the concepts, methods, and applications of community
detection, which has become an increasingly important area of network science.
To help ease newcomers into the field, we provide a guide to available
methodology and open problems, and discuss why scientists from diverse
backgrounds are interested in these problems. As a running theme, we emphasize
the connections of community detection to problems in statistical physics and
computational optimization.Comment: survey/review article on community structure in networks; published
version is available at
http://people.maths.ox.ac.uk/~porterm/papers/comnotices.pd
Robustness: a New Form of Heredity Motivated by Dynamic Networks
We investigate a special case of hereditary property in graphs, referred to
as {\em robustness}. A property (or structure) is called robust in a graph
if it is inherited by all the connected spanning subgraphs of . We motivate
this definition using two different settings of dynamic networks. The first
corresponds to networks of low dynamicity, where some links may be permanently
removed so long as the network remains connected. The second corresponds to
highly-dynamic networks, where communication links appear and disappear
arbitrarily often, subject only to the requirement that the entities are
temporally connected in a recurrent fashion ({\it i.e.} they can always reach
each other through temporal paths). Each context induces a different
interpretation of the notion of robustness.
We start by motivating the definition and discussing the two interpretations,
after what we consider the notion independently from its interpretation, taking
as our focus the robustness of {\em maximal independent sets} (MIS). A graph
may or may not admit a robust MIS. We characterize the set of graphs \forallMIS
in which {\em all} MISs are robust. Then, we turn our attention to the graphs
that {\em admit} a robust MIS (\existsMIS). This class has a more complex
structure; we give a partial characterization in terms of elementary graph
properties, then a complete characterization by means of a (polynomial time)
decision algorithm that accepts if and only if a robust MIS exists. This
algorithm can be adapted to construct such a solution if one exists
- …