7,842 research outputs found
Transcription Factor-DNA Binding Via Machine Learning Ensembles
We present ensemble methods in a machine learning (ML) framework combining
predictions from five known motif/binding site exploration algorithms. For a
given TF the ensemble starts with position weight matrices (PWM's) for the
motif, collected from the component algorithms. Using dimension reduction, we
identify significant PWM-based subspaces for analysis. Within each subspace a
machine classifier is built for identifying the TF's gene (promoter) targets
(Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool.
Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string)
feature PWM-based subspaces that stand out in identifying gene targets. We
approach Problem 3 (binding sites) with a novel machine learning approach that
uses promoter string features and ML importance scores in a classification
algorithm locating binding sites across the genome. For target gene
identification this method improves performance (measured by the F1 score) by
about 10 percentage points over the (a) motif scanning method and (b) the
coexpression-based association method. Top motif outperformed 5 component
algorithms as well as two other common algorithms (BEST and DEME). For
identifying individual binding sites on a benchmark cross species database
(Tompa et al., 2005) we match the best performer without much human
intervention. It also improved the performance on mammalian TFs.
The ensemble can integrate orthogonal information from different weak
learners (potentially using entirely different types of features) into a
machine learner that can perform consistently better for more TFs. The TF gene
target identification component (problem 1 above) is useful in constructing a
transcriptional regulatory network from known TF-target associations. The
ensemble is easily extendable to include more tools as well as future PWM-based
information.Comment: 33 page
A General Framework for Complex Network Applications
Complex network theory has been applied to solving practical problems from
different domains. In this paper, we present a general framework for complex
network applications. The keys of a successful application are a thorough
understanding of the real system and a correct mapping of complex network
theory to practical problems in the system. Despite of certain limitations
discussed in this paper, complex network theory provides a foundation on which
to develop powerful tools in analyzing and optimizing large interconnected
systems.Comment: 8 page
Communities in Networks
We survey some of the concepts, methods, and applications of community
detection, which has become an increasingly important area of network science.
To help ease newcomers into the field, we provide a guide to available
methodology and open problems, and discuss why scientists from diverse
backgrounds are interested in these problems. As a running theme, we emphasize
the connections of community detection to problems in statistical physics and
computational optimization.Comment: survey/review article on community structure in networks; published
version is available at
http://people.maths.ox.ac.uk/~porterm/papers/comnotices.pd
Definition of Naturally Processed Peptides Reveals Convergent Presentation of Autoantigenic Topoisomerase I Epitopes in Scleroderma.
ObjectiveAutoimmune responses to DNA topoisomerase I (topo I) are found in a subset of scleroderma patients who are at high risk for interstitial lung disease (ILD) and mortality. Anti-topo I antibodies (ATAs) are associated with specific HLA-DRB1 alleles, and the frequency of HLA-DR-restricted topo I-specific CD4+ T cells is associated with the presence, severity, and progression of ILD. Although this strongly implicates the presentation of topo I peptides by HLA-DR in scleroderma pathogenesis, the processing and presentation of topo I has not been studied.MethodsWe developed a natural antigen processing assay (NAPA) to identify putative CD4+ T cell epitopes of topo I presented by monocyte-derived dendritic cells (mo-DCs) from 6 ATA-positive patients with scleroderma. Mo-DCs were pulsed with topo I protein, HLA-DR-peptide complexes were isolated, and eluted peptides were analyzed by mass spectrometry. We then examined the ability of these naturally presented peptides to induce CD4+ T cell activation in 11 ATA-positive and 11 ATA-negative scleroderma patients.ResultsWe found that a common set of 10 topo I epitopes was presented by Mo-DCs from scleroderma patients with diverse HLA-DR variants. Sequence analysis revealed shared peptide-binding motifs within the HLA-DRβ chains of ATA-positive patients and a subset of topo I epitopes with distinct sets of anchor residues capable of binding to multiple different HLA-DR variants. The NAPA-derived epitopes elicited robust CD4+ T cell responses in 73% of ATA-positive patients (8 of 11), and the number of epitopes recognized correlated with ILD severity (P = 0.025).ConclusionThese findings mechanistically implicate the presentation of a convergent set of topo I epitopes in the development of scleroderma
- …