7,842 research outputs found

    Transcription Factor-DNA Binding Via Machine Learning Ensembles

    Full text link
    We present ensemble methods in a machine learning (ML) framework combining predictions from five known motif/binding site exploration algorithms. For a given TF the ensemble starts with position weight matrices (PWM's) for the motif, collected from the component algorithms. Using dimension reduction, we identify significant PWM-based subspaces for analysis. Within each subspace a machine classifier is built for identifying the TF's gene (promoter) targets (Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool. Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string) feature PWM-based subspaces that stand out in identifying gene targets. We approach Problem 3 (binding sites) with a novel machine learning approach that uses promoter string features and ML importance scores in a classification algorithm locating binding sites across the genome. For target gene identification this method improves performance (measured by the F1 score) by about 10 percentage points over the (a) motif scanning method and (b) the coexpression-based association method. Top motif outperformed 5 component algorithms as well as two other common algorithms (BEST and DEME). For identifying individual binding sites on a benchmark cross species database (Tompa et al., 2005) we match the best performer without much human intervention. It also improved the performance on mammalian TFs. The ensemble can integrate orthogonal information from different weak learners (potentially using entirely different types of features) into a machine learner that can perform consistently better for more TFs. The TF gene target identification component (problem 1 above) is useful in constructing a transcriptional regulatory network from known TF-target associations. The ensemble is easily extendable to include more tools as well as future PWM-based information.Comment: 33 page

    A General Framework for Complex Network Applications

    Full text link
    Complex network theory has been applied to solving practical problems from different domains. In this paper, we present a general framework for complex network applications. The keys of a successful application are a thorough understanding of the real system and a correct mapping of complex network theory to practical problems in the system. Despite of certain limitations discussed in this paper, complex network theory provides a foundation on which to develop powerful tools in analyzing and optimizing large interconnected systems.Comment: 8 page

    Communities in Networks

    Full text link
    We survey some of the concepts, methods, and applications of community detection, which has become an increasingly important area of network science. To help ease newcomers into the field, we provide a guide to available methodology and open problems, and discuss why scientists from diverse backgrounds are interested in these problems. As a running theme, we emphasize the connections of community detection to problems in statistical physics and computational optimization.Comment: survey/review article on community structure in networks; published version is available at http://people.maths.ox.ac.uk/~porterm/papers/comnotices.pd

    Definition of Naturally Processed Peptides Reveals Convergent Presentation of Autoantigenic Topoisomerase I Epitopes in Scleroderma.

    Get PDF
    ObjectiveAutoimmune responses to DNA topoisomerase I (topo I) are found in a subset of scleroderma patients who are at high risk for interstitial lung disease (ILD) and mortality. Anti-topo I antibodies (ATAs) are associated with specific HLA-DRB1 alleles, and the frequency of HLA-DR-restricted topo I-specific CD4+ T cells is associated with the presence, severity, and progression of ILD. Although this strongly implicates the presentation of topo I peptides by HLA-DR in scleroderma pathogenesis, the processing and presentation of topo I has not been studied.MethodsWe developed a natural antigen processing assay (NAPA) to identify putative CD4+ T cell epitopes of topo I presented by monocyte-derived dendritic cells (mo-DCs) from 6 ATA-positive patients with scleroderma. Mo-DCs were pulsed with topo I protein, HLA-DR-peptide complexes were isolated, and eluted peptides were analyzed by mass spectrometry. We then examined the ability of these naturally presented peptides to induce CD4+ T cell activation in 11 ATA-positive and 11 ATA-negative scleroderma patients.ResultsWe found that a common set of 10 topo I epitopes was presented by Mo-DCs from scleroderma patients with diverse HLA-DR variants. Sequence analysis revealed shared peptide-binding motifs within the HLA-DRβ chains of ATA-positive patients and a subset of topo I epitopes with distinct sets of anchor residues capable of binding to multiple different HLA-DR variants. The NAPA-derived epitopes elicited robust CD4+ T cell responses in 73% of ATA-positive patients (8 of 11), and the number of epitopes recognized correlated with ILD severity (P = 0.025).ConclusionThese findings mechanistically implicate the presentation of a convergent set of topo I epitopes in the development of scleroderma
    corecore