40 research outputs found

    ISMB/ECCB 2009 Stockholm

    Get PDF
    The International Society for Computational Biology (ISCB; http://www.iscb.org) presents the Seventeenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB), organized jointly with the Eighth Annual European Conference on Computational Biology (ECCB; http://bioinf.mpi-inf.mpg.de/conferences/eccb/eccb.htm), in Stockholm, Sweden, 27 June to 2 July 2009. The organizers are putting the finishing touches on the year's premier computational biology conference, with an expected attendance of 1400 computer scientists, mathematicians, statisticians, biologists and scientists from other disciplines related to and reliant on this multi-disciplinary science. ISMB/ECCB 2009 (http://www.iscb.org/ismbeccb2009/) follows the framework introduced at the ISMB/ECCB 2007 (http://www.iscb.org/ismbeccb2007/) in Vienna, and further refined at the ISMB 2008 (http://www.iscb.org/ismb2008/) in Toronto; a framework developed to specifically encourage increased participation from often under-represented disciplines at conferences on computational biology. During the main ISMB conference dates of 29 June to 2 July, keynote talks from highly regarded scientists, including ISCB Award winners, are the featured presentations that bring all attendees together twice a day. The remainder of each day offers a carefully balanced selection of parallel sessions to choose from: proceedings papers, special sessions on emerging topics, highlights of the past year's published research, special interest group meetings, technology demonstrations, workshops and several unique sessions of value to the broad audience of students, faculty and industry researchers. Several hundred posters displayed for the duration of the conference has become a standard of the ISMB and ECCB conference series, and an extensive commercial exhibition showcases the latest bioinformatics publications, software, hardware and services available on the market today. The main conference is preceded by 2 days of Special Interest Group (SIG) and Satellite meetings running in parallel to the fifth Student Council Symposium on 27 June, and in parallel to Tutorials on 28 June. All scientific sessions take place at the Stockholmsmässan/Stockholm International Fairs conference and exposition facility

    Paving the future: finding suitable ISMB venues

    Get PDF
    ISCB, the International Society for Computational Biology, organizes the largest event in the field of computational biology and bioinformatics, namely the annual ISMB, the international conference on Intelligent Systems for Molecular Biology. This year at ISMB 2012 in Long Beach, ISCB celebrated the 20th anniversary of its flagship meeting. ISCB is a young, lean and efficient society that aspires to make a significant impact with only limited resources. Many constraints make the choice of venues for ISMB a tough challenge. Here, we describe those challenges and invite the contribution of ideas for solutions

    Inferring gene ontologies from pairwise similarity data.

    Get PDF
    MotivationWhile the manually curated Gene Ontology (GO) is widely used, inferring a GO directly from -omics data is a compelling new problem. Recognizing that ontologies are a directed acyclic graph (DAG) of terms and hierarchical relations, algorithms are needed that: analyze a full matrix of gene-gene pairwise similarities from -omics data; infer true hierarchical structure in these data rather than enforcing hierarchy as a computational artifact; and respect biological pleiotropy, by which a term in the hierarchy can relate to multiple higher level terms. Methods addressing these requirements are just beginning to emerge-none has been evaluated for GO inference.MethodsWe consider two algorithms [Clique Extracted Ontology (CliXO), LocalFitness] that uniquely satisfy these requirements, compared with methods including standard clustering. CliXO is a new approach that finds maximal cliques in a network induced by progressive thresholding of a similarity matrix. We evaluate each method's ability to reconstruct the GO biological process ontology from a similarity matrix based on (a) semantic similarities for GO itself or (b) three -omics datasets for yeast.ResultsFor task (a) using semantic similarity, CliXO accurately reconstructs GO (>99% precision, recall) and outperforms other approaches (<20% precision, <20% recall). For task (b) using -omics data, CliXO outperforms other methods using two -omics datasets and achieves ∼30% precision and recall using YeastNet v3, similar to an earlier approach (Network Extracted Ontology) and better than LocalFitness or standard clustering (20-25% precision, recall).ConclusionThis study provides algorithmic foundation for building gene ontologies by capturing hierarchical and pleiotropic structure embedded in biomolecular data

    Detection of allele-specific methylation through a generalized heterogeneous epigenome model

    Get PDF
    Motivations: High-throughput sequencing has made it possible to sequence DNA methylation of a whole genome at the single-base resolution. A sample, however, may contain a number of distinct methylation patterns. For instance, cells of different types and in different developmental stages may have different methylation patterns. Alleles may be differentially methylated, which may partially explain that the large portions of epigenomes from single cell types are partially methylated, and may have major effects on transcriptional output. Approaches relying on DNA sequence polymorphism to identify individual patterns from a mixture of heterogeneous epigenomes are insufficient as methylcytosines occur at a much higher density than SNPs

    WordSpy: identifying transcription factor binding motifs by building a dictionary and learning a grammar

    Get PDF
    Transcription factor (TF) binding sites or motifs (TFBMs) are functional cis-regulatory DNA sequences that play an essential role in gene transcriptional regulation. Although many experimental and computational methods have been developed, finding TFBMs remains a challenging problem. We propose and develop a novel dictionary based motif finding algorithm, which we call WordSpy. One significant feature of WordSpy is the combination of a word counting method and a statistical model which consists of a dictionary of motifs and a grammar specifying their usage. The algorithm is suitable for genome-wide motif finding; it is capable of discovering hundreds of motifs from a large set of promoters in a single run. We further enhance WordSpy by applying gene expression information to separate true TFBMs from spurious ones, and by incorporating negative sequences to identify discriminative motifs. In addition, we also use randomly selected promoters from the genome to evaluate the significance of the discovered motifs. The output from WordSpy consists of an ordered list of putative motifs and a set of regulatory sequences with motif binding sites highlighted. The web server of WordSpy is available at

    IMMUNOCAT—A Data Management System for Epitope Mapping Studies

    Get PDF
    To enable rationale vaccine design, studies of molecular and cellular mechanisms of immune recognition need to be linked with clinical studies in humans. A major challenge in conducting such translational research studies lies in the management and integration of large amounts and various types of data collected from multiple sources. For this purpose, we have established “IMMUNOCAT”, an interactive data management system for the epitope discovery research projects conducted by our group. The system provides functions to store, query, and analyze clinical and experimental data, enabling efficient, systematic, and integrative data management. We demonstrate how IMMUNOCAT is utilized in a large-scale research contract that aims to identify epitopes in common allergens recognized by T cells from human donors, in order to facilitate the rational design of allergy vaccines. At clinical sites, demographic information and disease history of each enrolled donor are captured, followed by results of an allergen skin test and blood draw. At the laboratory site, T cells derived from blood samples are tested for reactivity against a panel of peptides derived from common human allergens. IMMUNOCAT stores results from these T cell assays along with MHC:peptide binding data, results from RAST tests for antibody titers in donor serum, and the respective donor HLA typing results. Through this system, we are able to perform queries and integrated analyses of the various types of data. This provides a case study for the use of bioinformatics and information management techniques to track and analyze data produced in a translational research study aimed at epitope identification

    Department of Computer Science Activity 1998-2004

    Get PDF
    This report summarizes much of the research and teaching activity of the Department of Computer Science at Dartmouth College between late 1998 and late 2004. The material for this report was collected as part of the final report for NSF Institutional Infrastructure award EIA-9802068, which funded equipment and technical staff during that six-year period. This equipment and staff supported essentially all of the department\u27s research activity during that period

    Parallel Minimum Cuts in Near-linear Work and Low Depth

    Full text link
    We present the first near-linear work and poly-logarithmic depth algorithm for computing a minimum cut in a graph, while previous parallel algorithms with poly-logarithmic depth required at least quadratic work in the number of vertices. In a graph with nn vertices and mm edges, our algorithm computes the correct result with high probability in O(mlog4n)O(m {\log}^4 n) work and O(log3n)O({\log}^3 n) depth. This result is obtained by parallelizing a data structure that aggregates weights along paths in a tree and by exploiting the connection between minimum cuts and approximate maximum packings of spanning trees. In addition, our algorithm improves upon bounds on the number of cache misses incurred to compute a minimum cut

    A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery

    Get PDF
    Functional relationships between proteins that do not share global structure similarity can be established by detecting their ligand-binding-site similarity. For a large-scale comparison, it is critical to accurately and efficiently assess the statistical significance of this similarity. Here, we report an efficient statistical model that supports local sequence order independent ligand–binding-site similarity searching. Most existing statistical models only take into account the matching vertices between two sites that are defined by a fixed number of points. In reality, the boundary of the binding site is not known or is dependent on the bound ligand making these approaches limited. To address these shortcomings and to perform binding-site mapping on a genome-wide scale, we developed a sequence-order independent profile–profile alignment (SOIPPA) algorithm that is able to detect local similarity between unknown binding sites a priori. The SOIPPA scoring integrates geometric, evolutionary and physical information into a unified framework. However, this imposes a significant challenge in assessing the statistical significance of the similarity because the conventional probability model that is based on fixed-point matching cannot be applied. Here we find that scores for binding-site matching by SOIPPA follow an extreme value distribution (EVD). Benchmark studies show that the EVD model performs at least two-orders faster and is more accurate than the non-parametric statistical method in the previous SOIPPA version. Efficient statistical analysis makes it possible to apply SOIPPA to genome-based drug discovery. Consequently, we have applied the approach to the structural genome of Mycobacterium tuberculosis to construct a protein–ligand interaction network. The network reveals highly connected proteins, which represent suitable targets for promiscuous drugs

    Systematic identification of conserved motif modules in the human genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification of motif modules, groups of multiple motifs frequently occurring in DNA sequences, is one of the most important tasks necessary for annotating the human genome. Current approaches to identifying motif modules are often restricted to searches within promoter regions or rely on multiple genome alignments. However, the promoter regions only account for a limited number of locations where transcription factor binding sites can occur, and multiple genome alignments often cannot align binding sites with their true counterparts because of the short and degenerative nature of these transcription factor binding sites.</p> <p>Results</p> <p>To identify motif modules systematically, we developed a computational method for the entire non-coding regions around human genes that does not rely upon the use of multiple genome alignments. First, we selected orthologous DNA blocks approximately 1-kilobase in length based on discontiguous sequence similarity. Next, we scanned the conserved segments in these blocks using known motifs in the TRANSFAC database. Finally, a frequent pattern mining technique was applied to identify motif modules within these blocks. In total, with a false discovery rate cutoff of 0.05, we predicted 3,161,839 motif modules, 90.8% of which are supported by various forms of functional evidence. Compared with experimental data from 14 ChIP-seq experiments, on average, our methods predicted 69.6% of the ChIP-seq peaks with TFBSs of multiple TFs. Our findings also show that many motif modules have distance preference and order preference among the motifs, which further supports the functionality of these predictions.</p> <p>Conclusions</p> <p>Our work provides a large-scale prediction of motif modules in mammals, which will facilitate the understanding of gene regulation in a systematic way.</p
    corecore