137 research outputs found

    Computational Molecular Biology

    No full text
    Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography

    Identification of mammalian orthologs using local synteny

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Accurate determination of orthology is central to comparative genomics. For vertebrates in particular, very large gene families, high rates of gene duplication and loss, multiple mechanisms of gene duplication, and high rates of retrotransposition all combine to make inference of orthology between genes difficult. Many methods have been developed to identify orthologous genes, mostly based upon analysis of the inferred protein sequence of the genes. More recently, methods have been proposed that use genomic context in addition to protein sequence to improve orthology assignment in vertebrates. Such methods have been most successfully implemented in fungal genomes and have long been used in prokaryotic genomes, where gene order is far less variable than in vertebrates. However, to our knowledge, no explicit comparison of synteny and sequence based definitions of orthology has been reported in vertebrates, or, more specifically, in mammals.</p> <p>Results</p> <p>We test a simple method for the measurement and utilization of gene order (local synteny) in the identification of mammalian orthologs by investigating the agreement between coding sequence based orthology (Inparanoid) and local synteny based orthology. In the 5 mammalian genomes studied, 93% of the sampled inter-species pairs were found to be concordant between the two orthology methods, illustrating that local synteny is a robust substitute to coding sequence for identifying orthologs. However, 7% of pairs were found to be discordant between local synteny and Inparanoid. These cases of discordance result from evolutionary events including retrotransposition and genome rearrangements.</p> <p>Conclusions</p> <p>By analyzing cases of discordance between local synteny and Inparanoid we show that local synteny can distinguish between true orthologs and recent retrogenes, can resolve ambiguous many-to-many orthology relationships into one-to-one ortholog pairs, and might be used to identify cases of non-orthologous gene displacement by retroduplicated paralogs.</p

    Comparing De Novo Genome Assembly: The Long and Short of It

    Get PDF
    Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers – both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies – are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing “next-generation” assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium

    H3K56me3 is a novel, conserved heterochromatic mark that largely but not completely overlaps with H3K9me3 in both regulation and localization.

    Get PDF
    Histone lysine (K) methylation has been shown to play a fundamental role in modulating chromatin architecture and regulation of gene expression. Here we report on the identification of histone H3K56, located at the pivotal, nucleosome DNA entry/exit point, as a novel methylation site that is evolutionary conserved. We identify trimethylation of H3K56 (H3K56me3) as a modification that is present during all cell cycle phases, with the exception of S-phase, where it is underrepresented on chromatin. H3K56me3 is a novel heterochromatin mark, since it is enriched at pericentromeres but not telomeres and is thereby similar, but not identical, to the localization of H3K9me3 and H4K20me3. Possibly due to H3 sequence similarities, Suv39h enzymes, responsible for trimethylation of H3K9, also affect methylation of H3K56. Similarly, we demonstrate that trimethylation of H3K56 is removed by members of the JMJD2 family of demethylases that also target H3K9me3. Furthermore, we identify and characterize mouse mJmjd2E and its human homolog hKDM4L as novel, functionally active enzymes that catalyze the removal of two methyl groups from trimethylated H3K9 and K56. H3K56me3 is also found in C. elegans, where it co-localizes with H3K9me3 in most, but not all, tissues. Taken together, our findings raise interesting questions regarding how methylation of H3K9 and H3K56 is regulated in different organisms and their functional roles in heterochromatin formation and/or maintenance

    DNA typing of the human small intestinal protozoan parasite Giardia lamblia

    Get PDF
    PhDAt present there is no satisfactory means of typing strains of Giardia lamblia which can explain the broad range of clinical symptoms seen in giardiasis or which can identify genotypes in epidemiological studies. This thesis attempts to address these problems by developing DNA based typing systems sensitive enough to be able to identify many different Giardia genotypes and which may be applied to the organisms found in clinical samples. Four different techniques were assessed for their ability to identify multiple polymorphic loci in the Giardia genome which may be used to genotype and identify isolates of Giardia and upon which the future development of PCR-typing protocols may be based. These techniques included RFLP analysis, random amplified polymorphic DNA (RAPD) analysis, M13 DNA fingerprinting and minisatellite DNA fingerprinting. Minisatellite DNA fingerprinting proved to be the most discriminatory, recognising many hypervariable loci within the Giardia genome which proved useful for in vitro studies on genotypic heterogeneity within Giardia isolates. This approach would require further development in order to be used on in vivo infections where it could directly assess the relationship between genotype and pathogenicity. Therefore the variable repeats recognised on Giardia fingerprints were sought by constructing and screening a Giardia genomic DNA cosmid library. Once cloned these repeats would form the basis of sensitive and specific PCR-based fingerprinting protocols ideal for typing large numbers of infections. The repeat sequences cloned in this way turned out to be Giardia variable surface protein genes with short, imperfect tandem repeats in their 3' flanking DNA. This work has important implications for the future development and use of fingerprinting techniques on Giardia and may be useful in the study of chromosome rearrangement in Giardia which is likely to be involved in surface antigen switching

    Genome evolution in Reptilia: in silico chicken mapping of 12,000 BAC-end sequences from two reptiles and a basal bird

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the publication of the draft chicken genome and the recent production of several BAC clone libraries from non-avian reptiles and birds, it is now possible to undertake more detailed comparative genomic studies in Reptilia. Of interest in particular are the genomic events that transformed the large, repeat-rich genomes of mammals and non-avian reptiles into the minimalist chicken genome. We have used paired BAC end sequences (BESs) from the American alligator (<it>Alligator mississippiensis</it>), painted turtle (<it>Chrysemys picta</it>) and emu (<it>Dromaius novaehollandiae</it>) to investigate patterns of sequence divergence, gene and retroelement content, and microsynteny between these species and chicken.</p> <p>Results</p> <p>From a total of 11,967 curated BESs, we successfully mapped 725, 773 and 2597 sequences in alligator, turtle, and emu, respectively, to sites in the draft chicken genome using a stringent BLAST protocol. Most commonly, sequences mapped to a single site in the chicken genome. Of 1675, 1828 and 2936 paired BESs obtained for alligator, turtle, and emu, respectively, a total of 34 (alligator, 2%), 24 (turtle, 1.3%) and 479 (emu, 16.3%) pairs were found to map with high confidence and in the correct orientation and with BAC-sized intermarker distances to single chicken chromosomes, including 25 such paired hits in emu mapping to the chicken Z chromosome. By determining the insert sizes of a subset of BAC clones from these three species, we also found a significant correlation between the intermarker distance in alligator and turtle and in chicken, with slopes as expected on the basis of the ratio of the genome sizes.</p> <p>Conclusion</p> <p>Our results suggest that a large number of small-scale chromosomal rearrangements and deletions in the lineage leading to chicken have drastically reduced the number of detected syntenies observed between the chicken and alligator, turtle, and emu genomes and imply that small deletions occurring widely throughout the genomes of reptilian and avian ancestors led to the ~50% reduction in genome size observed in birds compared to reptiles. We have also mapped and identified likely gene regions in hundreds of new BAC clones from these species.</p

    Next-Generation Sequencing: Acquisition, Analysis, and Assembly

    Get PDF
    The process of sequencing a genome involves many steps, and accordingly, this project contains work from each of those steps. Genome sequencing begins with acquisition of sequence data, therefore, a novel biochemistry was utilized and optimized for the Sequencing By Ligation (SBL) process. A cyclic SBL protocol was created that could be utilized to extend sequencing reads in both the 5\u27 and 3\u27 directions, for an increase in read length and thru-put. After sequence acquisition, there is the process of data analysis, and the focus shifted to creating software that could take sequence information and match up the individual reads to a reference genome with greater speed and efficiency than other commonly-used software. The Sequence Analysis Workbench Tool, SAWTooth, was written and shown to outperform contemporaries NOVOAlign and BOWTIE. Finally, the last aspect of genome sequencing is de novo assembly, prompting a comparative analysis of three assemblers: CLC Genomics Workbench, Velvet Assembler, and MIRA. Results were generated using Mauve to assess the general effects of different sequencing platforms on the final assembly

    On the role of metaheuristic optimization in bioinformatics

    Get PDF
    Metaheuristic algorithms are employed to solve complex and large-scale optimization problems in many different fields, from transportation and smart cities to finance. This paper discusses how metaheuristic algorithms are being applied to solve different optimization problems in the area of bioinformatics. While the text provides references to many optimization problems in the area, it focuses on those that have attracted more interest from the optimization community. Among the problems analyzed, the paper discusses in more detail the molecular docking problem, the protein structure prediction, phylogenetic inference, and different string problems. In addition, references to other relevant optimization problems are also given, including those related to medical imaging or gene selection for classification. From the previous analysis, the paper generates insights on research opportunities for the Operations Research and Computer Science communities in the field of bioinformatics

    Discovery of novel molecular and biochemical predictors of response and outcome in diffuse large B-cell lymphoma

    Get PDF
    PhDDiscovery of Novel Molecular and Biochemical Predictors of Response and Outcome in Diffuse Large B-cell Lymphoma Diffuse large B-cell lymphoma (DLBCL) is the commonest form of non-Hodgkin lymphoma and responds to treatment with a 5-year overall survival (OS) of 40-50%. Predicting outcome using the best available method, the International Prognostic Index (IPI), is inaccurate and unsatisfactory. This thesis describes research undertaken to discover, explore and validate new molecular and biochemical predictors of response and long-term outcome with the aims of improving on the inaccurate IPI and of suggesting novel therapeutic approaches. Two strategies were adopted: a rational and an empirical approach. The rational strategy used gene expression profiling to identify transcriptional signatures that correlated with outcome to treatment and from which a model of 13-genes accurately predict long-term OS. Two components of the 13-gene model, PKC and PDE4B, were studied using inhibitors in lymphoma cell-lines and primary cell cultures. PKC inhibition using SC-236 proved to be cytostatic and cytotoxic in the cell-lines examined and to a lesser extent in primary tumours. PDE4 inhibition using piclamilast and rolipram had no effect either alone or in combination with chemotherapy. The empirical approach investigated the trace element selenium in presentation serum and found that it was a biochemical predictor of response and outcome to treatment. In an attempt to provide evidence of a causal relationship as an explanation for the associations between presentation serum selenium, response and outcome, two selenium compounds, methylseleninic acid (MSA) and selenodiglutathione (SDG) were studied in vitro in the same lymphoma cell-lines and primary cell cultures. Both MSA and SDG exhibited cytostatic and cytotoxic activity and caspase-8 and caspase-9 driven apoptosis. For SDG reactive oxygen species generation was important for its activity in three of the four cell-lines. In conclusion, molecular and biochemical predictors of response and survival were discovered in DLBCL that led to viable targets for drug intervention being validated in vitro

    Loss of DCC gene expression during ovarian tumorigenesis: relation to tumour differentiation and progression

    Get PDF
    To clarify the possible role of DCC gene alteration in ovarian neoplasias, we immunohistochemically investigated 124 carcinomas, as well as 55 cystadenomas and 41 low malignant potential (LMP) tumours and compared the results with those for p53 protein expression, clinicopathological factors and survival. A combination of the reverse transcription polymerase chain reaction (RT-PCR) and Southern blot hybridization (SBH) for DCC mRNA levels was also carried out on 26 malignant, five LMP, eight benign and seven normal ovarian samples. Significantly decreased levels of overall DCC values in carcinomas compared with benign and LMP lesions were revealed by both immunohistochemical and RT-PCR/SBH assays. Similar findings were also noted when subdivision was into serous and mucinous categories. In carcinomas, reduction or loss of DCC expression was significantly related to the serous phenotype (serous vs non-serous, P< 0.0001), a high histological grade (grade 1 vs 2 or 3, P< 0.02) and a more advanced stage (FIGO stage I vs II/III/IV, P = 0.0083), while no association was noted with survival. Although p53 immunopositivity demonstrated significant stepwise increase from benign through to malignant lesions, there was no clear association with DCC score values. The results indicated that impaired DCC expression may play an important role in ovarian tumorigenesis. In ovarian carcinomas, the altered expression is closely linked with tumour differentiation and progression. © 2000 Cancer Research Campaig
    corecore