Search CORE

243 research outputs found

Proceedings of the 1st Computer Science Student Workshop: Koc University Istinye Campus, Istanbul, Turkey, February 21, 2010

Author
Publication venue: Sabancı University
Publication date: 01/01/2010
Field of study

Robust Algorithms for Detecting Hidden Structure in Biological Data

Author: Sloutsky Roman
Publication venue: Washington University Open Scholarship
Publication date: 15/08/2017
Field of study

Biological data, such as molecular abundance measurements and protein sequences, harbor complex hidden structure that reflects its underlying biological mechanisms. For example, high-throughput abundance measurements provide a snapshot the global state of a living cell, while homologous protein sequences encode the residue-level logic of the proteins\u27 function and provide a snapshot of the evolutionary trajectory of the protein family. In this work I describe algorithmic approaches and analysis software I developed for uncovering hidden structure in both kinds of data. Clustering is an unsurpervised machine learning technique commonly used to map the structure of data collected in high-throughput experiments, such as quantification of gene expression by DNA microarrays or short-read sequencing. Clustering algorithms always yield a partitioning of the data, but relying on a single partitioning solution can lead to spurious conclusions. In particular, noise in the data can cause objects to fall into the same cluster by chance rather than due to meaningful association. In the first part of this thesis I demonstrate approaches to clustering data robustly in the presence of noise and apply robust clustering to analyze the transcriptional response to injury in a neuron cell. In the second part of this thesis I describe identifying hidden specificity determining residues (SDPs) from alignments of protein sequences descended through gene duplication from a common ancestor (paralogs) and apply the approach to identify numerous putative SDPs in bacterial transcription factors in the LacI family. Finally, I describe and demonstrate a new algorithm for reconstructing the history of duplications by which paralogs descended from their common ancestor. This algorithm addresses the complexity of such reconstruction due to indeterminate or erroneous homology assignments made by sequence alignment algorithms and to the vast prevalence of divergence through speciation over divergence through gene duplication in protein evolution

Washington University St. Louis: Open Scholarship

A new method for identifying site-specific evolutionary rates and its applications.

Author: Cummins Carla A.
Publication venue
Publication date: 01/10/2011
Field of study

In this thesis, I discuss each stage in the development of a new method for identifying site specific evolutionary rates, from conception of the idea, through the implementation to its application to data. TIGER, or tree independent generation of evolutionary rates, is based largely around the works of LeQuesne (1989), Wilkinson (1998) and Pisani (2004) and the premise that sites in a multi-state character matrix could be scored based on the level of agreement it displays with the other sites. In these earlier studies, however, agreement was measured in binary manner: sites were either compatible with each other or they are not. TIGER allows various degrees of agreement to occur between two sites, allowing it to pick up more subtle signals in the data. After implementing the method into a software program, it could be applied to data. Using a combination of simulated and empirical datasets, TIGER was shown to produce desirable results. In particular, removal of sites identified by TIGER was shown to improve phylogenetic reconstruction of deeply diverging lineages and of taxa displaying compositional attraction. Additionally, TIGER was applied to a gene content matrix in order to identify HGT signals and integrated into the analysis of a current phylogenetic problem, the origin of the mitochondria. Although it is widely accepted that eukaryotes have a chimeric genome, the specific “parent” of the mitochondria is, as of yet, unclear. Previous studies have failed to reach agreement regarding this issue for a number of reasons. Exploration of the signals using TIGER and heterogeneous modelling reveal that multiple signals and compositional heterogeneity are among the biggest problems with datasets containing both mitochondrial and a-proteobacterial sequences

MURAL - Maynooth University Research Archive Library

AutoCoEv-A High-Throughput In Silico Pipeline for Predicting Inter-Protein Coevolution

Author: Awoniyi Luqman O.
Balc M. Özge
Mattila Pieta K.
Petrov Petar B.
Šuštar Vid
Publication venue: 'MDPI AG'
Publication date: 01/03/2022
Field of study

Protein-protein interactions govern cellular processes via complex regulatory networks, which are still far from being understood. Thus, identifying and understanding connections between proteins can significantly facilitate our comprehension of the mechanistic principles of protein functions. Coevolution between proteins is a sign of functional communication and, as such, provides a powerful approach to search for novel direct or indirect molecular partners. However, an evolutionary analysis of large arrays of proteins in silico is a highly time-consuming effort that has limited the usage of this method for protein pairs or small protein groups. Here, we developed AutoCoEv, a user-friendly, open source, computational pipeline for the search of coevolution between a large number of proteins. By driving 15 individual programs, culminating in CAPS2 as the software for detecting coevolution, AutoCoEv achieves a seamless automation and parallelization of the workflow. Importantly, we provide a patch to the CAPS2 source code to strengthen its statistical output, allowing for multiple comparison corrections and an enhanced analysis of the results. We apply the pipeline to inspect coevolution among 324 proteins identified to be located at the vicinity of the lipid rafts of B lymphocytes. We successfully detected multiple coevolutionary relations between the proteins, predicting many novel partners and previously unidentified clusters of functionally related molecules. We conclude that AutoCoEv, can be used to predict functional interactions from large datasets in a time- and cost-efficient manner

Directory of Open Access Journals

PubMed Central

UTUPub

Recommended from our members

Fusion genes in breast cancer

Author: Batty Elizabeth
Publication venue: University of Cambridge
Publication date: 07/02/2012
Field of study

Fusion genes caused by chromosomal rearrangements are a common and important feature in haematological malignancies, but have until recently been seen as unimportant in epithelial cancers. The discovery of recurrent fusion genes in prostate and lung cancer suggests that fusion genes may play an important role in epithelial carcinogenesis, and that they have been previously under-reported due to the difficulties of cytogenetic analysis of solid tumours. In particular, breast cancers often have complex, highly rearranged karyotypes which have proved difficult to analyse using classical cytogenetic techniques. The aim of this project was to search for fusion genes in breast cancer by using high-resolution mapping of chromosome rearrangements in breast cancer cell lines. Mapping the chromosome rearrangements was initially done using high-resolution DNA microarrays and fluorescence in- situ hybridisation, but moved to high-throughput sequencing as it became available. Interesting candidate genes identified from the mapped chromosome rearrangements were investigated on a larger set of cell lines and primary tumours. The complete karyotypes of two breast cancer cell lines were constructed using a combination of microarrays, fluorescence microscopy, and high-throughput sequencing. A number of potential fusion genes were identified in these two cell lines. Although no expressed fusion genes were found, the complete karyotypes gave insight into the number and mechanisms of chromosome rearrangement in breast cancer, and identified interesting candidate genes which may be of importance in tumourigenesis. Two genes which were fused in other breast cancer cell lines, BCAS3 and ODZ4, were disrupted by chromosome rearrangements and identified as interesting candidate genes in tumorigenesis. A bioinformatic pipeline to process high-throughput sequencing data was set up and validated, and shown to more accurately predict fusion genes than other methods, and can be used to investigate further cell lines and tumours for recurrent fusion genes. The pipeline was used to analyse data from 3 other breast cancer cell lines and predict chromosomal rearrangements and fusion genes, several of which were found to be expressed. Of the fusions predicted in the cell line ZR-75-30, 7 expressed fusion genes were identified, and may have functional significance in breast cancer.This work was supported by a grant from Breast Cancer Campaign

Apollo (Cambridge)

Bioinformatics and Next Generation Sequencing: Applications of Arthropod Genomes

Author: Zhang Zaichao
Publication venue: Scholarship@Western
Publication date: 22/09/2017
Field of study

Over the past decade, the Next Generation Sequencing (NGS) technology has been broadly applied in many areas such as genomics, medical diagnosis, biotechnology, virology, biological systematics, forensic biology, and anthropology. Taken together, it has offered us brilliant insights into life sciences. Most of the work presented in this thesis describes NGS applications on genome assembly, genome annotation, and comparative genomics, using arthropods as case studies: (1) by sequencing and analyzing the genomes of three Tetranychus spider mites with three completely different feeding behaviors, we uncovered genomic signature variations and indicative of pest adaptations; (2) we sequenced, assembled and annotated five Brevipalpus flat mite genomes and their corresponding endosymbiont Cardinium genomes. Comparative genomics reveals herbivorous pest adaptations and parthenogenesis; (3) the complete genomic analysis of parasitoid wasp Copidosoma floridanum indicates the mechanism of polyembryony of such primary parasite of moths. By bioinformatics and genomics approaches, my study provides the genomic basis and establishes the hypotheses for the future biology in pest and arthropod researches. These NGS applications of arthropod genomes will offer new insights into arthropod evolution and plant-herbivore interactions, open unique opportunities to develop novel plant protection strategies, and additionally, provide arthropod genomic resources as well

Scholarship@Western

Probabilistic Protein Design, Comparative Modeling, and the Structure of a Multidomain P53 Oligomer Bound to DNA

Author: Petty II Thomas John
Publication venue: ScholarlyCommons
Publication date: 01/01/2010
Field of study

Proteins are the main functional components of all cellular processes, and most of them fold into unique three-dimensional shapes guided by their amino-acid sequence. Discovering the structure of a protein, or protein complexes, can provide important clues about how they perform their function. However, the chemical, physical or architectural properties of many proteins impede traditional approaches to structure determination. Two such proteins, the tumor suppressor p53 and the cholesterol processing enzyme endothelial lipase, are prime examples of problematic proteins that defy structural investigation via crystallographic methods. Therefore, new techniques must be developed to gain valuable structural insights, such as: computationally assisted protein design strategies, more efficient crystal screening, or a combination of both. We applied a statistical computationally assisted design strategy to stabilize a p53 variant consisting of two independently folding domains. The re-engineered variant retained normal DNA-binding activities, and allowed us to experimentally determine the first structure of a physiologically active multi-domain p53 tetramer bound to a full-length DNA response element. We then demonstrated how computational methodology can be used to gain functional detail of proteins in the absence of experimentally determined structures. By creating comparative models of endothelial lipase, we discovered structural features that describe function and regulation, and gained a better understanding of the mechanisms conferring substrate specificity. Additionally, traditional methods for protein structure determination, such as X-ray crystallography, require relatively large amounts of purified sample in order to screen a sufficient variety of conditions. To improve this process, we developed a novel method for protein crystal screening using a microfluidics platform. We show how it is possible to use smaller quantities of protein to screen larger varieties of conditions, in turn increasing the probability of success in obtaining crystals. Furthermore, in contrast to current crystallographic approaches, all steps from screening to crystal growth to data collection were performed within the same reaction chamber, without any manipulation of the crystal, dramatically increasing the efficiency of both time and sample required to realize the structure. Collectively, these results demonstrate how advances in computational and experimental approaches can provide structural detail for proteins in circumstances where traditional methodology fails

ScholarlyCommons@Penn