1,608 research outputs found

    SimSearch: A new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences

    Get PDF
    http://www.informatik.uni-trier.de/%7Eley/db/conf/iwpacbb/iwpacbb2008.htmlIn this paper, we propose SimSearch, an algorithm implementing a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences. The initial phase of SimSearch is devoted to fulfil the binary similarity matrices by signalling the distances between occurrences of the same symbol. The scoring scheme is further applied, when analysed the maximal extension of the pattern. Employing bit parallelism to analyse the global similarity matrix’s upper triangle, the new methodology searches the sequence(s) for all the exact and approximate patterns in regular or reverse order. The algorithm accepts parameterization to work with greater seeds for near-optimal results. Performance tests show significant efficiency improvement over traditional optimal methods based on dynamic programming. Comparing the new algorithm’s efficiency against heuristic based methods, equalizing the required sensitivity, the proposed algorithm remains acceptable.This work has been partially supported by PRODEP

    An Alignment-Free Approach for Eukaryotic ITS2 Annotation and Phylogenetic Inference

    Get PDF
    The ITS2 gene class shows a high sequence divergence among its members that have complicated its annotation and its use for reconstructing phylogenies at a higher taxonomical level (beyond species and genus). Several alignment strategies have been implemented to improve the ITS2 annotation quality and its use for phylogenetic inferences. Although, alignment based methods have been exploited to the top of its complexity to tackle both issues, no alignment-free approaches have been able to successfully address both topics. By contrast, the use of simple alignment-free classifiers, like the topological indices (TIs) containing information about the sequence and structure of ITS2, may reveal to be a useful approach for the gene prediction and for assessing the phylogenetic relationships of the ITS2 class in eukaryotes. Thus, we used the TI2BioP (Topological Indices to BioPolymers) methodology [1], [2], freely available at http://ti2biop.sourceforge.net/ to calculate two different TIs. One class was derived from the ITS2 artificial 2D structures generated from DNA strings and the other from the secondary structure inferred from RNA folding algorithms. Two alignment-free models based on Artificial Neural Networks were developed for the ITS2 class prediction using the two classes of TIs referred above. Both models showed similar performances on the training and the test sets reaching values above 95% in the overall classification. Due to the importance of the ITS2 region for fungi identification, a novel ITS2 genomic sequence was isolated from Petrakia sp. This sequence and the test set were used to comparatively evaluate the conventional classification models based on multiple sequence alignments like Hidden Markov based approaches, revealing the success of our models to identify novel ITS2 members. The isolated sequence was assessed using traditional and alignment-free based techniques applied to phylogenetic inference to complement the taxonomy of the Petrakia sp. fungal isolate

    Special Topics in Information Technology

    Get PDF
    This open access book presents thirteen outstanding doctoral dissertations in Information Technology from the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. Information Technology has always been highly interdisciplinary, as many aspects have to be considered in IT systems. The doctoral studies program in IT at Politecnico di Milano emphasizes this interdisciplinary nature, which is becoming more and more important in recent technological advances, in collaborative projects, and in the education of young researchers. Accordingly, the focus of advanced research is on pursuing a rigorous approach to specific research topics starting from a broad background in various areas of Information Technology, especially Computer Science and Engineering, Electronics, Systems and Control, and Telecommunications. Each year, more than 50 PhDs graduate from the program. This book gathers the outcomes of the thirteen best theses defended in 2020-21 and selected for the IT PhD Award. Each of the authors provides a chapter summarizing his/her findings, including an introduction, description of methods, main achievements and future work on the topic. Hence, the book provides a cutting-edge overview of the latest research trends in Information Technology at Politecnico di Milano, presented in an easy-to-read format that will also appeal to non-specialists

    Limited Predictability of Amino Acid Substitutions in Seasonal Influenza Viruses

    Get PDF
    Seasonal influenza viruses repeatedly infect humans in part because they rapidly change their antigenic properties and evade host immune responses, necessitating frequent updates of the vaccine composition. Accurate predictions of strains circulating in the future could therefore improve the vaccine match. Here, we studied the predictability of frequency dynamics and fixation of amino acid substitutions. Current frequency was the strongest predictor of eventual fixation, as expected in neutral evolution. Other properties, such as occurrence in previously characterized epitopes or high Local Branching Index (LBI) had little predictive power. Parallel evolution was found to be moderately predictive of fixation. Although the LBI had little power to predict frequency dynamics, it was still successful at picking strains representative of future populations. The latter is due to a tendency of the LBI to be high for consensus-like sequences that are closer to the future than the average sequence. Simulations of models of adapting populations, in contrast, show clear signals of predictability. This indicates that the evolution of influenza HA and NA, while driven by strong selection pressure to change, is poorly described by common models of directional selection such as traveling fitness waves

    INTEGRATIVE ANALYSIS OF OMICS DATA IN ADULT GLIOMA AND OTHER TCGA CANCERS TO GUIDE PRECISION MEDICINE

    Get PDF
    Transcriptomic profiling and gene expression signatures have been widely applied as effective approaches for enhancing the molecular classification, diagnosis, prognosis or prediction of therapeutic response towards personalized therapy for cancer patients. Thanks to modern genome-wide profiling technology, scientists are able to build engines leveraging massive genomic variations and integrating with clinical data to identify “at risk” individuals for the sake of prevention, diagnosis and therapeutic interventions. In my graduate work for my Ph.D. thesis, I have investigated genomic sequencing data mining to comprehensively characterise molecular classifications and aberrant genomic events associated with clinical prognosis and treatment response, through applying high-dimensional omics genomic data to promote the understanding of gene signatures and somatic molecular alterations contributing to cancer progression and clinical outcomes. Following this motivation, my dissertation has been focused on the following three topics in translational genomics. 1) Characterization of transcriptomic plasticity and its association with the tumor microenvironment in glioblastoma (GBM). I have integrated transcriptomic, genomic, protein and clinical data to increase the accuracy of GBM classification, and identify the association between the GBM mesenchymal subtype and reduced tumorpurity, accompanied with increased presence of tumor-associated microglia. Then I have tackled the sole source of microglial as intrinsic tumor bulk but not their corresponding neurosphere cells through both transcriptional and protein level analysis using a panel of sphere-forming glioma cultures and their parent GBM samples.FurthermoreI have demonstrated my hypothesis through longitudinal analysis of paired primary and recurrent GBM samples that the phenotypic alterations of GBM subtypes are not due to intrinsic proneural-to-mesenchymal transition in tumor cells, rather it is intertwined with increased level of microglia upon disease recurrence. Collectively I have elucidated the critical role of tumor microenvironment (Microglia and macrophages from central nervous system) contributing to the intra-tumor heterogeneity and accurate classification of GBM patients based on transcriptomic profiling, which will not only significantly impact on clinical perspective but also pave the way for preclinical cancer research. 2) Identification of prognostic gene signatures that stratify adult diffuse glioma patientsharboring1p/19q co-deletions. I have compared multiple statistical methods and derived a gene signature significantly associated with survival by applying a machine learning algorithm. Then I have identified inflammatory response and acetylation activity that associated with malignant progression of 1p/19q co-deleted glioma. In addition, I showed this signature translates to other types of adult diffuse glioma, suggesting its universality in the pathobiology of other subset gliomas. My efforts on integrative data analysis of this highly curated data set usingoptimizedstatistical models will reflect the pending update to WHO classification system oftumorsin the central nervous system (CNS). 3) Comprehensive characterization of somatic fusion transcripts in Pan-Cancers. I have identified a panel of novel fusion transcripts across all of TCGA cancer types through transcriptomic profiling. Then I have predicted fusion proteins with kinase activity and hub function of pathway network based on the annotation of genetically mobile domains and functional domain architectures. I have evaluated a panel of in -frame gene fusions as potential driver mutations based on network fusion centrality hypothesis. I have also characterised the emerging complexity of genetic architecture in fusion transcripts through integrating genomic structure and somatic variants and delineating the distinct genomic patterns of fusion events across different cancer types. Overall my exploration of the pathogenetic impact and clinical relevance of candidate gene fusions have provided fundamental insights into the management of a subset of cancer patients by predicting the oncogenic signalling and specific drug targets encoded by these fusion genes. Taken together, the translational genomic research I have conducted during my Ph.D. study will shed new light on precision medicine and contribute to the cancer research community. The novel classification concept, gene signature and fusion transcripts I have identified will address several hotly debated issues in translational genomics, such as complex interactions between tumor bulks and their adjacent microenvironments, prognostic markers for clinical diagnostics and personalized therapy, distinct patterns of genomic structure alterations and oncogenic events in different cancer types, therefore facilitating our understanding of genomic alterations and moving us towards the development of precision medicine

    Special Topics in Information Technology

    Get PDF
    This open access book presents thirteen outstanding doctoral dissertations in Information Technology from the Department of Electronics, Information and Bioengineering, Politecnico di Milano, Italy. Information Technology has always been highly interdisciplinary, as many aspects have to be considered in IT systems. The doctoral studies program in IT at Politecnico di Milano emphasizes this interdisciplinary nature, which is becoming more and more important in recent technological advances, in collaborative projects, and in the education of young researchers. Accordingly, the focus of advanced research is on pursuing a rigorous approach to specific research topics starting from a broad background in various areas of Information Technology, especially Computer Science and Engineering, Electronics, Systems and Control, and Telecommunications. Each year, more than 50 PhDs graduate from the program. This book gathers the outcomes of the thirteen best theses defended in 2020-21 and selected for the IT PhD Award. Each of the authors provides a chapter summarizing his/her findings, including an introduction, description of methods, main achievements and future work on the topic. Hence, the book provides a cutting-edge overview of the latest research trends in Information Technology at Politecnico di Milano, presented in an easy-to-read format that will also appeal to non-specialists

    Gene expression in diapausing rotifer eggs in response to divergent environmental predictability regimes

    Get PDF
    In unpredictable environments in which reliable cues for predicting environmental variation are lacking, a diversifying bet-hedging strategy for diapause exit is expected to evolve, whereby only a portion of diapausing forms will resume development at the first occurrence of suitable conditions. This study focused on diapause termination in the rotifer Brachionus plicatilis s.s., addressing the transcriptional profile of diapausing eggs from environments differing in the level of predictability and the relationship of such profiles with hatching patterns. RNA-Seq analyses revealed significant differences in gene expression between diapausing eggs produced in the laboratory under combinations of two contrasting selective regimes of environmental fluctuation (predictable vs unpredictable) and two different diapause conditions (passing or not passing through forced diapause). The results showed that the selective regime was more important than the diapause condition in driving differences in the transcriptome profile. Most of the differentially expressed genes were upregulated in the predictable regime and mostly associated with molecular functions involved in embryo morphological development and hatching readiness. This was in concordance with observations of earlier, higher, and more synchronous hatching in diapausing eggs produced under the predictable regime

    A study of intrinsic disorder and its role in functional proteomics

    Get PDF
    Thesis (Ph.D.) - Indiana University, Informatics, 2009The last decade has witnessed the emergence of an alternate view on how protein function arises. This view attributes the functionality of many proteins to the presence of an ensemble of flexible regions popularly as `intrinsically disordered' or `unstructured'. Several proteomic studies have corroborated the existence of either wholly disordered proteins or proteins that contain regions of disorder in them. The purpose of this dissertation was to investigate the consistency of such regions across experiments, their mechanism of facilitating function via disorder-to-order transitions, their presence and significance in pathogenic versus non-pathogenic organisms and their promise of applicability towards the computational prediction of peptides involved in the most common class of post-translational modifications, phosphorylation. Besides these, a new algorithm exploiting the strong correlation between phosphorylation and intrinsic disorder has also been proposed to improve the detection of phosphorylated peptides via high-throughput methods such as tandem mass-spectrometry (LC-MS/MS). Results presented in this study, guide us in understanding the robustness of unstructured regions in proteins to sequence changes and environment, their role in facilitating molecular recognition as well as improving currently available methods for identification of post-translationally modified peptides. The findings and conclusions of this dissertation have the potential to impact ongoing structural genomics initiatives by suggesting alternative methods for determining structure for targets containing regions of disorder. Additional ramifications of results from this work include directing attention towards the possible use of regions of intrinsic disorder by pathogenic organisms for host cell invasion. We believe that unlike the traditional reductionist approach in a scientific method, this study gathers strength and utility by investigating the role of intrinsic disorder on more than one front in order to provide a novel perspective to the understanding of complex interactions within biological systems. Concluding arguments presented in this study pique one's curiosity regarding the evolution of disordered regions and proteins in general. On a technological side, the findings from this study unequivocally support the viable use of informatics methods in gaining new insights about a relatively young class of proteins known as intrinsically disordered proteins and its applicability to improve our present knowledge of cellular physiology

    Splicing and the Evolution of Proteins in Mammals

    Get PDF
    It is often supposed that a protein's rate of evolution and its amino acid content are determined by the function and anatomy of the protein. Here we examine an alternative possibility, namely that the requirement to specify in the unprocessed RNA, in the vicinity of intron–exon boundaries, information necessary for removal of introns (e.g., exonic splice enhancers) affects both amino acid usage and rates of protein evolution. We find that the majority of amino acids show skewed usage near intron–exon boundaries, and that differences in the trends for the 2-fold and 4-fold blocks of both arginine and leucine show this to be owing to effects mediated at the nucleotide level. More specifically, there is a robust relationship between the extent to which an amino acid is preferred/avoided near boundaries and its enrichment/paucity in splice enhancers. As might then be expected, the rate of evolution is lowest near intron–exon boundaries, at least in part owing to splice enhancers, such that domains flanking intron–exon junctions evolve on average at under half the rate of exon centres from the same gene. In contrast, the rate of evolution of intronless retrogenes is highest near the domains where intron–exon junctions previously resided. The proportion of sequence near intron–exon boundaries is one of the stronger predictors of a protein's rate of evolution in mammals yet described. We conclude that after intron insertion selection favours modification of amino acid content near intron–exon junctions, so as to enable efficient intron removal, these changes then being subject to strong purifying selection even if nonoptimal for protein function. Thus there exists a strong force operating on protein evolution in mammals that is not explained directly in terms of the biology of the protein

    IST Austria Thesis

    Get PDF
    Horizontal gene transfer (HGT), the lateral acquisition of genes across existing species boundaries, is a major evolutionary force shaping microbial genomes that facilitates adaptation to new environments as well as resistance to antimicrobial drugs. As such, understanding the mechanisms and constraints that determine the outcomes of HGT events is crucial to understand the dynamics of HGT and to design better strategies to overcome the challenges that originate from it. Following the insertion and expression of a newly transferred gene, the success of an HGT event will depend on the fitness effect it has on the recipient (host) cell. Therefore, predicting the impact of HGT on the genetic composition of a population critically depends on the distribution of fitness effects (DFE) of horizontally transferred genes. However, to date, we have little knowledge of the DFE of newly transferred genes, and hence little is known about the shape and scale of this distribution. It is particularly important to better understand the selective barriers that determine the fitness effects of newly transferred genes. In spite of substantial bioinformatics efforts to identify horizontally transferred genes and selective barriers, a systematic experimental approach to elucidate the roles of different selective barriers in defining the fate of a transfer event has largely been absent. Similarly, although the fact that environment might alter the fitness effect of a horizontally transferred gene may seem obvious, little attention has been given to it in a systematic experimental manner. In this study, we developed a systematic experimental approach that consists of transferring 44 arbitrarily selected Salmonella typhimurium orthologous genes into an Escherichia coli host, and estimating the fitness effects of these transferred genes at a constant expression level by performing competition assays against the wild type. In chapter 2, we performed one-to-one competition assays between a mutant strain carrying a transferred gene and the wild type strain. By using flow cytometry we estimated selection coefficients for the transferred genes with a precision level of 10-3,and obtained the DFE of horizontally transferred genes. We then investigated if these fitness effects could be predicted by any of the intrinsic properties of the genes, namely, functional category, degree of complexity (protein-protein interactions), GC content, codon usage and length. Our analyses revealed that the functional category and length of the genes act as potential selective barriers. Finally, using the same procedure with the endogenous E. coli orthologs of these 44 genes, we demonstrated that gene dosage is the most prominent selective barrier to HGT. In chapter 3, using the same set of genes we investigated the role of environment on the success of HGT events. Under six different environments with different levels of stress we performed more complex competition assays, where we mixed all 44 mutant strains carrying transferred genes with the wild type strain. To estimate the fitness effects of genes relative to wild type we used next generation sequencing. We found that the DFEs of horizontally transferred genes are highly dependent on the environment, with abundant gene–by-environment interactions. Furthermore, we demonstrated a relationship between average fitness effect of a gene across all environments and its environmental variance, and thus its predictability. Finally, in spite of the fitness effects of genes being highly environment-dependent, we still observed a common shape of DFEs across all tested environments
    corecore