81 research outputs found
From genotypes to organisms: State-of-the-art and perspectives of a cornerstone in evolutionary dynamics
Understanding how genotypes map onto phenotypes, fitness, and eventually
organisms is arguably the next major missing piece in a fully predictive theory
of evolution. We refer to this generally as the problem of the
genotype-phenotype map. Though we are still far from achieving a complete
picture of these relationships, our current understanding of simpler questions,
such as the structure induced in the space of genotypes by sequences mapped to
molecular structures, has revealed important facts that deeply affect the
dynamical description of evolutionary processes. Empirical evidence supporting
the fundamental relevance of features such as phenotypic bias is mounting as
well, while the synthesis of conceptual and experimental progress leads to
questioning current assumptions on the nature of evolutionary dynamics-cancer
progression models or synthetic biology approaches being notable examples. This
work delves into a critical and constructive attitude in our current knowledge
of how genotypes map onto molecular phenotypes and organismal functions, and
discusses theoretical and empirical avenues to broaden and improve this
comprehension. As a final goal, this community should aim at deriving an
updated picture of evolutionary processes soundly relying on the structural
properties of genotype spaces, as revealed by modern techniques of molecular
and functional analysis.Comment: 111 pages, 11 figures uses elsarticle latex clas
Determinants of CRISPR array non-canonical adaptation mechanism
CRISPR-cas systems are incredibly diverse and currently are classified in six major types
and over 30 subtypes. Apart from their role in adaptive immunity it has been shown that
some of the CRISPR-cas subtypes are also involved in host gene regulation and even in
collateral damage leading to bacteriostatic or lethal outcomes for the host. CRISPR array
spacers direct and influence canonical and non-canonical functions of the CRISPR-cas
system together with subtype Cas proteins. Better understanding of spacer adaptation
mechanisms is crucial for uncovering intricacies of evolutionary arms race between
prokaryotes and phages.
Here we present large-scale analysis of CRISPR array spacers originating from 31845
complete bacterial genomes. All bacterial and 16388 viral genomes were retrieved using
NCBI datasets API. CRISPRidentify and CRISPRcasIdentifier tools were used for CRISPR
array, Cas genes detection and subtyping. Viral genomes were mapped to their hosts
using the latest version of the Virus-Host DB. Mapping was performed on the genus level
of the hosts phylogenetic tree. Gumbel extreme value distribution was used to determine
statistical significance of each spacer Smith-Waterman alignment score.
Differences in melting energy and GC content between identified spacers, origin bacterial
genomes and infecting bacteriophages were explored for different CRISPR-cas subtypes
and for different bacterial genera. Spacers from the extremes of the GC content distribution
were aligned to the origin bacterial and infecting phage genomes in order to determine
their origin.
GC content of the spacers was lesser than the GC content of the source bacterial genome
but greater than infecting viral genome. This observation aligns with the hypothesis that
the majority of CRISPR spacers were adapted from the bacteriophage genomes and serve
canonical function. Alignments of the spacers from GC rich distribution tail have shown
their preferential targeting of host genomes which further supports the hypothesis that
GC rich spacers originated from the bacterial genome and have non-canonical function.Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202
Analysis of High-Throughput Data - Protein-Protein Interactions, Protein Complexes and RNA Half-life
The development of high-throughput techniques has lead to a paradigm change in biology from the small-scale analysis of individual genes and proteins to a genome-scale analysis of biological systems. Proteins and genes can now be studied in their interaction with each other and the cooperation within multi-subunit protein complexes can be investigated. Moreover, time-dependent dynamics and regulation of these processes and associations can now be explored by monitoring mRNA changes and turnover. The in-depth analysis of these large and complex data sets would not be possible
without sophisticated algorithms for integrating different data sources, identifying interesting patterns in the data
and addressing the high variability and error rates in biological measurements. In this thesis, we developed such methods for the investigation of protein interactions and complexes and the corresponding regulatory processes.
In the first part, we analyze networks of physical protein-protein interactions measured in large-scale experiments. We show that the topology of the complete interactomes can be confidently extrapolated despite high numbers of missing and wrong interactions from only partial measurements of interaction networks. Furthermore, we find that the structure and stability of protein interaction networks is not only influenced by the degree distribution of the network but also considerably by the suppression or propagation of interactions between highly connected proteins. As analysis of network topology is generally focused on large eukaryotic networks, we developed new methods to analyze smaller networks of intraviral and virus-host interactions. By comparing interactomes of related herpesviral species, we could detect a conserved core of protein interactions and could address the low coverage of the yeast two-hybrid system. In addition, common strategies in the interaction of the viruses with the host cell were identified.
New affinity purification methods now make it possible to directly study associations of proteins in complexes. Due to experimental errors the individual protein complexes have to be predicted with computational methods from these purification results. As previously published methods relied more or less heavily on existing knowledge on complexes, we developed an unsupervised prediction algorithm which is independent from such additional data. Using this approach, high-quality protein complexes can be identified from the raw purification data alone for any species purification experiments are performed. To identify the direct, physical interactions within these predicted complexes and their subcomponent structure, we describe a new approach to extract the highest scoring subnetwork connecting the complex and interactions not explained by alternative paths of indirect interactions. In this way, important interactions within the complexes can be identified and their substructure can be resolved in a straightforward way.
To explore the regulation of proteins and complexes, we analyzed microarray measurements of mRNA abundance, de novo transcription and decay. Based on the relationship between newly transcribed, pre-existing and total RNA,
transcript half-life can be estimated for individual genes using a new microarray normalization method and a quality control can be applied. We show that precise measurements of RNA half-life can be obtained from de novo transcription which are of superior accuracy to previously published results from RNA decay. Using such precise measurements, we studied RNA half-lives in human B-cells and mouse fibroblasts to identify conserved patterns governing RNA turnover. Our results show that transcript half-lives are strongly conserved and specifically correlated to gene function. Although transcript half-life is highly similar in protein complexes and \mbox{families}, individual proteins may deviate significantly from the remaining complex subunits or family members to efficiently support the regulation of protein complexes or to create non-redundant roles of functionally similar proteins.
These results illustrate several of the many ways in which high-throughput measurements lead to a better understanding
of biological systems. By studying large-scale measure\-ments in this thesis, the structure of protein interaction networks and protein complexes could be better characterized, important interactions and conserved strategies for herpes\-viral infection could be identified and interesting insights could be gained into the regulation of important biological processes and protein complexes. This was made possible by the development of novel algorithms and analysis approaches which will also be valuable for further research on these topics
Can we use biobanks to study infectious diseases?
Understanding the molecular and environmental basis of diseases in order to improve
diagnosis and treatment represent a top priority for researchers. Much of the progress
occurred following the growth of various omics technologies and the IT progress
in developing large electronic databases capable of storing huge amounts of data.
Biobanks represents the most valuable resource for personalized medicine as these
are the large collection of various patient samples with well-annotated clinical data
which strive to identify possible links between genetic predisposition and disease. A
significant step forward are biobanks that are linked to the electronic health records of
each participant enabling up-to-date source of relevant medical information and those
“deeply phenotyped” for various other omics data, such as microbiome, epigenome,
transcriptome, metabolome and proteome.
Since infectious diseases still represent a huge threat to global human health, and host
genetic factors have been implied as determining risk factors for observed variations in
disease susceptibility, severity, and outcome, during this lecture we will discuss challenges
and opportunities of using biobanks as a potential source to study infectious diseases
based on the case example of isolated population-based longitudinal biobank “10,001
Dalmatians”. Results of a genome-wide association meta-analyses of 14 different
infectious-related phenotypes identified 29 infection-related genetic associations, most
belonging to rare variants, all of which have a role in immune response. These findings
support the concept that host genetic susceptibility to bacterial and viral infections in
adults is polygenic, where common variations have very low explained variance and/or
“unfortunate” combinations of numerous rare variants. Expanding our understanding
of rare variants may help in the construction of genetic panels which might predict an
individual’s lifetime vulnerability to major infectious diseases. Furthermore, longitudinal
biobanks are a valuable source of data for discovering host genetic variations involved
in infectious disease susceptibility and severity. Because infectious diseases continue to
exert selective pressure on our genomes, a global network of biobanks with access to
genetic and environmental data is required to further explain complicated mechanisms
underlying host-pathogen interactions and infectious disease vulnerability.Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202
- …