201 research outputs found

    Darwin and Fisher meet at biotech : on the potential of computational molecular evolution in industry

    Get PDF
    Today computational molecular evolution is a vibrant research field that benefits from the availability of large and complex new generation sequencing data - ranging from full genomes and proteomes to microbiomes, metabolomes and epigenomes. The grounds for this progress were established long before the discovery of the DNA structure. Specifically, Darwin's theory of evolution by means of natural selection not only remains relevant today, but also provides a solid basis for computational research with a variety of applications. But a long-term progress in biology was ensured by the mathematical sciences, as exemplified by Sir R. Fisher in early 20th century. Now this is true more than ever: The data size and its complexity require biologists to work in close collaboration with experts in computational sciences, modeling and statistics

    Joint alignmnet and phylogeny for large genomics data

    Get PDF

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    Phylogenetics of tandem repeats with circular HMMs : a case study on Armadillo Repeat Proteins

    Get PDF

    Accounting for programmed ribosomal frameshifting in the computation of codon usage bias indices

    Get PDF
    Experimental evidence shows that synonymous mutations can have important consequences on genetic fitness. Many organisms display codon usage bias (CUB), where synonymous codons that are translated into the same amino acid appear with distinct frequency. CUB is thought to arise from selection for translational efficiency and accuracy, termed the translational efficiency hypothesis (TEH). Indeed, CUB indices correlate with protein expression levels, which is widely interpreted as evidence for translational selection. However, these tests neglect -1 programmed ribosomal frameshifting (-1 PRF), an important translational disruption effect found across all organisms of the tree of life. Genes that contain -1 PRF signals should cost more to express than genes without. Thus, CUB indices that do not consider -1 PRF may overestimate genes' true adaptation to translational efficiency and accuracy constraints. Here, we first investigate whether -1 PRF signals do indeed carry such translational cost. We then propose two corrections for CUB indices for genes containing -1 PRF signals. We retest the TEH under these corrections. We find that the correlation between corrected CUB index and protein expression remains intact for most levels of uniform -1 PRF efficiencies, and tends to increase when these efficiencies decline with protein expression. We conclude that the TEH is strengthened and that -1 PRF events constitute a promising and useful tool to examine the relationships between CUB and selection for translation efficiency and accuracy

    Phylogenetics reveals competition of human flu subtypes

    Get PDF

    Phylogentics of tandem repeats with circular HMMs : a case study on Armadillo Repeat Proteins

    Get PDF

    РІЗНОМАНІТНІСТЬ ВЕРБАЛІЗАЦІЇ КОНЦЕПТУ UNIVERSITY В АНГЛОМОВНОМУ ОСВІТНЬОМУ ДИСКУРСІ

    Get PDF
    This article discusses the diversity in verbalization of the concept UNIVERSITY in the English educational discourse. The definition of the notions “concept”, “educational discourse” have been revealed through cognitive linguistics. The analysis is based on the most common variants of the English language – British and American. The research has been conducted on the basis of the educational discourse of the leading universities of the UK and the USA, namely, 5 leading universities of the UK and 5 universities of the USA. In order to study the diversity in verbalization of the concept UNIVERSITY a frame structure has been chosen. This structure fully reflects the lexica‑and‑semantic features of the concept under study. The article presents a schematic view of the concept UNIVERSITY, where its components has been highlighted: subframes, slots and subslots. When examined the concept UNIVERSITY, component and conceptual analysis has been used, so that the vocabulary definitions of the verbalizers of the concept under study have been compared. The analysis has shown that the lexical unit “university” is a concept that includes a whole range of characteristics and associations. Different lexical units presented in the educational discourse have been examined on the basis of English-speakers’ perception. It has been stated that lexical‑and‑semantic structure of the concept UNIVERSITY is quite a complex and developed one. Significant differences in the use of lexical units that actualize the concept UNIVERSITY has been considered. The reason for this difference are linguocultural as well as historical features of the development of the two variants of the English language.Рассмотрено разнообразие вербализации концепта UNIVERSITY в англоязычном образовательном дискурсе. Предоставлено дефиницию понятиям «концепт», «образовательный дискурс», которые были рассмотрены сквозь призму когнитивной лингвистики. Проведен анализ на основании наиболее распространенных вариантов английского языка – британском варианте и американском варианте. Исследование проводилось в рамках образовательного дискурса, где было проанализировано по пять вузов Великобритании и США. Для исследования разнообразия вербализации концепта UNIVERSITY была выбрана фреймовая структура, которая наиболее полно отражает лексико-семантические особенности исследуемого концепта. В данном исследовании предоставлено схематическое видение концепта UNIVERSITY, в котором было выделено его составляющие: субфреймы, слоты и субслоты. Так как при исследовании концепта UNIVERSITY был использован компонентный и концептуальный анализ, то было предоставлено сравнение словарных дефиниций вербализаторов исследуемого концепта. Анализ исследуемого концепта показал, что лексическая единица university – это понятие, которое включает в себя целый ряд характеристик и ассоциаций. Лексические единицы, представленные в образовательном дискурсе, были проанализированы на основе их восприятия носителями английского языка. Установлено, что лексико-семантическая структура концепта UNIVERSITY является достаточно сложной и развитой. Во время исследования концепта были рассмотрены существенные различия в употреблении лексических единиц, которые актуализируют концепт UNIVERSITY. Причиной данной разницы являются лингвокультурные, а также исторические особенности развития двух исследуемых вариантов английского языка.Розглянуто різноманітність вербалізації концепту UNIVERSITY в англомовному освітньому дискурсі. Надано дефініцію поняттям «концепт», «освітній дискурс», які були розглянуті через призму когнітивної лінгвістики. Проведено аналіз на підставі найбільш поширених варіантів англійської мови – британської англійської та американської англійської. Дослідження проводилося на основі освітнього дискурсу, в якому було проаналізовано по п’ять ВНЗ Великої Британії та Сполучених Штатів Америки. Для дослідження різноманітності вербалізації концепту UNIVERSITY було обрано фреймову структуру, яка найповніше відображає лексико-семантичні особливості досліджуваного концепту. Надано схематичне бачення концепту UNIVERSITY, в якому було виділено його складові: субфрейми, слоти та субслоти. Так як під час дослідження концепту UNIVERSITY був використаний компонентний та концептуальний аналіз, то було порівняно словникові дефініції вербалізаторів досліджуваного концепту. Аналіз досліджуваного концепту показав, що лексична одиниця university – це поняття, яке включає в себе цілий ряд характеристик та асоціацій. Лексичні одиниці, що представлені в освітньому дискурсі, були проаналізовані на основі їхнього сприйняття носіями англійської мови. Встановлено, що лексико-семантична структура концепту UNIVERSITY є досить складною та розвиненою. Під час дослідження концепту були розглянуті суттєві відмінності у вживанні лексичних одиниць, які актуалізують концепт UNIVERSITY. Причиною даної різниці є лінгвокультурні, а також історичні особливості розвитку двох досліджуваних варіантів англійської мови

    A new census of protein tandem repeats and their relationship with intrinsic disorder

    Get PDF
    Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence

    Graph-based modeling of tandem repeats improves global multiple sequence alignment

    Get PDF
    Tandem repeats (TRs) are often present in proteins with crucial functions, responsible for resistance, pathogenicity and associated with infectious or neurodegenerative diseases. This motivates numerous studies of TRs and their evolution, requiring accurate multiple sequence alignment. TRs may be lost or inserted at any position of a TR region by replication slippage or recombination, but current methods assume fixed unit boundaries, and yet are of high complexity. We present a new global graph-based alignment method that does not restrict TR unit indels by unit boundaries. TR indels are modeled separately and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events but also all changes in TR regions owing to recombination, strand slippage and other events inserting or deleting TR units. We evaluate our method by simulation incorporating TR evolution, by either sampling TRs from a profile hidden Markov model or by mimicking strand slippage with duplications. The new method is illustrated on a family of type III effectors, a pathogenicity determinant in agriculturally important bacteria Ralstonia solanacearum. We show that TR indel rate variation contributes to the diversification of this protein famil
    corecore