7,794 research outputs found
Recommended from our members
Evolution of substrate specificity in a retained enzyme driven by gene loss.
The connection between gene loss and the functional adaptation of retained proteins is still poorly understood. We apply phylogenomics and metabolic modeling to detect bacterial species that are evolving by gene loss, with the finding that Actinomycetaceae genomes from human cavities are undergoing sizable reductions, including loss of L-histidine and L-tryptophan biosynthesis. We observe that the dual-substrate phosphoribosyl isomerase A or priA gene, at which these pathways converge, appears to coevolve with the occurrence of trp and his genes. Characterization of a dozen PriA homologs shows that these enzymes adapt from bifunctionality in the largest genomes, to a monofunctional, yet not necessarily specialized, inefficient form in genomes undergoing reduction. These functional changes are accomplished via mutations, which result from relaxation of purifying selection, in residues structurally mapped after sequence and X-ray structural analyses. Our results show how gene loss can drive the evolution of substrate specificity from retained enzymes
BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction
A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN
Topology Discovery of Sparse Random Graphs With Few Participants
We consider the task of topology discovery of sparse random graphs using
end-to-end random measurements (e.g., delay) between a subset of nodes,
referred to as the participants. The rest of the nodes are hidden, and do not
provide any information for topology discovery. We consider topology discovery
under two routing models: (a) the participants exchange messages along the
shortest paths and obtain end-to-end measurements, and (b) additionally, the
participants exchange messages along the second shortest path. For scenario
(a), our proposed algorithm results in a sub-linear edit-distance guarantee
using a sub-linear number of uniformly selected participants. For scenario (b),
we obtain a much stronger result, and show that we can achieve consistent
reconstruction when a sub-linear number of uniformly selected nodes
participate. This implies that accurate discovery of sparse random graphs is
tractable using an extremely small number of participants. We finally obtain a
lower bound on the number of participants required by any algorithm to
reconstruct the original random graph up to a given edit distance. We also
demonstrate that while consistent discovery is tractable for sparse random
graphs using a small number of participants, in general, there are graphs which
cannot be discovered by any algorithm even with a significant number of
participants, and with the availability of end-to-end information along all the
paths between the participants.Comment: A shorter version appears in ACM SIGMETRICS 2011. This version is
scheduled to appear in J. on Random Structures and Algorithm
Cultural Phylogenetics of the Tupi Language Family in Lowland South America
Background: Recent advances in automated assessment of basic vocabulary lists allow the construction of linguistic phylogenies useful for tracing dynamics of human population expansions, reconstructing ancestral cultures, and modeling transition rates of cultural traits over time. Methods: Here we investigate the Tupi expansion, a widely-dispersed language family in lowland South America, with a distance-based phylogeny based on 40-word vocabulary lists from 48 languages. We coded 11 cultural traits across the diverse Tupi family including traditional warfare patterns, post-marital residence, corporate structure, community size, paternity beliefs, sibling terminology, presence of canoes, tattooing, shamanism, men’s houses, and lip plugs. Results/Discussion: The linguistic phylogeny supports a Tupi homeland in west-central Brazil with subsequent major expansions across much of lowland South America. Consistently, ancestral reconstructions of cultural traits over the linguistic phylogeny suggest that social complexity has tended to decline through time, most notably in the independent emergence of several nomadic hunter-gatherer societies. Estimated rates of cultural change across the Tupi expansion are on the order of only a few changes per 10,000 years, in accord with previous cultural phylogenetic results in other languag
Evolutionary dynamics of structural features
Structural features have the potential to push the time barrier, after which we cannot test hypotheses about relatedness of languages, back in time. However, we have to know the stability of structural features in order to be able to apply them for such purposes. In this thesis I describe the typological profile of the Transeurasian languages, which serve as a data sample for the analysis of stability, build a phylogenetic tree with these languages, measure the stability of structural features as phylogenetic signal and evolutionary rate, reconstruct ancestral states of structural features and apply an admixture model from population genetics to test the performance of phonological, morphological and syntactic features in assigning languages to their respective language families and to investigate the level of diffusion in these three feature sets. More than half of structural features appear to have a high phylogenetic signal and evolve at a slow rate. I compare the stability across functional categories, parts of speech and language levels and come to a conclusion that argument marking (flagging and indexing), derivation and valency are the most stable functional categories, pronouns and nouns the most stable parts of speech and phonology and morphology the most stable language levels. The admixture model as implemented in STRUCTURE is able to correctly identify Turkic, Mongolic and Tungusic language families at the levels of morphology and syntax, whereas Japonic and Koreanic languages are assigned to the same ancestry. We see the least amount of admixture at the level of morphology and the highest level of admixture in syntactic features. One of the most important insights is that morphological features carry the most genealogical information, and these features could be used in the future to test relationships above the language family level
A Fast-Graph Approach to Modeling Similarity of Whole Genomes
As increasing numbers of closely related genomic sequences become available, the need to develop methods for detecting fine differences among them also grows apparent. Several calls have been made for improved algorithms to exploit the wealth of pathogenic viral and bacterial sequence data that are rapidly becoming available to researchers. The first stage of our research addresses the computational limitations associated with whole-genome comparisons of large numbers of subspecies sequences. We investigate the potential for the use of fast, word-based comparative measures to approximate computationally expensive, full alignment comparison methods. Recent advances in next generation sequencing are providing a number of large whole-genome sequence datasets stemming from globally distributed disease occurrences. This offers an unprecedented opportunity for epidemiological studies and the development of computationally efficient, robust tools for such studies. In the second stage of our research, we present an approach that enables a quick, effective, and robust epidemiological analysis of large whole-genome datasets. We then apply our method to a complex dataset of over 4,200 globally sampled Influenza A virus isolates from multiple host types, subtypes and years. These sequences are compared using an alignment-free method that runs in linear-time. These comparisons enable us to build 2-dimensional graphs that represent the relationships between sequences, where sequences are viewed as vertices, and high-degree sequence similarity as edges. These graphs prove useful, as they are able to model potential disease transmission paths when applied to viral sequences. Mixing patterns are then used to study the occurrence and patterns of edges between different types of sequence groups, such as the host type and year of collection, to better understand the potential of genotypic transfer between sequence groups
- …