27,299 research outputs found
Improved Core Genes Prediction for Constructing well-supported Phylogenetic Trees in large sets of Plant Species
The way to infer well-supported phylogenetic trees that precisely reflect the
evolutionary process is a challenging task that completely depends on the way
the related core genes have been found. In previous computational biology
studies, many similarity based algorithms, mainly dependent on calculating
sequence alignment matrices, have been proposed to find them. In these kinds of
approaches, a significantly high similarity score between two coding sequences
extracted from a given annotation tool means that one has the same genes. In a
previous work article, we presented a quality test approach (QTA) that improves
the core genes quality by combining two annotation tools (namely NCBI, a
partially human-curated database, and DOGMA, an efficient annotation algorithm
for chloroplasts). This method takes the advantages from both sequence
similarity and gene features to guarantee that the core genome contains correct
and well-clustered coding sequences (\emph{i.e.}, genes). We then show in this
article how useful are such well-defined core genes for biomolecular
phylogenetic reconstructions, by investigating various subsets of core genes at
various family or genus levels, leading to subtrees with strong bootstraps that
are finally merged in a well-supported supertree.Comment: 12 pages, 7 figures, IWBBIO 2015 (3rd International Work-Conference
on Bioinformatics and Biomedical Engineering
How to infer relative fitness from a sample of genomic sequences
Mounting evidence suggests that natural populations can harbor extensive
fitness diversity with numerous genomic loci under selection. It is also known
that genealogical trees for populations under selection are quantifiably
different from those expected under neutral evolution and described
statistically by Kingman's coalescent. While differences in the statistical
structure of genealogies have long been used as a test for the presence of
selection, the full extent of the information that they contain has not been
exploited. Here we shall demonstrate that the shape of the reconstructed
genealogical tree for a moderately large number of random genomic samples taken
from a fitness diverse, but otherwise unstructured asexual population can be
used to predict the relative fitness of individuals within the sample. To
achieve this we define a heuristic algorithm, which we test in silico using
simulations of a Wright-Fisher model for a realistic range of mutation rates
and selection strength. Our inferred fitness ranking is based on a linear
discriminator which identifies rapidly coalescing lineages in the reconstructed
tree. Inferred fitness ranking correlates strongly with actual fitness, with a
genome in the top 10% ranked being in the top 20% fittest with false discovery
rate of 0.1-0.3 depending on the mutation/selection parameters. The ranking
also enables to predict the genotypes that future populations inherit from the
present one. While the inference accuracy increases monotonically with sample
size, samples of 200 nearly saturate the performance. We propose that our
approach can be used for inferring relative fitness of genomes obtained in
single-cell sequencing of tumors and in monitoring viral outbreaks
Approximate Two-Party Privacy-Preserving String Matching with Linear Complexity
Consider two parties who want to compare their strings, e.g., genomes, but do
not want to reveal them to each other. We present a system for
privacy-preserving matching of strings, which differs from existing systems by
providing a deterministic approximation instead of an exact distance. It is
efficient (linear complexity), non-interactive and does not involve a third
party which makes it particularly suitable for cloud computing. We extend our
protocol, such that it mitigates iterated differential attacks proposed by
Goodrich. Further an implementation of the system is evaluated and compared
against current privacy-preserving string matching algorithms.Comment: 6 pages, 4 figure
Alkane hydroxylase genes in psychrophile genomes and the potential for cold active catalysis.
BackgroundPsychrophiles are presumed to play a large role in the catabolism of alkanes and other components of crude oil in natural low temperature environments. In this study we analyzed the functional diversity of genes for alkane hydroxylases, the enzymes responsible for converting alkanes to more labile alcohols, as found in the genomes of nineteen psychrophiles for which alkane degradation has not been reported. To identify possible mechanisms of low temperature optimization we compared putative alkane hydroxylases from these psychrophiles with homologues from nineteen taxonomically related mesophilic strains.ResultsSeven of the analyzed psychrophile genomes contained a total of 27 candidate alkane hydroxylase genes, only two of which are currently annotated as alkane hydroxylase. These candidates were mostly related to the AlkB and cytochrome p450 alkane hydroxylases, but several homologues of the LadA and AlmA enzymes, significant for their ability to degrade long-chain alkanes, were also detected. These putative alkane hydroxylases showed significant differences in primary structure from their mesophile homologues, with preferences for specific amino acids and increased flexibility on loops, bends, and α-helices.ConclusionA focused analysis on psychrophile genomes led to discovery of numerous candidate alkane hydroxylase genes not currently annotated as alkane hydroxylase. Gene products show signs of optimization to low temperature, including regions of increased flexibility and amino acid preferences typical of psychrophilic proteins. These findings are consistent with observations of microbial degradation of crude oil in cold environments and identify proteins that can be targeted in rate studies and in the design of molecular tools for low temperature bioremediation
Joint assembly and genetic mapping of the Atlantic horseshoe crab genome reveals ancient whole genome duplication
Horseshoe crabs are marine arthropods with a fossil record extending back
approximately 450 million years. They exhibit remarkable morphological
stability over their long evolutionary history, retaining a number of ancestral
arthropod traits, and are often cited as examples of "living fossils." As
arthropods, they belong to the Ecdysozoa}, an ancient super-phylum whose
sequenced genomes (including insects and nematodes) have thus far shown more
divergence from the ancestral pattern of eumetazoan genome organization than
cnidarians, deuterostomes, and lophotrochozoans. However, much of ecdysozoan
diversity remains unrepresented in comparative genomic analyses. Here we use a
new strategy of combined de novo assembly and genetic mapping to examine the
chromosome-scale genome organization of the Atlantic horseshoe crab Limulus
polyphemus. We constructed a genetic linkage map of this 2.7 Gbp genome by
sequencing the nuclear DNA of 34 wild-collected, full-sibling embryos and their
parents at a mean redundancy of 1.1x per sample. The map includes 84,307
sequence markers and 5,775 candidate conserved protein coding genes. Comparison
to other metazoan genomes shows that the L. polyphemus genome preserves
ancestral bilaterian linkage groups, and that a common ancestor of modern
horseshoe crabs underwent one or more ancient whole genome duplications (WGDs)
~ 300 MYA, followed by extensive chromosome fusion
- …