481 research outputs found
SATCHMO-JS: a webserver for simultaneous protein multiple sequence alignment and phylogenetic tree construction.
We present the jump-start simultaneous alignment and tree construction using hidden Markov models (SATCHMO-JS) web server for simultaneous estimation of protein multiple sequence alignments (MSAs) and phylogenetic trees. The server takes as input a set of sequences in FASTA format, and outputs a phylogenetic tree and MSA; these can be viewed online or downloaded from the website. SATCHMO-JS is an extension of the SATCHMO algorithm, and employs a divide-and-conquer strategy to jump-start SATCHMO at a higher point in the phylogenetic tree, reducing the computational complexity of the progressive all-versus-all HMM-HMM scoring and alignment. Results on a benchmark dataset of 983 structurally aligned pairs from the PREFAB benchmark dataset show that SATCHMO-JS provides a statistically significant improvement in alignment accuracy over MUSCLE, Multiple Alignment using Fast Fourier Transform (MAFFT), ClustalW and the original SATCHMO algorithm. The SATCHMO-JS webserver is available at http://phylogenomics.berkeley.edu/satchmo-js. The datasets used in these experiments are available for download at http://phylogenomics.berkeley.edu/satchmo-js/supplementary/
Ortholog identification in the presence of domain architecture rearrangement
Ortholog identification is used in gene functional annotation, species phylogeny estimation, phylogenetic profile construction and many other analyses. Bioinformatics methods for ortholog identification are commonly based on pairwise protein sequence comparisons between whole genomes. Phylogenetic methods of ortholog identification have also been developed; these methods can be applied to protein data sets sharing a common domain architecture or which share a single functional domain but differ outside this region of homology. While promiscuous domains represent a challenge to all orthology prediction methods, overall structural similarity is highly correlated with proximity in a phylogenetic tree, conferring a degree of robustness to phylogenetic methods. In this article, we review the issues involved in orthology prediction when data sets include sequences with structurally heterogeneous domain architectures, with particular attention to automated methods designed for high-throughput application, and present a case study to illustrate the challenges in this area
Practical considerations for plant phylogenomics
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/143756/1/aps31038_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/143756/2/aps31038.pd
New compound sets identified from high throughput phenotypic screening against three kinetoplastid parasites:an open resource
Using whole-cell phenotypic assays, the GlaxoSmithKline high-throughput screening (HTS) diversity set of 1.8 million compounds was screened against the three kinetoplastids most relevant to human disease, i.e. Leishmania donovani, Trypanosoma cruzi and Trypanosoma brucei. Secondary confirmatory and orthogonal intracellular anti-parasiticidal assays were conducted, and the potential for non-specific cytotoxicity determined. Hit compounds were chemically clustered and triaged for desirable physicochemical properties. The hypothetical biological target space covered by these diversity sets was investigated through bioinformatics methodologies. Consequently, three anti-kinetoplastid chemical boxes of ~200 compounds each were assembled. Functional analyses of these compounds suggest a wide array of potential modes of action against kinetoplastid kinases, proteases and cytochromes as well as potential host–pathogen targets. This is the first published parallel high throughput screening of a pharma compound collection against kinetoplastids. The compound sets are provided as an open resource for future lead discovery programs, and to address important research questions.The support and funding of Tres Cantos Open Lab Foundation is gratefully acknowledgedPeer reviewe
Co-diversification of an intestinal Mycoplasma and its salmonid host
Understanding the evolutionary relationships between a host and its intestinal resident bacteria can transform how we understand adaptive phenotypic traits. The interplay between hosts and their resident bacteria inevitably affects the intestinal environment and, thereby, the living conditions of both the host and the microbiota. Thereby this co-existence likely influences the fitness of both bacteria and host. Whether this co-existence leads to evolutionary co-diversification in animals is largely unexplored, mainly due to the complexity of the environment and microbial communities and the often low host selection. We present the gut metagenome from wild Atlantic salmon (Salmo salar), a new wild organism model with an intestinal microbiota of low complexity and a well-described population structure, making it well-suited for investigating co-evolution. Our data reveal a strong host selection of a core gut microbiota dominated by a single Mycoplasma species. We found a clear co-diversification between the population structure of Atlantic salmon and nucleotide variability of the intestinal Mycoplasma populations conforming to expectations from co-evolution between host and resident bacteria. Our results show that the stable microbiota of Atlantic salmon has evolved with its salmonid host populations while potentially providing adaptive traits to the salmon host populations, including defence mechanisms, biosynthesis of essential amino acids, and metabolism of B vitamins. We highlight Atlantic salmon as a novel model for studying co-evolution between vertebrate hosts and their resident bacteria.publishedVersio
Automatic and manual functional annotation in a distributed web service environment
While the number of genomic sequences becoming available is increasing exponentially, most genes are not functionally well characterized. Finding out more about the function of a gene and about functional relationships between genes will be the next big bottleneck in the post-genomic era. On the one hand improved pipelines and tools are needed in this context, because running experiments for all predicted genes is not feasible. On the other hand manual curation of the automatic predictions is necessary to judge the reliability of the automatic annotation and to get a more comprehensive view on the function of each individual gene. For the automatic functional annotation often a homology based function transfer from functionally characterized genes is applied using methods like Blast. However, this approach has many drawbacks and makes systematic errors by not taking care of speciation and duplication events. Phylogenomics has shown to improve the functional prediction accuracy by taking the evolutionary history of genes in a phylogenetic tree context into account. In this thesis the manual process from the assembly of the DNA sequence to the functional characterization of genes and the identification and comparison of shared syntenic regions, including the identification of candidate genes for pathogen resistance in potato chromosome V, is explained and problems discussed. To improve the automatic functional annotation in genome projects, a phylogenomic pipeline, which includes SIFTER one of the best phylogenomic tools in this area, is introduced, improved and tested in the Medicago truncatula, Sorghum bicolor and Solanum lycopersicum genome projects. To obtain new candidate genes for the development of new drugs and crop protection products, non-plant specific genes, like the transferrin family which is not known in plants yet, are extracted from the M. truncatula and S. bicolor genomes and further investigated. For further improvement of the annotation, a new phylogenomic approach is developed. This approach makes use of annotated functional attributes to calculate the functional mutation rate between genes and groups of genes in a phylogenetic tree and to find out if the function of a gene can be transferred or not. The new approach is integrated into the SIFTER tool and tested on the blue-light photoreceptor/photolyase family and on a test set of manually curated Arabidopsis thaliana genes. Using both test sets the prediction accuracy could be significantly improved and a more comprehensive view on the gene function could be obtained. But because still no tool is able to annotate all functions of a gene with 100% accuracy, I introduce a system for manual functional annotation, called AFAWE. AFAWE runs different web services for the functional annotation and displays the results and intermediate results in a comprehensive web interface that facilitates comparison. It can be used for any organism and any kind of gene. The inputs are the amino acid sequence and the corresponding organism. Because of its flexible structure, new web services and workflows can be easily integrated. Besides Blast searches against different databases and protein domain prediction tools, AFAWE also includes the phylogenomic pipeline. Different filters help to identify trustworthy results from each analysis. Furthermore a detailed manual annotation can be assigned to each protein, which will be used to update the functional annotation in public databases like MIPSPlantsDB
Evolutionary Developmental Leaf Morphology of the Plant Family Araceae
Studying the evolutionary developmental morphology of leaves using next-generation phylogenetics, a candidate gene approach and comparative developmental studies in the plant family Araceae is the overarching theme of the dissertation.
The plant family Araceae is an ancient lineage from the Early Cretaceous and belongs to the monocotyledons. Members of Araceae display striking variation in leaf development; such variation contradicts traditional models of monocot leaf development. Additionally, dissected leaves, which are rare in monocots, seem to have evolved independently multiple times in Araceae by various developmental mechanisms.
Despite extensive efforts to elucidate the evolutionary history of Araceae, phylogenetic ambiguity in the backbone of the tree has precluded answering questions about the early evolution of the family. To depict the sequence of morphological and developmental modifications to leaf ontogeny over time, it is essential to have a strongly supported hypothesis of the evolutionary relationships among species in the family.
To resolve the remaining questions in the deep phylogeny of Araceae a phylogenomic analysis was carried out using next-generation sequencing technology and reference-based assembly of chloroplast and mitochondrial genomes for 37 genera representing 42 of the 44 major clades in the family. Chloroplast sequences produced strongly supported phylogenies in contrast to mitochondrial sequences, which produced poorly supported trees although smaller clades were recovered. The plastid phylogeny obtained from this study is the first for Araceae with a strongly supported backbone and was used for subsequent studies of evolutionary developmental leaf morphology in the family.
Studies of the genetic basis of dissected leaf morphology via blastozone fractionation in plants outside monocots have almost always implicated the action of class I KNOX (KNOX1) genes with one exception - in peas a homolog of the floral meristem gene FLO/LFY is implicated. However, studies of dissected leaf development in monocots, and an examination of the developmental genetics for those monocots that putatively share the blastozone fractionation mechanism are lacking. Two genera in Araceae, Anthurium and Amorphophallus were studied and confirmed to produce lobes and leaflets through blastozone fractionation. To test whether KNOX1 genes are involved in leaf dissection in these genera, immunolocalizations using both a full-length and C-terminus anti-KN1 antibodies were performed on histological sections of developing dissected leaves. KNOX1 protein expression detected by the full-length anti-KN1 antibody and by the C-terminus anti-KN1 antibody was absent and present in developing dissected leaves, respectively. To resolve these conflicting results, an RT-PCR assay was designed to test for the presence of KNOX1 mRNA transcripts during leaf development in Anthurium. Results of the RT-PCR assay support the KNOX1 protein expression pattern seen in immunolocalizations using the C-terminus anti-KN1 antibody. This suggests that monocots share the same genetic mechanism for dissected leaf development with other angiosperms.
Historical models of leaf development posit that structural similarities between monocot and dicot leaves are the result of convergence, although this hypothesis has been contested. Araceae displays both dicot and monocot leaf characters. Previous researchers have remarked on the departure of leaf development in Araceae from traditional models of monocot leaf development. Araceae displays both dicot and monocot leaf characters. To test the hypothesis of a developmentally independent origin of dicot-like leaf characters in monocots, leaf primordium diversity was evaluated in 30 genera of Araceae, along with 36 taxa spanning the angiosperm phylogeny. Leaf primordia were scored for 14 developmental, morphological and anatomical leaf characters. Ancestral character state reconstruction was carried out using the phylogeny obtained from Chapter One, embedded in two contrasting phylogenetic hypotheses of angiosperm evolution. Taxa were plotted in morphospace constructed using the morphological matrix to test whether dicot and monocot leaves occupy similar or different parts of the morphospace. The results of ancestral character state reconstruction and morphospace plotting suggest that at the developmental morphological level, aroid and dicot leaves are homologous. However, at the molecular genetic level, a review of the literature suggests that statements of homology between monocot and dicot leaves must be tested within a framework of the hierarchically organized gene regulatory networks regulating leaf development.
The leaves of Araceae have historically been considered “odd” within monocots. However, the incredible morphological and developmental diversity of leaves in Araceae has provided a powerful study system with which to investigate the unifying aspects of leaf development across angiosperms
Developing and applying supertree methods in Phylogenomics and Macroevolution
Supertrees
can
be
used
to
combine
partially
overalapping
trees
and
generate
more
inclusive
phylogenies.
It
has
been
proposed
that
Maximum
Likelihood
(ML)
supertrees
method
(SM)
could
be
developed
using
an
exponential
probability
distribution
to
model
errors
in
the
input
trees
(given
a
proposed
supertree).
When
the
tree-‐to-‐tree
distances
used
in
the
ML
computation
are
symmetric
differences,
the
ML
SM
has
been
shown
to
be
equivalent
to
a
Majority-‐Rule
consensus
SM,
and
hence,
exactly
as
the
latter,
it
has
the
desirable
property
of
being
a
median
tree
(with
reference
to
the
set
of
input
trees).
The
ability
to
estimate
the
likelihood
of
supertrees,
allows
implementing
Bayesian
(MCMC)
approaches,
which
have
the
advantage
to
allow
the
support
for
the
clades
in
a
supertree
to
be
properly
estimated.
I
present
here
the
L.U.St
software
package;
it
contains
the
first
implementation
of
a
ML
SM
and
allows
for
the
first
time
statistical
tests
on
supertrees.
I
also
characterized
the
first
implementation
of
the
Bayesian
(MCMC)
SM.
Both
the
ML
and
the
Bayesian
(MCMC)
SMs
have
been
tested
for
and
found
to
be
immune
to
biases.
The
Bayesian
(MCMC)
SM
is
applied
to
the
reanalyses
of
a
variety
of
datasets
(i.e.
the
datasets
for
the
Metazoa
and
the
Carnivora),
and
I
have
also
recovered
the
first
Bayesian
supertree-‐based
phylogeny
of
the
Eubacteria
and
the
Archaebacteria.
These
new
SMs
are
discussed,
with
reference
to
other,
well-‐
known
SMs
like
Matrix
Representation
with
Parsimony.
Both
the
ML
and
Bayesian
SM
offer
multiple
attractive
advantages
over
current
alternatives
Developing and applying supertree methods in Phylogenomics and Macroevolution
Supertrees
can
be
used
to
combine
partially
overalapping
trees
and
generate
more
inclusive
phylogenies.
It
has
been
proposed
that
Maximum
Likelihood
(ML)
supertrees
method
(SM)
could
be
developed
using
an
exponential
probability
distribution
to
model
errors
in
the
input
trees
(given
a
proposed
supertree).
When
the
tree-‐to-‐tree
distances
used
in
the
ML
computation
are
symmetric
differences,
the
ML
SM
has
been
shown
to
be
equivalent
to
a
Majority-‐Rule
consensus
SM,
and
hence,
exactly
as
the
latter,
it
has
the
desirable
property
of
being
a
median
tree
(with
reference
to
the
set
of
input
trees).
The
ability
to
estimate
the
likelihood
of
supertrees,
allows
implementing
Bayesian
(MCMC)
approaches,
which
have
the
advantage
to
allow
the
support
for
the
clades
in
a
supertree
to
be
properly
estimated.
I
present
here
the
L.U.St
software
package;
it
contains
the
first
implementation
of
a
ML
SM
and
allows
for
the
first
time
statistical
tests
on
supertrees.
I
also
characterized
the
first
implementation
of
the
Bayesian
(MCMC)
SM.
Both
the
ML
and
the
Bayesian
(MCMC)
SMs
have
been
tested
for
and
found
to
be
immune
to
biases.
The
Bayesian
(MCMC)
SM
is
applied
to
the
reanalyses
of
a
variety
of
datasets
(i.e.
the
datasets
for
the
Metazoa
and
the
Carnivora),
and
I
have
also
recovered
the
first
Bayesian
supertree-‐based
phylogeny
of
the
Eubacteria
and
the
Archaebacteria.
These
new
SMs
are
discussed,
with
reference
to
other,
well-‐
known
SMs
like
Matrix
Representation
with
Parsimony.
Both
the
ML
and
Bayesian
SM
offer
multiple
attractive
advantages
over
current
alternatives
- …