365 research outputs found
The Interrelationships of Placental Mammals and the Limits of Phylogenetic Inference
Placental mammals comprise three principal clades: Afrotheria (e.g., elephants and tenrecs), Xenarthra (e.g., armadillos and sloths), and Boreoeutheria (all other placental mammals), the relationships among which are the subject of controversy and a touchstone for debate on the limits of phylogenetic inference. Previous analyses have found support for all three hypotheses, leading some to conclude that this phylogenetic problem might be impossible to resolve due to the compounded effects of incomplete lineage sorting (ILS) and a rapid radiation. Here we show, using a genome scale nucleotide data set, microRNAs, and the reanalysis of the three largest previously published amino acid data sets, that the root of Placentalia lies between Atlantogenata and Boreoeutheria. Although we found evidence for ILS in early placental evolution, we are able to reject previous conclusions that the placental root is a hard polytomy that cannot be resolved. Reanalyses of previous data sets recover Atlantogenata + Boreoeutheria and show that contradictory results are a consequence of poorly fitting evolutionary models; instead, when the evolutionary process is better-modeled, all data sets converge on Atlantogenata. Our Bayesian molecular clock analysis estimates that marsupials diverged from placentals 157-170 Ma, crown Placentalia diverged 86-100 Ma, and crown Atlantogenata diverged 84-97 Ma. Our results are compatible with placental diversification being driven by dispersal rather than vicariance mechanisms, postdating early phases in the protracted opening of the Atlantic Ocean
Estimating phylogenetic trees from genome-scale data
As researchers collect increasingly large molecular data sets to reconstruct
the Tree of Life, the heterogeneity of signals in the genomes of diverse
organisms poses challenges for traditional phylogenetic analysis. A class of
phylogenetic methods known as "species tree methods" have been proposed to
directly address one important source of gene tree heterogeneity, namely the
incomplete lineage sorting or deep coalescence that occurs when evolving
lineages radiate rapidly, resulting in a diversity of gene trees from a single
underlying species tree. Although such methods are gaining in popularity, they
are being adopted with caution in some quarters, in part because of an
increasing number of examples of strong phylogenetic conflict between
concatenation or supermatrix methods and species tree methods. Here we review
theory and empirical examples that help clarify these conflicts. Thinking of
concatenation as a special case of the more general model provided by the
multispecies coalescent can help explain a number of differences in the
behavior of the two methods on phylogenomic data sets. Recent work suggests
that species tree methods are more robust than concatenation approaches to some
of the classic challenges of phylogenetic analysis, including rapidly evolving
sites in DNA sequences, base compositional heterogeneity and long branch
attraction. We show that approaches such as binning, designed to augment the
signal in species tree analyses, can distort the distribution of gene trees and
are inconsistent. Computationally efficient species tree methods that
incorporate biological realism are a key to phylogenetic analysis of whole
genome data.Comment: 39 pages, 3 figure
The molecular phylogeny of placental mammals and its application to uncovering signatures of molecular adaptation.
Considerable conflict remains in the literature as to the position of the root of placental mammals, and the placement of several intra-ordinal groups. Debate continues over the use of DNA or amino acids datasets and over the use of Supertree or Supermatrix approaches. Known phenomena exist within mammal data that complicate the reconstruction of phylogeny. These include (but are not limited to), variation in longevity, body size, metabolic rates, and germ-line generation time that result in variation in mutation rates and composition biases. Previous attempts to resolve the placental mammal phylogeny have used homogeneous evolutionary models that cannot capture and adequately describe these features across the species sampled. In this thesis I explore the properties of different datasets and data types and their suitability to the resolution of the mammal phylogeny at different depths: (i) the position of the root of the placental mammals, and (ii), the intraordinal placements within the Laurasiatheria.
The datasets tested were (i) mitochondrial and nuclear data types, (ii) previously published datasets for mammals, and (iii), datasets I assembled specifically for analyses at different phylogenetic depths. I propose and apply the use of heterogeneous models to resolve the position of the root of the placental mammal phylogeny to these datasets.
Reconstruction of a robust mammal phylogeny provides us with an essential framework for understanding the molecular underpinnings of adaptation to environment. The placental mammals display a huge variations in life traits such longevity, body size and DNA repair efficiency, since they emerged ~100 million years ago. With this robust phylogeny, I set out to determine the level of adaptive and non-adaptive processes acting on a set of mammal genes that are linked with longevity and cancer.
The results of these analyses yield important insights into data and model suitability, and provide strong evidence for a single hypothesis for the rooting of placental mammals. These results also show that Laurasiatheria intra-ordinal placements are not fully resolved and additional sampling from this diverse clade is required. Using this resolved phylogeny, specific molecular adaptations and non-adaptive mechanisms were identified in the mammalia for a set of telomere-associated genes
Rare coral under the genomic microscope: timing and relationships among Hawaiian Montipora
Background
Evolutionary patterns of scleractinian (stony) corals are difficult to infer given the existence of few diagnostic characters and pervasive phenotypic plasticity. A previous study of Hawaiian Montipora (Scleractinia: Acroporidae) based on five partial mitochondrial and two nuclear genes revealed the existence of a species complex, grouping one of the rarest known species (M. dilatata, which is listed as Endangered by the International Union for Conservation of Nature - IUCN) with widespread corals of very different colony growth forms (M. flabellata and M. cf. turgescens). These previous results could result from a lack of resolution due to a limited number of markers, compositional heterogeneity or reflect biological processes such as incomplete lineage sorting (ILS) or introgression.
Results
All 13 mitochondrial protein-coding genes from 55 scleractinians (14 lineages from this study) were used to evaluate if a recent origin of the M. dilatata species complex or rate heterogeneity could be compromising phylogenetic inference. Rate heterogeneity detected in the mitochondrial data set seems to have no significant impacts on the phylogenies but clearly affects age estimates. Dating analyses show different estimations for the speciation of M. dilatata species complex depending on whether taking compositional heterogeneity into account (0.8 [0.05โ2.6] Myr) or assuming rate homogeneity (0.4 [0.14โ0.75] Myr). Genomic data also provided evidence of introgression among all analysed samples of the complex. RADseq data indicated that M. capitata colour morphs may have a genetic basis.
Conclusions
Despite the volume of data (over 60,000 SNPs), phylogenetic relationships within the M. dilatata species complex remain unresolved most likely due to a recent origin and ongoing introgression. Species delimitation with genomic data is not concordant with the current taxonomy, which does not reflect the true diversity of this group. Nominal species within the complex are either undergoing a speciation process or represent ecomorphs exhibiting phenotypic polymorphisms.info:eu-repo/semantics/publishedVersio
Developing and applying supertree methods in Phylogenomics and Macroevolution
Supertrees
can
be
used
to
combine
partially
overalapping
trees
and
generate
more
inclusive
phylogenies.
It
has
been
proposed
that
Maximum
Likelihood
(ML)
supertrees
method
(SM)
could
be
developed
using
an
exponential
probability
distribution
to
model
errors
in
the
input
trees
(given
a
proposed
supertree).
When
the
tree-ยญโto-ยญโtree
distances
used
in
the
ML
computation
are
symmetric
differences,
the
ML
SM
has
been
shown
to
be
equivalent
to
a
Majority-ยญโRule
consensus
SM,
and
hence,
exactly
as
the
latter,
it
has
the
desirable
property
of
being
a
median
tree
(with
reference
to
the
set
of
input
trees).
The
ability
to
estimate
the
likelihood
of
supertrees,
allows
implementing
Bayesian
(MCMC)
approaches,
which
have
the
advantage
to
allow
the
support
for
the
clades
in
a
supertree
to
be
properly
estimated.
I
present
here
the
L.U.St
software
package;
it
contains
the
first
implementation
of
a
ML
SM
and
allows
for
the
first
time
statistical
tests
on
supertrees.
I
also
characterized
the
first
implementation
of
the
Bayesian
(MCMC)
SM.
Both
the
ML
and
the
Bayesian
(MCMC)
SMs
have
been
tested
for
and
found
to
be
immune
to
biases.
The
Bayesian
(MCMC)
SM
is
applied
to
the
reanalyses
of
a
variety
of
datasets
(i.e.
the
datasets
for
the
Metazoa
and
the
Carnivora),
and
I
have
also
recovered
the
first
Bayesian
supertree-ยญโbased
phylogeny
of
the
Eubacteria
and
the
Archaebacteria.
These
new
SMs
are
discussed,
with
reference
to
other,
well-ยญโ
known
SMs
like
Matrix
Representation
with
Parsimony.
Both
the
ML
and
Bayesian
SM
offer
multiple
attractive
advantages
over
current
alternatives
Developing and applying supertree methods in Phylogenomics and Macroevolution
Supertrees
can
be
used
to
combine
partially
overalapping
trees
and
generate
more
inclusive
phylogenies.
It
has
been
proposed
that
Maximum
Likelihood
(ML)
supertrees
method
(SM)
could
be
developed
using
an
exponential
probability
distribution
to
model
errors
in
the
input
trees
(given
a
proposed
supertree).
When
the
tree-ยญโto-ยญโtree
distances
used
in
the
ML
computation
are
symmetric
differences,
the
ML
SM
has
been
shown
to
be
equivalent
to
a
Majority-ยญโRule
consensus
SM,
and
hence,
exactly
as
the
latter,
it
has
the
desirable
property
of
being
a
median
tree
(with
reference
to
the
set
of
input
trees).
The
ability
to
estimate
the
likelihood
of
supertrees,
allows
implementing
Bayesian
(MCMC)
approaches,
which
have
the
advantage
to
allow
the
support
for
the
clades
in
a
supertree
to
be
properly
estimated.
I
present
here
the
L.U.St
software
package;
it
contains
the
first
implementation
of
a
ML
SM
and
allows
for
the
first
time
statistical
tests
on
supertrees.
I
also
characterized
the
first
implementation
of
the
Bayesian
(MCMC)
SM.
Both
the
ML
and
the
Bayesian
(MCMC)
SMs
have
been
tested
for
and
found
to
be
immune
to
biases.
The
Bayesian
(MCMC)
SM
is
applied
to
the
reanalyses
of
a
variety
of
datasets
(i.e.
the
datasets
for
the
Metazoa
and
the
Carnivora),
and
I
have
also
recovered
the
first
Bayesian
supertree-ยญโbased
phylogeny
of
the
Eubacteria
and
the
Archaebacteria.
These
new
SMs
are
discussed,
with
reference
to
other,
well-ยญโ
known
SMs
like
Matrix
Representation
with
Parsimony.
Both
the
ML
and
Bayesian
SM
offer
multiple
attractive
advantages
over
current
alternatives
Recommended from our members
Estimating phylogenetic trees from genome-scale data
The heterogeneity of signals in the genomes of diverse organisms poses challenges for traditional phylogenetic analysis. Phylogenetic methods known as โspecies treeโ methods have been proposed to directly address one important source of gene tree heterogeneity, namely the incomplete lineage sorting that occurs when evolving lineages radiate rapidly, resulting in a diversity of gene trees from a single underlying species tree. Here we review theory and empirical examples that help clarify conflicts between species tree and concatenation methods, and misconceptions in the literature about the performance of species tree methods. Considering concatenation as a special case of the multispecies coalescent model helps explain differences in the behavior of the two methods on phylogenomic data sets. Recent work suggests that species tree methods are more robust than concatenation approaches to some of the classic challenges of phylogenetic analysis, including rapidly evolving sites in DNA sequences and long-branch attraction. We show that approaches, such as binning, designed to augment the signal in species tree analyses can distort the distribution of gene trees and are inconsistent. Computationally efficient species tree methods incorporating biological realism are a key to phylogenetic analysis of whole-genome data.Organismic and Evolutionary Biolog
๋ฐ์ด์ค์ธํฌ๋งคํฑ์ค ํ๋ก๊ทธ๋จ์ ์ด์ฉํ ์ ์ ์ ๋ง์ปค ์ ๋ณ ๋ฐ ๊ณํต์ ์ค๋ฅ ํ๊ฐ ์ฐ๊ตฌ
ํ์๋
ผ๋ฌธ(์์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ์์ฐ๊ณผํ๋ํ ํ๋๊ณผ์ ์๋ฌผ์ ๋ณดํ์ ๊ณต, 2021.8. ์ํ์.์ง์์ ์ผ๋ก ์ฐ์ถ๋๋ ์์ฒญ๋ ์์ ์๋ฌผํ์ ์์ด ๋ฐ์ดํฐ๋ ์ ๊ธฐ์ฒด ์ฌ์ด์ ์งํ์ ์ญ์ฌ์ ๊ณํตํ์ ๊ด๊ณ(phylogenetic relationship)๋ฅผ ์ ์ถํ ์ ์๋ ๊ธฐํ๋ฅผ ์ ๊ณตํ๋ค. ์ด์ ๊ณํต์ ๊ตฌ์ถ์ ๊ฑฐ์ ๋ชจ๋ ์๋ฌผํ ์ฐ๊ตฌ์์ ์ํ๋๋ ๊ณผ์ ์ ํ๋๊ฐ ๋์๋ค. ์ฌ๊ธฐ์ ๊ณํต์ ๋ณดํ(phyloinformatics)์ ๊ณํต์ ์์ฑ ์๊ณ ๋ฆฌ์ฆ๊ณผ ์งํ์ ๋ชจ๋ธ ๊ฐ๋ฐ๊ณผ ๊ฐ์ ๊ธฐ์ ์ ๋๋ ๋ฐฉ๋ฒ๋ก ์ ์ฐ๊ตฌ๋ฅผ ์ค์ฌ์ผ๋ก ๋ฐ์ ๋์ด ์๋ค. ํ์ฌ์ ๊ณํต์ ๋ถ์์ ์์ด ๋ฐ์ดํฐ, ์ฆ ์ ์ ์ ๋ง์ปค๋ฅผ ์ด์ฉํ์ฌ ๊ณํต์๋ฅผ ์์ฑํจ์ผ๋ก์จ ์ค์ ์ ๊ฐ๊น์ด ๊ณํต์๋ฅผ ์ถ๋ก ํ๋ ๊ฒ์ ๋ชฉํ๋ก ํ๋ค. ๊ทธ๋ฌ๋ ์ ์ ์ ๋ง์ปค๋ฅผ ๋น๋กฏํ ๋ฐ์ดํฐ์ ํฌ๊ธฐ๊ฐ ๊ธฐํ๊ธ์์ ์ผ๋ก ์ฆ๊ฐํ๊ณ ๋ฐ๋ผ์ค๋ ๊ณํต์ ๋ถ์์ ์ ํ์ฑ์ ๋ํ ์๋ฌธ์ด ์ ์ฐจ ์ค์ํ๊ฒ ๋ค๋ฃจ์ด ์ง๊ธฐ ์์ํ๋ฉด์ ๊ณํต์์ ์ ํ์ฑ ๋ฐ ์ ๋ขฐ์ฑ์ ํ๊ฐํ๊ธฐ ์ํ ์ฐ๊ตฌ๊ฐ ๋ค์ ์ด๋ฃจ์ด์ง๊ณ ์๋ ์ํฉ์ด๋ค. ๋ถ์ ์์คํ
ํ ๊ด์ ์์ ๊ณํต์์ ๋ํ ์ ํ์ฑ ํ๊ฐ๋ ๋ ๊ฐ์ง ๊ฐ๋๋ก ๋๋์ด ์ ๊ทผํ ์ ์๋๋ฐ, ํ๋๋ ์งํ ์กฐ๊ฑด, ๋ถ์๋ฐ์ดํฐ์ ์๊ณผ ๊ฐ์ ํน์ ํ๊ฒฝ ์๋์์ ๊ณํต ๋ถ์ ์๊ณ ๋ฆฌ์ฆ์ด ์ผ๋ง๋ ์ ์๋ํ๋์ง๋ฅผ ๋ค๋ฃจ๋ ๊ฒ์ด๊ณ , ๋ ๋ค๋ฅธ ํ๋๋ ํน์ ๊ณํต์๋ฅผ ์ผ๋ง๋ ์ ๋ขฐํ ์ ์๋์ง์ ์ง์คํ๋ ๊ฒ์ด๋ค. ๊ทธ๋ฆฌ๊ณ ๋ฐ์ดํฐ์
์ ํ๋ฆฌํฐ ๊ด์ ์์ ์ ๋ขฐํ ๋งํ ๊ณํต์๋ฅผ ํ๋ํ๊ธฐ ์ํด ๊ณํต์ ๋ถ์์ ์ํํ ํ, ์ฌ์ฉํ ๋ฐ์ดํฐ์
๊ณผ์ ์ ์ ์ฑ์ ํ๊ฐํ๋ ๊ฒ๋ ์ค์ํ๋ค. ๋๊ท๋ชจ ๋ฐ์ดํฐ๋ฅผ ๊ธฐ๋ณธ์ผ๋ก ์ทจ๊ธํ๋ ์ต๊ทผ ๊ณํต์ ๋ถ์์์ ํ๋ฅ ๋ก ์ ์ค๋ฅ์ ๊ฐ๋ฅ์ฑ์ ๋ฎ์์ก์ง๋ง, ์์คํ
์ค๋ฅ์ ๊ฐ๋ฅ์ฑ์ ์คํ๋ ค ๋์์ก์ผ๋ฏ๋ก, ๊ณํต์ ์ ํ์ฑ์ ํ๊ฐ ๋ฐ ๊ฐ์ ํ๊ธฐ ์ํด ๊ณํต ๋ถ์ ๊ฒฐ๊ณผ ํ์ ๋ฐ์ดํฐ์
์ด ๊ฐ์ง๋ ์์คํ
์ค๋ฅ์ ๊ทผ์์ ํ๊ฐํ๋ ๊ฒ์ด ๋งค์ฐ ์ค์ํ ๊ณผ์ ์ด ๋์๊ธฐ ๋๋ฌธ์ด๋ค. ์ด์ ๋ณธ ์ฐ๊ตฌ์์๋ ๋ฐ์ดํฐ ํ๋ฆฌํฐ ๊ด์ ์์ ๊ณํต์์ ์ ๋ขฐ๋ ํฅ์์ ๊ฐ์ ธ์ค๊ธฐ ์ํด APSE (Assessment Program for Systematic Error, tentative)๋ผ๋ ํ๋ก๊ทธ๋จ์ ๊ฐ๋ฐํ์๋ค. APSE๋ฅผ ํ์ฉํ๋ฉด ๋ถ๋ฅ๊ตฐ ํน์ด์ ์๋์ ๊ตฌ์ฑ ๋น๋ ๋ณ์ด(RCFV)์ ๋์นญ์ ์๊ณก๊ฐ(skew)์ ์ฐ์ถํ์ฌ ์ผ๊ธฐ์์ด์ ๊ตฌ์ฑ์ ํธํฅ์ฑ์ ๋ํ ์ ๋ณด๋ฅผ ์ป๊ณ , ์ด๋ฅผ ํตํด ์ฐ๊ตฌํ๊ณ ์ ํ๋ ๋ฐ์ดํฐ์ ์ ์ ์ ์ด์ง์ฑ(heterogeneity) ๋ฐ ์ ์ ์ ๋ณ์ด ํธํฅ์ฑ(mutational bias)์ ์ถ์ ํ ์ ์๋ค. ๋ฟ๋ง ์๋๋ผ ๋ค์ํ ์ผ๊ธฐ ๊ทธ๋ฃน์ ๋น๋, ๋ณ์ด์ ์ํ ๋ค์ ์นํ์ ์๋ฏธํ๋ ํฌํ(saturation)์ ๊ณต์ ๊ฒฐ์ธก ๋ฐ์ดํฐ(shared missing data) ๋ณ์๋ฅผ ํตํด ์์คํ
์ค๋ฅ๋ฅผ ์ ๋ฐํ ์ ์๋ ํธํฅ์ฑ ์ ๋ณด๋ค์ ๊ณ์ฐํ๋ ๊ฒ์ด ๊ฐ๋ฅํ๋ค. ๋ํ, ์์คํ
์ฑ๋ฅ์ ํ๊ฐํ๊ธฐ ์ํด ๋ค์ํ ์ ์ ์ ๋ง์ปค ์ฌ์ด์ ๋ชจ์๋๋ ๊ณํต์๋ฅผ ์ถ๋ ฅํ๊ณ ์๋, ํน์ด์ ์์(Terebelliformia, Daphniid, Glires)๋ฅผ APSE์ ์ ์ฉํ์ฌ ๋ง์ปค ๋ฐ์ดํฐ์
์ ์์คํ
์ค๋ฅ ํ๊ฐ์ ๊ทธ์ ๋ฐ๋ผ ์ ๋ณ๋ ๋ง์ปค ๊ณํต์์ ์ ํ์ฑ ์ถ๋ก ์ ๋ํ ๋ถ์์ด ์ ๋๋ก ์ํ๋ ์ ์์์ ํ์ธํ์๋ค. ๋ฐ๋ผ์ ํฅํ APSE๋ ์์คํ
ํ์ ๊ด์ ์์ ๋ฐ์ดํฐ ํ๋ฆฌํฐ์ ์ง์คํ์ฌ ์์ฑ๋ ๊ณํต์๊ฐ ๋ณด๋ค ์ ํํ ๊ฒฐ๊ณผ๋ฅผ ์ด๋์ด๋ผ ์ ์๋๋ก ์ฌ์ฉ์์ ๋ฐ์ดํฐ์ ๊ณํต์ ์ฌ์ด์ ์ ํ์ฑ์ ํ๊ฐํ๋ ์ญํ ์ ํ ๊ฒ์ด๊ณ , ์ ์ ์ ๋ง์ปค์ ๋ฐ๋ผ ์คํด์ ์์ง๊ฐ ์๋ ๊ณํต์๊ฐ ์ถ๋ ฅ๋์์ ๋, ์์คํ
์ค๋ฅ์ ๊ทผ์์ ๋ํ ์ฒ ์ ํ ๋ถ์๊ณผ ํด๋น ์ค๋ฅ์ ์ํฅ์ ๋ฐ์ ๋ฐ์ดํฐ๊ฐ ๊ณํต์์ ์ฃผ๋ ํจ๊ณผ๋ฅผ ํ์
ํ๋ ์ผ์ ์ํํ ์ ์์ ๊ฒ์ด๋ผ ๊ธฐ๋ํ๋ค.The steadily increasing volume of biological data with decisive phylogenetic relationship provides unparalleled opportunities in bioinformatics. Phylogenetics based on a large amount of datasets handling an evolutionary history and assigning the placement of taxa in a phylogeny establishes the tree of life. Constructing a phylogeny involving a phylogenetic analysis is implemented in most branches of biology and emphasizing the evolutionary history elucidates the phylogenetical background as a prerequisite interpreting a specific biological system, which is a biologically indispensable process. Due to the advent of computing and sequencing techniques as the phylogenetic approach, phyloinformatics has rapidly advanced at the technical and methodological levels along with phylogenetic reconstruction algorithm and evolutionary models. Unlike the classic approach using morphological data, modern phylogenetic analysis reconstructs a phylogeny using genetic information following the inference of phylogenetic tree from molecular data. Therefore, phylogeneticists have naturally dealt with questions concerning the accuracy of phylogenetic estimation and carried out studies on the reliability of phylogenies. In terms of molecular systematics, the concerns regarding the assessment of phylogenetic accuracy considering specific evolutionary conditions and the amount of molecular data implemented can now be divided into two types: how phylogenetic method works and how reliable it is under certain circumstances. Moreover, in terms of data quality, assessment for suitability of nuclear marker is required before the phylogenetic inference is performed for confident phylogeny. Recently, the probability of stochastic errors in phylogenetic estimation dealing with a large-scale datasets has decreased, while the probability of systematic errors has increased. Thus, before the implementation of phylogenetic reconstruction, the assessment of sources of systematic errors is indispensable for the improvement and estimation of phylogenetic accuracy. Assessment Program for Systematic Error (APSE) developed by this study will plays a key role in assessment between user datasets and phylogenies for improving the results of phylogenetic reconstruction in systematics and will be able to implement an analysis of the effect on data bearing systematic errors in a phylogeny after the misleading phylogenetic results are produced. This study with APSE will serve as the inference of phylogenetic accuracy and the assessment of systematic errors using an unresolved example showing the contradicting topologies between different gene markers in the same diversity group. Furthermore, by selectively grouping the properties of the existing systematic biases provided by the APSE, it proceeds in the direction of proposing a new protocol that can provide the best gene marker among candidate markers for a specific taxon.I. INTRODUCTION 1
1.1 Background of research 1
1.2 Necessity of research 20
1.3 Research objectives 22
II. MATERIALS AND METHODS 30
2.1 Datasets definition and data collection 30
2.2 Data processing and bioinformatics software used 33
2.3 Phylogenetic reconstruction and accuracy assessment 36
2.4 Software development environment and allowable data 37
2.5 Assessment of the systematic errors 38
III. RESULTS 45
3.1 Phylogenetic analysis results for incongruence between gene markers 45
3.2 Data-quality analysis using systematic errors 49
IV. DISCUSSION 79
4.1 Significance and implications of study 79
4.2 Application to bioinformatics research 80
4.3 Improvement and achievement 81
V. CONCLUSION AND SUMMARY 83
5.1 Conclusion 83
5.2 Summary 84
BIBLIOGRAPHY 87
ABSTRACT (KOREAN) 96์
Suprafamilial relationships among Rodentia and the phylogenetic effect of removing fast-evolving nucleotides in mitochondrial, exon and intron fragments
The number of rodent clades identified above the family level is contentious, and to date, no consensus has been reached on the basal evolutionary relationships among all rodent families. Rodent suprafamilial phylogenetic relationships are investigated in the present study using approximately 7600 nucleotide characters derived from two mitochondrial genes (Cytochrome b and 12S rRNA), two nuclear exons (IRBP and vWF) and four nuclear introns (MGF, PRKC, SPTBN, THY). Because increasing the number of nucleotides does not necessarily increase phylogenetic signal (especially if the data is saturated), we assess the potential impact of saturation for each dataset by removing the fastest-evolving positions that have been recognized as sources of inconsistencies in phylogenetics
Investigating Evolutionary History Using Phylogenomics
Reconstructing the Tree of Life is one of the principal aims of evolutionary biology. The development of molecular phylogenetics to elucidate evolutionary history has complemented palaeontology, biogeography, and archaeology in elucidating biological history. The development of molecular-clock analyses allowed evolutionary timescales to be estimated using nucleotide sequences and other products of the evolutionary process Until recently, the twin challenges of molecular dating were in obtaining sufficient data and developing robust methods. The former concern is now less important as highโthroughput sequencing technology allows entire genomes to be sampled. Genomeโscale data enhances statistical power, but accompanying this wealth of data is a new suite of analytical challenges. One of these key challenges is analysing these data in synthesis with the paleontological record without statistical overparameterisation. There are also aspects of the evolutionary process, such as amongโlineage rate variation, that can affect the precision and accuracy of current methods. In this thesis, I first use the richest nucleotide sequence data set of insects available to estimate an authoritative insect evolutionary timescale that dates the origins and diversification of every major insect order. I then focus on molecular-clock methods by testing their performance in inferring evolutionary rates from timeโstructured data, common in the study of ancient DNA. I find that amongโrate lineage variation and phyloโtemporal clustering affect rate estimates. I also study data partitioning, a common technique used to optimise the analysis of multilocus data where independent parameters are applied across different subsets of the data. New data from the genomic revolution gifts biologists new opportunities to re-examine enduring questions about the evolutionary process. Here, I use phylogenetic tools to show that evolution leaves figurative fingerprints on genomes over millions of years
- โฆ