40 research outputs found
Grammar-based distance in progressive multiple sequence alignment
Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets
Evolutionary distances in the twilight zone -- a rational kernel approach
Phylogenetic tree reconstruction is traditionally based on multiple sequence
alignments (MSAs) and heavily depends on the validity of this information
bottleneck. With increasing sequence divergence, the quality of MSAs decays
quickly. Alignment-free methods, on the other hand, are based on abstract
string comparisons and avoid potential alignment problems. However, in general
they are not biologically motivated and ignore our knowledge about the
evolution of sequences. Thus, it is still a major open question how to define
an evolutionary distance metric between divergent sequences that makes use of
indel information and known substitution models without the need for a multiple
alignment. Here we propose a new evolutionary distance metric to close this
gap. It uses finite-state transducers to create a biologically motivated
similarity score which models substitutions and indels, and does not depend on
a multiple sequence alignment. The sequence similarity score is defined in
analogy to pairwise alignments and additionally has the positive semi-definite
property. We describe its derivation and show in simulation studies and
real-world examples that it is more accurate in reconstructing phylogenies than
competing methods. The result is a new and accurate way of determining
evolutionary distances in and beyond the twilight zone of sequence alignments
that is suitable for large datasets.Comment: to appear in PLoS ON
A grammar-based distance metric enables fast and accurate clustering of large sets of 16S sequences
Background: We propose a sequence clustering algorithm and compare the partition quality and execution time of the proposed algorithm with those of a popular existing algorithm. The proposed clustering algorithm uses a grammar-based distance metric to determine partitioning for a set of biological sequences. The algorithm performs clustering in which new sequences are compared with cluster-representative sequences to determine membership. If comparison fails to identify a suitable cluster, a new cluster is created.
Results: The performance of the proposed algorithm is validated via comparison to the popular DNA/RNA sequence clustering approach, CD-HIT-EST, and to the recently developed algorithm, UCLUST, using two different sets of 16S rDNA sequences from 2,255 genera. The proposed algorithm maintains a comparable CPU execution time with that of CD-HIT-EST which is much slower than UCLUST, and has successfully generated clusters with higher statistical accuracy than both CD-HIT-EST and UCLUST. The validation results are especially striking for large datasets.
Conclusions: We introduce a fast and accurate clustering algorithm that relies on a grammar-based sequence distance. Its statistical clustering quality is validated by clustering large datasets containing 16S rDNA sequences
Mst1/2 signalling to Yap: gatekeeper for liver size and tumour development
The mechanisms controlling mammalian organ size have long been a source of fascination for biologists. These controls are needed to both ensure the integrity of the body plan and to restrict inappropriate proliferation that could lead to cancer. Regulation of liver size is of particular interest inasmuch as this organ maintains the capacity for regeneration throughout life, and is able to regain precisely its original mass after partial surgical resection. Recent studies using genetically engineered mouse strains have shed new light on this problem; the Hippo signalling pathway, first elucidated as a regulator of organ size in Drosophila, has been identified as dominant determinant of liver growth. Defects in this pathway in mouse liver lead to sustained liver overgrowth and the eventual development of both major types of liver cancer, hepatocellular carcinoma and cholangiocarcinoma. In this review, we discuss the role of Hippo signalling in liver biology and the contribution of this pathway to liver cancer in humans
Complete Genome Viral Phylogenies Suggests the Concerted Evolution of Regulatory Cores and Accessory Satellites
We consider the concerted evolution of viral genomes in four families of DNA viruses. Given the high rate of horizontal gene transfer among viruses and their hosts, it is an open question as to how representative particular genes are of the evolutionary history of the complete genome. To address the concerted evolution of viral genes, we compared genomic evolution across four distinct, extant viral families. For all four viral families we constructed DNA-dependent DNA polymerase-based (DdDp) phylogenies and in addition, whole genome sequence, as quantitative descriptions of inter-genome relationships. We found that the history of the polymerase gene was highly predictive of the history of the genome as a whole, which we explain in terms of repeated, co-divergence events of the core DdDp gene accompanied by a number of satellite, accessory genetic loci. We also found that the rate of gene gain in baculovirus and poxviruses proceeds significantly more quickly than the rate of gene loss and that there is convergent acquisition of satellite functions promoting contextual adaptation when distinct viral families infect related hosts. The congruence of the genome and polymerase trees suggests that a large set of viral genes, including polymerase, derive from a phylogenetically conserved core of genes of host origin, secondarily reinforced by gene acquisition from common hosts or co-infecting viruses within the host. A single viral genome can be thought of as a mutualistic network, with the core genes acting as an effective host and the satellite genes as effective symbionts. Larger virus genomes show a greater departure from linkage equilibrium between core and satellites functions
Genome BLAST distance phylogenies inferred from whole plastid and whole mitochondrion genome sequences
BACKGROUND: Phylogenetic methods which do not rely on multiple sequence alignments are important tools in inferring trees directly from completely sequenced genomes. Here, we extend the recently described Genome BLAST Distance Phylogeny (GBDP) strategy to compute phylogenetic trees from all completely sequenced plastid genomes currently available and from a selection of mitochondrial genomes representing the major eukaryotic lineages. BLASTN, TBLASTX, or combinations of both are used to locate high-scoring segment pairs (HSPs) between two sequences from which pairwise similarities and distances are computed in different ways resulting in a total of 96 GBDP variants. The suitability of these distance formulae for phylogeny reconstruction is directly estimated by computing a recently described measure of "treelikeness", the so-called δ value, from the respective distance matrices. Additionally, we compare the trees inferred from these matrices using UPGMA, NJ, BIONJ, FastME, or STC, respectively, with the NCBI taxonomy tree of the taxa under study. RESULTS: Our results indicate that, at this taxonomic level, plastid genomes are much more valuable for inferring phylogenies than are mitochondrial genomes, and that distances based on breakpoints are of little use. Distances based on the proportion of "matched" HSP length to average genome length were best for tree estimation. Additionally we found that using TBLASTX instead of BLASTN and, particularly, combining TBLASTX and BLASTN leads to a small but significant increase in accuracy. Other factors do not significantly affect the phylogenetic outcome. The BIONJ algorithm results in phylogenies most in accordance with the current NCBI taxonomy, with NJ and FastME performing insignificantly worse, and STC performing as well if applied to high quality distance matrices. δ values are found to be a reliable predictor of phylogenetic accuracy. CONCLUSION: Using the most treelike distance matrices, as judged by their δ values, distance methods are able to recover all major plant lineages, and are more in accordance with Apicomplexa organelles being derived from "green" plastids than from plastids of the "red" type. GBDP-like methods can be used to reliably infer phylogenies from different kinds of genomic data. A framework is established to further develop and improve such methods. δ values are a topology-independent tool of general use for the development and assessment of distance methods for phylogenetic inference
A20 Modulates Lipid Metabolism and Energy Production to Promote Liver Regeneration
Background: Liver Regeneration is clinically of major importance in the setting of liver injury, resection or transplantation. We have demonstrated that the NF-B inhibitory protein A20 significantly improves recovery of liver function and mass following extended liver resection (LR) in mice. In this study, we explored the Systems Biology modulated by A20 following extended LR in mice. Methodology and Principal Findings: We performed transcriptional profiling using Affymetrix-Mouse 430.2 arrays on liver mRNA retrieved from recombinant adenovirus A20 (rAd.A20) and rAd.galactosidase treated livers, before and 24 hours after 78% LR. A20 overexpression impacted 1595 genes that were enriched for biological processes related to inflammatory and immune responses, cellular proliferation, energy production, oxidoreductase activity, and lipid and fatty acid metabolism. These pathways were modulated by A20 in a manner that favored decreased inflammation, heightened proliferation, and optimized metabolic control and energy production. Promoter analysis identified several transcriptional factors that implemented the effects of A20, including NF-B, CEBPA, OCT-1, OCT-4 and EGR1. Interactive scale-free network analysis captured the key genes that delivered the specific functions of A20. Most of these genes were affected at basal level and after resection. We validated a number of A20's target genes by real-time PCR, including p21, the mitochondrial solute carriers SLC25a10 and SLC25a13, and the fatty acid metabolism regulator, peroxisome proliferator activated receptor alpha. This resulted in greater energy production in A20-expressing livers following LR, as demonstrated by increased enzymatic activity of cytochrome c oxidase, or mitochondrial complex IV. Conclusion: This Systems Biology-based analysis unravels novel mechanisms supporting the pro-regenerative function of A20 in the liver, by optimizing energy production through improved lipid/fatty acid metabolism, and down-regulated inflammation. These findings support pursuit of A20-based therapies to improve patients' outcomes in the context of extreme liver injury and extensive LR for tumor treatment or donation
Gene expression of purified beta-cell tissue obtained from human pancreas with laser capture microdissection.
Human β-cell gene profiling is a powerful tool for understanding β-cell biology in normal and pathological conditions. Assessment is complicated when isolated islets are studied because of contamination by non-β-cells and the trauma of the isolation procedure. Objective: The objective was to use laser capture microdissection (LCM) of human β-cells from pancreases of cadaver donors and compare their gene expression with that of handpicked isolated islets. Design: Endogenous autofluorescence of β-cells facilitated procurement of purified β-cell tissue from frozen pancreatic sections with LCM. Gene expression profiles of three microdissected β-cell samples and three isolated islet preparations were obtained. The array data were normalized using DNA-Chip Analyzer software (Harvard School of Public Health, Boston, MA), and the lower confidence bound evaluated differentially expressed genes. Real-time PCR was performed on selected acinar genes and on the duct cell markers, carbonic anhydrase II and keratin 19. Results: Endogenous autofluorescence facilitates the microdissection of β-cell rich tissue in human pancreas. When compared with array profiles of purified β-cell tissue, with lower confidence bound set at 1.2, there were 4560 genes up-regulated and 1226 genes down-regulated in the isolated islets. Among the genes up-regulated in isolated islets were pancreatic acinar and duct genes, chemokine genes, and genes associated with hypoxia, apoptosis, and stress. Quantitative RT-PCR confirmed the differential expression of acinar gene transcripts and the duct marker carbonic anhydrase II in isolated islets. Conclusion: LCM makes it possible to obtain β-cell enriched tissue from human pancreas sections without the trauma and ischemia of islet isolation
Identification of Genes Differentially Expressed in Simvastatin‐Induced Alveolar Bone Formation
Abstract Local delivery of simvastatin (SIM) has exhibited potential in preventing inflammation and limiting bone loss associated with experimental periodontitis. The primary aim of this study was to analyze transcriptome changes that may contribute to SIM's reduction of periodontal inflammation and bone loss. We evaluate the global genetic profile and signaling mechanisms induced by SIM on experimental periodontitis bone loss and inflammation. Twenty mature female Sprague Dawley rats were subjected to ligature‐induced experimental periodontitis around maxillary second molars (M2) either unilaterally (one side untreated, n = 10) or bilaterally (n = 10). After the ligature removal at day 7, sites were injected with either carrier, pyrophosphate (PPi ×3), 1.5‐mg SIM‐dose equivalent SIM‐pyrophosphate prodrug, or no injection. Three days after ligature removal, animals were euthanized; the M1‐M2 interproximal was evaluated with μCT, histology, and protein expression. M2 palatal gingiva was harvested for RNA sequencing. Although ligature alone caused upregulation of proinflammatory and bone catabolic genes and proteins, seen in human periodontitis, SIM‐PPi upregulated anti‐inflammatory (IL‐10, IL‐1 receptor‐like 1) and bone anabolic (insulin‐like growth factor, osteocrin, fibroblast growth factor, and Wnt/ β‐catenin) genes. The PPi carrier alone did not have these effects. Genetic profile and signaling mechanism data may help identify enhanced pharmacotherapeutic approaches to limit or regenerate periodontitis bone loss. © 2018 The Authors. JBMR Plus Published by Wiley Periodicals, Inc. on behalf of the American Society for Bone and Mineral Research