270 research outputs found

    apex: phylogenetics with multiple genes.

    Get PDF
    Genetic sequences of multiple genes are becoming increasingly common for a wide range of organisms including viruses, bacteria and eukaryotes. While such data may sometimes be treated as a single locus, in practice, a number of biological and statistical phenomena can lead to phylogenetic incongruence. In such cases, different loci should, at least as a preliminary step, be examined and analysed separately. The r software has become a popular platform for phylogenetics, with several packages implementing distance-based, parsimony and likelihood-based phylogenetic reconstruction, and an even greater number of packages implementing phylogenetic comparative methods. Unfortunately, basic data structures and tools for analysing multiple genes have so far been lacking, thereby limiting potential for investigating phylogenetic incongruence. In this study, we introduce the new r package apex to fill this gap. apex implements new object classes, which extend existing standards for storing DNA and amino acid sequences, and provides a number of convenient tools for handling, visualizing and analysing these data. In this study, we introduce the main features of the package and illustrate its functionalities through the analysis of a simple data set

    Anti-Fibrotic Effect of SDF-1β Overexpression in Bleomycin-Injured Rat Lung.

    Get PDF
    Rational: Idiopathic pulmonary fibrosis (IPF) is a progressive interstitial lung disease and is associated with high mortality due to a lack of effective treatment. Excessive deposition of the extracellular matrix by activated myofibroblasts in the alveolar space leads to scar formation that hinders gas exchange. Therefore, selectively removing activated myofibroblasts with the aim to repair and remodel fibrotic lungs is a promising approach. Stromal-derived growth factor (SDF-1) is known to stimulate cellular signals which attract stem cells to the site of injury for tissue repair and remodeling. Here, we investigate the effect of overexpression of SDF-1β on lung structure using the bleomycin-injured rat lung model. Methods: Intratracheal administration of bleomycin was performed in adult male rats (F344). Seven days later, in vivo electroporation-mediated gene transfer of either SDF-1β or the empty vector was performed. Animals were sacrificed seven days after gene transfer and histology, design-based stereology, flow cytometry, and collagen measurement were performed on the tissue collected. For in vitro experiments, lung fibroblasts obtained from IPF patients were used. Results: Seven days after SDF-1β gene transfer to bleomycin-injured rat lungs, reduced total collagen, reduced collagen fibrils, improved histology and induced apoptosis of myofibroblasts were observed. Furthermore, it was revealed that TNF-α mediates SDF-1β-induced apoptosis of myofibroblasts; moreover, SDF-1β overexpression increased alveolar epithelial cell numbers and proliferation in vivo and also induced their migration in vitro. Conclusions: Our study demonstrates a new antifibrotic mechanism of SDF-1β overexpression and suggests SDF-1β as a potential new approach for the treatment of lung fibrosis

    Spatio-temporal analysis of the extent of an extreme heat event

    Get PDF
    Evidence of global warming induced from the increasing concentration of greenhouse gases in the atmosphere suggests more frequent warm days and heat waves. The concept of an extreme heat event (EHE), defined locally based on exceedance of a suitable local threshold, enables us to capture the notion of a period of persistent extremely high temperatures. Modeling for extreme heat events is customarily implemented using time series of temperatures collected at a set of locations. Since spatial dependence is anticipated in the occurrence of EHE’s, a joint model for the time series, incorporating spatial dependence is needed. Recent work by Schliep et al. (J R Stat Soc Ser A Stat Soc 184(3):1070–1092, 2021) develops a space-time model based on a point-referenced collection of temperature time series that enables the prediction of both the incidence and characteristics of EHE’s occurring at any location in a study region. The contribution here is to introduce a formal definition of the notion of the spatial extent of an extreme heat event and then to employ output from the Schliep et al. (J R Stat Soc Ser A Stat Soc 184(3):1070–1092, 2021) modeling work to illustrate the notion. For a specified region and a given day, the definition takes the form of a block average of indicator functions over the region. Our risk assessment examines extents for the Comunidad Autónoma de Aragón in northeastern Spain. We calculate daily, seasonal and decadal averages of the extents for two subregions in this comunidad. We generalize our definition to capture extents of persistence of extreme heat and make comparisons across decades to reveal evidence of increasing extent over time

    Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm

    Get PDF
    We live in an era of abundant data. This has necessitated the development of new and innovative statistical algorithms to get the most from experimental data. For example, faster algorithms make practical the analysis of larger genomic data sets, allowing us to extend the utility of cutting-edge statistical methods. We present a randomised algorithm that accelerates the clustering of time series data using the Bayesian Hierarchical Clustering (BHC) statistical method. BHC is a general method for clustering any discretely sampled time series data. In this paper we focus on a particular application to microarray gene expression data. We define and analyse the randomised algorithm, before presenting results on both synthetic and real biological data sets. We show that the randomised algorithm leads to substantial gains in speed with minimal loss in clustering quality. The randomised time series BHC algorithm is available as part of the R package BHC, which is available for download from Bioconductor (version 2.10 and above) via http://bioconductor.org/packages/2.10/bioc/html/BHC.html. We have also made available a set of R scripts which can be used to reproduce the analyses carried out in this paper. These are available from the following URL. https://sites.google.com/site/randomisedbhc/

    Turnip mosaic potyvirus probably first spread to Eurasian brassica crops from wild orchids about 1000 years ago

    Get PDF
    Turnip mosaic potyvirus (TuMV) is probably the most widespread and damaging virus that infects cultivated brassicas worldwide. Previous work has indicated that the virus originated in western Eurasia, with all of its closest relatives being viruses of monocotyledonous plants. Here we report that we have identified a sister lineage of TuMV-like potyviruses (TuMV-OM) from European orchids. The isolates of TuMV-OM form a monophyletic sister lineage to the brassica-infecting TuMVs (TuMV-BIs), and are nested within a clade of monocotyledon-infecting viruses. Extensive host-range tests showed that all of the TuMV-OMs are biologically similar to, but distinct from, TuMV-BIs and do not readily infect brassicas. We conclude that it is more likely that TuMV evolved from a TuMV-OM-like ancestor than the reverse. We did Bayesian coalescent analyses using a combination of novel and published sequence data from four TuMV genes [helper component-proteinase protein (HC-Pro), protein 3(P3), nuclear inclusion b protein (NIb), and coat protein (CP)]. Three genes (HC-Pro, P3, and NIb), but not the CP gene, gave results indicating that the TuMV-BI viruses diverged from TuMV-OMs around 1000 years ago. Only 150 years later, the four lineages of the present global population of TuMV-BIs diverged from one another. These dates are congruent with historical records of the spread of agriculture in Western Europe. From about 1200 years ago, there was a warming of the climate, and agriculture and the human population of the region greatly increased. Farming replaced woodlands, fostering viruses and aphid vectors that could invade the crops, which included several brassica cultivars and weeds. Later, starting 500 years ago, inter-continental maritime trade probably spread the TuMV-BIs to the remainder of the world

    PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Monte Carlo simulation of sequence evolution is routinely used to assess the performance of phylogenetic inference methods and sequence alignment algorithms. Progress in the field of molecular evolution fuels the need for more realistic and hence more complex simulations, adapted to particular situations, yet current software makes unreasonable assumptions such as homogeneous substitution dynamics or a uniform distribution of indels across the simulated sequences. This calls for an extensible simulation framework written in a high-level functional language, offering new functionality and making it easy to incorporate further complexity.</p> <p>Results</p> <p><monospace>PhyloSim</monospace> is an extensible framework for the Monte Carlo simulation of sequence evolution, written in R, using the Gillespie algorithm to integrate the actions of many concurrent processes such as substitutions, insertions and deletions. Uniquely among sequence simulation tools, <monospace>PhyloSim</monospace> can simulate arbitrarily complex patterns of rate variation and multiple indel processes, and allows for the incorporation of selective constraints on indel events. User-defined complex patterns of mutation and selection can be easily integrated into simulations, allowing <monospace>PhyloSim</monospace> to be adapted to specific needs.</p> <p>Conclusions</p> <p>Close integration with <monospace>R</monospace> and the wide range of features implemented offer unmatched flexibility, making it possible to simulate sequence evolution under a wide range of realistic settings. We believe that <monospace>PhyloSim</monospace> will be useful to future studies involving simulated alignments.</p

    Shared probe design and existing microarray reanalysis using PICKY

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large genomes contain families of highly similar genes that cannot be individually identified by microarray probes. This limitation is due to thermodynamic restrictions and cannot be resolved by any computational method. Since gene annotations are updated more frequently than microarrays, another common issue facing microarray users is that existing microarrays must be routinely reanalyzed to determine probes that are still useful with respect to the updated annotations.</p> <p>Results</p> <p><smcaps>PICKY</smcaps> 2.0 can design shared probes for sets of genes that cannot be individually identified using unique probes. <smcaps>PICKY</smcaps> 2.0 uses novel algorithms to track sharable regions among genes and to strictly distinguish them from other highly similar but nontarget regions during thermodynamic comparisons. Therefore, <smcaps>PICKY</smcaps> does not sacrifice the quality of shared probes when choosing them. The latest <smcaps>PICKY</smcaps> 2.1 includes the new capability to reanalyze existing microarray probes against updated gene sets to determine probes that are still valid to use. In addition, more precise nonlinear salt effect estimates and other improvements are added, making <smcaps>PICKY</smcaps> 2.1 more versatile to microarray users.</p> <p>Conclusions</p> <p>Shared probes allow expressed gene family members to be detected; this capability is generally more desirable than not knowing anything about these genes. Shared probes also enable the design of cross-genome microarrays, which facilitate multiple species identification in environmental samples. The new nonlinear salt effect calculation significantly increases the precision of probes at a lower buffer salt concentration, and the probe reanalysis function improves existing microarray result interpretations.</p

    Identifying protein complexes directly from high-throughput TAP data with Markov random fields

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Predicting protein complexes from experimental data remains a challenge due to limited resolution and stochastic errors of high-throughput methods. Current algorithms to reconstruct the complexes typically rely on a two-step process. First, they construct an interaction graph from the data, predominantly using heuristics, and subsequently cluster its vertices to identify protein complexes.</p> <p>Results</p> <p>We propose a model-based identification of protein complexes directly from the experimental observations. Our model of protein complexes based on Markov random fields explicitly incorporates false negative and false positive errors and exhibits a high robustness to noise. A model-based quality score for the resulting clusters allows us to identify reliable predictions in the complete data set. Comparisons with prior work on reference data sets shows favorable results, particularly for larger unfiltered data sets. Additional information on predictions, including the source code under the GNU Public License can be found at http://algorithmics.molgen.mpg.de/Static/Supplements/ProteinComplexes.</p> <p>Conclusion</p> <p>We can identify complexes in the data obtained from high-throughput experiments without prior elimination of proteins or weak interactions. The few parameters of our model, which does not rely on heuristics, can be estimated using maximum likelihood without a reference data set. This is particularly important for protein complex studies in organisms that do not have an established reference frame of known protein complexes.</p
    corecore