159 research outputs found

    Erratum to: Mirnacle: machine learning with SMOTE and random forest for improving selectivity in pre-miRNA ab initio prediction

    Get PDF
    MicroRNAs (miRNAs) are key gene expression regulators in plants and animals. Therefore, miRNAs are involved in several biological processes, making the study of these molecules one of the most relevant topics of molecular biology nowadays. However, characterizing miRNAs in vivo is still a complex task. As a consequence, in silico methods have been developed to predict miRNA loci. A common ab initio strategy to find miRNAs in genomic data is to search for sequences that can fold into the typical hairpin structure of miRNA precursors (pre-miRNAs). The current ab initio approaches, however, have selectivity issues, i.e., a high number of false positives is reported, which can lead to laborious and costly attempts to provide biological validation. This study presents an extension of the ab initio method miRNAFold, with the aim of improving selectivity through machine learning techniques, namely, random forest combined with the SMOTE procedure that copes with imbalance datasets. By comparing our method, termed Mirnacle, with other important approaches in the literature, we demonstrate that Mirnacle substantially improves selectivity without compromising sensitivity. For the three datasets used in our experiments, our method achieved at least 97% of sensitivity and could deliver a two-fold, 20-fold, and 6-fold increase in selectivity, respectively, compared with the best results of current computational tools. The extension of miRNAFold by the introduction of machine learning techniques, significantly increases selectivity in pre-miRNA ab initio prediction, which optimally contributes to advanced studies on miRNAs, as the need of biological validations is diminished. Hopefully, new research, such as studies of severe diseases caused by miRNA malfunction, will benefit from the proposed computational tool

    The Omicron lineages BA.1 and BA.2 (Betacoronavirus SARS-CoV-2) have repeatedly entered Brazil through a single dispersal hub

    Get PDF
    Brazil currently ranks second in absolute deaths by COVID-19, even though most of its population has completed the vaccination protocol. With the introduction of Omicron in late 2021, the number of COVID-19 cases soared once again in the country. We investigated in this work how lineages BA.1 and BA.2 entered and spread in the country by sequencing 2173 new SARS-CoV-2 genomes collected between October 2021 and April 2022 and analyzing them in addition to more than 18,000 publicly available sequences with phylodynamic methods. We registered that Omicron was present in Brazil as early as 16 November 2021 and by January 2022 was already more than 99% of samples. More importantly, we detected that Omicron has been mostly imported through the state of São Paulo, which in turn dispersed the lineages to other states and regions of Brazil. This knowledge can be used to implement more efficient non-pharmaceutical interventions against the introduction of new SARS-CoV variants focused on surveillance of airports and ground transportation

    From sequence to dynamics: the effects of transcription factor and polymerase concentration changes on activated and repressed promoters

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The fine tuning of two features of the bacterial regulatory machinery have been known to contribute to the diversity of gene expression within the same regulon: the sequence of Transcription Factor (TF) binding sites, and their location with respect to promoters. While variations of binding sequences modulate the strength of the interaction between the TF and its binding sites, the distance between binding sites and promoters alter the interaction between the TF and the RNA polymerase (RNAP).</p> <p>Results</p> <p>In this paper we estimated the dissociation constants (<it>K</it><sub><it>d</it></sub>) of several <it>E. coli </it>TFs in their interaction with variants of their binding sequences from the scores resulting from aligning them to Positional Weight Matrices. A correlation coefficient of 0.78 was obtained when pooling together sites for different TFs. The theoretically estimated <it>K</it><sub><it>d </it></sub>values were then used, together with the dissociation constants of the RNAP-promoter interaction to analyze activated and repressed promoters. The strength of repressor sites -- i.e., the strength of the interaction between TFs and their binding sites -- is slightly higher than that of activated sites. We explored how different factors such as the variation of binding sequences, the occurrence of more than one binding site, or different RNAP concentrations may influence the promoters' response to the variations of TF concentrations. We found that the occurrence of several regulatory sites bound by the same TF close to a promoter -- if they are bound by the TF in an independent manner -- changes the effect of TF concentrations on promoter occupancy, with respect to individual sites. We also found that the occupancy of a promoter will never be more than half if the RNAP concentration-to-<it>K</it><sub><it>p </it></sub>ratio is 1 and the promoter is subject to repression; or less than half if the promoter is subject to activation. If the ratio falls to 0.1, the upper limit of occupancy probability for repressed drops below 10%; a descent of the limits occurs also for activated promoters.</p> <p>Conclusion</p> <p>The number of regulatory sites may thus act as a versatility-producing device, in addition to serving as a source of robustness of the transcription machinery. Furthermore, our results show that the effects of TF concentration fluctuations on promoter occupancy are constrained by RNAP concentrations.</p

    A Cellular Automata-Based Mathematical Model for Thymocyte Development

    Get PDF
    Intrathymic T cell development is an important process necessary for the normal formation of cell-mediated immune responses. Importantly, such a process depends on interactions of developing thymocytes with cellular and extracellular elements of the thymic microenvironment. Additionally, it includes a series of oriented and tunely regulated migration events, ultimately allowing mature cells to cross endothelial barriers and leave the organ. Herein we built a cellular automata-based mathematical model for thymocyte migration and development. The rules comprised in this model take into account the main stages of thymocyte development, two-dimensional sections of the normal thymic microenvironmental network, as well as the chemokines involved in intrathymic cell migration. Parameters of our computer simulations with further adjusted to results derived from previous experimental data using sub-lethally irradiated mice, in which thymus recovery can be evaluated. The model fitted with the increasing numbers of each CD4/CD8-defined thymocyte subset. It was further validated since it fitted with the times of permanence experimentally ascertained in each CD4/CD8-defined differentiation stage. Importantly, correlations using the whole mean volume of young normal adult mice revealed that the numbers of cells generated in silico with the mathematical model fall within the range of total thymocyte numbers seen in these animals. Furthermore, simulations made with a human thymic epithelial network using the same mathematical model generated similar profiles for temporal evolution of thymocyte developmental stages. Lastly, we provided in silico evidence that the thymus architecture is important in the thymocyte development, since changes in the epithelial network result in different theoretical profiles for T cell development/migration. This model likely can be used to predict thymocyte evolution following therapeutic strategies designed for recovery of the thymus in diseases coursing with thymus involution, such as some primary immunodeficiencies, acute infections, and malnutrition

    An MLSA-based online scheme for the rapid identification of Stenotrophomonas isolates

    Get PDF
    An online scheme to assign Stenotrophomonas isolates to genomic groups was developed using the multilocus sequence analysis (MLSA), which is based on the DNA sequencing of selected fragments of the housekeeping genes ATP synthase alpha subunit (atpA), the recombination repair protein (recA), the RNA polymerase alpha subunit (rpoA) and the excision repair beta subunit (uvrB). This MLSA-based scheme was validated using eight of the 10 Stenotrophomonas species that have been previously described. The environmental and nosocomial Stenotrophomonas strains were characterised using MLSA, 16S rRNA sequencing and DNA-DNA hybridisation (DDH) analyses. Strains of the same species were found to have greater than 95% concatenated sequence similarity and specific strains formed cohesive readily recognisable phylogenetic groups. Therefore, MLSA appeared to be an effective alternative methodology to amplified fragment length polymorphism fingerprint and DDH techniques. Strains of Stenotrophomonas can be readily assigned through the open database resource that was developed in the current study (www.steno.lncc.br/)

    Microbiome overview in swine lungs

    Get PDF
    Mycoplasma hyopneumoniae is the etiologic agent of swine enzootic pneumonia. However other mycoplasma species and secondary bacteria are found as inhabitants of the swine respiratory tract, which can be also related to disease. In the present study we have performed a total DNA metagenomic analysis from the lungs of pigs kept in a field condition, with suggestive signals of enzootic pneumonia and without any infection signals to evaluate the bacteria variability of the lungs microbiota. Libraries from metagenomic DNA were prepared and sequenced using total DNA shotgun metagenomic pyrosequencing. The metagenomic distribution showed a great abundance of bacteria. The most common microbial families identified from pneumonic swine’s lungs were Mycoplasmataceae, Flavobacteriaceae and Pasteurellaceae, whereas in the carrier swine’s lungs the most common families were Mycoplasmataceae, Bradyrhizobiaceae and Flavobacteriaceae. Analysis of community composition in both samples confirmed the high prevalence of M. hyopneumoniae. Moreover, the carrier lungs had more diverse family population, which should be related to the lungs normal flora. In summary, we provide a wide view of the bacterial population from lungs with signals of enzootic pneumonia and lungs without signals of enzootic pneumonia in a field situation. These bacteria patterns provide information that may be important for the establishment of disease control measures and to give insights for further studies

    Cestode strobilation: prediction of developmental genes and pathways

    Get PDF
    Background: Cestoda is a class of endoparasitic worms in the flatworm phylum (Platyhelminthes). During the course of their evolution cestodes have evolved some interesting aspects, such as their increased reproductive capacity. In this sense, they have serial repetition of their reproductive organs in the adult stage, which is often associated with external segmentation in a developmental process called strobilation. However, the molecular basis of strobilation is poorly understood. To assess this issue, an evolutionary comparative study among strobilated and non-strobilated flatworm species was conducted to identify genes and proteins related to the strobilation process. Results: We compared the genomic content of 10 parasitic platyhelminth species; five from cestode species, representing strobilated parasitic platyhelminths, and five from trematode species, representing non-strobilated parasitic platyhelminths. This dataset was used to identify 1813 genes with orthologues that are present in all cestode (strobilated) species, but absent from at least one trematode (non-strobilated) species. Development- related genes, along with genes of unknown function (UF), were then selected based on their transcriptional profiles, resulting in a total of 34 genes that were differentially expressed between the larval (pre-strobilation) and adult (strobilated) stages in at least one cestode species. These 34 genes were then assumed to be strobilation related; they included 12 encoding proteins of known function, with 6 related to the Wnt, TGF-β/BMP, or G-protein coupled receptor signaling pathways; and 22 encoding UF proteins. In order to assign function to at least some of the UF genes/proteins, a global gene co-expression analysis was performed for the cestode species Echinococcus multilocularis. This resulted in eight UF genes/proteins being predicted as related to developmental, reproductive, vesicle transport, or signaling processes. Conclusions: Overall, the described in silico data provided evidence of the involvement of 34 genes/proteins and at least 3 developmental pathways in the cestode strobilation process. These results highlight on the molecular mechanisms and evolution of the cestode strobilation process, and point to several interesting proteins as potential developmental markers and/or targets for the development of novel antihelminthic drugs

    Shifts in taxonomic and functional microbial diversity with agriculture: How fragile is the Brazilian Cerrado?

    Get PDF
    Rarefaction curves generated with the MG-RAST software against M5NR database using normalized values between 0 and 1 for no-tillage (NT), conventional tillage (CT) and undisturbed Cerrado (Native) soil metagenomes. Figure S2. Sequence abundance orders of Betaproteobacteria compared to M5NR database using normalized values between 0 and 1 for no-tillage (NT), conventional tillage (CT) and undisturbed Cerrado (Native) soil metagenomes. The order Burkholderiales was the most abundant in the NT system, followed by Nitrosomonadales, both in CT and NT (p < 0.05). Figure S3. Sequence abundance of phyla of Archaea Domain compared to M5NR database, and using normalized values between 0 and 1 for no-tillage (NT), conventional tillage (CT) and undisturbed Cerrado (Native) soil metagenomes. Crenarchaeota was higher in the NT, while Thaumarchaeota and unclassified were higher in the NT and CT treatments (p < 0.05). Figure S4. Sequence abundance of the phyla of Eukaryota Domain compared to M5NR database and using normalized values between 0 and 1 for no-tillage (NT), conventional tillage (CT) and undisturbed Cerrado (Native) soil metagenomes. Figure S5. Sequence abundance in the Viruses domain compared to M5NR database using normalized values between 0 and 1 for no-tillage (NT), conventional tillage (CT) and undisturbed (Native) soil metagenomes. Caudovirales was higher in the NT and CT systems (p < 0.05). (DOCX 423 kb

    Secondary metabolite gene clusters in the entomopathogen fungus Metarhizium anisopliae : genome identification and patterns of expression in a cuticle infection model

    Get PDF
    Background: The described species from the Metarhizium genus are cosmopolitan fungi that infect arthropod hosts. Interestingly, while some species infect a wide range of hosts (host-generalists), other species infect only a few arthropods (host-specialists). This singular evolutionary trait permits unique comparisons to determine how pathogens and virulence determinants emerge. Among the several virulence determinants that have been described, secondary metabolites (SMs) are suggested to play essential roles during fungal infection. Despite progress in the study of pathogen-host relationships, the majority of genes related to SM production in Metarhizium spp. are uncharacterized, and little is known about their genomic organization, expression and regulation. To better understand how infection conditions may affect SM production in Metarhizium anisopliae, we have performed a deep survey and description of SM biosynthetic gene clusters (BGCs) in M. anisopliae, analyzed RNA-seq data from fungi grown on cattle-tick cuticles, evaluated the differential expression of BGCs, and assessed conservation among the Metarhizium genus. Furthermore, our analysis extended to the construction of a phylogeny for the following three BGCs: a tropolone/citrinin-related compound (MaPKS1), a pseurotin-related compound (MaNRPS-PKS2), and a putative helvolic acid (MaTERP1). Results: Among 73 BGCs identified in M. anisopliae, 20 % were up-regulated during initial tick cuticle infection and presumably possess virulence-related roles. These up-regulated BGCs include known clusters, such as destruxin, NG39x and ferricrocin, together with putative helvolic acid and, pseurotin and tropolone/citrinin-related compound clusters as well as uncharacterized clusters. Furthermore, several previously characterized and putative BGCs were silent or down-regulated in initial infection conditions, indicating minor participation over the course of infection. Interestingly, several up-regulated BGCs were not conserved in host-specialist species from the Metarhizium genus, indicating differences in the metabolic strategies employed by generalist and specialist species to overcome and kill their host. These differences in metabolic potential may have been partially shaped by horizontal gene transfer (HGT) events, as our phylogenetic analysis provided evidence that the putative helvolic acid cluster in Metarhizium spp. originated from an HGT event. Conclusions: Several unknown BGCs are described, and aspects of their organization, regulation and origin are discussed, providing further support for the impact of SM on the Metarhizium genus lifestyle and infection process
    • …
    corecore