60 research outputs found

    Machado: open source genomics data integration framework.

    Get PDF
    Abstract. Background: Genome projects and multiomics experiments generate huge volumes of data that must be stored, mined, and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for more than a decade and have been implementing software and databases to meet this challenge. The GMOD's (Generic Model Organism Database) biological relational database schema, known as Chado, is one of the few successful open source initiatives; it is widely adopted and many software packages are able to connect to it. Findings: We have been developing an open source software package named Machado, a genomics data integration framework implemented in Python, to enable research groups to both store and visualize genomics data. The framework relies on the Chado database schema and, therefore, should be very intuitive for current developers to adopt it or have it running on top of already existing databases. It has several data-loading tools for genomics and transcriptomics data and also for annotation results from tools such as BLAST, InterproScan, OrthoMCL, and LSTrAP. There is an API to connect to JBrowse, and a web visualization tool is implemented using Django Views and Templates. The Haystack library integrated with the ElasticSearch engine was used to implement a Google-like search, i.e., single auto-complete search box that provides fast results and filters. Conclusion: Machado aims to be a modern object-relational framework that uses the latest Python libraries to produce an effective open source resource for genomics research.Na publicação: Adhemar Zerlotini

    Detecção e análise bioinformática de genes sob evidência de seleção positiva em genomas de parasitos.

    Get PDF
    A relação ecológica de parasitismo é uma constante corrida armamentista entre os organismos parasitas e seus hospedeiros. A infecção por parasitas diminui a aptidão evolutiva dos hospedeiros e, conseqüentemente, mecanismos anti-parasitismo são positivamente selecionados continuamente dentre o conjunto de genes que compõem o genoma do organismo hospedeiro. Entretanto, a seleção positiva de mecanismos anti-parasitismo por parte dos hospedeiros impõe novas pressões seletivas aos organismos parasitas. Dessa maneira, genes de parasitas que permitam o escape dos mecanismos anti-parasitismo do hospedeiro aumentam a aptidão evolutiva do organismo parasita, sendo também selecionados positivamente. Esse fenômeno acaba por causar uma espiral de eventos coevolutivos ao longo do tempo em ambos os genomas no que se refere aos genes envolvidos na relação molecular parasito-hospedeiro. Genes evoluindo sob esse tipo de pressão seletiva no sistema parasita-hospedeiro muitas vezes apresentam uma freqüência de mutações não-sinônimas e sinônimas mais elevada do que a da vasta maioria dos outros genes destes genomas, fenômeno este denominado seleção positiva. Assim, dentre todos os genes observados no genoma de hospedeiros e parasitas, genes sob evidência de seleção positiva são ótimos candidatos a genes envolvidos no relação ecológica de parasitismo. Entretanto, o software existente para o cálculo de seleção positiva é computacionalmente custoso, tornando proibitivo a busca por seleção positiva em escala genômica. Nesse cenário, o presente trabalho descreve um software que faz uso de paralelização para permitir a busca por seleção positiva em escala genômica em tempo exequível.CIIC 2012. No 12612

    Utilização da plataforma Galaxy na análise de dados de RNAseq.

    Get PDF
    Este trabalho teve como objetivo avaliar a plataforma Galaxy na análise de dados de RNA-seq, uma metodologia de sequenciamento de transcritos (moléculas de RNAm) que utiliza as novas tecnologias de sequenciamento (NTS)

    Reconstructing the whole mitochondrial DNA (mtDNA) from nuclear genome.

    Get PDF
    In several eukaryotic organisms, the nuclear genome has several partial copies of the mitochondrial DNA (mtDNA). These copies are called NUMTs (NUclear MiTochondrial DNA) and they have been known since 1967 when the first evidence of them were reported in the mouse nuclear genome. Despite almost fifty years have passed, the reason of their very existence remains controversial. However, their presence has been confirmed in an increasing number of genomes. The NUMts could be only another DNA idiosyncrasy, but they actually represent a serious issue for important application such as genome bar coding. There are many open questions about them.X-meeting 2015

    Plant Co-expression Annotation Resource: a web server for identifying targets for genetically modified crop breeding pipelines.

    Get PDF
    Abstract. The development of genetically modified crops (GM) includes the discovery of candidate genes through bioinformatics analysis using genomics data, gene expression, and others. Proteins of unknown function (PUFs) are interesting targets for GM crops breeding pipelines for the novelty associated with such targets and also to avoid copyright protection. One method of inferring the putative function of PUFs is by relating them to factors of interest such as abiotic stresses using orthology and co-expression networks, in a guilt-by-association manner. In this regard, we have downloaded, analyzed, and processed genomics data of 53 angiosperms, totaling 1,862,010 genes and 2,332,974 RNA. Diamond and InterproScan were used to discover 72,266 PUFs for all organisms. RNA-seq datasets related to abiotic stresses were downloaded from NCBI/GEO. The RNA-seq data was used as input to the LSTrAP software to construct co-expression networks. LSTrAP also created clusters of transcripts with correlated expression, whose members are more probably related to the molecular mechanisms associated with abiotic stresses in the plants. Orthologous groups were created (OrhtoMCL) using all 2,332,974 proteins in order to associate PUFs to abiotic stress-related clusters of co-expression and therefore infer their function in a guilt-by-association manner. A freely available web resource named "Plant Co-expression Annotation Resource" (https://www.machado.cnptia.embrapa.br/plantannot ), Plantannot, was created to provide indexed queries to search for PUF putatively associated with abiotic stresses. The web interface also allows browsing, querying, and retrieving of public genomics data from 53 plants. We hope Plantannot to be useful for researchers trying to obtain novel GM crops resistant to climate change hazards.Article 46. Na publicação: Adhemar Zerlotini

    BDGF: a database and webbased information retrieval system for genotype and phenotype.

    Get PDF
    In order to get efficient storage and fast queries in this high volume of data, in this work we present the BDGF system (Genotypes and Phenotypes Database). It is based on a data model first proposed by (HIGA, 2015).X-Meeting 2016

    Predição in silico de efetores de Fusarium decemcellulare, agente causal do superbrotamento do guaranazeiro.

    Get PDF
    O guaranazeiro é uma planta nativa da Amazônia de grande importância econômica e social para o estado do Amazonas, onde a sua produção vem sendo comprometida por doenças fúngicas, como o superbrotamento causado por Fusarium decemcellulare. No presente estudo conduzimos a predição in silico de efetores no secretoma deste importante patógeno do guaranazeiro.CDMICRO 2023

    Copy number variation in dairy cattle using next-generation sequencing.

    Get PDF
    Gene copy number variants (CNV) have been shown to be associated with several production traits in dairy cattle; however, the detection and validation of CNVs in crossbred cattle is currently lacking. In order to provide a basis for future association studies, we sought to identify CNV regions (CNVRs) within the Girolando composite breed resulting from a mating of the Holstein (taurine) and Gir (indicine) breeds. A read depth method was performed using CNVnator software on NGS data from two Girolando, two Gir and ten Holstein bulls. The individual CNVs were merged into CNVRs based on genomic regions overlapping by at least 1 bp. In total, we identified a composite of 1,286 CNVRs (520 deletions, 255 duplications, 511 mixed) on the genomes of all samples. We observed 34 CNVRs (nine deletions, 25 mixed) in common (overlapping > 50%) only between Girolando and Holstein and 181 CNVRs (20 deletions, 21 duplications,140 mixed) only in Girolando and Gir, suggesting parent-of-origin inheritance from Holstein and Gir cattle, respectively. One of these Holstein-specific CNVRs intersected with the interleukin 6 family cytokine (LIF) gene which is linked to fat production and fertility traits in Holstein. Genes related to disease resistance (e.g. the CD4 gene) also coincided with CNVRs present only in Gir and Girolando cattle suggesting an indicine origin for the CNV. These results showed evidence of specific CNVRs shared by Girolando and purebred breeds which may be targeted for future selective breeding.PAG 2018. P0490. Na publicação: Adhemar Zerlotini, Marcos Vinicius B. da Silva
    corecore