10 research outputs found

    Search and comparison of (epi)genomic feature patterns in multiple genome browser tracks

    Get PDF
    Background: Genome browsers are widely used for locating interesting genomic regions, but their interactive use is obviously limited to inspecting short genomic portions. An ideal interaction is to provide patterns of regions on the browser, and then extract other genomic regions over the whole genome where such patterns occur, ranked by similarity. Results: We developed SimSearch, an optimized pattern-search method and an open source plugin for the Integrated Genome Browser (IGB), to find genomic region sets that are similar to a given region pattern. It provides efficient visual genome-wide analytics computation in large datasets; the plugin supports intuitive user interactions for selecting an interesting pattern on IGB tracks and visualizing the computed occurrences of similar patterns along the entire genome. SimSearch also includes functions for the annotation and enrichment of results, and is enhanced with a Quickload repository including numerous epigenomic feature datasets from ENCODE and Roadmap Epigenomics. The paper also includes some use cases to show multiple genome-wide analyses of biological interest, which can be easily performed by taking advantage of the presented approach. Conclusions: The novel SimSearch method provides innovative support for effective genome-wide pattern search and visualization; its relevance and practical usefulness is demonstrated through a number of significant use cases of biological interest. The SimSearch IGB plugin, documentation, and code are freely available at https://deibgeco.github.io/simsearch-app/ and https://github.com/DEIB-GECO/simsearch-app/

    Développement de méthodes d'intégration de données biologiques à l'aide d'Elasticsearch

    Get PDF
    En biologie, les données apparaissent à toutes les étapes des projets, de la préparation des études à la publication des résultats. Toutefois, de nombreux aspects limitent leur utilisation. Le volume, la vitesse de production ainsi que la variété des données produites ont fait entrer la biologie dans une ère dominée par le phénomène des données massives. Depuis 1980 et afin d'organiser les données générées, la communauté scientifique a produit de nombreux dépôts de données. Ces dépôts peuvent contenir des données de divers éléments biologiques par exemple les gènes, les transcrits, les protéines et les métabolites, mais aussi d'autres concepts comme les toxines, le vocabulaire biologique et les publications scientifiques. Stocker l'ensemble de ces données nécessite des infrastructures matérielles et logicielles robustes et pérennes. À ce jour, de par la diversité biologique et les architectures informatiques présentes, il n'existe encore aucun dépôt centralisé contenant toutes les bases de données publiques en biologie. Les nombreux dépôts existants sont dispersés et généralement autogérés par des équipes de recherche les ayant publiées. Avec l'évolution rapide des technologies de l'information, les interfaces de partage de données ont, elles aussi, évolué, passant de protocoles de transfert de fichiers à des interfaces de requêtes de données. En conséquence, l'accès à l'ensemble des données dispersées sur les nombreux dépôts est disparate. Cette diversité d'accès nécessite l'appui d'outils d'automatisation pour la récupération de données. Lorsque plusieurs sources de données sont requises dans une étude, le cheminement des données suit différentes étapes. La première est l'intégration de données, notamment en combinant de multiples sources de données sous une interface d'accès unifiée. Viennent ensuite des exploitations diverses comme l'exploration au travers de scripts ou de visualisations, les transformations et les analyses. La littérature a montré de nombreuses initiatives de systèmes informatiques de partage et d'uniformisation de données. Toutefois, la complexité induite par ces multiples systèmes continue de contraindre la diffusion des données biologiques. En effet, la production toujours plus forte de données, leur gestion et les multiples aspects techniques font obstacle aux chercheurs qui veulent exploiter ces données et les mettre à disposition. L'hypothèse testée pour cette thèse est que l'exploitation large des données pouvait être actualisée avec des outils et méthodes récents, notamment un outil nommé Elasticsearch. Cet outil devait permettre de combler les besoins déjà identifiés dans la littérature, mais également devait permettre d'ajouter des considérations plus récentes comme le partage facilité des données. La construction d'une architecture basée sur cet outil de gestion de données permet de les partager selon des standards d'interopérabilité. La diffusion des données selon ces standards peut être autant appliquée à des opérations de fouille de données biologiques que pour de la transformation et de l'analyse de données. Les résultats présentés dans le cadre de ma thèse se basent sur des outils pouvant être utilisés par l'ensemble des chercheurs, en biologie mais aussi dans d'autres domaines. Il restera cependant à les appliquer et à les tester dans les divers autres domaines afin d'en identifier précisément les limites.In biology, data appear at all stages of projects, from study preparation to publication of results. However, many aspects limit their use. The volume, the speed of production and the variety of data produced have brought biology into an era dominated by the phenomenon of "Big Data" (or massive data). Since 1980 and in order to organize the generated data, the scientific community has produced numerous data repositories. These repositories can contain data of various biological elements such as genes, transcripts, proteins and metabolites, but also other concepts such as toxins, biological vocabulary and scientific publications. Storing all of this data requires robust and durable hardware and software infrastructures. To date, due to the diversity of biology and computer architectures present, there is no centralized repository containing all the public databases in biology. Many existing repositories are scattered and generally self-managed by research teams that have published them. With the rapid evolution of information technology, data sharing interfaces have also evolved from file transfer protocols to data query interfaces. As a result, access to data set dispersed across the many repositories is disparate. This diversity of access requires the support of automation tools for data retrieval. When multiple data sources are required in a study, the data flow follows several steps, first of which is data integration, combining multiple data sources under a unified access interface. It is followed by various exploitations such as exploration through scripts or visualizations, transformations and analyses. The literature has shown numerous initiatives of computerized systems for sharing and standardizing data. However, the complexity induced by these multiple systems continues to constrain the dissemination of biological data. Indeed, the ever-increasing production of data, its management and multiple technical aspects hinder researchers who want to exploit these data and make them available. The hypothesis tested for this thesis is that the wide exploitation of data can be updated with recent tools and methods, in particular a tool named Elasticsearch. This tool should fill the needs already identified in the literature, but also should allow adding more recent considerations, such as easy data sharing. The construction of an architecture based on this data management tool allows sharing data according to interoperability standards. Data dissemination according to these standards can be applied to biological data mining operations as well as to data transformation and analysis. The results presented in my thesis are based on tools that can be used by all researchers, in biology but also in other fields. However, applying and testing them in various other fields remains to be studied in order to identify more precisely their limits

    Enhanced JBrowse plugins for epigenomics data visualization

    No full text
    Abstract Background New sequencing techniques require new visualization strategies, as is the case for epigenomics data such as DNA base modifications, small non-coding RNAs, and histone modifications. Results We present a set of plugins for the genome browser JBrowse that are targeted for epigenomics visualizations. Specifically, we have focused on visualizing DNA base modifications, small non-coding RNAs, stranded read coverage, and sequence motif density. Additionally, we present several plugins for improved user experience such as configurable, high-quality screenshots. Conclusions In visualizing epigenomics with traditional genomics data, we see these plugins improving scientific communication and leading to discoveries within the field of epigenomics

    Forest genomics and biotechnology

    Get PDF
    This Research Topic addresses research in genomics and biotechnology to improve the growth and quality of forest trees for wood, pulp, biorefineries and carbon capture. Forests are the world’s greatest repository of terrestrial biomass and biodiversity. Forests serve critical ecological services, supporting the preservation of fauna and flora, and water resources. Planted forests also offer a renewable source of timber, for pulp and paper production, and the biorefinery. Despite their fundamental role for society, thousands of hectares of forests are lost annually due to deforestation, pests and pathogens and urban development. As a consequence, there is an increasing need to develop trees that are more productive under lower inputs, while understanding how they adapt to the environment and respond to biotic and abiotic stress. Forest genomics and biotechnology, disciplines that study the genetic composition of trees and the methods required to modify them, began over a quarter of a century ago with the development of the first genetic maps and establishment of early methods of genetic transformation. Since then, genomics and biotechnology have impacted all research areas of forestry. Genome analyses of tree populations have uncovered genes involved in adaptation and response to biotic and abiotic stress. Genes that regulate growth and development have been identified, and in many cases their mechanisms of action have been described. Genetic transformation is now widely used to understand the roles of genes and to develop germplasm that is more suitable for commercial tree plantations. However, in contrast to many annual crops that have benefited from centuries of domestication and extensive genomic and biotechnology research, in forestry the field is still in its infancy. Thus, tremendous opportunities remain unexplored. This Research Topic aims to briefly summarize recent findings, to discuss long-term goals and to think ahead about future developments and how this can be applied to improve growth and quality of forest trees. Mini-review articles are sought in forest genomics and biotechnology, with a focus on future directions applied to (1) genetic engineering, (2) adaptation, (3) genomics of conifers and hardwoods, (4) cell wall and wood formation, (5) development (6) metabolic engineering (7) biotic and abiotic resistance and (8) the biorefinery

    Unruptured brain arteriovenous malformations : primary ONYX embolization in ARUBA (A Randomized Trial of Unruptured Brain Arteriovenous Malformations)-eligible patients

    Get PDF
    Background and Purpose: In light of evidence from ARUBA (A Randomized Trial of Unruptured Brain Arteriovenous Malformations), neurovascular specialists had to reconsider deliberate treatment of unruptured brain arteriovenous malformations (uBAVMs). Our objective was to determine the outcomes of uBAVM treated with primary embolization using ethylene vinyl alcohol (ONYX). Methods: Patients with uBAVM who met the inclusion criteria of ARUBA and were treated with primary Onyx embolization were assigned to this retrospective study. The primary outcome was the modified Rankin Scale score. Secondary outcomes were stroke or death because of uBAVM or intervention and uBAVM obliteration. Results: Sixty-one patients (mean age, 38 years) were included. The median observation period was 60 months. Patients were treated by embolization alone (41.0%), embolization and radiosurgery (57.4%), or embolization and excision (1.6%). Occlusion was achieved in 44 of 57 patients with completed treatment (77.2%). Forty-seven patients (77.1%) had no clinical impairment at the end of observation (modified Rankin Scale score of <2). Twelve patients (19.7%) reached the outcome of stroke or death because of uBAVM or intervention. Treatment-related mortality was 6.6% (4 patients). Conclusions: In uBAVM, Onyx embolization alone or combined with stereotactic radiosurgery achieves a high occlusion rate. Morbidity remains a challenge, even if it seems lower than in the ARUBA trial
    corecore