664 research outputs found

    Reaching a modular, domain-agnostic and containerized development in biomedical Natural Language Processing systems.

    Get PDF
    The last century saw an exponential increase in scientific publications in the biomedical domain. Despite the potential value of this knowledge; most of this data is only available as unstructured textual literature, which have limited their systematic access, use and exploitation. This limitation can be avoided, or at least mitigated, by relying on text mining techniques to automatically extract relevant data and structure it from textual documents. A significant challenge for scientific software applications, including Natural Language Processing (NLP) systems, consists in providing facilities to share, distribute and run such systems in a simple and convenient way. Software containers can host their own dependencies and auxiliary programs, isolating them from the execution environment. In addition, a workflow manager can be used for the automated orchestration and execution of the text mining pipelines. Our work is focused in the study and design of new techniques and approaches to construct, develop, validate and deploy NLP components and workflows with sufficient genericity, scalability and interoperability allowing their use and instantiation across different domains. The results and techniques acquired will be applied in two main uses cases: the detection of relevant information from preclinical toxicological reports, under the eTRANSAFE project [1]; and the indexation of biomaterials publications with relevant concepts as part as the DEBBIE project

    FAIRsoft - A practical implementation of FAIR principles for research software

    Get PDF
    Computational tools are increasingly becoming constitutive parts of scientific research, from experimentation and data collection to the dissemination and storage of results. Unfortunately, however, research software is not subjected to the same requirements as other methods of scientific research: being peer-reviewed, being reproducible and allowing one to build upon another’s work. This situation is detrimental to the integrity and advancement of scientific research, leading to computational methods frequently being impossible to reproduce and/or verify [1]. Moreover, they are often opaque, direcly unavailable or impossible to use by others [2]. One step to address this problem could be formulating a set of principles that research software should meet to ensure its quality and sustainability, resembling the FAIR (Findable, Accessible, Interoperable and Reusable) Data Principles [3]. The FAIR Data Principles were created to solve similar issues affecting scholarly data, namely great difficulty of sharing and accessibility, and are currently widely recognized accross fileds. We present here FAIRsoft, our initial effort to assess the quality of research software using a FAIR-like framework, as a first step towards its implementation in OpenEBench [4], the ELIXIR benchmarking platform

    Perspectives on automated composition of workflows in the life sciences [version 1; peer review: 2 approved]

    Get PDF
    Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the “big picture” of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.Stian Soiland-Reyes was supported by BioExcel-2 Centre of Excellence, funded by European Commission Horizon 2020 programme under European Commission contract H2020-INFRAEDI-02-2018 823830. Carole Goble was supported by EOSC-Life, funded by European Commission Horizon 2020 programme under grant agreement H2020-INFRAEOSC-2018-2 824087. We gratefully acknowledge the financial support from the Lorentz Center, ELIXIR, and the Leiden University Medical Center (LUMC) that made the workshop possible. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscriptPeer Reviewed"Article signat per 33 autors/es: Anna-Lena Lamprecht , Magnus Palmblad, Jon Ison, Veit Schwämmle , Mohammad Sadnan Al Manir, Ilkay Altintas, Christopher J. O. Baker, Ammar Ben Hadj Amor, Salvador Capella-Gutierrez, Paulos Charonyktakis, Michael R. Crusoe, Yolanda Gil, Carole Goble, Timothy J. Griffin , Paul Groth , Hans Ienasescu, Pratik Jagtap, Matúš Kalaš , Vedran Kasalica, Alireza Khanteymoori , Tobias Kuhn12, Hailiang Mei, Hervé Ménager, Steffen Möller, Robin A. Richardson, Vincent Robert9, Stian Soiland-Reyes, Robert Stevens, Szoke Szaniszlo, Suzan Verberne, Aswin Verhoeven, Katherine Wolstencroft "Postprint (published version

    Formas de administración de las explotaciones agropecuarias de la zona central del departamento Del Magdalena

    Get PDF
    La presente investigación realizada en la zona Central del Departamento del Magdalena, en la que se hallan ubicados los Municipios de Plato y Ariguaní, se refiere a las características o formas de utilización o aprovechamiento de los recursos existentes al interior de las medianas y grandes Explotaciones de la Zona. La actividad fundamental de la zona la constituye la ganadería, puesto que el 99% de las medianas y grandes explotaciones la tienen como su actividad principal siendo la cría y levante el principal tipo de especialidad de Exportación. El sentido tradicional de orientar las distintas actividades y la utilización de métodos rudimentarios no permiten el rendimiento adecuado de los recursos que intervienen en el proceso de producción dentro de las medianas y grandes explotaciones de la zona. De la superficie total de las medianas y grandes explotaciones, el 87.2% es apta para cultivos; solo se utiliza el 2.8% el resto subutilizado en ganadería extensiva. De la maquinaria y equipo existente en la zona a pesar de ser un índice bajo, dado el carácter tradicional de orientar las distintas actividades esta permanece inactiva parte del año. De la capacidad total de crédito para las medianas y grandes explotaciones solo se utiliza el 23% de éste, debido al bajo nivel empresarial de los propietarios de la zona. De la mano de obra rural solo se emplea por parte de las medianas y grandes explotaciones el 3.8% del total. El uso que se hace de las normas que el proceso administrativo proporciona como regla para una dirección adecuada y correcta de la explotación como negocio, es muy baja; por lo tanto, como no se realizan planes y programas efectivos de producción, la organización, ejecución y control de las actividades es ineficiente. Los índices de rentabilidad para las distintas explotaciones son bajo, lo que demuestra el bajo nivel de aprovechamiento de los distintos recursos. En el proceso de comercialización no intervienen los productores ya que estos se ven obligados a vender en la finca, por las condiciones de transporte y la distancia de los centros de consumo

    Detection of early seeding of Richter transformation in chronic lymphocytic leukemia

    Get PDF
    Richter transformation (RT) is a paradigmatic evolution of chronic lymphocytic leukemia (CLL) into a very aggressive large B cell lymphoma conferring a dismal prognosis. The mechanisms driving RT remain largely unknown. We characterized the whole genome, epigenome and transcriptome, combined with single-cell DNA/RNA-sequencing analyses and functional experiments, of 19 cases of CLL developing RT. Studying 54 longitudinal samples covering up to 19 years of disease course, we uncovered minute subclones carrying genomic, immunogenetic and transcriptomic features of RT cells already at CLL diagnosis, which were dormant for up to 19 years before transformation. We also identified new driver alterations, discovered a new mutational signature (SBS-RT), recognized an oxidative phosphorylation (OXPHOS)high–B cell receptor (BCR)low-signaling transcriptional axis in RT and showed that OXPHOS inhibition reduces the proliferation of RT cells. These findings demonstrate the early seeding of subclones driving advanced stages of cancer evolution and uncover potential therapeutic targets for RT.The authors thank the Hematopathology Collection registered at the Biobank of Hospital Clínic, Institut d’Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS) and the Biobank HUB-ICO-IDIBELL (PT20/00171) for sample procurement, S. Martín, F. Arenas, the Genomics Core Facility of the IDIBAPS, CNAG Sequencing Unit, Mission Bio, Omniscope and Barcelona Supercomputing Center for the technical support and the computer resources at MareNostrum4 (RES activity, BCV-2018-3-0001). This study was supported by the la Caixa Foundation (CLLEvolution-LCF/PR/HR17/52150017, Health Research 2017 Program HR17-00221, to E.C.), the European Research Council under the European Union’s Horizon 2020 Research and Innovation Program (810287, BCLLatlas, to E.C., J.I.M.-S., H.H. and I.G.), the Instituto de Salud Carlos III and the European Regional Development Fund Una Manera de Hacer Europa (PMP15/00007 to E.C. and RTI2018-094584-B-I00 to D.C.), the American Association for Cancer Research (2021 AACR-Amgen Fellowship in Clinical/Translational Cancer Research, 21-40-11-NADE to F.N.), the European Hematology Association (EHA Junior Research Grant 2021, RG-202012-00245 to F.N.), the Lady Tata Memorial Trust (International Award for Research in Leukaemia 2021-2022, LADY_TATA_21_3223 to F.N.), the Generalitat de Catalunya Suport Grups de Recerca AGAUR (2017-SGR-1142 to E.C., 2017-SGR-736 to J.I.M.-S. and 2017-SGR-1009 to D.C.), the Accelerator award CRUK/AIRC/AECC joint funder partnership (AECC_AA17_SUBERO to J.I.M.-S.), the Fundació La Marató de TV3 (201924-30 to J.I.M.-S.), the Centro de Investigación Biomédica en Red Cáncer (CIBERONC; CB16/12/00225, CB16/12/00334, CB16/12/00236), the Ministerio de Ciencia e Innovación (PID2020-117185RB-I00 to X.S.P.), the Fundación Asociación Española Contra el Cáncer (FUNCAR-PRYGN211258SUÁR to X.S.P.), the Associazione Italiana per la Ricerca sul Cancro Foundation (AIRC 5 × 1,000 no. 21198 to G.G.) and the CERCA Programme/Generalitat de Catalunya. H.P.-A. is a recipient of a predoctoral fellowship from the Spanish Ministry of Science, Innovation and Universities (FPU19/03110). A.D.-N. is supported by the Department of Education of the Basque Government (PRE_2017_1_0100). E.C. is an Academia Researcher of the Institució Catalana de Recerca i Estudis Avançats of the Generalitat de Catalunya. This work was partially developed at the Center Esther Koplowitz (Barcelona, Spain).Peer Reviewed"Article signat per 52 autors/es: Ferran Nadeu, Romina Royo, Ramon Massoni-Badosa, Heribert Playa-Albinyana, Beatriz Garcia-Torre, Martí Duran-Ferrer, Kevin J. Dawson, Marta Kulis, Ander Diaz-Navarro, Neus Villamor, Juan L. Melero, Vicente Chapaprieta, Ana Dueso-Barroso, Julio Delgado, Riccardo Moia, Sara Ruiz-Gil, Domenica Marchese, Ariadna Giró, Núria Verdaguer-Dot, Mónica Romo, Guillem Clot, Maria Rozman, Gerard Frigola, Alfredo Rivas-Delgado, Tycho Baumann, Miguel Alcoceba, Marcos González, Fina Climent, Pau Abrisqueta, Josep Castellví, Francesc Bosch, Marta Aymerich, Anna Enjuanes, Sílvia Ruiz-Gaspà, Armando López-Guillermo, Pedro Jares, Sílvia Beà, Salvador Capella-Gutierrez, Josep Ll. Gelpí, Núria López-Bigas, David Torrents, Peter J. Campbell, Ivo Gut, Davide Rossi, Gianluca Gaidano, Xose S. Puente, Pablo M. Garcia-Roves, Dolors Colomer, Holger Heyn, Francesco Maura, José I. Martín-Subero & Elías Campo "Postprint (published version

    COVID-19 Flow-Maps an open geographic information system on COVID-19 and human mobility for Spain

    Get PDF
    COVID-19 is an infectious disease caused by the SARS-CoV-2 virus, which has spread all over the world leading to a global pandemic. The fast progression of COVID-19 has been mainly related to the high contagion rate of the virus and the worldwide mobility of humans. In the absence of pharmacological therapies, governments from different countries have introduced several non-pharmaceutical interventions to reduce human mobility and social contact. Several studies based on Anonymized Mobile Phone Data have been published analysing the relationship between human mobility and the spread of coronavirus. However, to our knowledge, none of these data-sets integrates cross-referenced geo-localised data on human mobility and COVID-19 cases into one all-inclusive open resource. Herein we present COVID-19 Flow-Maps, a cross-referenced Geographic Information System that integrates regularly updated time-series accounting for population mobility and daily reports of COVID-19 cases in Spain at different scales of time spatial resolution. This integrated and up-to-date data-set can be used to analyse the human dynamics to guide and support the design of more effective non-pharmaceutical interventions.This work was supported by the Generalitat de Catalunya through the project PDAD14/20/00001, and by the H2020 programme under Grant Agreement 825070 (INFORE) and the INB Grant (PT17/0009/0001 - ISCIII-SGEFI/ERDF).Peer ReviewedPostprint (published version

    Automatic, efficient and scalable provenance registration for FAIR HPC workflows

    Get PDF
    Provenance registration is becoming more and more important, as we increase the size and number of experiments performed using computers. In particular, when provenance is recorded in HPC environments, it must be efficient and scalable. In this paper, we propose a provenance registration method for scientific workflows, efficient enough to run in supercomputers (thus, it could run in other environments with more relaxed restrictions, such as distributed ones). It also must be scalable in order to deal with large workflows, that are more typically used in HPC. We also target transparency for the user, shielding them from having to specify how provenance must be recorded. We implement our design using the COMPSs programming model as a Workflow Management System (WfMS) and use RO-Crate as a well-established specification to record and publish provenance. Experiments are provided, demonstrating the run time efficiency and scalability of our solution.This work has been supported by the Spanish Government (PID2019-107255GB-C21), by Generalitat de Catalunya (contract 2017-SGR-01414) and the EU’s Horizon research and innovation programme under Grant agreement No 101058129 (DT-GEO). Also, it has been contributed in the CECH project, co-funded with 50% by the European Regional Development Fund under the framework of the ERFD Operative Programme for Catalunya 2014-2020, with a grant of 1.527.637,88 C. LRN, JMF and SCG are partly supported by INB Grant (PT17/0009/0001 - ISCIII-SGEFI / ERDF), and their work received funding from the EU’s Horizon 2020 research and innovation programme under grant agreements EOSC-Life No 824087, and EJP RD No 825575.Peer ReviewedPostprint (author's final draft

    The use of chloroplast genome sequences to solve phylogenetic incongruences in Polystachya Hook (Orchidaceae Juss)

    Get PDF
    Background: Current evidence suggests that for more robust estimates of species tree and divergence times, several unlinked genes are required. However, most phylogenetic trees for non-model organisms are based on single sequences or just a few regions, using traditional sequencing methods. Techniques for massive parallel sequencing or next generation sequencing (NGS) are an alternative to traditional methods that allow access to hundreds of DNA regions. Here we use this approach to resolve the phylogenetic incongruence found in Polystachya Hook. (Orchidaceae), a genus that stands out due to several interesting aspects, including cytological (polyploid and diploid species), evolutionary (reticulate evolution) and biogeographical (species widely distributed in the tropics and high endemism in Brazil). The genus has a notoriously complicated taxonomy, with several sections that are widely used but probably not monophyletic. Methods: We generated the complete plastid genome of 40 individuals from one clade within the genus. The method consisted in construction of genomic libraries, hybridization to RNA probes designed from available sequences of a related species, and subsequent sequencing of the product. We also tested how well a smaller sample of the plastid genome would perform in phylogenetic inference in two ways: by duplicating a fast region and analyzing multiple copies of this dataset, and by sampling without replacement from all non-coding regions in our alignment. We further examined the phylogenetic implications of non-coding sequences that appear to have undergone hairpin inversions (reverse complemented sequences associated with small loops). Results: We retrieved 131,214 bp, including coding and non-coding regions of the plastid genome. The phylogeny was able to fully resolve the relationships among all species in the targeted clade with high support values. The first divergent species are represented by African accessions and the most recent ones are among Neotropical species. Discussion: Our results indicate that using the entire plastid genome is a better option than screening highly variable markers, especially when the expected tree is likely to contain many short branches. The phylogeny inferred is consistent with the proposed origin of the genus, showing a probable origin in Africa, with later dispersal into the Neotropics, as evidenced by a clade containing all Neotropical individuals. The multiple positions of Polystachya concreta (Jacq.) Garay & Sweet in the phylogeny are explained by allotetraploidy. Polystachya estrellensis Rchb.f. can be considered a genetically distinct species from P. concreta and P. foliosa (Lindl.) Rchb.f., but the delimitation of P. concreta remains uncertain. Our study shows that NGS provides a powerful tool for inferring relationships at low taxonomic levels, even in taxonomically challenging groups with short branches and intricate morphology.Swedish Research Council [B0569601]; European Research Council under the European Union's Seventh Framework Programme (ERC) [331024]; Swedish Foundation for Strategic Research; Knut and Alice Wallenberg Foundation; Biodiversity and Ecosystems in a Changing Climate programme; Wenner-Gren Foundations; David Rockefeller Center for Latin American Studies at Harvard University; Faculty of Science at the University of Gothenbur
    corecore