50 research outputs found

    Study on open science: The general state of the play in Open Science principles and practices at European life sciences institutes

    Get PDF
    Nowadays, open science is a hot topic on all levels and also is one of the priorities of the European Research Area. Components that are commonly associated with open science are open access, open data, open methodology, open source, open peer review, open science policies and citizen science. Open science may a great potential to connect and influence the practices of researchers, funding institutions and the public. In this paper, we evaluate the level of openness based on public surveys at four European life sciences institute

    Automatic Extraction and Assessment of Entities from the Web

    Get PDF
    The search for information about entities, such as people or movies, plays an increasingly important role on the Web. This information is still scattered across many Web pages, making it more time consuming for a user to ïŹnd all relevant information about an entity. This thesis describes techniques to extract entities and information about these entities from the Web, such as facts, opinions, questions and answers, interactive multimedia objects, and events. The ïŹndings of this thesis are that it is possible to create a large knowledge base automatically using a manually-crafted ontology. The precision of the extracted information was found to be between 75–90 % (facts and entities respectively) after using assessment algorithms. The algorithms from this thesis can be used to create such a knowledge base, which can be used in various research ïŹelds, such as question answering, named entity recognition, and information retrieval

    A study assessing the characteristics of big data environments that predict high research impact: application of qualitative and quantitative methods

    Full text link
    BACKGROUND: Big data offers new opportunities to enhance healthcare practice. While researchers have shown increasing interest to use them, little is known about what drives research impact. We explored predictors of research impact, across three major sources of healthcare big data derived from the government and the private sector. METHODS: This study was based on a mixed methods approach. Using quantitative analysis, we first clustered peer-reviewed original research that used data from government sources derived through the Veterans Health Administration (VHA), and private sources of data from IBM MarketScan and Optum, using social network analysis. We analyzed a battery of research impact measures as a function of the data sources. Other main predictors were topic clusters and authors’ social influence. Additionally, we conducted key informant interviews (KII) with a purposive sample of high impact researchers who have knowledge of the data. We then compiled findings of KIIs into two case studies to provide a rich understanding of drivers of research impact. RESULTS: Analysis of 1,907 peer-reviewed publications using VHA, IBM MarketScan and Optum found that the overall research enterprise was highly dynamic and growing over time. With less than 4 years of observation, research productivity, use of machine learning (ML), natural language processing (NLP), and the Journal Impact Factor showed substantial growth. Studies that used ML and NLP, however, showed limited visibility. After adjustments, VHA studies had generally higher impact (10% and 27% higher annualized Google citation rates) compared to MarketScan and Optum (p<0.001 for both). Analysis of co-authorship networks showed that no single social actor, either a community of scientists or institutions, was dominating. Other key opportunities to achieve high impact based on KIIs include methodological innovations, under-studied populations and predictive modeling based on rich clinical data. CONCLUSIONS: Big data for purposes of research analytics has grown within the three data sources studied between 2013 and 2016. Despite important challenges, the research community is reacting favorably to the opportunities offered both by big data and advanced analytic methods. Big data may be a logical and cost-efficient choice to emulate research initiatives where RCTs are not possible

    Timely and reliable evaluation of the effects of interventions: a framework for adaptive meta-analysis (FAME)

    Get PDF
    Most systematic reviews are retrospective and use aggregate data AD) from publications, meaning they can be unreliable, lag behind therapeutic developments and fail to influence ongoing or new trials. Commonly, the potential influence of unpublished or ongoing trials is overlooked when interpreting results, or determining the value of updating the meta-analysis or need to collect individual participant data (IPD). Therefore, we developed a Framework for Adaptive Metaanalysis (FAME) to determine prospectively the earliest opportunity for reliable AD meta-analysis. We illustrate FAME using two systematic reviews in men with metastatic (M1) and non-metastatic (M0)hormone-sensitive prostate cancer (HSPC)

    Development and implementation of in silico molecule fragmentation algorithms for the cheminformatics analysis of natural product spaces

    Get PDF
    Computational methodologies extracting specific substructures like functional groups or molecular scaffolds from input molecules can be grouped under the term “in silico molecule fragmentation”. They can be used to investigate what specifically characterises a heterogeneous compound class, like pharmaceuticals or Natural Products (NP) and in which aspects they are similar or dissimilar. The aim is to determine what specifically characterises NP structures to transfer patterns favourable for bioactivity to drug development. As part of this thesis, the first algorithmic approach to in silico deglycosylation, the removal of glycosidic moieties for the study of aglycones, was developed with the Sugar Removal Utility (SRU) (Publication A). The SRU has also proven useful for investigating NP glycoside space. It was applied to one of the largest open NP databases, COCONUT (COlleCtion of Open Natural prodUcTs), for this purpose (Publication B). A contribution was made to the Chemistry Development Kit (CDK) by developing the open Scaffold Generator Java library (Publication C). Scaffold Generator can extract different scaffold types and dissect them into smaller parent scaffolds following the scaffold tree or scaffold network approach. Publication D describes the OngLai algorithm, the first automated method to identify homologous series in input datasets, group the member structures of each group, and extract their common core. To support the development of new fragmentation algorithms, the open Java rich client graphical user interface application MORTAR (MOlecule fRagmenTAtion fRamework) was developed as part of this thesis (Publication E). MORTAR allows users to quickly execute the steps of importing a structural dataset, applying a fragmentation algorithm, and visually inspecting the results in different ways. All software developed as part of this thesis is freely and openly available (see https://github.com/JonasSchaub)

    Integrative approaches to high-throughput data in lymphoid leukemias (on transcriptomes, the whole-genome mutational landscape, flow cytometry and gene copy-number alterations)

    Get PDF
    Within this thesis I developed a new approach for the analysis and integration of heterogeneous leukemic data sets applicable to any high-throughput analysis including basic research. All layers are stored in a semantic graph which facilitates modifications by just adding edges (relationships/attributes) and nodes (values/results) as well as calculating biological consensus and clinical correlation. The front-end is accessible through a GUI (graphical user interface) on a Java-based Semantic Web server. I used this framework to describe the genomic landscape of T-PLL (T-cell prolymphocytic leukemia), which is a rare (~0.6/million) mature T-cell malignancy with aggressive clinical course, notorious treatment resistance, and generally low overall survival. We have conducted gene expression and copy-number profiling as well as NGS (next-generation sequencing) analyses on a cohort comprising 94 T-PLL cases. TCL1A (T-cell leukemia/lymphoma 1A) overexpression and ATM (Ataxia Telangiectasia Mutated) impairment represent central hallmarks of T-PLL, predictive for patient survival, T-cell function and proper DNA damage responses. We identified new chromosomal lesions, including a gain of AGO2 (Argonaute 2, RISC Catalytic Component; 57.14% of cases), which is decisive for the chromosome 8q lesion. While we found significant enrichments of truncating mutations in ATM mut/no del (p=0.01365), as well as FAT (FAT Atypical Cadherin) domain mutations in ATM mut/del (p=0.01156), JAK3 (Janus Kinase 3) mut/ATM del cases may represent another tumor lineage. Using whole-transcriptome sequencing, we identified novel structural variants affecting chromosome 14 that lead to the expression of a TCL1A-TCR (T-cell receptor) fusion transcript and a likely degradated TCL1A protein. Two clustering approaches of normal T-cell subsets vs. leukemia gene expression profiles, as well as immunophenotyping-based agglomerative clustering and TCR repertoire reconstruction further revealed a restricted, memory-like T-cell phenotype. This is to date the most comprehensive, multi-level, integrative study on T-PLL and it led to an evolutionary disease model and a histone deacetylase-inhibiting / double strand break-inducing treatment that performs better than the current standard of chemoimmunotherapy in preclinical testing
    corecore