Search CORE

100 research outputs found

BERT WEAVER: Using WEight AVERaging to enable lifelong learning for transformer-based models in biomedical semantic search engines

Author: Fluck Juliane
Hammer Barbara
Kühnel Lisa
Schulz Alexander
Publication venue
Publication date: 31/10/2023
Field of study

Recent developments in transfer learning have boosted the advancements in natural language processing tasks. The performance is, however, dependent on high-quality, manually annotated training data. Especially in the biomedical domain, it has been shown that one training corpus is not enough to learn generic models that are able to efficiently predict on new data. Therefore, in order to be used in real world applications state-of-the-art models need the ability of lifelong learning to improve performance as soon as new data are available - without the need of re-training the whole model from scratch. We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model, thereby reducing catastrophic forgetting. We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once, while being computationally more efficient. Because there is no need of data sharing, the presented method is also easily applicable to federated learning settings and can for example be beneficial for the mining of electronic health records from different clinics

arXiv.org e-Print Archive

preVIEW: from a fast prototype towards a sustainable semantic search system for central access to COVID-19 preprints

Author: Baum Roman
Darms Johannes
Fluck Juliane
Langnickel Lisa
Publication venue: 'European Association for Health Information and Libraries EAHIL'
Publication date: 21/09/2021
Field of study

The current COVID-19 pandemic emphasizes the use of so-called preprints - a type of publication that is not subject to peer review. Due to its global relevance, there is an immense number of COVID-19-related preprints every day. To help researchers find relevant information, we have developed the semantic search engine preVIEW, it integrates preprints from currently seven different preprint servers. For semantic indexing, we implemented various text mining components to tag, for example, diseases or SARS-CoV-2 specific proteins. While the service initially served as a prototype developed together with users, we present a re-engineering towards a sustainable semantic search system, which was inevitable due to the continuously growing number of preprint publications. This enables easy reuse of the components and allows rapid adaptation of the service to further user needs

Journal of EAHIL

The Autoimmune Disease Database: a dynamically compiled literature-derived database

Author: Fluck Juliane
Glass Änne
Karopka Thomas
Mevissen Heinz-Theodor
Publication venue: BioMed Central
Publication date: 01/06/2006
Field of study

BACKGROUND: Autoimmune diseases are disorders caused by an immune response directed against the body's own organs, tissues and cells. In practice more than 80 clinically distinct diseases, among them systemic lupus erythematosus and rheumatoid arthritis, are classified as autoimmune diseases. Although their etiology is unclear these diseases share certain similarities at the molecular level i.e. susceptibility regions on the chromosomes or the involvement of common genes. To gain an overview of these related diseases it is not feasible to do a literary review but it requires methods of automated analyses of the more than 500,000 Medline documents related to autoimmune disorders. RESULTS: In this paper we present the first version of the Autoimmune Disease Database which to our knowledge is the first comprehensive literature-based database covering all known or suspected autoimmune diseases. This dynamically compiled database allows researchers to link autoimmune diseases to the candidate genes or proteins through the use of named entity recognition which identifies genes/proteins in the corresponding Medline abstracts. The Autoimmune Disease Database covers 103 autoimmune disease concepts. This list was expanded to include synonyms and spelling variants yielding a list of over 1,200 disease names. The current version of the database provides links to 541,690 abstracts and over 5,000 unique genes/proteins. CONCLUSION: The Autoimmune Disease Database provides the researcher with a tool to navigate potential gene-disease relationships in Medline abstracts in the context of autoimmune diseases

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ProMiner: rule-based protein and gene entity recognition

Author: Bmc Bioinformatics
Daniel Hanisch
Heinz-theodor Mevissen
Juliane Fluck
Katrin Fundel
Ralf Zimmer
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

doi:10.1186/1471-2105-6-S1-S14 <supplement> <title> <p>A critical assessment of text mining methods in molecular biology</p> </title> <editor>Christian Blaschke, Lynette Hirschman, Alfonso Valencia, Alexander Yeh</editor> <note>Report</note> </supplement> Background: Identification of gene and protein names in biomedical text is a challenging task as the corresponding nomenclature has evolved over time. This has led to multiple synonyms for individual genes and proteins, as well as names that may be ambiguous with other gene names or with general English words. The Gene List Task of the BioCreAtIvE challenge evaluation enables comparison of systems addressing the problem of protein and gene name identification on common benchmark data. Methods: The ProMiner system uses a pre-processed synonym dictionary to identify potential name occurrences in the biomedical text and associate protein and gene database identifiers with the detected matches. It follows a rule-based approach and its search algorithm is geared towards recognition of multi-word names [1]. To account for the large number of ambiguous synonyms in the considered organisms, the system has been extended to use specific variants of the detection procedure for highly ambiguous and case-sensitive synonyms. Based on all detected synonyms fo

CiteSeerX

Crossref

Springer - Publisher Connector

Fraunhofer-ePrints

PubMed Central

Open Access LMU

Interactive cohort exploration for spinocerebellar ataxias using synthetic cohort data for visualization

Author: Faber Jennifer
Fluck Juliane
Grobe-Einsler Marcus
Klockgether Thomas
Schaaf Sebastian
Uebachs Mischa
Wegner Philipp
Publication venue
Publication date: 13/06/2023
Field of study

Motivation: Visualization of data is a crucial step to understanding and deriving hypotheses from clinical data. However, for clinicians, visualization often comes with great effort due to the lack of technical knowledge about data handling and visualization. The application offers an easy-to-use solution with an intuitive design that enables various kinds of plotting functions. The aim was to provide an intuitive solution with a low entrance barrier for clinical users. Little to no onboarding is required before creating plots, while the complexity of questions can grow up to specific corner cases. To allow for an easy start and testing with SCAview, we incorporated a synthetic cohort dataset based on real data of rare neurological movement disorders: the most common autosomal-dominantly inherited spinocerebellar ataxias (SCAs) type 1, 2, 3, and 6 (SCA1, 2, 3 and 6). Methods: We created a Django-based backend application that serves the data to a React-based frontend that uses Plotly for plotting. A synthetic cohort was created to deploy a version of SCAview without violating any data protection guidelines. Here, we added normal distributed noise to the data and therefore prevent re-identification while keeping distributions and general correlations. Results: This work presents SCAview, an user-friendly, interactive web-based service that enables data visualization in a clickable interface allowing intuitive graphical handling that aims to enable data visualization in a clickable interface. The service is deployed and can be tested with a synthetic cohort created based on a large, longitudinal dataset from observational studies in the most common SCAs

arXiv.org e-Print Archive

Proceedings of the Second International Symposium for Semantic Mining in Biomedicine

Author: J Fluck
J-D Kim
Juliane Fluck
L Hirschman
L Hunter
L Jensen
L Tanabe
O Bodenreider
S Ananiadou
Sophia Ananiadou
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Patent Retrieval in Chemistry based on semantically tagged Named Entities

Author: Buckland Lori P.
Fluck Juliane
Friedrich Christoph M.
Gurulingappa Harsha
Hofmann-Apitius Martin
Klinger Roman
Mevissen Heinz-Theo
Müller Bernd
Voorhees Ellen M.
Publication venue
Publication date: 01/01/2009
Field of study

Gurulingappa H, Müller B, Klinger R, et al. Patent Retrieval in Chemistry based on semantically tagged Named Entities. In: Voorhees EM, Buckland LP, eds. The Eighteenth Text RETrieval Conference (TREC 2009) Proceedings. Gaithersburg, Maryland, USA; 2009.This paper reports on the work that has been conducted by Fraunhofer SCAI for Trec Chemistry (Trec-Chem) track 2009. The team of Fraunhofer SCAI participated in two tasks, namely Technology Survey and Prior Art Search. The core of the framework is an index of 1.2 million chemical patents provided as a data set by Trec. For the technology survey, three runs were submitted based on semantic dictionaries and noun phrases. For the prior art search task, several elds were introduced into the index that contained normalized noun phrases, biomedical as well as chemical entities. Altogether, 36 runs were submitted for this task that were based on automatic querying with tokens, noun phrases and entities along with dierent search strategies

Fraunhofer-ePrints

Publications at Bielefeld University

Overview of BioCreative II gene normalization

Author: Cohen Aaron M
Cohen K Bretonnel
Divoli Anna
Fluck Juliane
Fundel Katrin
Hakenberg Jörg
Hirschman Lynette
Hsu Chun-Nan
Krauthammer Michael
Lau William W
Leaman Robert
Liu Heng-hui
Liu Hongfang
Lu Zhiyong
Morgan Alexander A
Ruch Patrick
Schuemie Martijn
Sun Chengjie
Torres Rafael
Wang Xinglong
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated abstracts containing 684 gene identifiers for training, and a blind test set of 262 documents containing 785 identifiers, with a gold standard created by expert annotators. Inter-annotator agreement was measured at over 90%. Results: Twenty groups submitted one to three runs each, for a total of 54 runs. Three systems achieved F-measures (balanced precision and recall) between 0.80 and 0.81. Combining the system outputs using simple voting schemes and classifiers obtained improved results; the best composite system achieved an F-measure of 0.92 with 10-fold cross-validation. A 'maximum recall' system based on the pooled responses of all participants gave a recall of 0.97 (with precision 0.23), identifying 763 out of 785 identifiers. Conclusion: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement. These results show promise as tools to link the literature with biological databases

Crossref

Springer - Publisher Connector

Fraunhofer-ePrints

PubMed Central

Open Access LMU

EUR Research Repository

Erasmus University Digital Repository

Archive ouverte UNIGE

COVID-19-Forschungsdaten leichter zugänglich machen – Aufbau einer bundesweiten Informationsinfrastruktur

Author: Fluck Juliane
Golebiewski Martin
Grabenhenrich Linus
Hahn Horst
Kirsten Toralf
Klammt Sebastian
Löbe Matthias
Pigeot Iris
Sax Ulrich
Schmidt Carsten Oliver
Task Force Covid-19 NFDI4Health
Thun Sylvia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

Public-Health-Forschung, epidemiologische und klinische Studien sind erforderlich, um die COVID-19-Pandemie besser zu verstehen und geeignete Maßnahmen zu ergreifen. Daher wurden auch in Deutschland zahlreiche Forschungsprojekte initiiert. Zum heutigen Zeitpunkt ist es ob der Fülle an Informationen jedoch kaum noch möglich, einen Überblick über die vielfältigen Forschungsaktivitäten und deren Ergebnisse zu erhalten. Im Rahmen der Initiative „Nationale Forschungsdateninfrastruktur für personenbezogene Gesundheitsdaten“ (NFDI4Health) schafft die „Task Force COVID-19“ einen leichteren Zugang zu SARS-CoV-2- und COVID-19-bezogenen klinischen, epidemiologischen und Public-Health-Forschungsdaten. Dabei werden die sogenannten FAIR-Prinzipien (Findable, Accessible, Interoperable, Reusable) berücksichtigt, die eine schnellere Kommunikation von Ergebnissen befördern sollen. Zu den wesentlichen Arbeitsinhalten der Taskforce gehören die Erstellung eines Studienportals mit Metadaten, Erhebungsinstrumenten, Studiendokumenten, Studienergebnissen und Veröffentlichungen sowie einer Suchmaschine für Preprint-Publikationen. Weitere Inhalte sind ein Konzept zur Verknüpfung von Forschungs- und Routinedaten, Services zum verbesserten Umgang mit Bilddaten und die Anwendung standardisierter Analyseroutinen für harmonisierte Qualitätsbewertungen. Die im Aufbau befindliche Infrastruktur erleichtert die Auffindbarkeit von und den Umgang mit deutscher COVID-19-Forschung. Die im Rahmen der NFDI4Health Task Force COVID-19 begonnenen Entwicklungen sind für weitere Forschungsthemen nachnutzbar, da die adressierten Herausforderungen generisch für die Auffindbarkeit von und den Umgang mit Forschungsdaten sind.Public health research and epidemiological and clinical studies are necessary to understand the COVID-19 pandemic and to take appropriate action. Therefore, since early 2020, numerous research projects have also been initiated in Germany. However, due to the large amount of information, it is currently difficult to get an overview of the diverse research activities and their results. Based on the “Federated research data infrastructure for personal health data” (NFDI4Health) initiative, the “COVID-19 task force” is able to create easier access to SARS-CoV-2- and COVID-19-related clinical, epidemiological, and public health research data. Therefore, the so-called FAIR data principles (findable, accessible, interoperable, reusable) are taken into account and should allow an expedited communication of results. The most essential work of the task force includes the generation of a study portal with metadata, selected instruments, other study documents, and study results as well as a search engine for preprint publications. Additional contents include a concept for the linkage between research and routine data, a service for an enhanced practice of image data, and the application of a standardized analysis routine for harmonized quality assessment. This infrastructure, currently being established, will facilitate the findability and handling of German COVID-19 research. The developments initiated in the context of the NFDI4Health COVID-19 task force are reusable for further research topics, as the challenges addressed are generic for the findability of and the handling with research data.Peer Reviewe

PubMed Central

Publikationsserver des Robert Koch-Instituts

Hochschulbibliothekszentrum des Landes Nordrhein-Westfalen (hbz)

PSB 2019 Workshop on Text Mining and Visualization for Precision Medicine

Author: Boland Mary Regina
Chen Yong
Du Jingcheng
Fluck Juliane
Gonzalez-Hernandez Graciela
Greene Casey S.
Holmes John
Kashyap Aditya
Leaman Robert
Liu Hongfang
Lu Zhiyong
Nielsen Rikke Linnemann
Ouyang Zhengqing
Schaaf Sebastian
Tao Cui
Taroni Jaclyn N.
Weissenbacher Davy
Zhang Yuping
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2018
Field of study

Crossref

Online Research Database In Technology