294 research outputs found
A survey on bias in machine learning research
Current research on bias in machine learning often focuses on fairness, while
overlooking the roots or causes of bias. However, bias was originally defined
as a "systematic error," often caused by humans at different stages of the
research process. This article aims to bridge the gap between past literature
on bias in research by providing taxonomy for potential sources of bias and
errors in data and models. The paper focus on bias in machine learning
pipelines. Survey analyses over forty potential sources of bias in the machine
learning (ML) pipeline, providing clear examples for each. By understanding the
sources and consequences of bias in machine learning, better methods can be
developed for its detecting and mitigating, leading to fairer, more
transparent, and more accurate ML models.Comment: Submitted to journal. arXiv admin note: substantial text overlap with
arXiv:2308.0946
Measuring the impact of COVID-19 on hospital care pathways
Care pathways in hospitals around the world reported significant disruption during the recent COVID-19 pandemic but measuring the actual impact is more problematic. Process mining can be useful for hospital management to measure the conformance of real-life care to what might be considered normal operations. In this study, we aim to demonstrate that process mining can be used to investigate process changes associated with complex disruptive events. We studied perturbations to accident and emergency (A &E) and maternity pathways in a UK public hospital during the COVID-19 pandemic. Co-incidentally the hospital had implemented a Command Centre approach for patient-flow management affording an opportunity to study both the planned improvement and the disruption due to the pandemic. Our study proposes and demonstrates a method for measuring and investigating the impact of such planned and unplanned disruptions affecting hospital care pathways. We found that during the pandemic, both A &E and maternity pathways had measurable reductions in the mean length of stay and a measurable drop in the percentage of pathways conforming to normative models. There were no distinctive patterns of monthly mean values of length of stay nor conformance throughout the phases of the installation of the hospital’s new Command Centre approach. Due to a deficit in the available A &E data, the findings for A &E pathways could not be interpreted
A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4
Large language models (LLMs) are a special class of pretrained language
models obtained by scaling model size, pretraining corpus and computation.
LLMs, because of their large size and pretraining on large volumes of text
data, exhibit special abilities which allow them to achieve remarkable
performances without any task-specific training in many of the natural language
processing tasks. The era of LLMs started with OpenAI GPT-3 model, and the
popularity of LLMs is increasing exponentially after the introduction of models
like ChatGPT and GPT4. We refer to GPT-3 and its successor OpenAI models,
including ChatGPT and GPT4, as GPT-3 family large language models (GLLMs). With
the ever-rising popularity of GLLMs, especially in the research community,
there is a strong need for a comprehensive survey which summarizes the recent
research progress in multiple dimensions and can guide the research community
with insightful future research directions. We start the survey paper with
foundation concepts like transformers, transfer learning, self-supervised
learning, pretrained language models and large language models. We then present
a brief overview of GLLMs and discuss the performances of GLLMs in various
downstream tasks, specific domains and multiple languages. We also discuss the
data labelling and data augmentation abilities of GLLMs, the robustness of
GLLMs, the effectiveness of GLLMs as evaluators, and finally, conclude with
multiple insightful future research directions. To summarize, this
comprehensive survey paper will serve as a good resource for both academic and
industry people to stay updated with the latest research related to GPT-3
family large language models.Comment: Preprint under review, 58 page
Machine Learning Algorithm for the Scansion of Old Saxon Poetry
Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools
deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We
implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon
and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and
we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm
reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested
the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that
the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input
verses
Beyond Quantity: Research with Subsymbolic AI
How do artificial neural networks and other forms of artificial intelligence interfere with methods and practices in the sciences? Which interdisciplinary epistemological challenges arise when we think about the use of AI beyond its dependency on big data? Not only the natural sciences, but also the social sciences and the humanities seem to be increasingly affected by current approaches of subsymbolic AI, which master problems of quality (fuzziness, uncertainty) in a hitherto unknown way. But what are the conditions, implications, and effects of these (potential) epistemic transformations and how must research on AI be configured to address them adequately
NBIAS: A Natural Language Processing Framework for Bias Identification in Text
Bias in textual data can lead to skewed interpretations and outcomes when the
data is used. These biases could perpetuate stereotypes, discrimination, or
other forms of unfair treatment. An algorithm trained on biased data ends up
making decisions that disproportionately impact a certain group of people.
Therefore, it is crucial to detect and remove these biases to ensure the fair
and ethical use of data. To this end, we develop a comprehensive and robust
framework \textsc{Nbias} that consists of a data layer, corpus contruction,
model development layer and an evaluation layer. The dataset is constructed by
collecting diverse data from various fields, including social media,
healthcare, and job hiring portals. As such, we applied a transformer-based
token classification model that is able to identify bias words/ phrases through
a unique named entity. In the assessment procedure, we incorporate a blend of
quantitative and qualitative evaluations to gauge the effectiveness of our
models. We achieve accuracy improvements ranging from 1% to 8% compared to
baselines. We are also able to generate a robust understanding of the model
functioning, capturing not only numerical data but also the quality and
intricacies of its performance. The proposed approach is applicable to a
variety of biases and contributes to the fair and ethical use of textual data.Comment: Under revie
Managing healthcare transformation towards P5 medicine (Published in Frontiers in Medicine)
Health and social care systems around the world are facing radical organizational, methodological and technological paradigm changes to meet the requirements for improving quality and safety of care as well as efficiency and efficacy of care processes. In this they’re trying to manage the challenges of ongoing demographic changes towards aging, multi-diseased societies, development of human resources, a health and social services consumerism, medical and biomedical progress, and exploding costs for health-related R&D as well as health services delivery. Furthermore, they intend to achieve sustainability of global health systems by transforming them towards intelligent, adaptive and proactive systems focusing on health and wellness with optimized quality and safety outcomes.
The outcome is a transformed health and wellness ecosystem combining the approaches of translational medicine, 5P medicine (personalized, preventive, predictive, participative precision medicine) and digital health towards ubiquitous personalized health services realized independent of time and location. It considers individual health status, conditions, genetic and genomic dispositions in personal social, occupational, environmental and behavioural context, thus turning health and social care from reactive to proactive. This requires the advancement communication and cooperation among the business actors from different domains (disciplines) with different methodologies, terminologies/ontologies, education, skills and experiences from data level (data sharing) to concept/knowledge level (knowledge sharing). The challenge here is the understanding and the formal as well as consistent representation of the world of sciences and practices, i.e. of multidisciplinary and dynamic systems in variable context, for enabling mapping between the different disciplines, methodologies, perspectives, intentions, languages, etc. Based on a framework for dynamically, use-case-specifically and context aware representing multi-domain ecosystems including their development process, systems, models and artefacts can be consistently represented, harmonized and integrated. The response to that problem is the formal representation of health and social care ecosystems through an system-oriented, architecture-centric, ontology-based and policy-driven model and framework, addressing all domains and development process views contributing to the system and context in question.
Accordingly, this Research Topic would like to address this change towards 5P medicine. Specifically, areas of interest include, but are not limited:
• A multidisciplinary approach to the transformation of health and social systems
• Success factors for sustainable P5 ecosystems
• AI and robotics in transformed health ecosystems
• Transformed health ecosystems challenges for security, privacy and trust
• Modelling digital health systems
• Ethical challenges of personalized digital health
• Knowledge representation and management of transformed health ecosystems
Table of Contents:
04 Editorial: Managing healthcare transformation towards P5
medicine
Bernd Blobel and Dipak Kalra
06 Transformation of Health and Social Care Systems—An
Interdisciplinary Approach Toward a Foundational
Architecture
Bernd Blobel, Frank Oemig, Pekka Ruotsalainen and Diego M. Lopez
26 Transformed Health Ecosystems—Challenges for Security,
Privacy, and Trust
Pekka Ruotsalainen and Bernd Blobel
36 Success Factors for Scaling Up the Adoption of Digital
Therapeutics Towards the Realization of P5 Medicine
Alexandra Prodan, Lucas Deimel, Johannes Ahlqvist, Strahil Birov,
Rainer Thiel, Meeri Toivanen, Zoi Kolitsi and Dipak Kalra
49 EU-Funded Telemedicine Projects – Assessment of, and
Lessons Learned From, in the Light of the SARS-CoV-2
Pandemic
Laura Paleari, Virginia Malini, Gabriella Paoli, Stefano Scillieri,
Claudia Bighin, Bernd Blobel and Mauro Giacomini
60 A Review of Artificial Intelligence and Robotics in
Transformed Health Ecosystems
Kerstin Denecke and Claude R. Baudoin
73 Modeling digital health systems to foster interoperability
Frank Oemig and Bernd Blobel
89 Challenges and solutions for transforming health ecosystems
in low- and middle-income countries through artificial
intelligence
Diego M. López, Carolina Rico-Olarte, Bernd Blobel and Carol Hullin
111 Linguistic and ontological challenges of multiple domains
contributing to transformed health ecosystems
Markus Kreuzthaler, Mathias Brochhausen, Cilia Zayas, Bernd Blobel
and Stefan Schulz
126 The ethical challenges of personalized digital health
Els Maeckelberghe, Kinga Zdunek, Sara Marceglia, Bobbie Farsides
and Michael Rigb
LeafAI: query generator for clinical cohort discovery rivaling a human programmer
Objective: Identifying study-eligible patients within clinical databases is a
critical step in clinical research. However, accurate query design typically
requires extensive technical and biomedical expertise. We sought to create a
system capable of generating data model-agnostic queries while also providing
novel logical reasoning capabilities for complex clinical trial eligibility
criteria.
Materials and Methods: The task of query creation from eligibility criteria
requires solving several text-processing problems, including named entity
recognition and relation extraction, sequence-to-sequence transformation,
normalization, and reasoning. We incorporated hybrid deep learning and
rule-based modules for these, as well as a knowledge base of the Unified
Medical Language System (UMLS) and linked ontologies. To enable data-model
agnostic query creation, we introduce a novel method for tagging database
schema elements using UMLS concepts. To evaluate our system, called LeafAI, we
compared the capability of LeafAI to a human database programmer to identify
patients who had been enrolled in 8 clinical trials conducted at our
institution. We measured performance by the number of actual enrolled patients
matched by generated queries.
Results: LeafAI matched a mean 43% of enrolled patients with 27,225 eligible
across 8 clinical trials, compared to 27% matched and 14,587 eligible in
queries by a human database programmer. The human programmer spent 26 total
hours crafting queries compared to several minutes by LeafAI.
Conclusions: Our work contributes a state-of-the-art data model-agnostic
query generation system capable of conditional reasoning using a knowledge
base. We demonstrate that LeafAI can rival a human programmer in finding
patients eligible for clinical trials
General Course Catalog [2022/23 academic year]
General Course Catalog, 2022/23 academic yearhttps://repository.stcloudstate.edu/undergencat/1134/thumbnail.jp
Hierarchical, informed and robust machine learning for surgical tool management
This thesis focuses on the development of a computer vision and deep learning based system for the intelligent management of surgical tools. The work accomplished included the development of a new dataset, creation of state of the art techniques to cope with volume, variety and vision problems, and designing or adapting algorithms to address specific surgical tool recognition issues. The system was trained to cope with a wide variety of tools, with very subtle differences in shapes, and was designed to work with high volumes, as well as varying illuminations and backgrounds. Methodology that was adopted in this thesis included the creation of a surgical tool image dataset and development of a surgical tool attribute matrix or knowledge-base. This was significant because there are no large scale publicly available surgical tool datasets, nor are there established annotations or datasets of textual descriptions of surgical tools that can be used for machine learning. The work resulted in the development of a new hierarchical architecture for multi-level predictions at surgical speciality, pack, set and tool level. Additional work evaluated the use of synthetic data to improve robustness of the CNN, and the infusion of knowledge to improve predictive performance
- …