6,659 research outputs found

    Clinical chemistry in higher dimensions: machine-learning and enhanced prediction from routine clinical chemistry data

    No full text
    Big Data is having an impact on many areas of research, not the least of which is biomedical science. In this review paper, big data and machine learning are defined in terms accessible to the clinical chemistry community. Seven myths associated with machine learning and big data are then presented, with the aim of managing expectation of machine learning amongst clinical chemists. The myths are illustrated with four examples investigating the relationship between biomarkers in liver function tests, enhanced laboratory prediction of hepatitis virus infection, the relationship between bilirubin and white cell count, and the relationship between red cell distribution width and laboratory prediction of anaemia.This work was supported by the Quality Use of Pathology Programme (QUPP), The Commonwealth Department of Health

    Supervised Learning Models for the Preliminary Detection of COVID-19 in Patients Using Demographic and Epidemiological Parameters

    Get PDF
    The World Health Organization labelled the new COVID-19 breakout a public health crisis of worldwide concern on 30 January 2020, and it was named the new global pandemic in March 2020. It has had catastrophic consequences on the world economy and well-being of people and has put a tremendous strain on already-scarce healthcare systems globally, particularly in underdeveloped countries. Over 11 billion vaccine doses have already been administered worldwide, and the benefits of these vaccinations will take some time to appear. Today, the only practical approach to diagnosing COVID-19 is through the RT-PCR and RAT tests, which have sometimes been known to give unreliable results. Timely diagnosis and implementation of precautionary measures will likely improve the survival outcome and decrease the fatality rates. In this study, we propose an innovative way to predict COVID-19 with the help of alternative non-clinical methods such as supervised machine learning models to identify the patients at risk based on their characteristic parameters and underlying comorbidities. Medical records of patients from Mexico admitted between 23 January 2020 and 26 March 2022, were chosen for this purpose. Among several supervised machine learning approaches tested, the XGBoost model achieved the best results with an accuracy of 92%. It is an easy, non-invasive, inexpensive, instant and accurate way of forecasting those at risk of contracting the virus. However, it is pretty early to deduce that this method can be used as an alternative in the clinical diagnosis of coronavirus cases

    Exploring the clinical features of narcolepsy type 1 versus narcolepsy type 2 from European Narcolepsy Network database with machine learning

    Get PDF
    Narcolepsy is a rare life-long disease that exists in two forms, narcolepsy type-1 (NT1) or type-2 (NT2), but only NT1 is accepted as clearly defined entity. Both types of narcolepsies belong to the group of central hypersomnias (CH), a spectrum of poorly defined diseases with excessive daytime sleepiness as a core feature. Due to the considerable overlap of symptoms and the rarity of the diseases, it is difficult to identify distinct phenotypes of CH. Machine learning (ML) can help to identify phenotypes as it learns to recognize clinical features invisible for humans. Here we apply ML to data from the huge European Narcolepsy Network (EU-NN) that contains hundreds of mixed features of narcolepsy making it difficult to analyze with classical statistics. Stochastic gradient boosting, a supervised learning model with built-in feature selection, results in high performances in testing set. While cataplexy features are recognized as the most influential predictors, machine find additional features, e.g. mean rapid-eye-movement sleep latency of multiple sleep latency test contributes to classify NT1 and NT2 as confirmed by classical statistical analysis. Our results suggest ML can identify features of CH on machine scale from complex databases, thus providing 'ideas' and promising candidates for future diagnostic classifications.</p

    Statistical Methods to Enhance Clinical Prediction with High-Dimensional Data and Ordinal Response

    Get PDF
    Der technologische Fortschritt ermöglicht es heute, die moleculare Konfiguration einzelner Zellen oder ganzer Gewebeproben zu untersuchen. Solche in großen Mengen produzierten hochdimensionalen Omics-Daten aus der Molekularbiologie lassen sich zu immer niedrigeren Kosten erzeugen und werden so immer häufiger auch in klinischen Fragestellungen eingesetzt. Personalisierte Diagnose oder auch die Vorhersage eines Behandlungserfolges auf der Basis solcher Hochdurchsatzdaten stellen eine moderne Anwendung von Techniken aus dem maschinellen Lernen dar. In der Praxis werden klinische Parameter, wie etwa der Gesundheitszustand oder die Nebenwirkungen einer Therapie, häufig auf einer ordinalen Skala erhoben (beispielsweise gut, normal, schlecht). Es ist verbreitet, Klassifikationsproblme mit ordinal skaliertem Endpunkt wie generelle Mehrklassenproblme zu behandeln und somit die Information, die in der Ordnung zwischen den Klassen enthalten ist, zu ignorieren. Allerdings kann das Vernachlässigen dieser Information zu einer verminderten Klassifikationsgüte führen oder sogar eine ungünstige ungeordnete Klassifikation erzeugen. Klassische Ansätze, einen ordinal skalierten Endpunkt direkt zu modellieren, wie beispielsweise mit einem kumulativen Linkmodell, lassen sich typischerweise nicht auf hochdimensionale Daten anwenden. Wir präsentieren in dieser Arbeit hierarchical twoing (hi2) als einen Algorithmus für die Klassifikation hochdimensionler Daten in ordinal Skalierte Kategorien. hi2 nutzt die Mächtigkeit der sehr gut verstandenen binären Klassifikation, um auch in ordinale Kategorien zu klassifizieren. Eine Opensource-Implementierung von hi2 ist online verfügbar. In einer Vergleichsstudie zur Klassifikation von echten wie von simulierten Daten mit ordinalem Endpunkt produzieren etablierte Methoden, die speziell für geordnete Kategorien entworfen wurden, nicht generell bessere Ergebnisse als state-of-the-art nicht-ordinale Klassifikatoren. Die Fähigkeit eines Algorithmus, mit hochdimensionalen Daten umzugehen, dominiert die Klassifikationsleisting. Wir zeigen, dass unser Algorithmus hi2 konsistent gute Ergebnisse erzielt und in vielen Fällen besser abschneidet als die anderen Methoden

    DNA as Patentable Subject Matter and a Narrow Framework for Addressing the Perceived Problems Caused by Gene Patents

    Get PDF
    Concerns about the alleged harmful effects of gene patents— including hindered research and innovation and impeded patient access to high-quality genetic diagnostic tests—have resulted in overreactions from the public and throughout the legal profession. These overreactions are exemplified by Association for Molecular Pathology v. U.S. Patent and Trademark Office, a 2010 case in the Southern District of New York that held that isolated DNA is unpatentable subject matter under 35 U.S.C. § 101. The problem with these responses is that they fail to adequately consider the role that gene patents and patents on similar biomolecules play in facilitating investment in the costly and risky developmental processes required to transform the underlying inventions into marketable products. Accordingly, a more precisely refined solution is advisable. This Note proposes a narrowly tailored set of solutions to address the concerns about gene patents without destroying the incentives for companies to create and commercialize inventions derived from these and similar patents

    A Machine Learning Approach for Early Diagnosis of Transthyretin Amyloid Cardiomyopathy Among Heart Failure Patients

    Get PDF
    Transthyretin Amyloid Cardiomyopathy (ATTR-CM) is a rare, progressive, and fatal disease. Prevalence of ATTR-CM ranges from 4 to 17 per 100000 cases where the mean survival time is less than 4 years. It has a history of being underdiagnosed and misdiagnosed. The diagnosis delay has a weighted mean of 6.1 years for wild-type ATTR-CM. Low awareness, the necessity of invasive procedures, and lack of treatment are the key reasons for delayed diagnosis. But, with the introduction of non-invasive tests like nuclear scintigraphy with 99mTC-PYP and the disease modifying drug Tafamidis, the diagnosis delay signifies a missed opportunity to increase life expectancy by early treatment. Studies show that mean life expectancy can be increased by 5.46 years by early treatment if the 6.1 years of diagnosis delay can be eliminated, whereas the current mean survival time is less than 4 years. Though there is no definitive symptom for it, studies have found out some key prognostic flags: symptoms and comorbidities that are co-existent with ATTR-CM. A prediction model can be developed using the electronic health records (EHR) information in hand to diagnose it early and aid to increase the mean life expectancy. This study aims to identify the top phenotypes that can be used for early diagnosis of ATTR-CM and to predict ATTR-CM using machine learning models among heart failure patients. Patient records from North American healthcare organizations were derived from an EHR system ‘TrixNetX’ for this study. Several statistical analyses (e.g., logistic regression, forward and backward elimination, LASSO, and Survival analysis) were utilized to find out the top diagnostic procedures and comorbidities related with the diagnosis of wild-type ATTR-CM. These key factors were used as features to train machine learning models (e.g., XGBoost, Random Forest) and predict ATTR-CM early among heart failure patients. The study results found the key factors related to diagnosis delay and predicting early cases to improve life expectancy and quality of life
    corecore