36 research outputs found

    X-TIME: An in-memory engine for accelerating machine learning on tabular data with CAMs

    Full text link
    Structured, or tabular, data is the most common format in data science. While deep learning models have proven formidable in learning from unstructured data such as images or speech, they are less accurate than simpler approaches when learning from tabular data. In contrast, modern tree-based Machine Learning (ML) models shine in extracting relevant information from structured data. An essential requirement in data science is to reduce model inference latency in cases where, for example, models are used in a closed loop with simulation to accelerate scientific discovery. However, the hardware acceleration community has mostly focused on deep neural networks and largely ignored other forms of machine learning. Previous work has described the use of an analog content addressable memory (CAM) component for efficiently mapping random forests. In this work, we focus on an overall analog-digital architecture implementing a novel increased precision analog CAM and a programmable network on chip allowing the inference of state-of-the-art tree-based ML models, such as XGBoost and CatBoost. Results evaluated in a single chip at 16nm technology show 119x lower latency at 9740x higher throughput compared with a state-of-the-art GPU, with a 19W peak power consumption

    Solve-RD: systematic pan-European data sharing and collaborative analysis to solve rare diseases.

    Get PDF
    For the first time in Europe hundreds of rare disease (RD) experts team up to actively share and jointly analyse existing patient's data. Solve-RD is a Horizon 2020-supported EU flagship project bringing together >300 clinicians, scientists, and patient representatives of 51 sites from 15 countries. Solve-RD is built upon a core group of four European Reference Networks (ERNs; ERN-ITHACA, ERN-RND, ERN-Euro NMD, ERN-GENTURIS) which annually see more than 270,000 RD patients with respective pathologies. The main ambition is to solve unsolved rare diseases for which a molecular cause is not yet known. This is achieved through an innovative clinical research environment that introduces novel ways to organise expertise and data. Two major approaches are being pursued (i) massive data re-analysis of >19,000 unsolved rare disease patients and (ii) novel combined -omics approaches. The minimum requirement to be eligible for the analysis activities is an inconclusive exome that can be shared with controlled access. The first preliminary data re-analysis has already diagnosed 255 cases form 8393 exomes/genome datasets. This unprecedented degree of collaboration focused on sharing of data and expertise shall identify many new disease genes and enable diagnosis of many so far undiagnosed patients from all over Europe

    Solving patients with rare diseases through programmatic reanalysis of genome-phenome data.

    Get PDF
    Funder: EC | EC Seventh Framework Programm | FP7 Health (FP7-HEALTH - Specific Programme "Cooperation": Health); doi: https://doi.org/10.13039/100011272; Grant(s): 305444, 305444Funder: Ministerio de Economía y Competitividad (Ministry of Economy and Competitiveness); doi: https://doi.org/10.13039/501100003329Funder: Generalitat de Catalunya (Government of Catalonia); doi: https://doi.org/10.13039/501100002809Funder: EC | European Regional Development Fund (Europski Fond za Regionalni Razvoj); doi: https://doi.org/10.13039/501100008530Funder: Instituto Nacional de Bioinformática ELIXIR Implementation Studies Centro de Excelencia Severo OchoaFunder: EC | EC Seventh Framework Programm | FP7 Health (FP7-HEALTH - Specific Programme "Cooperation": Health)Reanalysis of inconclusive exome/genome sequencing data increases the diagnosis yield of patients with rare diseases. However, the cost and efforts required for reanalysis prevent its routine implementation in research and clinical environments. The Solve-RD project aims to reveal the molecular causes underlying undiagnosed rare diseases. One of the goals is to implement innovative approaches to reanalyse the exomes and genomes from thousands of well-studied undiagnosed cases. The raw genomic data is submitted to Solve-RD through the RD-Connect Genome-Phenome Analysis Platform (GPAP) together with standardised phenotypic and pedigree data. We have developed a programmatic workflow to reanalyse genome-phenome data. It uses the RD-Connect GPAP's Application Programming Interface (API) and relies on the big-data technologies upon which the system is built. We have applied the workflow to prioritise rare known pathogenic variants from 4411 undiagnosed cases. The queries returned an average of 1.45 variants per case, which first were evaluated in bulk by a panel of disease experts and afterwards specifically by the submitter of each case. A total of 120 index cases (21.2% of prioritised cases, 2.7% of all exome/genome-negative samples) have already been solved, with others being under investigation. The implementation of solutions as the one described here provide the technical framework to enable periodic case-level data re-evaluation in clinical settings, as recommended by the American College of Medical Genetics

    Solving unsolved rare neurological diseases-a Solve-RD viewpoint.

    Get PDF
    Funder: Durch Princess Beatrix Muscle Fund Durch Speeren voor Spieren Muscle FundFunder: University of Tübingen Medical Faculty PATE programFunder: European Reference Network for Rare Neurological Diseases | 739510Funder: European Joint Program on Rare Diseases (EJP-RD COFUND-EJP) | 44140962

    Twist exome capture allows for lower average sequence coverage in clinical exome sequencing

    Get PDF
    Background Exome and genome sequencing are the predominant techniques in the diagnosis and research of genetic disorders. Sufficient, uniform and reproducible/consistent sequence coverage is a main determinant for the sensitivity to detect single-nucleotide (SNVs) and copy number variants (CNVs). Here we compared the ability to obtain comprehensive exome coverage for recent exome capture kits and genome sequencing techniques. Results We compared three different widely used enrichment kits (Agilent SureSelect Human All Exon V5, Agilent SureSelect Human All Exon V7 and Twist Bioscience) as well as short-read and long-read WGS. We show that the Twist exome capture significantly improves complete coverage and coverage uniformity across coding regions compared to other exome capture kits. Twist performance is comparable to that of both short- and long-read whole genome sequencing. Additionally, we show that even at a reduced average coverage of 70× there is only minimal loss in sensitivity for SNV and CNV detection. Conclusion We conclude that exome sequencing with Twist represents a significant improvement and could be performed at lower sequence coverage compared to other exome capture techniques

    A Solve-RD ClinVar-based reanalysis of 1522 index cases from ERN-ITHACA reveals common pitfalls and misinterpretations in exome sequencing

    Get PDF
    Purpose Within the Solve-RD project (https://solve-rd.eu/), the European Reference Network for Intellectual disability, TeleHealth, Autism and Congenital Anomalies aimed to investigate whether a reanalysis of exomes from unsolved cases based on ClinVar annotations could establish additional diagnoses. We present the results of the “ClinVar low-hanging fruit” reanalysis, reasons for the failure of previous analyses, and lessons learned. Methods Data from the first 3576 exomes (1522 probands and 2054 relatives) collected from European Reference Network for Intellectual disability, TeleHealth, Autism and Congenital Anomalies was reanalyzed by the Solve-RD consortium by evaluating for the presence of single-nucleotide variant, and small insertions and deletions already reported as (likely) pathogenic in ClinVar. Variants were filtered according to frequency, genotype, and mode of inheritance and reinterpreted. Results We identified causal variants in 59 cases (3.9%), 50 of them also raised by other approaches and 9 leading to new diagnoses, highlighting interpretation challenges: variants in genes not known to be involved in human disease at the time of the first analysis, misleading genotypes, or variants undetected by local pipelines (variants in off-target regions, low quality filters, low allelic balance, or high frequency). Conclusion The “ClinVar low-hanging fruit” analysis represents an effective, fast, and easy approach to recover causal variants from exome sequencing data, herewith contributing to the reduction of the diagnostic deadlock

    Prevalence, associated factors and outcomes of pressure injuries in adult intensive care unit patients: the DecubICUs study

    Get PDF
    Funder: European Society of Intensive Care Medicine; doi: http://dx.doi.org/10.13039/501100013347Funder: Flemish Society for Critical Care NursesAbstract: Purpose: Intensive care unit (ICU) patients are particularly susceptible to developing pressure injuries. Epidemiologic data is however unavailable. We aimed to provide an international picture of the extent of pressure injuries and factors associated with ICU-acquired pressure injuries in adult ICU patients. Methods: International 1-day point-prevalence study; follow-up for outcome assessment until hospital discharge (maximum 12 weeks). Factors associated with ICU-acquired pressure injury and hospital mortality were assessed by generalised linear mixed-effects regression analysis. Results: Data from 13,254 patients in 1117 ICUs (90 countries) revealed 6747 pressure injuries; 3997 (59.2%) were ICU-acquired. Overall prevalence was 26.6% (95% confidence interval [CI] 25.9–27.3). ICU-acquired prevalence was 16.2% (95% CI 15.6–16.8). Sacrum (37%) and heels (19.5%) were most affected. Factors independently associated with ICU-acquired pressure injuries were older age, male sex, being underweight, emergency surgery, higher Simplified Acute Physiology Score II, Braden score 3 days, comorbidities (chronic obstructive pulmonary disease, immunodeficiency), organ support (renal replacement, mechanical ventilation on ICU admission), and being in a low or lower-middle income-economy. Gradually increasing associations with mortality were identified for increasing severity of pressure injury: stage I (odds ratio [OR] 1.5; 95% CI 1.2–1.8), stage II (OR 1.6; 95% CI 1.4–1.9), and stage III or worse (OR 2.8; 95% CI 2.3–3.3). Conclusion: Pressure injuries are common in adult ICU patients. ICU-acquired pressure injuries are associated with mainly intrinsic factors and mortality. Optimal care standards, increased awareness, appropriate resource allocation, and further research into optimal prevention are pivotal to tackle this important patient safety threat

    Toward transparent and parsimonious methods for automatic performance tuning

    No full text
    La fin de la loi de Moore et de la loi de Dennard entraînent une augmentation de la complexité du matériel informatique qui implique d'adapter et d'optimiser les codes scientifiques très régulièrement. Une optimisation manuelle de code n'est pas adaptée en raison du nombre considérable de configurations mais en se plaçant dans le cadre de l'optimisation mathématique et de l'apprentissage, il est possible d'appliquer des méthodes issues de ces domaines pour optimiser automatiquement les performances des codes scientifiques, un processus appelé autotuning. Cependant, les méthodes d'autotuning couramment utilisées sont souvent peu propices à l'analyse statistique, comme les algorithmes génétiques,ce qui rend leur résultat difficile à interpréter ou dépendantes d'hypothèses restrictives sur l'espace de recherche, comme la descente de gradient, ce qui peut conduire à des solutions sous-optimales. Dans cette thèse, nous développons et évaluons la performance d'une méthode d'autotuning utilisant des plans d'expériences, une branche des statistiques qui a encore été peu utilisée dans ce contexte, et qui a pour objectif de produire des modèles interprétables et précis tout en restant parcimonieux sur le plan expérimental. Cette thèse commence par une présentation des principales méthodes d'optimisation et d'apprentissage. Nous décrivons en particulier les principales heuristiques issues de l'optimisation mathématique, les méthodes de modélisation statistique paramétriques et non paramétriques ainsi que comment ces modèles peuvent être utilisés pour minimiser une fonction inconnue (surrogateoptimization), puis nous expliquons en quoi les techniques de plan d'expériences permettent de contrôler le compromis entre le budget expérimental et la qualité du modèle, enfin, nous faisons le lien avec les techniques d'apprentissage en ligne, en nous concentrant sur les propriétés les plus importantes (parcimonie, transparence, incrémentalité, confiance, robustesse)pour leur applicabilité aux problèmes d'autotuning. La principale contribution de cette thèse est le développement d'une approche d'autotuning transparente et parcimonieuse basée sur les plans d'expériences. Nous appliquons cette approche à différents problèmes comme l'optimisation de la configuration de noyaux GPU et CPU, et la discrétisation de la précision numérique dans des réseaux de neurones. Nous évaluons également empiriquement d'autres méthodes (par exemple des heuristiques de recherche coordonnées par un algorithme de bandit) sur des problèmes d'optimisation de configuration decompilateurs pour des noyaux de calcul sur GPU et sur FPGA. Même s'il n'est pas possible de détecter et d'exploiter la structure de l'espace de recherche en toute généralité, nous montrons comment les méthodes d'autotuning basées sur des plans d'expériences peuvent permettre de réaliser une optimisation de code à la fois interprétable, efficace, et peu coûteuse sur le plan expérimental.The end of Moore's Law and the breakdown of Dennard's scaling mean thatincreasing hardware complexity and optimizing code efficiently are indispensableto maintain the exponential performance improvements of the past decades. Handoptimizing code is incompatible with the sheer number of configurations of manycode optimization problems, but fitting these problems into the mathematicaloptimization and learning frameworks enables applying methods from these domainsto automatically optimize code for performance, a process called autotuning.Commonly used autotuning methods are either not conducive to statisticalanalysis, such as genetic algorithms, or reliant on restrictive hypotheses aboutthe target search space, such as gradient descent. In this thesis we developand evaluate the performance of an autotuning method based on the Design ofExperiments, a branch of statistics that is not widely studied or applied inautotuning problems, and which aids in the parsimonious production ofstatistically interpretable and accurate surrogate models.We present a series of descriptions and discussions of various optimizationmethods, from the perspective of performance tuning. We describe heuristicsfrom mathematical optimization and parametric and nonparametric statisticalmodeling methods, describing how these surrogate models can be used to minimizean unknown function. We then discuss how the Design of Experiments enablesmanaging the compromise between experimental budget and model quality,establishing a link with Online Learning methods, focusing on parsimony,progressive model improvement, uncertainty, and robustness, the properties thatare most relevant for a method's applicability to autotuning problems.The key contribution of this thesis is the development of a transparent andparsimonious autotuning approach based on the Design of Experiments, which weapply to diverse problems such as optimizing the configuration of GPU and CPUkernels and finding mixed-precision bit quantization policies for neuralnetworks. We also present a series of empirical evaluations of other methods onautotuning problems from different High Performance Computing domains, such assearch heuristics coordinated by a bandit algorithm to optimize theconfiguration of compilers for several GPU and FPGA kernels. Although someexperimental scenarios eluded the detection and exploitation of search spacestructure, regardless of the chosen method, we demonstrate how autotuningmethods based on the Design of Experiments can aid in interpretable, efficient,and effective code optimization
    corecore