13 research outputs found

    Learning disease progression models with longitudinal data and missing values

    Get PDF
    International audienceStatistical methods have been developed for the analysis of longitudinal data in neurodegenerative diseases. To cope with the lack of temporal markers-i.e. to account for subject-specific disease progression in regard to age-a common strategy consists in realigning the individual sequence data in time. Patient's specific trajectories can indeed be seen as spatiotemporal perturbations of the same normative disease trajectory. However, these models do not easily allow one to account for multimodal data, which more than often include missing values. Indeed, it is rare that imaging and clinical examinations for instance are performed at the same frequency in clinical protocols. Multimodal models also need to allow a different profile of progression for data with different structure and representation. We propose to use a generative mixed effect model that considers the progression trajectories as curves on a Rieman-nian Manifold. We use the concept of product manifold to handle multimodal data, and leverage the generative aspect of our model to handle missing values. We assess the robuste-ness of our methods toward missing values frequency on both synthetic and real data. Finally we apply our model on a real-world dataset to model Parkinson's disease progression from data derived from clinical examination and imaging

    A comparison between early presentation of dementia with Lewy Bodies, Alzheimer's disease and Parkinson's disease: evidence from routine primary care and UK Biobank data

    Get PDF
    OBJECTIVE: To simultaneously contrast prediagnostic clinical characteristics of individuals with a final diagnosis of dementia with Lewy Bodies, Parkinson's disease, Alzheimer's disease compared to controls without neurodegenerative disorders. METHODS: Using the longitudinal THIN database in the UK, we tested the association of each neurodegenerative disorder with a selected list of symptoms and broad families of treatments, and compared the associations between disorders to detect disease-specific effects. We replicated the main findings in the UK Biobank. RESULTS: We used data of 28,222 patients with PD, 20,214 with AD, 4,682 with DLB and 20,214 controls. All neurodegenerative disorders were significantly associated with the presence of multiple clinical characteristics before their diagnosis including sleep disorders, falls, psychiatric symptoms and autonomic dysfunctions. When comparing DLB patients with patients with PD and AD patients, falls, psychiatric symptoms and autonomic dysfunction were all more strongly associated with DLB in the five years preceding the first neurodegenerative diagnosis. The use of statins was lower in patients who developed PD and higher in patients who developed DLB compared to AD. In PD patients, the use of statins was associated with the development of dementia in the five years following PD diagnosis. INTERPRETATION: Prediagnostic presentations of falls, psychiatric symptoms and autonomic dysfunctions were more strongly associated with DLB than PD and AD. This study also suggests that whilst several associations with medications are similar in neurodegenerative disorders, statin usage is negatively associated with Parkinson's Disease but positively with DLB and AD as well as development of dementia in PD

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∌99% of the euchromatic genome and is accurate to an error rate of ∌1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    PrePeP: A light-weight, extensible tool for predicting frequent hitters

    No full text
    International audienceWe present PrePeP, a lightweight tool for predicting whether molecules are frequent hitters, and visually inspecting the subgraphs supporting this decision. PrePeP is contains three modules: a mining component , an encoding/predicting component, and a graphical interface, all of which are easily extensible

    Validity, Agreement, Consensuality and Annotated Data Quality

    No full text
    International audienceReference annotated (or gold-standard) datasets are required for various common tasks such as training for machine learning systems or system validation. They are necessary to analyse or compare occurrences or items annotated by experts, or to compare objects resulting from any computational process to objects annotated (selected and characterized) by experts. But, even if reference annotated gold-standard corpora are required, their production is known as a difficult problem, from both a theoretical and practical point of view. Many studies devoted to these issues conclude that multi-annotation is most of the time a necessity. Measuring the inter-annotator agreement, which is required to check the reliability of data and the reproducibility of an annotation task, and thus to establish a gold standard, is another thorny problem. Fine analysis of available metrics for this specific task then becomes essential. Our work is part of this effort and more precisely focuses on several problems, which are rarely discussed, although they are intrinsically linked with the interpretation and the evaluation of metrics. In particular, we focus here on the complex relations between agreement and reference (of which agreement among annotators is supposed to be an indicator), and the emergence of a consensus. We also introduce the notion of consensuality as another relevant indicator

    Validity, Agreement, Consensuality and Annotated Data Quality

    No full text
    International audienceReference annotated (or gold-standard) datasets are required for various common tasks such as training for machine learning systems or system validation. They are necessary to analyse or compare occurrences or items annotated by experts, or to compare objects resulting from any computational process to objects annotated (selected and characterized) by experts. But, even if reference annotated gold-standard corpora are required, their production is known as a difficult problem, from both a theoretical and practical point of view. Many studies devoted to these issues conclude that multi-annotation is most of the time a necessity. Measuring the inter-annotator agreement, which is required to check the reliability of data and the reproducibility of an annotation task, and thus to establish a gold standard, is another thorny problem. Fine analysis of available metrics for this specific task then becomes essential. Our work is part of this effort and more precisely focuses on several problems, which are rarely discussed, although they are intrinsically linked with the interpretation and the evaluation of metrics. In particular, we focus here on the complex relations between agreement and reference (of which agreement among annotators is supposed to be an indicator), and the emergence of a consensus. We also introduce the notion of consensuality as another relevant indicator

    Improving SAR analysis via pharmacophoric feature selection and feature transformation

    No full text
    International audienceRecently, the analysis of Structure-Activity Relationships has been confronted with the high dimensionality of chemical representations of molecular datasets, making analysis for drug discovery more complicated. To address this, different machine learning (ML) approaches have been exploitedand proved their effectiveness by extracting relevant information. Generally, before applying ML methods, the raw data must be preprocessed to obtain better results. In our work, we start with a dataset described by pharmacophores obtained by the Norns tool 1 . Norns considers a dataset of molecules for which both structure and activity are given, and extracts a set of pharmacophores whose occurrences in the data set fulfill specified properties. For example, Norns allows the automatic extraction of 112047 pharmacophores from ligands tested on BCR-ABL, without any prior supervised selection. However, the size of this set is too large to perform efficient analysis and offer explicable results based on ML algorithms. This is why, in a first step, we select a subset of pharmacophores via grouping them into equivalence classes: pharmacophores occurring in the same set of molecules. This first unsupervised selection step allows us to retain 22127 pharmacophores. As the latter step removes redundant pharmacophores without losing statistical information, it allows to cope with the high dimensionality exhibited by chemogenomics datasets. We can then perform more sophisticated operations using the new representation: we do so by passing the data through two neural network-based transformations. The first neural network (NN) performs an unsupervised transformation 2 by reducing a loss function based on similarity computations. Its goal is to reduce the distance between similar data and increase it between dissimilar ones. The second NN exploits (a part of) the activity information with the aim of obtaining a more structured data space. While we use labels contained in the data in our work, the necessary information could also be the result of an expert interacting with the process, a first step towards interactive mining of pharmacophores. We obtain good clustering performances which lead to an easier and more efficient analysis. In addition, feature weights derived from the NN-based transformations could help to explain the results of the clustering step
    corecore