3,326 research outputs found

    Discovering novelty in sequential patterns: application for analysis of microarray data on Alzheimer disease

    Get PDF
    [Departement_IRSTEA]Territoires [TR1_IRSTEA]SYNERGIEInternational audienceAnalyzing microarrays data is still a great challenge since existing methods produce huge amounts of useless results. We propose a new method called NoDisco for discovering novelties in gene sequences obtained by applying data-mining techniques to microarray data. Method: We identify popular genes, which are often cited in the literature, and innovative genes, which are linked to the popular genes in the sequences but are not mentioned in the literature. We also identify popular and innovative sequences containing these genes. Biologists can thus select interesting sequences from the two sets and obtain the k-best documents. Results: We show the efficiency of this method by applying it on real data used to decipher the mechanisms underlying Alzheimer disease. Conclusion: The first selection of sequences based on popularity and innovation help experts focus on relevant sequences while the top-k documents help them understand the sequences

    Fouille de texte : une approche séquentielle pour découvrir des relations spatiales

    Get PDF
    National audienceDans cet article, nous présentons les premières étapes d'un projet de fouille de données textuelles. Plus précisément, nous appliquons un algorithme d'extraction de motifs séquentiels sous contraintes multiples afin d'identifier des relations entre entités spatiales. Les premiers résultats obtenus montrent l'intérêt de l'utilisation de cette approche et ses limites. Dans cet article, nous détaillons les premières bases de travaux plus ambitieux dont l'objectif est d'apporter des informations cruciales permettant de compléter l'analyse des images satellitaires

    Machine learning techniques implementation in power optimization, data processing, and bio-medical applications

    Get PDF
    The rapid progress and development in machine-learning algorithms becomes a key factor in determining the future of humanity. These algorithms and techniques were utilized to solve a wide spectrum of problems extended from data mining and knowledge discovery to unsupervised learning and optimization. This dissertation consists of two study areas. The first area investigates the use of reinforcement learning and adaptive critic design algorithms in the field of power grid control. The second area in this dissertation, consisting of three papers, focuses on developing and applying clustering algorithms on biomedical data. The first paper presents a novel modelling approach for demand side management of electric water heaters using Q-learning and action-dependent heuristic dynamic programming. The implemented approaches provide an efficient load management mechanism that reduces the overall power cost and smooths grid load profile. The second paper implements an ensemble statistical and subspace-clustering model for analyzing the heterogeneous data of the autism spectrum disorder. The paper implements a novel k-dimensional algorithm that shows efficiency in handling heterogeneous dataset. The third paper provides a unified learning model for clustering neuroimaging data to identify the potential risk factors for suboptimal brain aging. In the last paper, clustering and clustering validation indices are utilized to identify the groups of compounds that are responsible for plant uptake and contaminant transportation from roots to plants edible parts --Abstract, page iv

    Motifs SĂ©quentiels Discriminants pour les puces ADN

    Get PDF
    National audienceDécouvrir de nouvelles informations sur les groupes de gènes impliqués dans une maladie est un véritable challenge. Les puces ADN sont des outils puissants pour l'analyse des expressions de gènes. Elles mesurent l'expression de milliers de gènes dans différentes conditions biologiques. Dans cet article, nous proposons une nouvelle approche mettant en évidence des relations d'ordre entre les expressions de gènes. Tout d'abord, nous extrayons des motifs séquentiels qui peuvent être utilisés comme matériel d'étude par les biologistes. Or, comme la densité des bases issues des puces à ADN rend difficile l'extraction de ces motifs, nous introduisons une source de connaissances pendant le processus de fouille. De cette manière, l'espace de recherche est réduit et les résultats obtenus sont plus pertinents d'un point de vue biologique. Les expérimentations sur des données réelles soulignent la pertinence de notre proposition

    Finding Relevant Sequences With The Least Temporal Contradiction Measure: Application to Hydrological Data

    Get PDF
    International audienceIn this paper, we present a knowledge discovery process applied to hydrological data. To achieve this objective, we apply an algorithm to extract sequential patterns on data collected at stations located along several rivers. The data is pre-processed in order to obtain different spatial proximities and the number of patterns is estimated to highlight the influence of defined spatial relationship. We provide an objective measure of assessment, called the least temporal contradiction, to help the expert in discovering new knowledge. Such elements can be used to assess spatialized indicators to assist the interpretation of ecological and rivers monitoring pressure data

    Mining microarray data to predict the histological grade of a breast cancer

    Get PDF
    BACKGROUND: The aim of this study was to develop an original method to extract sets of relevant molecular biomarkers (gene sequences) that can be used for class prediction and can be included as prognostic and predictive tools. MATERIALS AND METHODS: The method is based on sequential patterns used as features for class prediction. We applied it to classify breast cancer tumors according to their histological grade. RESULTS: We obtained very good recall and precision for grades 1 and 3 tumors, but, like other authors, our results were less satisfactory for grade 2 tumors. CONCLUSIONS: We demonstrated the interest of sequential patterns for class prediction of microarrays and we now have the material to use them for prognostic and predictive applications

    Harnessing the Power of Machine Learning in Dementia Informatics Research: Issues, Opportunities and Challenges

    Get PDF
    Dementia is a chronic and degenerative condition affecting millions globally. The care of patients with dementia presents an ever continuing challenge to healthcare systems in the 21st century. Medical and health sciences have generated unprecedented volumes of data related to health and wellbeing for patients with dementia due to advances in information technology, such as genetics, neuroimaging, cognitive assessment, free texts, routine electronic health records etc. Making the best use of these diverse and strategic resources will lead to high quality care of patients with dementia. As such, machine learning becomes a crucial factor in achieving this objective. The aim of this paper is to provide a state-of-the-art review of machine learning methods applied to health informatics for dementia care. We collate and review the existing scientific methodologies and identify the relevant issues and challenges when faced with big health data. Machine learning has demonstrated promising applications to neuroimaging data analysis for dementia care, while relatively less efforts have been made to make use of integrated heterogeneous data via advanced machine learning approaches. We further indicate the future potentials and research directions of applying advanced machine learning, such as deep learning, to dementia informatics

    Discriminative multi-task feature selection for multi-modality classification of Alzheimer’s disease

    Get PDF
    Recently, multi-task based feature selection methods have been used in multi-modality based classification of Alzheimer’s disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI). However, in traditional multi-task feature selection methods, some useful discriminative information among subjects is usually not well mined for further improving the subsequent classification performance. Accordingly, in this paper, we propose a discriminative multitask feature selection method to select the most discriminative features for multi-modality based classification of AD/MCI. Specifically, for each modality, we train a linear regression model using the corresponding modality of data, and further enforce the group-sparsity regularization on weights of those regression models for joint selection of common features across multiple modalities. Furthermore, we propose a discriminative regularization term based on the intra-class and inter-class Laplacian matrices to better use the discriminative information among subjects. To evaluate our proposed method, we perform extensive experiments on 202 subjects, including 51 AD patients, 99 MCI patients, and 52 healthy controls (HC), from the baseline MRI and FDG-PET image data of the Alzheimer’s Disease Neuroimaging Initiative (ADNI). The experimental results show that our proposed method not only improves the classification performance, but also has potential to discover the disease-related biomarkers useful for diagnosis of disease, along with the comparison to several state-of-the-art methods for multi-modality based AD/MCI classification
    • …
    corecore