Search CORE

467 research outputs found

Spatio-temporal data classification through multidimensional sequential patterns: Application to crop mapping in complex landscape

Author: Bégué Agnès
Ienco Dino
Laurent Anne
Pitarch Yoann
Poncelet Pascal
Sala Michel
Teisseire Maguelonne
Vintrou Elodie
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

International audienceThe main use of satellite imagery concerns the process of the spectral and spatial dimensions of the data. However, to extract useful information, the temporal dimension also has to be accounted for which increases the complexity of the problem. For this reason, there is a need for suitable data mining techniques for this source of data. In this work, we developed a data mining methodology to extract multidimensional sequential patterns to characterize temporal behaviors. We then used the extracted multidimensional sequences to build a classifier, and show how the patterns help to distinguish between the classes. We evaluated our technique using a real-world dataset containing information about land use in Mali (West Africa) to automatically recognize if an area is cultivated or not

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Agritrop

HAL-CIRAD

Hadoop neural network for parallel and distributed feature selection

Author: Austin
Bentz
Borthakur
Casasent
Chu
Dash
Fayyad
Fisher
Forman
Franks
Guyon
Guyon
Hall
Hall
Hall
Hall
Han
Hebb
Hodge
Hodge
Hodge
Hodge
Hodge
Hodge
Jim Austin
Jolliffe
Kohavi
Kumar
Liu
Liu
Liu
McCallum
Palm
Quinlan
Quinlan
Reggiani
Rutman
Shvachko
Simon O’Keefe
Sun
Victoria J. Hodge
Weeks
Weeks
Wettscherek
Willshaw
Witten
Zhang
Zikopoulos
Publication venue: 'Elsevier BV'
Publication date: 01/06/2016
Field of study

In this paper, we introduce a theoretical basis for a Hadoop-based neural network for parallel and distributed feature selection in Big Data sets. It is underpinned by an associative memory (binary) neural network which is highly amenable to parallel and distributed processing and fits with the Hadoop paradigm. There are many feature selectors described in the literature which all have various strengths and weaknesses. We present the implementation details of five feature selection algorithms constructed using our artificial neural network framework embedded in Hadoop YARN. Hadoop allows parallel and distributed processing. Each feature selector can be divided into subtasks and the subtasks can then be processed in parallel. Multiple feature selectors can also be processed simultaneously (in parallel) allowing multiple feature selectors to be compared. We identify commonalities among the five features selectors. All can be processed in the framework using a single representation and the overall processing can also be greatly reduced by only processing the common aspects of the feature selectors once and propagating these aspects across all five feature selectors as necessary. This allows the best feature selector and the actual features to select to be identified for large and high dimensional data sets through exploiting the efficiency and flexibility of embedding the binary associative-memory neural network in Hadoop

Elsevier - Publisher Connector

Crossref

White Rose Research Online

Research summary, January 1989 - June 1990

Author
Publication venue
Publication date
Field of study

The Research Institute for Advanced Computer Science (RIACS) was established at NASA ARC in June of 1983. RIACS is privately operated by the Universities Space Research Association (USRA), a consortium of 62 universities with graduate programs in the aerospace sciences, under a Cooperative Agreement with NASA. RIACS serves as the representative of the USRA universities at ARC. This document reports our activities and accomplishments for the period 1 Jan. 1989 - 30 Jun. 1990. The following topics are covered: learning systems, networked systems, and parallel systems

NASA Technical Reports Server

Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model

Author: Letham Benjamin
Madigan David
McCormick Tyler H.
Rudin Cynthia
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 10/05/2018
Field of study

We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if … then. . . statements (e.g., if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called Bayesian Rule Lists that yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS₂ score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS₂, but more accurate.National Science Foundation (U.S.) (Grant IIS-1053407

DSpace@MIT

Mining Predictive Patterns and Extension to Multivariate Temporal Data

Author: Batal Iyad
Publication venue
Publication date: 01/01/2012
Field of study

An important goal of knowledge discovery is the search for patterns in the data that can help explaining its underlying structure. To be practically useful, the discovered patterns should be novel (unexpected) and easy to understand by humans. In this thesis, we study the problem of mining patterns (defining subpopulations of data instances) that are important for predicting and explaining a specific outcome variable. An example is the task of identifying groups of patients that respond better to a certain treatment than the rest of the patients. We propose and present efficient methods for mining predictive patterns for both atemporal and temporal (time series) data. Our first method relies on frequent pattern mining to explore the search space. It applies a novel evaluation technique for extracting a small set of frequent patterns that are highly predictive and have low redundancy. We show the benefits of this method on several synthetic and public datasets. Our temporal pattern mining method works on complex multivariate temporal data, such as electronic health records, for the event detection task. It first converts time series into time-interval sequences of temporal abstractions and then mines temporal patterns backwards in time, starting from patterns related to the most recent observations. We show the benefits of our temporal pattern mining method on two real-world clinical tasks

CiteSeerX

D-Scholarship@Pitt

Associative Pattern Recognition for Biological Regulation Data

Author: Xiao Yiou
Publication venue: SURFACE at Syracuse University
Publication date: 22/12/2017
Field of study

In the last decade, bioinformatics data has been accumulated at an unprecedented rate, thanks to the advancement in sequencing technologies. Such rapid development poses both challenges and promising research topics. In this dissertation, we propose a series of associative pattern recognition algorithms in biological regulation studies. In particular, we emphasize efficiently recognizing associative patterns between genes, transcription factors, histone modifications and functional labels using heterogeneous data sources (numeric, sequences, time series data and textual labels). In protein-DNA associative pattern recognition, we introduce an efficient algorithm for affinity test by searching for over-represented DNA sequences using a hash function and modulo addition calculation. This substantially improves the efficiency of \textit{next generation sequencing} data analysis. In gene regulatory network inference, we propose a framework for refining weak networks based on transcription factor binding sites, thus improved the precision of predicted edges by up to 52%. In histone modification code analysis, we propose an approach to genome-wide combinatorial pattern recognition for histone code to function associative pattern recognition, and achieved improvement by up to

38.1\%

. We also propose a novel shape based modification pattern analysis approach, using this to successfully predict sub-classes of genes in flowering-time category. We also propose a combination to combination associative pattern recognition, and achieved better performance compared against multi-label classification and bidirectional associative memory methods. Our proposed approaches recognize associative patterns from different types of data efficiently, and provides a useful toolbox for biological regulation analysis. This dissertation presents a road-map to associative patterns recognition at genome wide level

Syracuse University Research Facility and Collaborative Environment