19 research outputs found

    Multivariate Analysis Techniques for Optimal Vision System Design

    Get PDF

    Novel Approach to Choosing Principal Components Number in Logistic Regression

    Get PDF
    The confirmed approach to choosing the number of principal components for prediction models includes exploring the contribution of each principal component to the total variance of the target variable. A combination of possible important principal components can be chosen to explain a big part of the variance in the target. Sometimes several combinations of principal components should be explored to achieve the highest accuracy in classification. This research proposes a novel automatic way of deciding how many principal components should be retained to improve classification accuracy. We do that by combining principal components with the ANOVA selection. To improve the accuracy resulting from our automatic approach, we use the bootstrap procedure for model selection. We call this procedure the Bootstrapped-ANOVA PCA selection. Our results suggest that this procedure can automate the principal components selection and improve the accuracy of classification models, in our example, the logistic regression

    A framework to identify structured behavioral patterns within rodent spatial trajectories

    Get PDF
    Animal behavior is highly structured. Yet, structured behavioral patterns—or “statistical ethograms”—are not immediately apparent from the full spatiotemporal data that behavioral scientists usually collect. Here, we introduce a framework to quantitatively characterize rodent behavior during spatial (e.g., maze) navigation, in terms of movement building blocks or motor primitives. The hypothesis that we pursue is that rodent behavior is characterized by a small number of motor primitives, which are combined over time to produce open-ended movements. We assume motor primitives to be organized in terms of two sparsity principles: each movement is controlled using a limited subset of motor primitives (sparse superposition) and each primitive is active only for time-limited, time-contiguous portions of movements (sparse activity). We formalize this hypothesis using a sparse dictionary learning method, which we use to extract motor primitives from rodent position and velocity data collected during spatial navigation, and successively to reconstruct past trajectories and predict novel ones. Three main results validate our approach. First, rodent behavioral trajectories are robustly reconstructed from incomplete data, performing better than approaches based on standard dimensionality reduction methods, such as principal component analysis, or single sparsity. Second, the motor primitives extracted during one experimental session generalize and afford the accurate reconstruction of rodent behavior across successive experimental sessions in the same or in modified mazes. Third, in our approach the number of motor primitives associated with each maze correlates with independent measures of maze complexity, hence showing that our formalism is sensitive to essential aspects of task structure. The framework introduced here can be used by behavioral scientists and neuroscientists as an aid for behavioral and neural data analysis. Indeed, the extracted motor primitives enable the quantitative characterization of the complexity and similarity between different mazes and behavioral patterns across multiple trials (i.e., habit formation). We provide example uses of this computational framework, showing how it can be used to identify behavioural effects of maze complexity, analyze stereotyped behavior, classify behavioral choices and predict place and grid cell displacement in novel environments

    A framework to identify structured behavioral patterns within rodent spatial trajectories

    Get PDF
    Animal behavior is highly structured. Yet, structured behavioral patterns—or “statistical ethograms”—are not immediately apparent from the full spatiotemporal data that behavioral scientists usually collect. Here, we introduce a framework to quantitatively characterize rodent behavior during spatial (e.g., maze) navigation, in terms of movement building blocks or motor primitives. The hypothesis that we pursue is that rodent behavior is characterized by a small number of motor primitives, which are combined over time to produce open-ended movements. We assume motor primitives to be organized in terms of two sparsity principles: each movement is controlled using a limited subset of motor primitives (sparse superposition) and each primitive is active only for time-limited, time-contiguous portions of movements (sparse activity). We formalize this hypothesis using a sparse dictionary learning method, which we use to extract motor primitives from rodent position and velocity data collected during spatial navigation, and successively to reconstruct past trajectories and predict novel ones. Three main results validate our approach. First, rodent behavioral trajectories are robustly reconstructed from incomplete data, performing better than approaches based on standard dimensionality reduction methods, such as principal component analysis, or single sparsity. Second, the motor primitives extracted during one experimental session generalize and afford the accurate reconstruction of rodent behavior across successive experimental sessions in the same or in modified mazes. Third, in our approach the number of motor primitives associated with each maze correlates with independent measures of maze complexity, hence showing that our formalism is sensitive to essential aspects of task structure. The framework introduced here can be used by behavioral scientists and neuroscientists as an aid for behavioral and neural data analysis. Indeed, the extracted motor primitives enable the quantitative characterization of the complexity and similarity between different mazes and behavioral patterns across multiple trials (i.e., habit formation). We provide example uses of this computational framework, showing how it can be used to identify behavioural effects of maze complexity, analyze stereotyped behavior, classify behavioral choices and predict place and grid cell displacement in novel environments

    A literature review of (sparse) exponential family PCA

    Get PDF
    This is a brief overview of the methodology around exponential family PCA. We revisit classic PCA methodology and we focus on exponential family PCA due to it's applicability on a number of distributions and hence a wide variety of problems. We discuss the applicability of these methods to text data analysis due to the high-dimensional and sparse nature of these data

    Incremental Sparse-PCA Feature Extraction For Data Streams

    Get PDF
    Intruders attempt to penetrate commercial systems daily and cause considerable financial losses for individuals and organizations. Intrusion detection systems monitor network events to detect computer security threats. An extensive amount of network data is devoted to detecting malicious activities. Storing, processing, and analyzing the massive volume of data is costly and indicate the need to find efficient methods to perform network data reduction that does not require the data to be first captured and stored. A better approach allows the extraction of useful variables from data streams in real time and in a single pass. The removal of irrelevant attributes reduces the data to be fed to the intrusion detection system (IDS) and shortens the analysis time while improving the classification accuracy. This dissertation introduces an online, real time, data processing method for knowledge extraction. This incremental feature extraction is based on two approaches. First, Chunk Incremental Principal Component Analysis (CIPCA) detects intrusion in data streams. Then, two novel incremental feature extraction methods, Incremental Structured Sparse PCA (ISSPCA) and Incremental Generalized Power Method Sparse PCA (IGSPCA), find malicious elements. Metrics helped compare the performance of all methods. The IGSPCA was found to perform as well as or better than CIPCA overall in term of dimensionality reduction, classification accuracy, and learning time. ISSPCA yielded better results for higher chunk values and greater accumulation ratio thresholds. CIPCA and IGSPCA reduced the IDS dataset to 10 principal components as opposed to 14 eigenvectors for ISSPCA. ISSPCA is more expensive in terms of learning time in comparison to the other techniques. This dissertation presents new methods that perform feature extraction from continuous data streams to find the small number of features necessary to express the most data variance. Data subsets derived from a few important variables render their interpretation easier. Another goal of this dissertation was to propose incremental sparse PCA algorithms capable to process data with concept drift and concept shift. Experiments using WaveForm and WaveFormNoise datasets confirmed this ability. Similar to CIPCA, the ISSPCA and IGSPCA updated eigen-axes as a function of the accumulation ratio value, forming informative eigenspace with few eigenvectors

    Novel Methods for Multi-Shape Analysis

    Get PDF
    Multi-shape analysis has the objective to recognise, classify, or quantify morphological patterns or regularities within a set of shapes of a particular object class in order to better understand the object class of interest. One important aspect of multi-shape analysis are Statistical Shape Models (SSMs), where a collection of shapes is analysed and modelled within a statistical framework. SSMs can be used as (statistical) prior that describes which shapes are more likely and which shapes are less likely to be plausible instances of the object class of interest. Assuming that the object class of interest is known, such a prior can for example be used in order to reconstruct a three-dimensional surface from only a few known surface points. One relevant application of this surface reconstruction is 3D image segmentation in medical imaging, where the anatomical structure of interest is known a-priori and the surface points are obtained (either automatically or manually) from images. Frequently, Point Distribution Models (PDMs) are used to represent the distribution of shapes, where each shape is discretised and represented as labelled point set. With that, a shape can be interpreted as an element of a vector space, the so-called shape space, and the shape distribution in shape space can be estimated from a collection of given shape samples. One crucial aspect for the creation of PDMs that is tackled in this thesis is how to establish (bijective) correspondences across the collection of training shapes. Evaluated on brain shapes, the proposed method results in an improved model quality compared to existing approaches whilst at the same time being superior with respect to runtime. The second aspect considered in this work is how to learn a low-dimensional subspace of the shape space that is close to the training shapes, where all factors spanning this subspace have local support. Compared to previous work, the proposed method models the local support regions implicitly, such that no initialisation of the size and location of these regions is necessary, which is advantageous in scenarios where this information is not available. The third topic covered in this thesis is how to use an SSM in order to reconstruct a surface from only few surface points. By using a Gaussian Mixture Model (GMM) with anisotropic covariance matrices, which are oriented according to the surface normals, a more surface-oriented fitting is achieved compared to a purely point-based fitting when using the common Iterative Closest Point (ICP) algorithm. In comparison to ICP we find that the GMM-based approach gives superior accuracy and robustness on sparse data. Furthermore, this work covers the transformation synchronisation method, which is a procedure for removing noise that accounts for transitive inconsistency in the set of pairwise linear transformations. One interesting application of this methodology that is relevant in the context of multi-shape analysis is to solve the multi-alignment problem in an unbiased/reference-free manner. Moreover, by introducing an improvement of the numerical stability, the methodology can be used to solve the (affine) multi-image registration problem from pairwise registrations. Compared to reference-based multi-image registration, the proposed approach leads to an improved registration accuracy and is unbiased/reference-free, which makes it ideal for statistical analyses
    corecore