156,841 research outputs found

    GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data

    Get PDF
    Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.Department of Agriculture, Food and the MarineEuropean Commission - Seventh Framework Programme (FP7)Science Foundation IrelandUniversity College Dubli

    FORCE-TIME CURVE ALIGNMENT FOR FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS IN VERTICAL JUMPING

    Get PDF
    Functional principal component analysis (FPCA) can be used to extract key features from time series data for use in statistical models. This study evaluated time normalisation in combination with curve registration prior to performing FPCA. Using vertical ground reaction force data from countermovement jumps, evaluation was based on linear regression for predicting peak power and jump height, and logistic regression for classifying jump type (arm swing or not). Datasets not subject to time normalisation generally produced better results with the highest accuracy being achieved when using registration with peak power as a landmark (peak power R2 = 99.3%, jump height R2 = 94.9%). Classification of jump type benefited in some cases from registration (87.0% to 91.2%). These techniques could be applied to data from wearable sensors to improve prediction and classification

    Decoding the Encoding of Functional Brain Networks: an fMRI Classification Comparison of Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA), and Sparse Coding Algorithms

    Full text link
    Brain networks in fMRI are typically identified using spatial independent component analysis (ICA), yet mathematical constraints such as sparse coding and positivity both provide alternate biologically-plausible frameworks for generating brain networks. Non-negative Matrix Factorization (NMF) would suppress negative BOLD signal by enforcing positivity. Spatial sparse coding algorithms (L1L1 Regularized Learning and K-SVD) would impose local specialization and a discouragement of multitasking, where the total observed activity in a single voxel originates from a restricted number of possible brain networks. The assumptions of independence, positivity, and sparsity to encode task-related brain networks are compared; the resulting brain networks for different constraints are used as basis functions to encode the observed functional activity at a given time point. These encodings are decoded using machine learning to compare both the algorithms and their assumptions, using the time series weights to predict whether a subject is viewing a video, listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects. For classifying cognitive activity, the sparse coding algorithm of L1L1 Regularized Learning consistently outperformed 4 variations of ICA across different numbers of networks and noise levels (p<<0.001). The NMF algorithms, which suppressed negative BOLD signal, had the poorest accuracy. Within each algorithm, encodings using sparser spatial networks (containing more zero-valued voxels) had higher classification accuracy (p<<0.001). The success of sparse coding algorithms may suggest that algorithms which enforce sparse coding, discourage multitasking, and promote local specialization may capture better the underlying source processes than those which allow inexhaustible local processes such as ICA

    Impact of lag information on network inference

    Get PDF
    Extracting useful information from data is a fundamental challenge across disciplines as diverse as climate, neuroscience, genetics, and ecology. In the era of ``big data'', data is ubiquitous, but appropriated methods are needed for gaining reliable information from the data. In this work we consider a complex system, composed by interacting units, and aim at inferring which elements influence each other, directly from the observed data. The only assumption about the structure of the system is that it can be modeled by a network composed by a set of NN units connected with LL un-weighted and un-directed links, however, the structure of the connections is not known. In this situation the inference of the underlying network is usually done by using interdependency measures, computed from the output signals of the units. We show, using experimental data recorded from randomly coupled electronic R{\"o}ssler chaotic oscillators, that the information of the lag times obtained from bivariate cross-correlation analysis can be useful to gain information about the real connectivity of the system

    HR: A System for Machine Discovery in Finite Algebras

    Get PDF
    We describe the HR concept formation program which invents mathematical definitions and conjectures in finite algebras such as group theory and ring theory. We give the methods behind and the reasons for the concept formation in HR, an evaluation of its performance in its training domain, group theory, and a look at HR in domains other than group theory
    corecore