156,841 research outputs found
GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
Background: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results: We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions: GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines.Department of Agriculture, Food and the MarineEuropean Commission - Seventh Framework Programme (FP7)Science Foundation IrelandUniversity College Dubli
FORCE-TIME CURVE ALIGNMENT FOR FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS IN VERTICAL JUMPING
Functional principal component analysis (FPCA) can be used to extract key features from time series data for use in statistical models. This study evaluated time normalisation in combination with curve registration prior to performing FPCA. Using vertical ground reaction force data from countermovement jumps, evaluation was based on linear regression for predicting peak power and jump height, and logistic regression for classifying jump type (arm swing or not). Datasets not subject to time normalisation generally produced better results with the highest accuracy being achieved when using registration with peak power as a landmark (peak power R2 = 99.3%, jump height R2 = 94.9%). Classification of jump type benefited in some cases from registration (87.0% to 91.2%). These techniques could be applied to data from wearable sensors to improve prediction and classification
Decoding the Encoding of Functional Brain Networks: an fMRI Classification Comparison of Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA), and Sparse Coding Algorithms
Brain networks in fMRI are typically identified using spatial independent
component analysis (ICA), yet mathematical constraints such as sparse coding
and positivity both provide alternate biologically-plausible frameworks for
generating brain networks. Non-negative Matrix Factorization (NMF) would
suppress negative BOLD signal by enforcing positivity. Spatial sparse coding
algorithms ( Regularized Learning and K-SVD) would impose local
specialization and a discouragement of multitasking, where the total observed
activity in a single voxel originates from a restricted number of possible
brain networks.
The assumptions of independence, positivity, and sparsity to encode
task-related brain networks are compared; the resulting brain networks for
different constraints are used as basis functions to encode the observed
functional activity at a given time point. These encodings are decoded using
machine learning to compare both the algorithms and their assumptions, using
the time series weights to predict whether a subject is viewing a video,
listening to an audio cue, or at rest, in 304 fMRI scans from 51 subjects.
For classifying cognitive activity, the sparse coding algorithm of
Regularized Learning consistently outperformed 4 variations of ICA across
different numbers of networks and noise levels (p0.001). The NMF algorithms,
which suppressed negative BOLD signal, had the poorest accuracy. Within each
algorithm, encodings using sparser spatial networks (containing more
zero-valued voxels) had higher classification accuracy (p0.001). The success
of sparse coding algorithms may suggest that algorithms which enforce sparse
coding, discourage multitasking, and promote local specialization may capture
better the underlying source processes than those which allow inexhaustible
local processes such as ICA
Impact of lag information on network inference
Extracting useful information from data is a fundamental challenge across
disciplines as diverse as climate, neuroscience, genetics, and ecology. In the
era of ``big data'', data is ubiquitous, but appropriated methods are needed
for gaining reliable information from the data. In this work we consider a
complex system, composed by interacting units, and aim at inferring which
elements influence each other, directly from the observed data. The only
assumption about the structure of the system is that it can be modeled by a
network composed by a set of units connected with un-weighted and
un-directed links, however, the structure of the connections is not known. In
this situation the inference of the underlying network is usually done by using
interdependency measures, computed from the output signals of the units. We
show, using experimental data recorded from randomly coupled electronic
R{\"o}ssler chaotic oscillators, that the information of the lag times obtained
from bivariate cross-correlation analysis can be useful to gain information
about the real connectivity of the system
HR: A System for Machine Discovery in Finite Algebras
We describe the HR concept formation program which invents mathematical definitions and conjectures in finite algebras such as group theory and ring theory. We give the methods behind and the reasons for the concept formation in HR, an evaluation of its performance in its training domain, group theory, and a look at HR in domains other than group theory
- …