68 research outputs found

    FracSim: An R Package to Simulate Multifractional Lévy Motions

    Get PDF
    In this article a procedure is proposed to simulate fractional fields, which are non Gaussian counterpart of the fractional Brownian motion. These fields, called real harmonizable (multi)fractional Lévy motions, allow fixing the Hölder exponent at each point. FracSim is an R package developed in R and C language. Parallel computers have been also used.

    CCA: An R Package to Extend Canonical Correlation Analysis

    Get PDF
    Canonical correlations analysis (CCA) is an exploratory statistical method to highlight correlations between two data sets acquired on the same experimental units. The cancor() function in R (R Development Core Team 2007) performs the core of computations but further work was required to provide the user with additional tools to facilitate the interpretation of the results. We implemented an R package, CCA, freely available from the Comprehensive R Archive Network (CRAN, http://CRAN.R-project.org/), to develop numerical and graphical outputs and to enable the user to handle missing values. The CCA package also includes a regularized version of CCA to deal with data sets with more variables than units. Illustrations are given through the analysis of a data set coming from a nutrigenomic study in the mouse.

    Learning to Choose the Best System Configuration in Information Retrieval: the case of repeated queries

    Get PDF
    This paper presents a method that automatically decides which system configuration should be used to process a query. This method is developed for the case of repeated queries and implements a new kind of meta-system. It is based on a training process: the meta-system learns the best system configuration to use on a per query basis. After training, the meta-search system knows which configuration should treat a given query. The Learning to Choose method we developed selects the best configurations among many. This selective process rests on data analytics applied to system parameter values and their link with system effectiveness. Moreover, we optimize the parameters on a per-query basis. The training phase uses a limited amount of document relevance judgment. When the query is repeated or when an equal-query is submitted to the system, the meta-system automatically knows which parameters it should use to treat the query. This method its the case of changing collections since what is learnt is the relationship between a query and the best parameters to use to process it, rather than the relationship between a query and documents to retrieve. In this paper, we describe how data analysis can help to select among various configurations the ones that will be useful. The "Learning to choose" method is presented and evaluated using simulated data from TREC campaigns. We show that system performance highly increases in terms of precision, specifically for the queries that are difficult or medium difficult to answer. The other parameters of the method are also studied

    Improvement of variables interpretability in kernel PCA

    Full text link
    Kernel methods have been proven to be a powerful tool for the integration and analysis of highthroughput technologies generated data. Kernels offer a nonlinear version of any linear algorithm solely based on dot products. The kernelized version of Principal Component Analysis is a valid nonlinear alternative to tackle the nonlinearity of biological sample spaces. This paper proposes a novel methodology to obtain a data-driven feature importance based on the KPCA representation of the data. The proposed method, kernel PCA Interpretable Gradient (KPCA-IG), provides a datadriven feature importance that is computationally fast and based solely on linear algebra calculations. It has been compared with existing methods on three benchmark datasets. The accuracy obtained using KPCA-IG selected features is equal to or greater than the other methods' average. Also, the computational complexity required demonstrates the high efficiency of the method. An exhaustive literature search has been conducted on the selected genes from a publicly available Hepatocellular carcinoma dataset to validate the retained features from a biological point of view. The results once again remark on the appropriateness of the computed ranking. The black-box nature of kernel PCA needs new methods to interpret the original features. Our proposed methodology KPCA-IG proved to be a valid alternative to select influential variables in high-dimensional high-throughput datasets, potentially unravelling new biological and medical biomarkers

    integrOmics: an R package to unravel relationships between two omics datasets

    Get PDF
    Motivation: With the availability of many ‘omics’ data, such as transcriptomics, proteomics or metabolomics, the integrative or joint analysis of multiple datasets from different technology platforms is becoming crucial to unravel the relationships between different biological functional levels. However, the development of such an analysis is a major computational and technical challenge as most approaches suffer from high data dimensionality. New methodologies need to be developed and validated

    Muscle atrophy phenotype gene expression during spaceflight is linked to a metabolic crosstalk in both the liver and the muscle in mice

    Get PDF
    Human expansion in space is hampered by the physiological risks of spaceflight. The muscle and the liver are among the most affected tissues during spaceflight and their relationships in response to space exposure have never been studied. We compared the transcriptome response of liver and quadriceps from mice on NASA RR1 mission, after 37 days of exposure to spaceflight using GSEA, ORA, and sparse partial least square-differential analysis. We found that lipid metabolism is the most affected biological process between the two organs. A specific gene cluster expression pattern in the liver strongly correlated with glucose sparing and an energy-saving response affecting high energy demand process gene expression such as DNA repair, autophagy, and translation in the muscle. Our results show that impaired lipid metabolism gene expression in the liver and muscle atrophy gene expression are two paired events during spaceflight, for which dietary changes represent a possible countermeasure

    Urinary amine and organic acid metabolites evaluated as markers for childhood aggression : the ACTION biomarker study

    Get PDF
    Biomarkers are of interest as potential diagnostic and predictive instruments in personalized medicine. We present the first urinary metabolomics biomarker study of childhood aggression. We aim to examine the association of urinary metabolites and neurotransmitter ratios involved in key metabolic and neurotransmitter pathways in a large cohort of twins (N = 1,347) and clinic-referred children (N = 183) with an average age of 9.7 years. This study is part of ACTION (Aggression in Children: Unraveling gene-environment interplay to inform Treatment and InterventiON strategies), in which we developed a standardized protocol for large-scale collection of urine samples in children. Our analytical design consisted of three phases: a discovery phase in twins scoring low or high on aggression (N = 783); a replication phase in twin pairs discordant for aggression (N = 378); and a validation phase in clinical cases and matched twin controls (N = 367). In the discovery phase, 6 biomarkers were significantly associated with childhood aggression, of which the association of O-phosphoserine (beta = 0.36; SE = 0.09; p = 0.004), and gamma-L-glutamyl-L-alanine (beta = 0.32; SE = 0.09; p = 0.01) remained significant after multiple testing. Although non-significant, the directions of effect were congruent between the discovery and replication analyses for six biomarkers and two neurotransmitter ratios and the concentrations of 6 amines differed between low and high aggressive twins. In the validation analyses, the top biomarkers and neurotransmitter ratios, with congruent directions of effect, showed no significant associations with childhood aggression. We find suggestive evidence for associations of childhood aggression with metabolic dysregulation of neurotransmission, oxidative stress, and energy metabolism. Although replication is required, our findings provide starting points to investigate causal and pleiotropic effects of these dysregulations on childhood aggression

    Analysis of the real EADGENE data set: Comparison of methods and guidelines for data normalisation and selection of differentially expressed genes (Open Access publication)

    Get PDF
    A large variety of methods has been proposed in the literature for microarray data analysis. The aim of this paper was to present techniques used by the EADGENE (European Animal Disease Genomics Network of Excellence) WP1.4 participants for data quality control, normalisation and statistical methods for the detection of differentially expressed genes in order to provide some more general data analysis guidelines. All the workshop participants were given a real data set obtained in an EADGENE funded microarray study looking at the gene expression changes following artificial infection with two different mastitis causing bacteria: Escherichia coli and Staphylococcus aureus. It was reassuring to see that most of the teams found the same main biological results. In fact, most of the differentially expressed genes were found for infection by E. coli between uninfected and 24 h challenged udder quarters. Very little transcriptional variation was observed for the bacteria S. aureus. Lists of differentially expressed genes found by the different research teams were, however, quite dependent on the method used, especially concerning the data quality control step. These analyses also emphasised a biological problem of cross-talk between infected and uninfected quarters which will have to be dealt with for further microarray studies

    FracSim: An R Package to Simulate Multifractional Lévy Motions

    Get PDF
    In this article a procedure is proposed to simulate fractional fields, which are non Gaussian counterpart of the fractional Brownian motion. These fields, called real harmonizable (multi)fractional Lvy motions, allow fixing the Hlder exponent at each point. FracSim is an R package developed in R and C language. Parallel computers have been also used
    corecore