66 research outputs found
Supervised classification of combined copy number and gene expression data
Summary In this paper we apply a predictive profiling method to genome copy number aberrations (CNA) in combination with gene expression and clinical data to identify molecular patterns of cancer pathophysiology. Predictive models and optimal feature lists for the platforms are developed by a complete validation SVM-based machine learning system. Ranked list of genome CNA sites (assessed by comparative genomic hybridization arrays â aCGH) and of differentially expressed genes (assessed by microarray profiling with Affy HG-U133A chips) are computed and combined on a breast cancer dataset for the discrimination of Luminal/ ER+ (Lum/ER+) and Basal-like/ER- classes. Different encodings are developed and applied to the CNA data, and predictive variable selection is discussed. We analyze the combination of profiling information between the platforms, also considering the pathophysiological data. A specific subset of patients is identified that has a different response to classification by chromosomal gains and losses and by differentially expressed genes, corroborating the idea that genomic CNA can represent an independent source for tumor classification
Two-omics data revealed commonalities and differences between Rpv12- and Rpv3-mediated resistance in grapevine
Plasmopara viticola is the causal agent of grapevine downy mildew (DM). DM resistant varieties deploy effector-triggered immunity (ETI) to inhibit pathogen growth, which is activated by major resistance loci, the most common of which are Rpv3 and Rpv12. We previously showed that a quick metabolome response lies behind the ETI conferred by Rpv3 TIR-NB-LRR genes. Here we used a grape variety operating Rpv12-mediated ETI, which is conferred by an independent locus containing CC-NB-LRR genes, to investigate the defence response using GC/MS, UPLC, UHPLC and RNA-Seq analyses. Eighty-eight metabolites showed significantly different concentration and 432 genes showed differential expression between inoculated resistant leaves and controls. Most metabolite changes in sugars, fatty acids and phenols were similar in timing and direction to those observed in Rpv3-mediated ETI but some of them were stronger or more persistent. Activators, elicitors and signal transducers for the formation of reactive oxygen species were early observed in samples undergoing Rpv12-mediated ETI and were paralleled and followed by the upregulation of genes belonging to ontology categories associated with salicylic acid signalling, signal transduction, WRKY transcription factors and synthesis of PR-1, PR-2, PR-5 pathogenesis-related proteins
Algebraic Comparison of Partial Lists in Bioinformatics
The outcome of a functional genomics pipeline is usually a partial list of
genomic features, ranked by their relevance in modelling biological phenotype
in terms of a classification or regression model. Due to resampling protocols
or just within a meta-analysis comparison, instead of one list it is often the
case that sets of alternative feature lists (possibly of different lengths) are
obtained. Here we introduce a method, based on the algebraic theory of
symmetric groups, for studying the variability between lists ("list stability")
in the case of lists of unequal length. We provide algorithms evaluating
stability for lists embedded in the full feature set or just limited to the
features occurring in the partial lists. The method is demonstrated first on
synthetic data in a gene filtering task and then for finding gene profiles on a
recent prostate cancer dataset
A new classification method using array Comparative Genome Hybridization data, based on the concept of Limited Jumping Emerging Patterns
<p>Abstract</p> <p>Background</p> <p>Classification using aCGH data is an important and insufficiently investigated problem in bioinformatics. In this paper we propose a new classification method of DNA copy number data based on the concept of limited Jumping Emerging Patterns. We present the comparison of our limJEPClassifier to SVM which is considered the most successful classifier in the case of high-throughput data.</p> <p>Results</p> <p>Our results revealed that the classification performance using limJEPClassifier is significantly higher than other methods. Furthermore, we show that application of the limited JEP's can significantly improve classification, when strongly unbalanced data are given.</p> <p>Conclusion</p> <p>Nowadays, aCGH has become a very important tool, used in research of cancer or genomic disorders. Therefore, improving classification of aCGH data can have a great impact on many medical issues such as the process of diagnosis and finding disease-related genes. The performed experiment shows that the application of Jumping Emerging Patterns can be effective in the classification of high-dimensional data, including these from aCGH experiments.</p
Nutrimetabolomics: An Integrative Action for Metabolomic Analyses in Human Nutritional Studies
The life sciences are currently being transformed by an unprecedented wave of developments in molecular analysis, which include important advances in instrumental analysis as well as biocomputing. In light of the central role played by metabolism in nutrition, metabolomics is rapidly being established as a key analytical tool in human nutritional studies. Consequently, an increasing number of nutritionists integrate metabolomics into their study designs. Within this dynamic landscape, the potential of nutritional metabolomics (nutrimetabolomics) to be translated into a science, which can impact on health policies, still needs to be realized. A key element to reach this goal is the ability of the research community to join, to collectively make the best use of the potential offered by nutritional metabolomics. This article, therefore, provides a methodological description of nutritional metabolomics that reflects on the stateâofâtheâart techniques used in the laboratories of the Food Biomarker Alliance (funded by the European Joint Programming Initiative "A Healthy Diet for a Healthy Life" (JPI HDHL)) as well as points of reflections to harmonize this field. It is not intended to be exhaustive but rather to present a pragmatic guidance on metabolomic methodologies, providing readers with useful "tips and tricks" along the analytical workflow
Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment
MOTIVATION:
The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods.
METHODS:
We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state.
RESULTS:
The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results
Performance of the ATLAS electromagnetic calorimeter end-cap module 0
The construction and beam test results of the ATLAS electromagnetic end-cap calorimeter pre-production module 0 are presented. The stochastic term of the energy resolution is between 10% GeV^1/2 and 12.5% GeV^1/2 over the full pseudorapidity range. Position and angular resolutions are found to be in agreement with simulation. A global constant term of 0.6% is obtained in the pseudorapidity range 2.5 eta 3.2 (inner wheel)
Projection to latent structures with orthogonal constraints for metabolomics data
Multivariate techniques based on projection methods such as Principal Component Analysis and Partial Least Squares (PLS) regression are widely applied in metabolomics. However, the effects of confounding factors and the presence of specific clusters in the data could force the projection to produce inefficient representations in the latent space, preventing the identification of the most relevant data variation. To overcome this issue, we introduce a general framework for projection methods, allowing an easy integration of orthogonal constraints, which help in reducing the effect of uninformative variations. In particular, the discussed algorithms address different scenarios. When known confounding factors can be explicitly encoded into a proper constraint matrix, orthogonally Constrained Principal Component Analysis (oCPCA) and orthogonally Constrained PLS2 (oCPLS2) can be used. Orthogonal PLS (OPLS) and postâtransformation of PLS2 (ptPLS2), instead, are suited to problems in which a constraint matrix cannot be defined. Finally, a data integration task is considered: Orthogonal twoâblock PLS (O2PLS) and Orthogonal Wold's twoâblock Mode A PLS (OPLSâW2A) are used to identify the common variation between two data set
- âŠ