48 research outputs found

    Supervised classification of combined copy number and gene expression data

    Get PDF
    Summary In this paper we apply a predictive profiling method to genome copy number aberrations (CNA) in combination with gene expression and clinical data to identify molecular patterns of cancer pathophysiology. Predictive models and optimal feature lists for the platforms are developed by a complete validation SVM-based machine learning system. Ranked list of genome CNA sites (assessed by comparative genomic hybridization arrays – aCGH) and of differentially expressed genes (assessed by microarray profiling with Affy HG-U133A chips) are computed and combined on a breast cancer dataset for the discrimination of Luminal/ ER+ (Lum/ER+) and Basal-like/ER- classes. Different encodings are developed and applied to the CNA data, and predictive variable selection is discussed. We analyze the combination of profiling information between the platforms, also considering the pathophysiological data. A specific subset of patients is identified that has a different response to classification by chromosomal gains and losses and by differentially expressed genes, corroborating the idea that genomic CNA can represent an independent source for tumor classification

    Two-omics data revealed commonalities and differences between Rpv12- and Rpv3-mediated resistance in grapevine

    Get PDF
    Plasmopara viticola is the causal agent of grapevine downy mildew (DM). DM resistant varieties deploy effector-triggered immunity (ETI) to inhibit pathogen growth, which is activated by major resistance loci, the most common of which are Rpv3 and Rpv12. We previously showed that a quick metabolome response lies behind the ETI conferred by Rpv3 TIR-NB-LRR genes. Here we used a grape variety operating Rpv12-mediated ETI, which is conferred by an independent locus containing CC-NB-LRR genes, to investigate the defence response using GC/MS, UPLC, UHPLC and RNA-Seq analyses. Eighty-eight metabolites showed significantly different concentration and 432 genes showed differential expression between inoculated resistant leaves and controls. Most metabolite changes in sugars, fatty acids and phenols were similar in timing and direction to those observed in Rpv3-mediated ETI but some of them were stronger or more persistent. Activators, elicitors and signal transducers for the formation of reactive oxygen species were early observed in samples undergoing Rpv12-mediated ETI and were paralleled and followed by the upregulation of genes belonging to ontology categories associated with salicylic acid signalling, signal transduction, WRKY transcription factors and synthesis of PR-1, PR-2, PR-5 pathogenesis-related proteins

    Algebraic Comparison of Partial Lists in Bioinformatics

    Get PDF
    The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset

    Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment

    Get PDF
    MOTIVATION: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods. METHODS: We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state. RESULTS: The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results

    Performance of the ATLAS electromagnetic calorimeter end-cap module 0

    Get PDF
    The construction and beam test results of the ATLAS electromagnetic end-cap calorimeter pre-production module 0 are presented. The stochastic term of the energy resolution is between 10% GeV^1/2 and 12.5% GeV^1/2 over the full pseudorapidity range. Position and angular resolutions are found to be in agreement with simulation. A global constant term of 0.6% is obtained in the pseudorapidity range 2.5 eta 3.2 (inner wheel)

    ATLAS detector and physics performance: Technical Design Report, 1

    Get PDF

    Experimental shear testing of timber-masonry dry connections for the seismic retrofit of unreinforced masonry shear walls

    No full text
    The mechanical coupling of timber products to the masonry walls of unreinforced masonry (URM) buildings is generating considerable interest in terms of seismic vulnerability mitigation. An extensive experimental investigation on timber panel to masonry wall connections realised with screw anchor fasteners is presented. A total of 64 shear tests under monotonic, cyclic and semi-cyclic loading conditions were performed on site in a historic URM building. The examined parameters were: masonry type, timber panel product and material, load-to-grain direction, fastener geometry and steel grade. The outcomes of the campaign are then reported and discussed focusing on the strength and stiffness properties and on the dissipation capacity and residual strength of the connection under cyclic load. Moreover, a lognormal distribution fitting is proposed for the maximum load and slip modulus measurements of all the cyclic test configurations analysed. Finally, the principal experimental observations are listed along with recommendations for future work or use in practic

    Machine learning methods for predictive proteomics

    No full text
    The search for predictive biomarkers of disease from high-throughput mass spectrometry (MS) data requires a complex analysis path. Preprocessing and machine-learning modules are pipelined, starting from raw spectra, to set up a predictive classifier based on a shortlist of candidate features. As a machine-learning problem, proteomic profiling on MS data needs caution like the microarray case. The risk of overfitting and of selection bias effects is pervasive: not only potential features easily outnumber samples by 103 times, but it is easy to neglect information-leakage effects during preprocessing from spectra to peaks. The aim of this review is to explain how to build a general purpose design analysis protocol (DAP) for predictive proteomic profiling: we show how to limit leakage due to parameter tuning and how to organize classification and ranking on large numbers of replicate versions of the original data to avoid selection bias. The DAP can be used with alternative components, i.e. with different preprocessing methods (peak clustering or wavelet based), classifiers e.g. Suport Vector Machine (SVM) or feature ranking methods recursive feature elimination (RFE) or I-Relief. A procedure for assessing stability and predictive value of the resulting biomarkers’ list is also provided. The approach is exemplified with experiments on synthetic datasets (from the Cromwell MS simulator) and with publicly available datasets from cancer studies
    corecore