374 research outputs found

    Knowledge-based gene expression classification via matrix factorization

    Get PDF
    Motivation: Modern machine learning methods based on matrix decomposition techniques, like independent component analysis (ICA) or non-negative matrix factorization (NMF), provide new and efficient analysis tools which are currently explored to analyze gene expression profiles. These exploratory feature extraction techniques yield expression modes (ICA) or metagenes (NMF). These extracted features are considered indicative of underlying regulatory processes. They can as well be applied to the classification of gene expression datasets by grouping samples into different categories for diagnostic purposes or group genes into functional categories for further investigation of related metabolic pathways and regulatory networks. Results: In this study we focus on unsupervised matrix factorization techniques and apply ICA and sparse NMF to microarray datasets. The latter monitor the gene expression levels of human peripheral blood cells during differentiation from monocytes to macrophages. We show that these tools are able to identify relevant signatures in the deduced component matrices and extract informative sets of marker genes from these gene expression profiles. The methods rely on the joint discriminative power of a set of marker genes rather than on single marker genes. With these sets of marker genes, corroborated by leave-one-out or random forest cross-validation, the datasets could easily be classified into related diagnostic categories. The latter correspond to either monocytes versus macrophages or healthy vs Niemann Pick C disease patients.Siemens AG, MunichDFG (Graduate College 638)DAAD (PPP Luso - Alem˜a and PPP Hispano - Alemanas

    Evaluating the discriminating capacity of cell death (apoptotic) biomarkers in sepsis.

    Get PDF
    Background: Sepsis biomarker panels that provide diagnostic and prognostic discrimination in sepsis patients would be transformative to patient care. We assessed the mortality prediction and diagnostic discriminatory accuracy of two biomarkers reflective of cell death (apoptosis), circulating cell-free DNA (cfDNA), and nucleosomes. Methods: The cfDNA and nucleosome levels were assayed in plasma samples acquired in patients admitted from four emergency departments with suspected sepsis. Subjects with non-infectious systemic inflammatory response syndrome (SIRS) served as controls. Samples were acquired at enrollment (T0) and 24 h later (T24). We assessed diagnostic (differentiating SIRS from sepsis) and prognostic (28-day mortality) predictive power. Models incorporating procalcitonin (diagnostic prediction) and APACHE II scores (mortality prediction) were generated. Results: Two hundred three subjects were included (107 provided procalcitonin measurements). Four subjects exhibited uncomplicated sepsis, 127 severe sepsis, 35 septic shock, and 24 had non-infectious SIRS. There were 190-survivors and 13 non-survivors. Mortality prediction models using cfDNA, nucleosomes, or APACHEII yielded AUC values of 0.61, 0.75, and 0.81, respectively. A model combining nucleosomes with the APACHE II score improved the AUC to 0.84. Diagnostic models distinguishing sepsis from SIRS using procalcitonin, cfDNA(T0), or nucleosomes(T0) yielded AUC values of 0.64, 0.65, and 0.63, respectively. The three parameter model yielded an AUC of 0.74. Conclusions: To our knowledge, this is the first head-to-head comparison of cfDNA and nucleosomes in diagnosing sepsis and predicting sepsis-related mortality. Both cfDNA and nucleosome concentrations demonstrated a modest ability to distinguish sepsis survivors and non-survivors and provided additive diagnostic predictive accuracy in differentiating sepsis from non-infectious SIRS when integrated into a diagnostic prediction model including PCT and APACHE II. A sepsis biomarker strategy incorporating measures of the apoptotic pathway may serve as an important component of a sepsis diagnostic and mortality prediction tool

    D-MaPs - DNA-microarray projects: Web-based software for multi-platform microarray analysis

    Get PDF
    The web application D-Maps provides a user-friendly interface to researchers performing studies based on microarrays. The program was developed to manage and process one- or two-color microarray data obtained from several platforms (currently, GeneTAC, ScanArray, CodeLink, NimbleGen and Affymetrix). Despite the availability of many algorithms and many software programs designed to perform microarray analysis on the internet, these usually require sophisticated knowledge of mathematics, statistics and computation. D-maps was developed to overcome the requirement of high performance computers or programming experience. D-Maps performs raw data processing, normalization and statistical analysis, allowing access to the analyzed data in text or graphical format. An original feature presented by D-Maps is GEO (Gene Expression Omnibus) submission format service. The D-MaPs application was already used for analysis of oligonucleotide microarrays and PCR-spotted arrays (one- and two-color, laser and light scanner). In conclusion, D-Maps is a valuable tool for microarray research community, especially in the case of groups without a bioinformatic core

    Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools.</p> <p>Results</p> <p>We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net).</p> <p>Conclusion</p> <p>The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.</p

    Functional Annotation and Identification of Candidate Disease Genes by Computational Analysis of Normal Tissue Gene Expression Data

    Get PDF
    Background: High-throughput gene expression data can predict gene function through the ‘‘guilt by association’ ’ principle: coexpressed genes are likely to be functionally associated. Methodology/Principal Findings: We analyzed publicly available expression data on normal human tissues. The analysis is based on the integration of data obtained with two experimental platforms (microarrays and SAGE) and of various measures of dissimilarity between expression profiles. The building blocks of the procedure are the Ranked Coexpression Groups (RCG), small sets of tightly coexpressed genes which are analyzed in terms of functional annotation. Functionally characterized RCGs are selected by means of the majority rule and used to predict new functional annotations. Functionally characterized RCGs are enriched in groups of genes associated to similar phenotypes. We exploit this fact to find new candidate disease genes for many OMIM phenotypes of unknown molecular origin. Conclusions/Significance: We predict new functional annotations for many human genes, showing that the integration of different data sets and coexpression measures significantly improves the scope of the results. Combining gene expression data, functional annotation and known phenotype-gene associations we provide candidate genes for several geneti

    High-confidence glycosome proteome for procyclic form <em>Trypanosoma brucei</em> by epitope-tag organelle enrichment and SILAC proteomics

    Get PDF
    The glycosome of the pathogenic African trypanosome Trypanosoma brucei is a specialized peroxisome that contains most of the enzymes of glycolysis and several other metabolic and catabolic pathways. The contents and transporters of this membrane-bounded organelle are of considerable interest as potential drug targets. Here we use epitope tagging, magnetic bead enrichment, and SILAC quantitative proteomics to determine a high-confidence glycosome proteome for the procyclic life cycle stage of the parasite using isotope ratios to discriminate glycosomal from mitochondrial and other contaminating proteins. The data confirm the presence of several previously demonstrated and suggested pathways in the organelle and identify previously unanticipated activities, such as protein phosphatases. The implications of the findings are discussed

    Lone-pair stabilization in transparent amorphous tin oxides:a potential route to p-type conduction pathways

    Get PDF
    The electronic and atomic structures of amorphous transparent tin oxides have been investigated by a combination of X-ray spectroscopy and atomistic calculations. Crystalline SnO is a promising p-type transparent oxide semiconductor due to a complex lone-pair hybridization that affords both optical transparency despite a small electronic band gap and spherical s-orbital character at the valence band edge. We find that both of these desirable properties (transparency and s-orbital valence band character) are retained upon amorphization despite the disruption of the layered lone-pair states by structural disorder. We explain the anomalously large band gap widening necessary to maintain transparency in terms of lone-pair stabilization via atomic clustering. Our understanding of this mechanism suggests that continuous hole conduction pathways along extended lone pair clusters should be possible under certain stoichiometries. Moreover, these findings should be applicable to other lone-pair active semiconductors

    Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Classification and variable selection play an important role in knowledge discovery in high-dimensional data. Although Support Vector Machine (SVM) algorithms are among the most powerful classification and prediction methods with a wide range of scientific applications, the SVM does not include automatic feature selection and therefore a number of feature selection procedures have been developed. Regularisation approaches extend SVM to a feature selection method in a flexible way using penalty functions like LASSO, SCAD and Elastic Net.</p> <p>We propose a novel penalty function for SVM classification tasks, Elastic SCAD, a combination of SCAD and ridge penalties which overcomes the limitations of each penalty alone.</p> <p>Since SVM models are extremely sensitive to the choice of tuning parameters, we adopted an interval search algorithm, which in comparison to a fixed grid search finds rapidly and more precisely a global optimal solution.</p> <p>Results</p> <p>Feature selection methods with combined penalties (Elastic Net and Elastic SCAD SVMs) are more robust to a change of the model complexity than methods using single penalties. Our simulation study showed that Elastic SCAD SVM outperformed LASSO (<it>L</it><sub>1</sub>) and SCAD SVMs. Moreover, Elastic SCAD SVM provided sparser classifiers in terms of median number of features selected than Elastic Net SVM and often better predicted than Elastic Net in terms of misclassification error.</p> <p>Finally, we applied the penalization methods described above on four publicly available breast cancer data sets. Elastic SCAD SVM was the only method providing robust classifiers in sparse and non-sparse situations.</p> <p>Conclusions</p> <p>The proposed Elastic SCAD SVM algorithm provides the advantages of the SCAD penalty and at the same time avoids sparsity limitations for non-sparse data. We were first to demonstrate that the integration of the interval search algorithm and penalized SVM classification techniques provides fast solutions on the optimization of tuning parameters.</p> <p>The penalized SVM classification algorithms as well as fixed grid and interval search for finding appropriate tuning parameters were implemented in our freely available R package 'penalizedSVM'.</p> <p>We conclude that the Elastic SCAD SVM is a flexible and robust tool for classification and feature selection tasks for high-dimensional data such as microarray data sets.</p

    Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement

    Get PDF
    New auditory-inspired speech processing methods are presented in this paper, combining spectral subtraction and two-dimensional non-linear filtering techniques originally conceived for image processing purposes. In particular, mathematical morphology operations, like erosion and dilation, are applied to noisy speech spectrograms using specifically designed structuring elements inspired in the masking properties of the human auditory system. This is effectively complemented with a pre-processing stage including the conventional spectral subtraction procedure and auditory filterbanks. These methods were tested in both speech enhancement and automatic speech recognition tasks. For the first, time-frequency anisotropic structuring elements over grey-scale spectrograms were found to provide a better perceptual quality than isotropic ones, revealing themselves as more appropriate—under a number of perceptual quality estimation measures and several signal-to-noise ratios on the Aurora database—for retaining the structure of speech while removing background noise. For the second, the combination of Spectral Subtraction and auditory-inspired Morphological Filtering was found to improve recognition rates in a noise-contaminated version of the Isolet database.This work has been partially supported by the Spanish Ministry of Science and Innovation CICYT Project No. TEC2008-06382/TEC.Publicad

    Microarray scanner calibration curves: characteristics and implications

    Get PDF
    BACKGROUND: Microarray-based measurement of mRNA abundance assumes a linear relationship between the fluorescence intensity and the dye concentration. In reality, however, the calibration curve can be nonlinear. RESULTS: By scanning a microarray scanner calibration slide containing known concentrations of fluorescent dyes under 18 PMT gains, we were able to evaluate the differences in calibration characteristics of Cy5 and Cy3. First, the calibration curve for the same dye under the same PMT gain is nonlinear at both the high and low intensity ends. Second, the degree of nonlinearity of the calibration curve depends on the PMT gain. Third, the two PMTs (for Cy5 and Cy3) behave differently even under the same gain. Fourth, the background intensity for the Cy3 channel is higher than that for the Cy5 channel. The impact of such characteristics on the accuracy and reproducibility of measured mRNA abundance and the calculated ratios was demonstrated. Combined with simulation results, we provided explanations to the existence of ratio underestimation, intensity-dependence of ratio bias, and anti-correlation of ratios in dye-swap replicates. We further demonstrated that although Lowess normalization effectively eliminates the intensity-dependence of ratio bias, the systematic deviation from true ratios largely remained. A method of calculating ratios based on concentrations estimated from the calibration curves was proposed for correcting ratio bias. CONCLUSION: It is preferable to scan microarray slides at fixed, optimal gain settings under which the linearity between concentration and intensity is maximized. Although normalization methods improve reproducibility of microarray measurements, they appear less effective in improving accuracy
    corecore