2 research outputs found

    Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra

    No full text
    In proteomics, liquid chromatographyā€“tandem mass spectrometry (LCā€“MS/MS) is established for identifying peptides and proteins. Duplicated spectra, that is, multiple spectra of the same peptide, occur both in single MS/MS runs and in large spectral libraries. Clustering tandem mass spectra is used to find consensus spectra, with manifold applications. First, it speeds up database searches, as performed for instance by Mascot. Second, it helps to identify novel peptides across species. Third, it is used for quality control to detect wrongly annotated spectra. We compare different clustering algorithms based on the cosine distance between spectra. CAST, MS-Cluster, and PRIDE Cluster are popular algorithms to cluster tandem mass spectra. We add well-known algorithms for large data sets, hierarchical clustering, DBSCAN, and connected components of a graph, as well as the new method N-Cluster. All algorithms are evaluated on real data with varied parameter settings. Cluster results are compared with each other and with peptide annotations based on validation measures such as purity. Quality control, regarding the detection of wrongly (un)Ā­annotated spectra, is discussed for exemplary resulting clusters. N-Cluster proves to be highly competitive. All clustering results benefit from the so-called DISMS2 filter that integrates additional information, for example, on precursor mass

    Design Principles of Concentration-Dependent Transcriptome Deviations in Drug-Exposed Differentiating Stem Cells

    No full text
    Information on design principles governing transcriptome changes upon transition from safe to hazardous drug concentrations or from tolerated to cytotoxic drug levels are important for the application of toxicogenomics data in developmental toxicology. Here, we tested the effect of eight concentrations of valproic acid (VPA; 25ā€“1000 Ī¼M) in an assay that recapitulates the development of human embryonic stem cells to neuroectoderm. Cells were exposed to the drug during the entire differentiation process, and the number of differentially regulated genes increased continuously over the concentration range from zero to about 3000. We identified overrepresented transcription factor binding sites (TFBS) as well as superordinate cell biological processes, and we developed a gene ontology (GO) activation profiler, as well as a two-dimensional teratogenicity index. Analysis of the transcriptome data set by the above biostatistical and systems biology approaches yielded the following insights: (i) tolerated (ā‰¤25 Ī¼M), deregulated/teratogenic (150ā€“550 Ī¼M), and cytotoxic (ā‰„800 Ī¼M) concentrations could be differentiated. (ii) Biological signatures related to the mode of action of VPA, such as protein acetylation, developmental changes, and cell migration, emerged from the teratogenic concentrations range. (iii) Cytotoxicity was not accompanied by signatures of newly emerging canonical cell death/stress indicators, but by catabolism and decreased expression of cell cycle associated genes. (iv) Most, but not all of the GO groups and TFBS seen at the highest concentrations were already overrepresented at 350ā€“450 Ī¼M. (v) The teratogenicity index reflected this behavior, and thus differed strongly from cytotoxicity. Our findings suggest the use of the highest noncytotoxic drug concentration for gene array toxicogenomics studies, as higher concentrations possibly yield wrong information on the mode of action, and lower drug levels result in decreased gene expression changes and thus a reduced power of the study
    corecore