2 research outputs found
Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra
In
proteomics, liquid chromatographyātandem mass spectrometry
(LCāMS/MS) is established for identifying peptides and proteins.
Duplicated spectra, that is, multiple spectra of the same peptide,
occur both in single MS/MS runs and in large spectral libraries. Clustering
tandem mass spectra is used to find consensus spectra, with manifold
applications. First, it speeds up database searches, as performed
for instance by Mascot. Second, it helps to identify novel peptides
across species. Third, it is used for quality control to detect wrongly
annotated spectra. We compare different clustering algorithms based
on the cosine distance between spectra. CAST, MS-Cluster, and PRIDE
Cluster are popular algorithms to cluster tandem mass spectra. We
add well-known algorithms for large data sets, hierarchical clustering,
DBSCAN, and connected components of a graph, as well as the new method
N-Cluster. All algorithms are evaluated on real data with varied parameter
settings. Cluster results are compared with each other and with peptide
annotations based on validation measures such as purity. Quality control,
regarding the detection of wrongly (un)Āannotated spectra, is discussed
for exemplary resulting clusters. N-Cluster proves to be highly competitive.
All clustering results benefit from the so-called DISMS2 filter that
integrates additional information, for example, on precursor mass
Design Principles of Concentration-Dependent Transcriptome Deviations in Drug-Exposed Differentiating Stem Cells
Information on design principles
governing transcriptome changes
upon transition from safe to hazardous drug concentrations or from
tolerated to cytotoxic drug levels are important for the application
of toxicogenomics data in developmental toxicology. Here, we tested
the effect of eight concentrations of valproic acid (VPA; 25ā1000
Ī¼M) in an assay that recapitulates the development of human
embryonic stem cells to neuroectoderm. Cells were exposed to the drug
during the entire differentiation process, and the number of differentially
regulated genes increased continuously over the concentration range
from zero to about 3000. We identified overrepresented transcription
factor binding sites (TFBS) as well as superordinate cell biological
processes, and we developed a gene ontology (GO) activation profiler,
as well as a two-dimensional teratogenicity index. Analysis of the
transcriptome data set by the above biostatistical and systems biology
approaches yielded the following insights: (i) tolerated (ā¤25
Ī¼M), deregulated/teratogenic (150ā550 Ī¼M), and
cytotoxic (ā„800 Ī¼M) concentrations could be differentiated.
(ii) Biological signatures related to the mode of action of VPA, such
as protein acetylation, developmental changes, and cell migration,
emerged from the teratogenic concentrations range. (iii) Cytotoxicity
was not accompanied by signatures of newly emerging canonical cell
death/stress indicators, but by catabolism and decreased expression
of cell cycle associated genes. (iv) Most, but not all of the GO groups
and TFBS seen at the highest concentrations were already overrepresented
at 350ā450 Ī¼M. (v) The teratogenicity index reflected
this behavior, and thus differed strongly from cytotoxicity. Our findings
suggest the use of the highest noncytotoxic drug concentration for
gene array toxicogenomics studies, as higher concentrations possibly
yield wrong information on the mode of action, and lower drug levels
result in decreased gene expression changes and thus a reduced power
of the study