Mass spectrometry (MS) based techniques have emerged as a standard forlarge-scale protein analysis. The ongoing progress in terms of more sensitive
machines and improved data analysis algorithms led to a constant expansion of
its fields of applications. Recently, MS was introduced into clinical proteomics
with the prospect of early disease detection using proteomic pattern matching.
Analyzing biological samples (e.g. blood) by mass spectrometry generates
mass spectra that represent the components (molecules) contained in a
sample as masses and their respective relative concentrations.
In this work, we are interested in those components that are constant within a
group of individuals but differ much between individuals of two distinct groups.
These distinguishing components that dependent on a particular medical condition
are generally called biomarkers. Since not all biomarkers found by the
algorithms are of equal (discriminating) quality we are only interested in a
small biomarker subset that - as a combination - can be used as a
fingerprint for a disease. Once a fingerprint for a particular disease
(or medical condition) is identified, it can be used in clinical diagnostics to
classify unknown spectra.
In this thesis we have developed new algorithms for automatic extraction of
disease specific fingerprints from mass spectrometry data. Special emphasis has
been put on designing highly sensitive methods with respect to signal detection.
Thanks to our statistically based approach our methods are able to
detect signals even below the noise level inherent in data acquired by common MS
machines, such as hormones.
To provide access to these new classes of algorithms to collaborating groups
we have created a web-based analysis platform that provides all necessary
interfaces for data transfer, data analysis and result inspection.
To prove the platform's practical relevance it has been utilized in several
clinical studies two of which are presented in this thesis. In these studies it
could be shown that our platform is superior to commercial systems with respect
to fingerprint identification. As an outcome of these studies several
fingerprints for different cancer types (bladder, kidney, testicle, pancreas,
colon and thyroid) have been detected and validated. The clinical partners in
fact emphasize that these results would be impossible with a less sensitive
analysis tool (such as the currently available systems).
In addition to the issue of reliably finding and handling signals in noise we
faced the problem to handle very large amounts of data, since an average dataset
of an individual is about 2.5 Gigabytes in size and we have data of hundreds to
thousands of persons. To cope with these large datasets, we developed a new
framework for a heterogeneous (quasi) ad-hoc Grid - an infrastructure that
allows to integrate thousands of computing resources (e.g. Desktop Computers,
Computing Clusters or specialized hardware, such as IBM's Cell Processor in a
Playstation 3)