338 research outputs found
Inferential stability in systems biology
The modern biological sciences are fraught with statistical difficulties. Biomolecular
stochasticity, experimental noise, and the “large p, small n” problem all contribute to
the challenge of data analysis. Nevertheless, we routinely seek to draw robust, meaningful
conclusions from observations. In this thesis, we explore methods for assessing
the effects of data variability upon downstream inference, in an attempt to quantify and
promote the stability of the inferences we make.
We start with a review of existing methods for addressing this problem, focusing upon the
bootstrap and similar methods. The key requirement for all such approaches is a statistical
model that approximates the data generating process.
We move on to consider biomarker discovery problems. We present a novel algorithm for
proposing putative biomarkers on the strength of both their predictive ability and the stability
with which they are selected. In a simulation study, we find our approach to perform
favourably in comparison to strategies that select on the basis of predictive performance
alone.
We then consider the real problem of identifying protein peak biomarkers for HAM/TSP,
an inflammatory condition of the central nervous system caused by HTLV-1 infection.
We apply our algorithm to a set of SELDI mass spectral data, and identify a number of
putative biomarkers. Additional experimental work, together with known results from the
literature, provides corroborating evidence for the validity of these putative biomarkers.
Having focused on static observations, we then make the natural progression to time
course data sets. We propose a (Bayesian) bootstrap approach for such data, and then
apply our method in the context of gene network inference and the estimation of parameters
in ordinary differential equation models. We find that the inferred gene networks
are relatively unstable, and demonstrate the importance of finding distributions of ODE
parameter estimates, rather than single point estimates
Human proteomic profiles in latent and active tuberculosis
Distinguishing patients with active tuberculosis (TB) from those with latent TB is an
important clinical problem. The SELDI-TOF MS (Surface Enhanced Laser
Desorption Ionisation – Time of Flight Mass Spectrometry) platform allows for high
throughput detection of multiple proteins in biological fluids. Proteomic patterns
reflecting host-pathogen interaction can be used as a tool to aid our understanding of
the Natural History of Tuberculosis.
Methods: Plasma samples were collected prospectively in a shanty town in Lima,
Peru. Latent and active TB status was defined using the Tuberculin Skin Test (TST),
Quantiferon (QFN) assay and TB culture. Crude plasma and fractionated plasma
samples were analysed on weak cationic CM10 chip surfaces using a Biomek 3000
Laboratory Automation Workstation. Spectra were generated using a ProteinChip
System 4000 Mass spectrometer. Data was analysed using a Support Vector Machine.
Results:
Samples were collected from 154 patients with active TB, 112 patients with
respiratory symptoms suggestive of TB and 151 healthy controls. Multiple peaks
differed significantly between active TB patients and unhealthy controls. Trained
optimal classifiers discriminate between:
i) active TB and unhealthy controls with 84% accuracy (87% sensitivity, 79%
specificity) in crude plasma and up to 89% accuracy (90% sensitivity, 88%
specificity) in fractionated plasma
ii) active TB and latent TB with 89% accuracy (90% sensitivity, 89% specificity)
iii) latent TB and no TB in healthy controls with 77% accuracy (67% sensitivity, 84%
specificity).
Conclusions:
SELDI-TOF MS proteomic profiles in combination with trained optimal classifiers
accurately discriminate active TB from other respiratory disorders. The classifier for
latent TB was not as accurate, but active TB could be discriminated from latent TB
A Machine Learning Model for Discovery of Protein Isoforms as Biomarkers
Prostate cancer is the most common cancer in men. One in eight Canadian men will be diagnosed with prostate cancer in their lifetime. The accurate detection of the disease’s subtypes is critical for providing adequate therapy; hence, it is critical for increasing both survival rates and quality of life. Next generation sequencing can be beneficial when studying cancer. This technology generates a large amount of data that can be used to extract information about biomarkers. This thesis proposes a model that discovers protein isoforms for different stages of prostate cancer progression. A tool has been developed that utilizes RNA-Seq data to infer open reading frames (ORFs) corresponding to transcripts. These ORFs are used as features for classificatio. A quantification measurement, Adaptive Fragments Per Kilobase of transcript per Million mapped reads (AFPKM), is proposed to compute the expression level for ORFs. The new measurement considers the actual length of the ORF and the length of the transcript. Using these ORFs and the new expression measure, several classifiers were built using different machine learning techniques. That enabled the identification of some protein isoforms related to prostate cancer progression. The biomarkers have had a great impact on the discrimination of prostate cancer stages and are worth further investigation
The Use of Proteomic Technologies to Identify Serum Glycoproteins for the Early Detection of Liver and Prostate Cancers
The application of proteomic technologies to identify serum glycoproteins is an emerging technique to identify new biomarkers indicative of disease severity. Many of these newly evolving protein-profiling methodologies have evolved from previous global protein expression profiling studies such as those involving SELDI-TOF-MS technologies. Though the SELDI approach could distinguish disease from normal by utilizing protein patterns as shown herein with the HCC study of chapter II, it was unable to offer sequence information on the selected peaks, and did not have the ability to analyze the entire dynamic range of the serum/plasma proteome. To address these deficiencies, new strategies that incorporate the use of differential lectin-based glycoprotein capture and targeted immuno-based assays have been developed. The carbohydrate binding specificities of different lectins offers a biological affinity approach that both complements existing mass spectrometer capabilities and retains automated throughput options. A prostate cancer study using disease stratified samples is utilized herein to determine whether lectin capture can identify glycoproteins, which are indicative of different stages of prostate disease. By utilizing upfront lectin fractionation we show here evidence of glycoproteins and glycoprotein isoforms, which are specific to cancer progression. In addition, the incorporation of lectin fractionation followed by albumin depletion allows for a more in depth analysis of the entire dynamic range of the human serum and plasma proteome. Taken together we believe this approach is an attractive strategy for the discovery of proteins indicative of the early detection of liver and prostate cancers
A multi-modality approach for enhancing the diagnosis of cholangiocarcinoma
Background: Cholangiocarcinoma (CC) is a malignancy of the bile ducts and mortality is high as
patients present too late for curative surgery. In most cases of CC the aetiology is
unknown, whilst diagnosis and staging are challenging. The hepatobiliary system
excretes carcinogenic toxins and genetic mutations in biliary transporters lead to
dysfunction and cholestasis, potentially contributing to cholangiocarcinogenesis.
Polymorphisms in the NKG2D receptor have previously been associated with CC in
primary sclerosing cholangitis (PSC). Such a role has not been investigated in sporadic
CC. CC is difficult to diagnose, particularly in those with PSC. The transition from
benign to malignant biliary disease is likely to be reflected in changes to the plasma
proteome. However, current plasma biomarkers do not reliably distinguish benign from
malignant biliary strictures. Elevation of neutrophil gelatinase-associated lipocalin
(NGAL) has been demonstrated in the bile of patients with CC but has not been
investigated as a plasma protein biomarker. Staging of CC is inaccurate, with only a
minority of operated patients cured. Higher resolution MRI would improve diagnosis
and staging. The work presented in this thesis represents a multimodality approach to
enhance the diagnosis of CC:
Genetic studies: Genetic variation in major biliary transporter proteins, and the NKG2D receptor, were
investigated. Single nucleotide polymorphisms (SNPs) in candidate genes were
selected using HapMap. DNA from 173 CC patients and 265 healthy controls was
genotyped. SNPs in ABCB11, MDR3 and ATP8B1 were nominally associated with
altered susceptibility to CC, suggesting a potential role in cholangiocarcinogenesis.
The previous association of NKG2D variation with CC in PSC was not replicated in
sporadic CC, suggesting a possible difference in pathogenesis.
Protein studies: Plasma from subjects with CC, benign disease, and from healthy controls was studied.
Two proteomic techniques, liquid chromatography-tandem mass spectrometry (LCMS/
MS) and surfaced enhanced laser desorption ionization time-of-flight MS (SELDITOF
MS), were utilised. Differentially expressed proteins were identified where
possible. LC-MS/MS fully identified six proteins that were differentially expressed in CC
compared to gall stone disease patients. SELDI-TOF MS identified seven m/z peaks
that showed significant utility in discriminating CC from PSC controls. An ELISA
approach was used to study plasma NGAL levels in CC. Although differentially
expressed between CC and healthy control groups, the utility of NGAL in discriminating
CC from PSC was limited.
Imaging studies: An endoscope-mounted MR coil and intraductal MR detector coil were developed.
Quantitative resolution and signal-to-noise-ratio (SNR) testing, and qualitative tissue
discrimination appraisal, were undertaken. Sub-0.7mm resolution and excellent SNRs
have been demonstrated. High-resolution has been demonstrated in imaged tissue.
Imaging with the new devices compares favourably with endoscopic ultrasound
imaging
Ovarian cancer: can proteomics give new insights for therapy and diagnosis?
The study of the ovarian proteomic profile represents a new frontier in ovarian cancer research, since this approach is able to enlighten the wide variety of post-translational events (such as glycosylation and phosphorylation). Due to the possibility of analyzing thousands of proteins, which could be simultaneously altered, comparative proteomics represent a promising model of possible biomarker discovery for ovarian cancer detection and monitoring. Moreover, defining signaling pathways in ovarian cancer cells through proteomic analysis offers the opportunity to design novel drugs and to optimize the use of molecularly targeted agents against crucial and biologically active pathways. Proteomic techniques provide more information about different histological types of ovarian cancer, cell growth and progression, genes related to tumor microenvironment and specific molecular targets predictive of response to chemotherapy than sequencing or microarrays. Estimates of specificity with proteomics are less consistent, but suggest a new role for combinations of biomarkers in early ovarian cancer diagnosis, such as the OVA1 test. Finally, the definition of the proteomic profiles in ovarian cancer would be accurate and effective in identifying which pathways are differentially altered, defining the most effective therapeutic regimen and eventually improving health outcomes
New Statistical Algorithms for the Analysis of Mass Spectrometry Time-Of-Flight Mass Data with Applications in Clinical Diagnostics
Mass spectrometry (MS) based techniques have emerged as a standard forlarge-scale protein analysis. The ongoing progress in terms of more sensitive
machines and improved data analysis algorithms led to a constant expansion of
its fields of applications. Recently, MS was introduced into clinical proteomics
with the prospect of early disease detection using proteomic pattern matching.
Analyzing biological samples (e.g. blood) by mass spectrometry generates
mass spectra that represent the components (molecules) contained in a
sample as masses and their respective relative concentrations.
In this work, we are interested in those components that are constant within a
group of individuals but differ much between individuals of two distinct groups.
These distinguishing components that dependent on a particular medical condition
are generally called biomarkers. Since not all biomarkers found by the
algorithms are of equal (discriminating) quality we are only interested in a
small biomarker subset that - as a combination - can be used as a
fingerprint for a disease. Once a fingerprint for a particular disease
(or medical condition) is identified, it can be used in clinical diagnostics to
classify unknown spectra.
In this thesis we have developed new algorithms for automatic extraction of
disease specific fingerprints from mass spectrometry data. Special emphasis has
been put on designing highly sensitive methods with respect to signal detection.
Thanks to our statistically based approach our methods are able to
detect signals even below the noise level inherent in data acquired by common MS
machines, such as hormones.
To provide access to these new classes of algorithms to collaborating groups
we have created a web-based analysis platform that provides all necessary
interfaces for data transfer, data analysis and result inspection.
To prove the platform's practical relevance it has been utilized in several
clinical studies two of which are presented in this thesis. In these studies it
could be shown that our platform is superior to commercial systems with respect
to fingerprint identification. As an outcome of these studies several
fingerprints for different cancer types (bladder, kidney, testicle, pancreas,
colon and thyroid) have been detected and validated. The clinical partners in
fact emphasize that these results would be impossible with a less sensitive
analysis tool (such as the currently available systems).
In addition to the issue of reliably finding and handling signals in noise we
faced the problem to handle very large amounts of data, since an average dataset
of an individual is about 2.5 Gigabytes in size and we have data of hundreds to
thousands of persons. To cope with these large datasets, we developed a new
framework for a heterogeneous (quasi) ad-hoc Grid - an infrastructure that
allows to integrate thousands of computing resources (e.g. Desktop Computers,
Computing Clusters or specialized hardware, such as IBM's Cell Processor in a
Playstation 3)
Investigation of novel urinary markers of hepatotoxicity.
The aim of this study was to identify novel, sensitive, and specific protein markers of hepatotoxicity in rat urine. Collection of urine is non-invasive compared to biopsy or serum analysis and therefore preferable when screening for toxicity. Carbon tetrachloride (CCI4) was used both acutely to produce hepatotoxicity and chronically to produce a rat model of liver fibrosis.
An optimal dose of CCl4, and the time post-dosing for maximum acute liver injury, were established by histopathological examination and by assaying serum enzyme markers of liver injury. Urine was analysed using Surface Enhanced Laser Desorption/Ionisation (SELDI) ProteinChip® technology. SELDI revealed the appearance of a protein peak at 15.7 KDa in response to CCU-treatment, while one-dimensional sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE) identified an 18.4 KDa protein in urine from CCl4-treated rats. This protein was identified by in-gel digestion and tandem mass spectrometry as Cu/Zn superoxide dismutase (SOD-1). SOD-1 catalyses the destruction of the superoxide anion and acts as a defence against oxidative damage. SOD-1 exists as a 32.5 KDa homodimer and it was concluded that the 15.7 KDa SELDI and 18.4 KDa SDS PAGE proteins are the SOD-1 subunit that runs at an anomalous MW on SDS PAGE.
SOD-1 in the rat urine was confirmed by Western blotting with a commercial antibody and by measuring SOD activity. Further studies revealed that SOD-1 was increased in urine from CCU-treated rats between 12 hours and 60 hours post-dosing, at dose levels as low as 0.4 ml/kg CCI4. Western blots of homogenates showed that SOD-1 was being lost from rat liver presumably by necrosis. Although the enzyme SOD-1 is not specifically located in the liver, its appearance in the rat urine following hepatotoxicity is a novel finding. Since changes in SOD-1 levels were detected following low dose levels of CCI4 and the response was measured using non-invasive methods suggests that SOD-1 is thus a potential marker of hepatotoxicity.
Liver fibrosis was induced by repeat dosing with CCI4 and confirmed by histopathological examination. Analysis of the urine samples by SDS-PAGE revealed an increase in SOD-1 in fibrotic rats but no other differences were evident. Examination of urine samples from rats with fibrosis, or acute hepatotoxicity, by two-dimensional gel electrophoresis revealed a number of proteins that were increased in these models. Two-dimensional gel electrophoresis of liver homogenates from the acute model showed a number of proteins that decreased. These results suggest that although the urinary proteins have not yet been identified it is highly probable that one or more will be a specific marker for the non-invasive identification of hepatotoxicity
Tilstedeværelsen av en akutt fase-reaksjon hos lam med eksperimentell klassisk skrapesjuke indikerer et skifte mot en pro-inflammatorisk tilstand i det kliniske endestadiet
Classical scrapie in sheep is a transmissible and fatal neurodegenerative disease caused by the self-replicating and infectious prion protein, PrPSc, which is a conformational variant of the normal cellular prion protein, PrPC. The prion protein is a highly conserved glycoprotein encoded by the PRNP gene and therefore within the same host both PrPC and PRPSC have the same unique amino acid sequence and they only differ in their three-dimensional folded structure. Specific mutations at codons 136, 154 and 171 of the PRNP gene leads to single amino acid substitutions, and the most common polymorphisms give rise to five possible alleles and 15 PRNP genotypes found in sheep. The different alleles are highly associated with levels of susceptibility to classical scrapie, where A136R154R171 allele provides high genetic resistance and V136R154Q171 allele results in highly susceptible animals. On the basis of this association between PRNP genotype and susceptibility, many EU MSs have implemented national breeding for resistance programme with the aim of increasing distribution of ARR allele and reducing the distribution of VRQ allele. For almost 20 years, the EU TSE regulation has required surveillance within each country to establish prevalence of prion diseases and the different PRNP genotypes. Classical scrapie has a widespread distribution and incidence rate fluctuates due to the complex interaction between prion and host factors, and prevalence can only be estimated by ante mortem testing through active and passive surveillance. Transmission between sheep occurs through direct and indirect contact, and PrPSc can remain infective in the environment for years. The most common route of infection is the oral route, and infected animals can excrete PrPSc through foetal membranes and fluids, saliva, urine, faeces, and milk. Pathogenesis is highly influenced by PRNP genotype, as animals of the most susceptible genotypes have the most effective uptake of PrPSc across small intestine followed by an extensive dissemination and involvement of the SLOs, and an early neuroinvasion with spread of PrPSc within the CNS. The susceptible genotypes will contribute the most to spread of infectivity and environmental contamination.
This work describes the results from experimental classical scrapie where homozygous VRQ lambs were inoculated orally at birth with homogenated brain material from either healthy sheep or from natural cases of classical scrapie. This resulted in a worst-case scenario type of classical scrapie with sudden onset of severe clinical signs at 22 wpi followed by a rapid deterioration and euthanasia at 23 wpi. Serum samples were collected at regular intervals and tissue samples from brain and liver were sampled at post mortem examination. Proteomic examinations of serum revealed a downregulation of several protein peaks during the pre-symptomatic incubation period in the scrapie affected group compared to the control group, and a shift to upregulation of protein peaks onwards from 22 wpi. Genomic examinations of serum samples showed a slight
downregulation IL1B and TLR4 at 16 wpi, followed by a change at 22 wpi with upregulation of genes encoding TLRs, C3 and APPs. Genomic examination of liver and brain tissues showed an alteration in gene expression of APPs in accordance with an APR. Serum analyses of different APPs showed increased levels of the positive APPs and a reduced concentration of negative APPs.
These findings are indicative of a shift from anti-inflammatory to pro-inflammatory systemic innate immune response that coincide with the onset of debilitating clinical disease. In neurodegenerative diseases, the innate immune response in the CNS has a key role in both onset and progression of disease and resolution of inflammation. The accumulation of PrPSc in the CNS has been associated with a chronic activation of the innate immune response, pro-inflammatory activation of microglia, neuroinflammation, and neurodegeneration.
The diseases phenotype registered in this work is a result of PRNP genotype, and time and dose of inoculation, which can occur naturally if the right circumstances are in place. New-born homozygous VRQ lambs from an infected dam can get infected at birth. These cases could develop a similar disease progression as described in this work, resulting in an efficient and fast uptake and widespread peripheral and central dissemination of PrPSc, and clinical disease at a young age. These cases would present as a diagnostic challenge and easily missed as classical scrapie. Due to their young age, these cases would not be sampled through active surveillance. If incubation period extends commercial lifespan, these lambs would be slaughtered for human consumption, and due to their PRNP genotype, prions would enter the food chain.
Control of classical scrapie can probably not be achieved by absence of infectivity, but absence of clinical disease is possible through breeding for resistance which will provide flock immunity to classical scrapie.Klassisk skrapesyke hos sau er en overførbar og dødelig nevrodegenerativ sykdom forårsaket av det selvrepliserende og smittsomme prionproteinet, PrPSc, som er en variant av det normale cellulære prionproteinet, PrPC. Prionproteinet er et glykoprotein som er kodet for av PRNP-genet. Dette betyr at PrPC og PRPSC hos samme verten, har den samme unike aminosyresekvensen og det er kun den tredimensjonale strukturen som skiller dem. Spesifikke mutasjoner ved kodonene 136, 154 og 171 i PRNP-genet fører til substitusjoner av enkelte aminosyrer, og de vanligste polymorfismer gir opphav til fem mulige alleler, og 15 PRNP-genotyper hos sau. De forskjellige allelene er assosiert med nivå av mottakelighet for klassisk skrapesyke, og A136R154R171-allel fører til genetisk resistens, og V136R154Q171-allel gir høy mottagelighet. På bakgrunn av denne sammenhengen mellom PRNP-genotype og mottakelighet, har mange EU medlemsland
innført nasjonale avlsprogram som har mål om å øke utbredelsen av ARR-allel, og samtidig en reduksjon av VRQ-allel. I snart 20 år har EUs TSE-regelverk krevd nasjonale overvåkingsprogram for å bestemme forekomsten av prionsykdommer og kartlegge utbredelsen av de forskjellige PRNP-genotypene. Klassisk skrapesyke er utbredt, men forekomsten vil variere med bakgrunn i det komplekse samspillet mellom prionprotein og vertsfaktorer. Prevalens kan estimeres gjennom ante mortem testing i forbindelse med aktivt og passivt overvåkingsprogram. Smitteoverføring mellom sau skjer ved direkte og indirekte kontakt, og PrPSc er smittsomt i flere år i miljøet. Den vanligste infeksjonsveien er gjennom oralt inntak, og dyr kan skille ut smittsomt PrPSc via fosterhinner og væsker, spytt, urin, feces og melk, og nivå er styrt av PRNP genotype.Research Council of Norwa
- …