20 research outputs found
Cohort profile: The Clinical and Multi-omic (CAMO) cohort, part of the Norwegian Women and Cancer (NOWAC) study
Introduction - Breast cancer is the most common cancer worldwide and the leading cause of cancer related deaths among women. The high incidence and mortality of breast cancer calls for improved prevention, diagnostics, and treatment, including identification of new prognostic and predictive biomarkers for use in precision medicine.
Material and methods - With the aim of compiling a cohort amenable to integrative study designs, we collected detailed epidemiological and clinical data, blood samples, and tumor tissue from a subset of participants from the prospective, population-based Norwegian Women and Cancer (NOWAC) study. These study participants were diagnosed with invasive breast cancer in North Norway before 2013 according to the Cancer Registry of Norway and constitute the Clinical and Multi-omic (CAMO) cohort. Prospectively collected questionnaire data on lifestyle and reproductive factors and blood samples were extracted from the NOWAC study, clinical and histopathological data were manually curated from medical records, and archived tumor tissue collected.
Results - The lifestyle and reproductive characteristics of the study participants in the CAMO cohort (n = 388) were largely similar to those of the breast cancer patients in NOWAC (n = 10 356). The majority of the cancers in the CAMO cohort were tumor grade 2 and of the luminal A subtype. Approx. 80% were estrogen receptor positive, 13% were HER2 positive, and 12% were triple negative breast cancers. Lymph node metastases were present in 31% at diagnosis. The epidemiological dataset in the CAMO cohort is complemented by mRNA, miRNA, and metabolomics analyses in plasma, as well as miRNA profiling in tumor tissue. Additionally, histological analyses at the level of proteins and miRNAs in tumor tissue are currently ongoing.
Conclusion - The CAMO cohort provides data suitable for epidemiological, clinical, molecular, and multi-omics investigations, thereby enabling a systems epidemiology approach to translational breast cancer research
Reproducible Data Management and Analysis using R
Standardizing and documenting computational analyses is necessary to ensure reproducible results. We describe an R-based implementation of data management and preprocessing that is well integrated with the analysis tools typically used for statistical analysis of omics data. We have used these tools to organize data storage and documentation, and to standardize the analysis of gene expression data, in the Norwegian Women and Cancer study.submittedVersio
Local in Time Statistics for detecting weak gene expression signals in blood – illustrated for prediction of metastases in breast cancer in the NOWAC Post-genome Cohort
Small data: practical modeling issues in human-model -omic data
This thesis is based on the following articles:
Chapter 2: Holsbø, E., Perduca, V., Bongo, L.A., Lund, E. & Birmelé, E. (Manuscript). Stratified time-course gene preselection shows a pre-diagnostic transcriptomic signal for metastasis in blood cells: a proof of concept from the NOWAC study. Available at https://doi.org/10.1101/141325.
Chapter 3: Bøvelstad, H.M., Holsbø, E., Bongo, L.A. & Lund, E. (Manuscript). A Standard Operating Procedure For Outlier Removal In Large-Sample Epidemiological Transcriptomics Datasets. Available at https://doi.org/10.1101/144519.
Chapter 4: Holsbø, E. & Perduca, V. (2018). Shrinkage estimation of rate statistics. Case Studies in Business, Industry and Government Statistics 7(1), 14-25. Also available at http://hdl.handle.net/10037/14678.Human-model data are very valuable and important in biomedical research. Ethical and economical constraints limit the access to such data, and consequently these datasets rarely comprise more than a few hundred observations. As measurements are comparatively cheap, the tendency is to measure as many things as possible for the few, valuable participants in a study. With -omics technologies it is cheap and simple to make hundreds of thousands of measurements simultaneously. This few observations–many measurements setting is a high-dimensional problem in the technical language. Most gene expression experiments measure the expression levels of 10 000–15 000 genes for fewer than 100 subjects. I refer to this as the small data setting.
This dissertation is an exercise in practical data analysis as it happens in a large epidemiological cohort study. It comprises three main projects: (i) predictive modeling of breast cancer metastasis from whole-blood transcriptomics measurements; (ii) standardizing a microarray data quality assessment in the Norwegian Women and Cancer (NOWAC) postgenome cohort; and (iii) shrinkage estimation of rates. These three are all small data analyses for various reasons.
Predictive modeling in the small data setting is very challenging. There are several modern methods built to tackle high-dimensional data, but there is a need to evaluate these methods against one another when analyzing data in practice. Through the metastasis prediction work we learned first-hand that common practices in machine learning can be inefficient or harmful, especially for small data. I will outline some of the more important issues.
In a large project such as NOWAC there is a need to centralize and disseminate knowledge and procedures. The standardization of NOWAC quality assessment was a project born of necessity. The standard operating procedure for outlier removal was developed so that preprocessing of the NOWAC microarray material should happen in the same way every time. We take this procedure from an archaic R-script that resided in peoples email inboxes to a well-documented, open-source R-package and present the NOWAC guidelines for microarray quality control. The procedure is built around the inherent high value of a singleobservation.
Small data are plagued by high variance. Working with small data it is usually profitable to bias models by shrinkage or borrowing of information from elsewhere. We present a pseudo-Bayesian estimator of rates in an informal crime rate study. We exhibit the value of such procedures in a small data setting and demonstrate some novel considerations about the coverage properties of such a procedure.
In short I gather some common practices in predictive modeling as applied to small data and assess their practical implications. I argue that with more focus on human-based datasets in biomedicine there is a need for particular consideration of these data in a small data paradigm to allow for reliable analysis. I will present what I believe to be sensible guidelines
Epithelial ovarian cancer. Population-based cohort studies. The NOWAC Study and Postgenome biobank
Important gaps in population-based epidemiological research on ovarian cancer include understanding how risk factors relate to cancer subtypes and anatomical sites, identifying safe and effective preventive measures, and getting a more detailed picture of the continuum of events during ovarian carcinogenesis. This thesis used prospective exposure information from the Norwegian Women and Cancer (NOWAC) Study and blood samples from the NOWAC Postgenome biobank to explore topics within these gaps.
On the topic of risk factors, subtypes and anatomical sites, previous studies have shown that serous carcinomas of the ovary and fallopian tube cancers have similar risk factors. This thesis compared risk factors between the ovary/fallopian tube and uterine corpus. One risk factor association separated serous carcinomas of these sites, while no differences in risk factor associations were found for endometrioid and clear cell carcinomas. Possible alternative explanations of this result include few observations in the analysis of endometrioid and clear cell carcinomas, and histological misclassification of high-grade endometrioid carcinomas.
Among preventive measures, combined oral contraceptives reduce the risk of both ovarian and uterine carcinoma. Current trends in female contraception include an increase in use of progestin-only long-acting reversible contraceptives, such as the levonorgestrel-releasing intrauterine system (LNG-IUS). In the NOWAC cohort, ever use of LNG-IUS reduced the risk of ovarian carcinoma by 53% (95% CI: 22% – 68%) and the risk of uterine carcinoma by 78% (95% CI: 60% – 87%) compared to never use. These results extend current knowledge to include postmenopausal women in a sample of the general population. The association with breast cancer was also investigated and discussed.
To investigate the continuum of events during ovarian carcinogenesis, this thesis explores gene expression in peripheral blood in the years preceding ovarian cancer diagnosis. The presented study did not find strong associations. This could be because there is little association between ovarian cancer and prediagnostic gene expression in blood, but could also be due to a small sample size, or the analytic approach that was used.
Kreft i eggstokkene er forholdsvis sjelden kreftform, som har høy dødelighet. Denne doktoravhandlingen bygger på spørreskjema fra 172000 kvinner i Kvinner og kreft-studien, og på blodprøver fra 50000 av deltakerne. Blodprøvene utgjør en unik biobank med bevart genuttrykk fra de hvite blodlegemene.
Fra før vet man at kvinner som har brukt p-piller har lavere risiko for eggstokkreft. I dag har mange kvinner, også de yngre, begynt å bruke hormonspiral. Det er lite kunnskap om hvorvidt kvinner som bruker hormonspiral har lavere risiko for eggstokkreft slik som p-pillebrukerne. Blant de noe eldre deltakerne i Kvinner og kreft hadde kvinner som noen gang har brukt hormonspiral en halvert risiko for eggstokkreft. Fordi det var få tilfeller, har anslaget en usikkerhet som tilsvarer mellom 10% og 70% lavere risiko.
Blodprøvene i biobanken gir mulighet til å undersøke endringer i genuttrykk i immunceller opptil sju år før diagnosen ble stilt, i håp om å forstå mer om sykdomsutviklingen. Vi gjorde en utforskende analyse av blodprøver fra kvinner som hadde fått eggstokkreft, men fant ikke entydige endringer i genuttrykket
Assessing new prognostic biomarkers in resected colon cancer patients
Colon cancer is a common malignancy. We retrospectively established a biobank including 452 patients operated for colon cancer stage I-III in Northern Norway in the period 1998-2007. Tissue microarrays (TMAs) containing tumor tissue from these patients were constructed, and subsequently biomarker expression assessed by digital pathology. By using in situ hybridization, we assessed the expression of miR-126, miR-17-5p and miR-20a-5p. A high expression of all these miRs was related to improved disease-specific survival for these patients. For miR-17-5p and miR-20a-5p we investigated their functional aspects in select colon cancer cell lines. Over-expression of miR-17-5p and miR-20a-5p did not impact viability or invasion, and mitigated migration, strengthening our results of improved survival upon high expression. Additionally, using immunohistochemistry, we wanted to investigate four proteins thought to be regulated by these miRs; IRS-1, IRS-2, RUNX3 and SMAD4. In our material, a high expression of IRS-1, RUNX3 and SMAD4 was related to a favorable survival. These novel biomarkers could improve risk stratification in colon cancer patients, differentiating patients with a high risk of recurrence vs patients with a low risk of recurrence in the same TNM-stage. This could help the oncologists to choose the appropriate adjuvant treatment. For this to be implemented in clinical practice, the results need to be verified in large prospective trials