9 research outputs found

    Integrative and Comparative Analysis of Retinoblastoma and Osteosarcoma

    Get PDF
    In the last one and a half decades, the generalization of high throughput methods in molecular biology has led to the generation of vast amounts of datasets that unraveled the unfathomed complexity of the cell regulatory mechanisms. The recently published results of the ENCODE project (ENCODE Project Consortium et al., 2012) demonstrated the extend of these in the human genome and certainly more regulation mechanisms will be discovered in the future. Already, this complexity within a single cell - without taking into account cell-cell interaction or micro-environment influences - cannot be abstracted by the human mind. However, understanding it is the key to devise adapted treatments to genetic diseases or disorders, among which is cancer. In mathematics, such complex problems are addressed using methods that reduce their complexity, so that they can be modeled in a solvable manner. In biology, it led researchers to develop the concept of systems biology as a mean to abstract the complexity of the cell regulatory network. To date, most of the published studies using high throughput technologies only focus on one kind of regulatory mechanism and hence cannot be used as such to investigate the interactions between these. Moreover, distinguishing causative from confounding factors within such studies is difficult. These were my original motivations to develop analytical and statistical methods that control for confounding factors effects and allow the integrative and comparative analysis of different kinds of datasets. In fine, three different tools were developed to achieve this goal. First, "customCDF": a tool to redefine the Custom Definition File (CDF) of Affymetrix GeneChips. It results in the increased sensitivity of downstream analyses as these bene fit from the constantly evolving human genome reference and annotations. Second, "aSim": a tool to simulate microarray data, which was required to benchmark the developed algorithms. Third, for the integrative analysis, a set of combined statistical methods and finally for the comparative analysis, a modification of the integrative analysis approach. These were bundled in the "crossChip" R package. The "customCDF" and "aSim" tools were first validated on independant datasets. The developed analytical methods ("crossChip") were first validated on "aSim" simulated data and publicly available datasets and then used to answer two biological questions. First, using two retinoblastoma datasets, the effect of genomic copy number variations on gene-expression was investigated. Then, motivated by the fact that retinoblastoma patients have a higher chance to develop osteosarcoma later in life than the average population, datasets of both these tumors were comparatively analyzed to assess these tumors similarities and differences. Despite a rather limited number of samples within the selected datasets, the developed approaches with their higher sensitivity and sensibility were successful and set the ground for larger scale analyses. Indeed, the integrative analysis applied to retinoblastoma revealed the high importance of the chromosome 6 gain at a later stage of the disease, indicating that many genes on that chromosome are beneficial to cancerogenesis. Moreover, in comparison to standard microarray analyses, it demonstrated its efficacy at detecting the interplay of regulatory mechanisms: examples of positive and negative compensation of gene expression in lost and gained regions, respectively, as well as examples of antisense transcription, pseudogene and snRNAs regulation were identified in this dataset. The comparative analysis on the other hand revealed the high similarity of the retinoblastoma and osteosarcoma tumors, while at the same time showing that either of them take advantage of their distinct micro-environment and consequently appear to make use of different signaling pathways, PKC/calmodulin in retinoblastoma and GPCR/RAS in osteosarcoma. The developed tools and statistical methods have demonstrated their validity and utility by giving sensible answers to the two biological questions addressed. Moreover, they generated a large number of interesting hypotheses that need further investigations. And as they are not limited to microarray analysis but can be applied to analyze any high-throughput generated data, they demonstrated the usefulness of "systems biology" approaches to study cancerogenesis

    Bioinformatic solutions for chromosomal copy number analysis in cancer

    Get PDF
    Chromosomal copy number aberrations are one of the main mechanisms that give rise to the proliferative capabilities of cancer cells. These aberrations can be quantified with technologies that generate measurements genome-wide and with high resolution. Hence, they produce vast amounts of data, which requires tailored bioinformatic solutions for analysis and management. Two such high-resolution and genome-wide technologies are DNA microarrays, which are successively replaced by next-generation sequencing approaches. This dissertation describes three novel bioinformatic solutions for copy number analysis in cancer with these technologies. CanGEM is a publicly-accessible database solution for storage of raw and processed copy number data from cancer research experiments. The contents of the database can be queried based on clinical and copy number data. Clinical data is collected using appropriate controlled vocabularies. Copy number data is collected as raw microarray data and automated analysis identifies the locations of chromosomal aberrations. In order to allow integration of data measured with different microarray platforms, a copy number status is derived for every known human gene. CGHpower is a statistical power calculator for copy number experiments that compare two groups. It estimates genome complexity of a cancer type in question from a pilot data set of the sample series, and assesses the number of samples required to satisfy statistical requirements. It can be used either in the planning stages of experiments, including as a justification in grant applications, or to verify whether sufficient samples were included in past experiments. Performance of this bioinformatic solution is evaluated with real and simulated data sets. QDNAseq is a preprocessing solution to detect copy number aberrations from shallow whole-genome next-generation sequencing data. It corrects the observed sequencing coverage for known systematic biases and allows filtering of spurious regions in the genome. A new list of such problematic regions is derived from public data generated by the 1000 Genomes Project. Performance of the solution is evaluated relative to other similar published solutions and DNA microarrays, and also compared to theoretical statistical expectations. An application of the QDNAseq method is also presented in a translational research project with the aim to identify copy number aberrations in tumors of patients with low-grade glioma. Aberrations identified by shallow whole-genome next-generation sequencing and QDNAseq are used to evaluate associations with patient survival, and also to assess intratumoral heterogeneity and temporal evolution of these tumors. A loss in chromosome 10q is identified to be associated with poor prognosis, and the finding validated in two independent data sets. From the assessment of intratumoral heterogeneity and temporal tumor evolution, the well-characterized co-deletion of 1p/19q is found to be the only chromosomal aberration that is consistently present or absent across the entire tumor and possible future recurrences. This is compatible with the present view of its role as an early event in the development of these tumors. The text concludes with a discussion of lessons learned from the development process and application of the three described bioinformatic solutions. Better awareness of and adherence to established best practices from the software development field would have been useful, and together with more careful consideration of implementation decisions could have resulted…Kromosomaaliset kopiolukupoikkeamat ovat eräs tärkeimmistä mekanismeista syövän synnyssä. Yhden äidiltä ja yhden isältä perityn geenikopion sijaan osa perimästä voi olla monistunut useammaksi kopioksi, ja joidenkin osien kohdalla yksi tai molemmat kopiot voivat olla hävinneet. Kopiolukupoikkeamien todentamiseen käytetään genominlaajuisia tekniikoita, joilla on tarkka erotuskyky. Ne tuottavat suuria tietomääriä, joiden analysointi ja käsittely vaativat räätälöityjä bioinformaattisia menetelmiä. Tekniikoihin sisältyvät DNA-mikrolevyt sekä ne käytännössä jo syrjäyttäneet uuden sukupolven sekvensointimenetelmät. Tässä väitöskirjassa kuvataan kolme uutta bioinformaattista ohjelmistoa kopiolukupoikkeamien analysointiin syöpänäytteistä näillä tekniikoilla. CanGEM on julkinen tietokanta raa'an ja prosessoidun mikrolevyaineiston keräämiseen yksittäisistä syöpätutkimuksista. Tietokannan sisältöön voi tehdä hakuja kliinisten muuttujien tai kopiolukupoikkeamien perusteella. Kliinisten muuttujien tallennukseen käytetään asianmukaisia luokittelujärjestelmiä. Kopiolukuaineisto kerätään raakoina mikrolevymittauksina, joista kopiolukupoikkeamat tunnistetaan algoritmisesti. Jotta eri mikrolevyalustoilla mitatun tiedon yhdistäminen olisi mahdollista, kopioluku määritetään erikseen jokaiselle tunnetulle ihmisen geenille. CGHpower on menetelmä tilastollisten voima-analyysien tekemiseen kahta ryhmää vertailevista kopiolukututkimuksista. Aineiston kopiolukupoikkeamien monimutkaisuus arvioidaan koe-erästä näytteitä ja määritetään tilastollisten vaatimusten edellyttämä otoskoko. Menetelmää voidaan käyttää joko tutkimusten suunnitteluvaiheessa, mm. rahoitushakemusten tukena, tai arvioimaan onko jo tehdyissä kokeissa käytetty riittävää määrää näytteitä. Suorituskyky mitataan sekä todellisilla että simuloiduilla aineistoilla. QDNAseq on esikäsittelymenetelmä kopiolukupoikkeamien tunnistamiseen matalalla lukupeitolla ja genominlaajuisesti tuotetusta uuden sukupolven sekvensointiaineistosta. Se korjaa havaittua lukupeittoa tunnettujen vinoumalähteiden osalta ja mahdollistaa kopiolukuanalyyseille ongelmallisten perimän osien suodattamisen jatkokäsittelystä. Näistä ongelmallisista alueista kuvataan uusi luettelo, joka on johdettu 1000 Genomes -projektin julkaisemasta aineistosta. Menetelmän suorituskykyä arvioidaan verrattuna muihin vastaaviin julkaistuihin menetelmiin ja DNA-mikrolevyihin, sekä suhteessa teoreettisiin tilastollisiin odotuksiin. Itse menetelmän lisäksi kuvataan QDNAseq:n sovellutus translationaaliseen tutkimukseen ja kopiolukupoikkeamien tunnistamiseen alhaisen erilaistumisasteen glioomista. Todetaan kromosomin 10q häviämän yhteys huonoon ennusteeseen ja löydös vahvistetaan kahdessa riippumattomassa aineistossa. Tunnistettuja kopiolukupoikkeamia käytetään myös kasvaimien epäyhtenäisyyden ja ajallisen kehityksen tarkasteluun. Havaitaan kyseiselle syöpätyypille yleisen 1p/19q-häviämän olevan ainoa kopiolukupoikkeama, joka on johdonmukaisesti joko läsnä taikka puuttuu läpi sekä koko alkuperäisen syöpäkasvaimen että mahdollisten uusiutumien. Havainto sopii nykynäkemykseen kyseisen poikkeaman synnystä hyvin varhaisessa vaiheessa kyseisen syöpätyypin kehitystä. Lopuksi tarkastellaan kuvattujen bioinformaattisten ohjelmistojen kehitys- ja sovellutusprosesseista opittuja asioita. Ohjelmistokehitysalan vakiintuneiden käytänteiden parempi tuntemus olisi ollut hyödyllistä, ja yhdessä toteutusyksityiskohtien tarkemman harkinnan kanssa voinut auttaa tuottamaan tarkoituksensa paremmin täyttäviä sekä helpommin kehitettäviä ja ylläpidettäviä…Afwijkingen in het aantal chromosomen, of delen van chromosomen, zijn een van de mechanismen die aanleiding geven tot het proliferatieve gedrag van kankercellen. Deze chromosomale afwijkingen kunnen worden gemeten met genomische technieken met een hoge resolutie. Deze technieken genereren zeer grote hoeveelheden data, die op maat gemaakte bioinformatische oplossingen vereisen voor analyse en databeheer. De twee meest relevante genomische technieken met hoge resolutie zijn microarrays en ‘next generation sequencing’. Hoofdstuk 1 van dit proefschrift behandelt de literatuur van de data-analyse voor chromosomale afwijkingen gemeten met microarrays of ‘next generation sequencing’. Het introduceert relevante bioinformatische concepten, beschrijft het analytische proces van ruwe data tot identificatie van numerieke chromosoomafwijkingen in individuele tumoren en het bioinformatisch onderzoek gericht op de betekenis van die afwijkingen in grote series tumoren. Hoofdstuk 2 tot en met 4 beschrijven drie nieuwe bioinformatische implementaties ontwikkeld voor de analyse van deze chromosomale afwijkingen in kanker. CanGEM (Hoofdstuk 2) is een publiek toegankelijke database voor het opslaan van ruwe en verwerkte chromosoomaantallen het kankeronderzoek. De inhoud van de database kan worden doorzocht op basis van zowel klinische als experimentele gegevens met betrekking tot chromosoomaantallen. Klinische gegevens worden verzameld met behulp van gecontroleerde woordenlijsten. Chromosoomaantallen worden verzameld als ruwe microarray data en begin- en eindpositie van de afwijkingen worden steeds opnieuw automatisch bepaald. Om de integratie van de data, die gemeten worden met microarrays van verschillende makelij, verder te faciliteren, wordt het aantal chromosomen per gen afgeleid voor ieder van de ca. 19.000 tot 20.000 menselijke genen. CGHpower (Hoofdstuk 3) is een methode om te berekenen hoeveel tumormonsters statistisch nodig zijn om verschillen en overeenkomsten in chromosomale afwijkingen tussen twee groepen tumoren te kunnen vergelijken. Er wordt een schatting gemaakt van de complexiteit van de afwijkingen in een bepaald type kanker met behulp van een beperkt aantal monsters. Vervolgens wordt geschat hoeveel tumoren nodig zijn om aan de statistische eisen te voldoen. CGHpower kan in de planningsfase van een subsidieaanvraag worden gebruikt als rechtvaardiging van de voorgestelde aantallen naar een subsidiegever, of kan gebruikt worden om te controleren of er voldoende aantallen tumoren in een experiment werden opgenomen. CGHpower wordt geëvalueerd met behulp van experimentele en gesimuleerde datasets. QDNAseq (Hoofdstuk 4) is een methode die een voorbewerkingstap maakt van ‘next generation sequencing’ data naar chromosoomaantallen in het genoom van een tumor, waarbij wordt uitgegaan van sequencing met een diepte van slechts 10\% van het gehele genoom. QDNAseq corrigeert de waargenomen genoomwijde dekking voor systematische fouten en faciliteert de mogelijkheid om onregelmatige gebieden in het genoom te verwijderen. Een lijst van dergelijke systematische fouten en onregelmatige gebieden is afgeleid van publieke data die openbaar werd gemaakt door het “1000 Genomes Project”. QDNAseq wordt geëvalueerd ten opzichte van de microarraytechniek en andere gepubliceerde software voor de analyse van numerieke chromosoomafwijkingen met behulp van ‘next generation sequencing’. Tenslotte worden de uitkomsten van QDNAseq op ‘next generation sequencing’ data vergeleken met theoretische statistisch verwachte resultaten. In het voorlaatste hoofdstuk (Chapter 5) wordt QDNAseq toegepast op translationeel onderzoek dat tot doel heeft afwijkingen in het aantal chromosomen of delen daarvan te identificeren bij tumoren van patiënten met laag-gradige gliomen. Chromosomale afwijkingen geïdentificeerd middels ‘next generation sequencing’ en QDNAseq worden gebruikt om associaties te bepalen met de overleving van de patiënt, de intratumorale heterogeniteit van de tumoren en de evolutie over tijd van deze tumoren. Een verlies van het distale deel van chromosoom 10q wordt in dit onderzoek geassocieerd met een slechte prognose. Deze bevinding kon worden gevalideerd in twee onafhankelijke patiëntenseries. Uit de beoordeling van intratumorale heterogeniteit en tumorevolutie blijkt tenslotte dat verlies van chromosoom 1p samen met 19q de enige afwijking is die consistent aan- of afwezig is in de tumoren. Net als bij de drie beschreven implementaties voor de analyse van chromosomale afwijkingen in kanker, wordt veel bioinformatisch onderzoek uitgevoerd in academische groepen. De discussie (Hoofdstuk 6) behandelt de opgedane ervaringen met betrekking tot het ontwikkelingsproces en de toepassing van bioinformatische oplossingen

    Copy Number Variants in the human genome and their association with quantitative traits

    Get PDF
    Copy number Variants (CNVs), which comprise deletions, insertions and inversions of genomic sequence, are a main form of genetic variation between individual genomes. CNVs are commonly present in the genomes of human and other species. However, they have not been extensively characterized as their ascertainment is challenging. I reviewed current CNV studies and CNV discovery methods, especially the algorithms which infer CNVs from whole genome Single Nucleotide Polymorphism (SNP) arrays and compared the performance of three analytical tools in order to identify the best method of CNV identification. Then I applied this method to identify CNV events in three European population isolates—the island of Vis in Croatia, the islands of Orkney in Scotland and villages in the South Tyrol in Italy - from Illumina genome-wide array data with more than 300,000 SNPs. I analyzed and compared CNV features across these three populations, including CNV frequencies, genome distribution, gene content, segmental duplication overlap and GC content. With the pedigree information for each population, I investigated the inheritance and segregation of CNVs in families. I also looked at association between CNVs and quantitative traits measured in the study samples. CNVs were widely found in study samples and reference genomes. Discrepancies were found between sets of CNVs called by different analytical tools. I detected 4016 CNVs in 1964 individuals, out of a total of 2789 participants from the three population isolates, which clustered into 743 copy number variable regions (CNVRs). Features of these CVNRs, including frequency and distribution, were compared and were shown to differ significantly between the Orcadian, South Tyrolean and Dalmatian population samples. Consistent with the inference that this indicated population-specific CNVR identity and origin, it was also demonstrated that CNV variation within each population can be used to measure genetic relatedness. Finally, I discovered that individuals who had extreme values of some metabolic traits possessed rare CNVs which overlapped with known genes more often than in individuals with moderate trait values

    Presented Abstracts from the Thirty Fourth Annual Education Conference of the National Society of Genetic Counselors (Pittsburgh, PA, October 2015)

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/147137/1/jgc41044.pd

    A longitudinal study of the experiences and psychological well-being of Indian surrogates

    Get PDF
    Study question: What is the psychological well-being of Indian surrogates during and after the surrogacy pregnancy? Summary answer: Surrogates were similar to a matched group of expectant mothers on anxiety and stress. However, they scored higher on depression during and after pregnancy. What is known already: The recent ban on trans-national commercial surrogacy in India has led to urgent policy discussions regarding surrogacy. Whilst previous studies have reported the motivations and experiences of Indian surrogates no studies have systematically examined the psychological well-being of Indian surrogates, especially from a longitudinal perspective. Previous research has shown that Indian surrogates are motivated by financial payment and may face criticism from their family and community due to negative social stigma attached to surrogacy. Indian surrogates often recruited by agencies and mainly live together in a “surrogacy house.” Study design, size, duration: A longitudinal study was conducted comparing surrogates to a matched group of expectant mothers over two time points: (a) during pregnancy (Phase1: 50 surrogates, 70 expectant mothers) and (b) 4–6 months after delivery (Phase 2: 45 surrogates, 49 expectant mothers). The Surrogates were recruited from a fertility clinic in Mumbai and the matched comparison group was recruited from four public hospitals in Mumbai and Delhi. Data collection was completed over 2 years. Participants/materials, setting, methods: Surrogates and expectant mothers were aged between 23 and 36 years. All participants were from a low socio-economic background and had left school before 12–13 years of age. In-depth faceto-face semi-structured interviews and a psychological questionnaire assessing anxiety, stress and depression were administered in Hindi to both groups. Interviews took place in a private setting. Audio recordings of surrogate interviews were later translated and transcribed into English. Main results and the role of chance: Stress and anxiety levels did not significantly differ between the two groups for both phases of the study. For depression, surrogates were found to be significantly more depressed than expectant mothers at phase 1 (p = 0.012) and phase 2 (p = 0.017). Within the surrogacy group, stress and depression did not change during and after pregnancy. However, a non-significant trend was found showing that anxiety decreased after delivery (p = 0.086). No participants reported being coerced into surrogacy, however nearly all kept it a secret from their wider family and community and hence did not face criticism. Surrogates lived at the surrogate house for different durations. During pregnancy, 66% (N = 33/50) reported their experiences of the surrogate house as positive, 24% (N = 12/50) as negative and 10% (N = 5/50) as neutral. After delivery, most surrogates (66%, N = 30/45) reported their experiences of surrogacy to be positive, with the remainder viewing it as neutral (28%) or negative (4%). In addition, most (66%, N = 30/45) reported that they had felt “socially supported and loved” during the surrogacy arrangement by friends in the surrogate hostel, clinic staff or family. Most surrogates did not meet the intending parents (49%, N = 22/45) or the resultant child (75%, N = 34/45). Limitations, reasons for caution: Since the surrogates were recruited from only one clinic, the findings may not be representative of all Indian surrogates. Some were lost to follow-up which may have produced sampling bias. Wider implications of the findings: This is the first study to examine the psychological well-being of surrogates in India. This research is of relevance to current policy discussions in India regarding legislation on surrogacy. Moreover, the findings are of relevance to clinicians, counselors and other professionals involved in surrogacy. Trial registration number: N/A
    corecore