10 research outputs found
Phasage d’haplotypes par ASP à partir de longues lectures : une approche d’optimisation flexible
Version non corrigée. Une nouvelle version sera disponible d'ici mars 2023.Each chromosome of a di- or polyploid organism has several haplotypes, which are highly similar but diverge on a certain number of positions. However, most of the reference genomes only provide a single sequence for each chromosome, and therefore do not reflect the biological reality.Yet, it is crucial to have access to this information, which is useful in medicine, agronomy and population studies. The recent development of third generation technologies, especially PacBio and Oxford Nanopore Technologies sequencers, has allowed for the production of long reads that facilitate haplotype sequence reconstruction.Bioinformatics methods exist for this task, but they provide only a single solution. This thesis introduces an approach for haplotype phasing based on the search of connected components in a read similarity graph to identify haplotypes. This method uses Answer Set Programming to work on the set ofoptimal solutions. This phasing algorithm has been used to reconstruct haplotypes of the diploid rotifer Adineta vaga.Chaque chromosome d’organisme di- ou polyploïde présente plusieurs haplotypes, qui sont fortement similaires mais divergent sur un certain nombre de positions. Cependant, la majorité des génomes de référence ne renseignent qu’une seule séquence pour chaque chromosome, et ne reflètent donc pas la réalité biologique. Or, il est crucial d’avoir accès à ces informations, qui sont utiles en médecine, en agronomie ou encore dans l’étude des populations. Le récent développement des technologies de troisième génération, notamment des séquenceurs PacBio et Oxford NanoporeTechnologies, a permis la production de lectures longues facilitant la reconstruction des séquences d’haplotypes. Il existe pour cela des méthodes bioinformatiques, mais elles ne fournissent qu’une unique solution. Cette thèse propose une méthode de phasage d’haplotype basée sur la recherchede composantes connexes dans un graph de similarité des lectures pour identifier les haplotypes. Cette méthode utilise l’Answer Set Programming pour travailler sur l’ensemble des solutions optimales. L’algorithme de phasage a permis de reconstruire les haplotypes du rotifère diploïde Adineta vaga
Whole genome amplification for PGD and PND; molecular and a-CGH diagnosis
Whole genome amplification amplifies the entire genome in a few hours from samples of
minimal DNA quantities, even from single cells. This may have many applications,
especially in prenatal diagnosis, PGD and PGS.
The hypothesis for chapter 3 was: Can multiple displacement amplification (MDA) be
used as a universal step prior to molecular analysis for PGD? WGA using MDA (Qiagen)
was used on single cells in order to overcome the problem of limited DNA in PGD. MDA
allows the diagnosis through haplotyping or a combination of direct and indirect mutation
analysis. Different cell types, including buccal cells, lymphocytes, fibroblasts and
blastomeres were examined. A modification on the cell lysis buffer was also tested in
order to achieve more accurate results. PGD seems to benefit from MDA when multiple
tests are performed for direct and indirect analysis. The modified lysis buffer (exclusion
of DTT) produced better results than the other lysis buffers and buccal cells do not
produce as accurate results as other cell types. The hypothesis was met as the amount of
DNA produced by MDA can be used for direct and indirect testing and haplotyping.
The hypothesis for chapter 4 was: Is it possible to accurately assess the chromosomes of a
single cell by a-CGH? WGA was achieved by MDA and GenomePlex (Sigma) on single
lymphocytes, fibroblasts and blastomeres prior to a-CGH analysis. The difficulty of this
technique was the high background noise that was produced by WGA that makes
interpretation difficult. Different lysis buffers, modifications of the WGA reaction and
analysis software were examined for better results. A-CGH slides from different
companies and institutions were used. The results showed that GenomePlex produced
less background noise compared to MDA but the amplification efficiency of the
technique was less reliable. The BlueGnome Cytochip arrays produced the best compared
to arrays from any other companies or institutions. More experiments would be necessary
to determine if the hypothesis was met as a number of chromosomal abnormalities
detected were not always confirmed by other experiments. The hypothesis for chapter 5 was: Can aneuploidy be detected in coelomic fluid using a-CGH? The possibility of using WGA and a-CGH on coelomic fluid was tested as this
could be used as an early form of prenatal diagnosis. Coelomic fluid was collected
between the 5th and 11th week of pregnancy from women undergoing termination of
pregnancy. MDA and GenomePlex were used to amplify the DNA prior to a-CGH
analysis. Both genomic (high resolution) and constitutional (low resolution) arrays were
tested. The results showed that aneuploidy can be detected by a-CGH. BlueGnome
Cytochip slides produced the best results. A triploid sample was detected as normal. The
hypothesis was met and even higher resolution could be achieved with the use of
GenomePlex and BlueGnome Cytochip arrays.
WGA may be very important for downstream genetic tests when the DNA is from very
low quality and quantity. Further optimisation of the technique is needed in order to
achieve similar results to those of good quality genomic DNA. Arrays from different
companies or institutions may produce very different results. In conclusion, the results
showed that WGA can benefit PGD and PND, and a-CGH gives great potential to PGS
and coelomic fluid diagnosis
Recommended from our members
Spatial stochastic models for network analysis
This thesis proposes new stochastic interacting particle models for networks, and studies some fundamental properties of these models. This thesis considers two application areas of networking - engineering design questions in future wireless systems and algorithmic tasks in large scale graph structured data. The key innovation introduced in this thesis is to bring tools and ideas from stochastic geometry to bear on the problems in both these application domains. We identify certain fundamental questions in design and engineering both wireless systems and large scale graph structured data processing systems. Subsequently, we identify novel stochastic geometric models, that captures the fundamental properties of these networks, which forms the first research contribution. We then rigorously study these models, by bringing to bear new tools from stochastic geometry, random graphs, percolation and Markov processes to establish structural results and fundamental phase transitions in these models. Using our developed mathematical methodology, we then identify design insights and develop algorithms, which we demonstrate are instructive in many practical settings. In the setting of wireless systems, this thesis studies both ad-hoc and cellular networks. In the ad-hoc network setting, we aim to understand fundamental limits of the simplest possible protocol to access the spectrum, namely a link transmits whenever it has data to send by treating all interference as noise. Surprisingly this basic question itself was not understood, as the system dynamics is coupled spatially due to the interference links cause one another and temporally due to randomness in traffic arrivals. We propose a novel interacting particle model called the spatial birth-death wireless network model to understand the stability properties of the simple spectrum access protocol. Using tools from Palm calculus and fluid limit theory, we establish a tight characterization of when this model is stable. Furthermore, we show that whenever stable, the links in steady-state exhibit a form of clustering. Leveraging these structural results, we propose two mean field heuristics to obtain formulas for key performance metrics such as average delay experienced by a link. We empirically find that the proposed formulas for delay predicts accurately the system behavior. We subsequently study scalability properties of this model by introducing an appropriate infinite dimensional version of the model we call the Interference Queueing Networks model. The model consists of a queue located at each grid point of an infinite regular integer lattice, with the queues interacting with each other in a translation invariant fashion. We then prove several structural properties of the model namely, tight conditions for existence of stationary solutions and some sufficient conditions for uniqueness of stationary solutions. Remarkably, we obtain exact formula for mean delay in this model, unlike the continuum model where we relied on mean-field type heuristics to obtain insights. In the setting of cellular networks, we study optimal association schemes by mobile phones in the case when there are several possible base station technologies operating on orthogonal bands. We show that this choice leads to a performance gain we term technology diversity. Interestingly, we show that the performance gain relies on the amount of instantaneous information a user has on the various base station technologies that it can leverage to make the association decision. We outline optimal association schemes under various information settings that a user may have on the network. Moreover, we propose simple heuristics for association that relies on a user obtaining minimal instantaneous information and are thus practical to implement. We prove that in certain natural asymptotic regime of parameters, our proposed heuristic policy is also optimal, and thus quantifying the value of having fine grained information at a user for association. We empirically observe that the asymptotic result is valid even at finite parameter regimes that are typical in todays networks. In the application of analyzing large scale graph structured data, we consider the graph clustering problem with side information. Graph clustering is a standard and widely used task which consists in partitioning the set of nodes of a graph into underlying clusters where nodes in the same cluster are similar to each other and nodes across different clusters are different. Motivated by applications in social and biological networks, we consider the task of clustering nodes of a graph, when there is side information on the nodes, other than that contained in the graph. For instance in social networks, one has access to meta data about a person (node in a social graph) such as age, location, income etc, along with the combinatorial data of who are his friends on the social graph. Similarly, in biological networks, there is often meta-data about an experiment that provides additional contextual data about a node, in addition to the combinatorial data. In this thesis, we propose a generative model for such graph structured data with side information, which is inspired by random graph models in stochastic geometry such as the random connection model and the generative models for networks with clusters without contexts, such as the stochastic block model or the planted partition model. We propose a novel graph model called the planted partition random connection model. Roughly speaking, in this model, each node has two labels - an observable R [superscript d] valued (for some fixed d) feature label and an unobservable binary valued community label. Conditional on the node labels, edges are drawn at random in this graph depending on both the feature and community labels of the two end points. The clustering task consists in recovering the underlying partition of nodes corresponding to the respective community labels better than a random assignment, when given an observation of the graph generated and the features of all nodes. We show that if the 'density of nodes', i.e., average number of nodes having features in an unit volume of space of R [superscript d] is small, then no algorithm can cluster the graph that can asymptotically beat a random assignment of community labels. On the contrary, if the density of nodes is sufficiently high, we give a simple algorithm that recovers the true underlying partition strictly better a random assignment. We then apply the proposed algorithm to a problem in computational biology called Haplotype Phasing and observe empirically, that it obtains state of art results. This demonstrates, both the validity of our generative model, as well as our new algorithm.Electrical and Computer Engineerin
Vibrio interactions with bivalve hemocytes and analysis of the Crassostrea gigas microbiota
My PhD project aimed at investigating the molecular mechanisms at the basis of the interaction between Vibrio bacteria and shellfish in the bivalve models Crassotrea gigas and Mytilus galloprovincialis and to study the composition and dynamics of bivalve microbiota.
Previous studies suggested that persistence of entrapped bacteria inside bivalve tissues depends, at least in part, on their capacity to survive to the hemolymph bactericidal activity, that is exerted by both hemocytes and serum soluble factors. In the first part of my PhD work, hemocytes of M. galloprovincialis were challenged with different pathogenic Vibrio strains (V. aestuarianus 01/032, V. aestuarianus 02/041, V. tasmaniensis LGP32, V. harveyi VH2, V. tapetis CECT 4600 and V. coralliilyticus ATCC BAA 450) in the presence or in the absence of the extrapallial protein present in M. galloprovincialis serum (MgEP), and of the whole hemolymph serum. In addition, C. gigas hemocytes were exposed to the bivalve pathogens V. aestuarianus 01/032 and V. aestuarianus 02/041 under the same conditions to better understand molecular basis of bacteria-hemolymph interactions in oysters. We observed that MgEP promotes D- mannose sensitive adhesion to and killing by hemocytes of the bivalve pathogens V. aestuarianus 01/032, V. aestuarianus 02/041, V. tasmaniensis LGP32 and V. coralliilyticus ATCC BAA 450. In addition, in the presence of M. galloprovincialis EP protein (MgEP), C. gigas haemocytes killed V. aestuarianus 01/032 and V. aestuarianus 02/041 almost as efficiently as mussel phagocytes. These findings suggest that the different sensitivity of Vibrio strains to the antibacterial activity of oyster (susceptible to Vibrio infection) and mussel (resistant to Vibrio infection) haemolymph might partly depend on the fact that C. gigas serum lacks MgEP-like opsonins. These results may have important implications for improving bivalve depuration strategies and prevent diseases affecting bivalve production worldwide.
In the second part of my thesis work, I studied the microbial communities associated to contrasting C. gigas samples collected during mortality episodes in different European sites. Real-time PCR targeting oyster pathogens (e.g. Ostreid herpesvirus 1 [OshV-1] and V. aestuarianus) and 16SrRNA gene-based microbial profiling were applied on a large number of C. gigas samples (n=525 and n=101 for qPCR and 16SrRNA gene profiling analysis, respectively) to extensively investigate the patterns and dynamics of oyster microbiota during mortality events. Comparative analysis of contrasting (e.g. infected vs not infected) C. gigas samples conducted using these methods revealed that oyster experiencing mortality outbreaks displayed signs of microbiota disruption associated with the presence of previously undetected potential pathogenic microbial species mostly belonging to genus Vibrio and Arcobacter. This represents to our knowledge, the largest study conducted so far to determine the composition and dynamics of farmed oyster microbiota
Diagnostics in Plant Breeding
“Diagnostics in Plant Breeding” is systematically organizing cutting-edge research reviews on the development and application of molecular tools for the prediction of plant performance. Given its significance for mankind and the available research resources, medical sciences are leading the area of molecular diagnostics, where DNA-based risk assessments for various diseases and biomarkers to determine their onset become increasingly available. So far, most research in plant genomics has been directed towards understanding the molecular basis of biological processes or phenotypic traits. From a plant breeding perspective, however, the main interest is in predicting optimal genotypes based on molecular information for more time- and cost-efficient breeding schemes. It is anticipated that progress in plant genomics and in particular sequence technology made recently will shift the focus from “explanatory” to “predictive” in crop science. This book assembles chapters on all areas relevant to development and application of predictive molecular tools in plant breeding by leading authorties in the respective areas
The Peanut Genome
This book presents the current state of the art in peanut genomics, focusing particularly on the latest genomic findings, tools and strategies employed in genome sequencing, transcriptomes and analysis, availability of public and private genomic resources, and ways to maximize the use of this information in peanut breeding programs. Further, it demonstrates how advances in plant genomics can be used to improve crop breeding. The peanut or groundnut (Arachis hypogaea L. Millsp) is a globally important grain legume and oilseed crop, cultivated in over 100 countries and consumed in the form of roasted seeds, oil and confectionary in nearly every country on Earth. The peanut contributes towards achieving food and nutritional security, in addition to financial security through income generation; as such, it is also vital to the livelihood of the poor in the developing world. There have been significant advances in peanut research, especially in the last five years, including sequencing the genome of both diploid progenitors, and the availability of tremendous transcriptome resources, large-scale genomic variations that can be used as genetic markers, genetic populations (bi- and multiparent populations and germplasm sets), marker-trait associations and molecular breeding products. The immediate availability of the genome sequence for tetraploid cultivated peanuts is the most essential genomic resource for achieving a deeper understanding of peanut traits and their use in breeding programs
A longitudinal study of the experiences and psychological well-being of Indian surrogates
Study question: What is the psychological well-being of Indian surrogates during and after the surrogacy pregnancy?
Summary answer: Surrogates were similar to a matched group of expectant mothers on anxiety and stress. However, they scored higher on depression during and after pregnancy.
What is known already: The recent ban on trans-national commercial surrogacy in India has led to urgent policy discussions regarding surrogacy. Whilst previous studies have reported the motivations and experiences of Indian surrogates no studies have systematically examined the psychological well-being of Indian surrogates, especially from a longitudinal perspective. Previous research has shown that Indian surrogates are motivated by financial payment and may face criticism from their family and community due to negative social stigma attached to surrogacy. Indian surrogates often recruited by agencies and mainly live together in a “surrogacy house.”
Study design, size, duration: A longitudinal study was conducted comparing surrogates to a matched group of expectant mothers over two time points: (a) during pregnancy (Phase1: 50 surrogates, 70 expectant mothers) and (b) 4–6 months after delivery (Phase 2: 45 surrogates, 49 expectant mothers). The Surrogates were recruited from a fertility clinic in Mumbai and the matched comparison group was recruited from four public hospitals in Mumbai and Delhi.
Data collection was completed over 2 years.
Participants/materials, setting, methods: Surrogates and expectant mothers were aged between 23 and 36 years. All participants were from a low socio-economic background and had left school before 12–13 years of age. In-depth faceto-face semi-structured interviews and a psychological questionnaire assessing anxiety, stress and depression were administered in Hindi to both groups. Interviews took place in a private setting. Audio recordings of surrogate interviews were later translated and transcribed into English.
Main results and the role of chance: Stress and anxiety levels did not significantly differ between the two groups for both phases of the study. For depression, surrogates were found to be significantly more depressed than expectant mothers at phase 1 (p = 0.012) and phase 2 (p = 0.017). Within the surrogacy group, stress and depression did not change during and after pregnancy. However, a non-significant trend was found showing that anxiety decreased after delivery (p = 0.086). No participants reported being coerced into surrogacy, however nearly all kept it a secret from their wider family and community and hence did not face criticism. Surrogates lived at the surrogate house for different durations. During pregnancy, 66% (N = 33/50) reported their experiences of the surrogate house as positive, 24% (N = 12/50) as negative and 10% (N = 5/50) as neutral. After delivery, most surrogates (66%, N = 30/45) reported their experiences of surrogacy to be positive, with the remainder viewing it as neutral (28%) or negative (4%). In addition, most (66%, N = 30/45) reported that they had felt “socially supported and loved” during the surrogacy arrangement by friends in the surrogate hostel, clinic staff or family. Most surrogates did not meet the intending parents (49%, N = 22/45) or the resultant child (75%, N = 34/45).
Limitations, reasons for caution: Since the surrogates were recruited from only one clinic, the findings may not be representative of all Indian surrogates. Some were lost to follow-up which may have produced sampling bias.
Wider implications of the findings: This is the first study to examine the psychological well-being of surrogates in India. This research is of relevance to current policy discussions in India regarding legislation on surrogacy. Moreover, the findings are of relevance to clinicians, counselors and other professionals involved in surrogacy.
Trial registration number: N/A