52 research outputs found
Phasage dâhaplotypes par ASP Ă partir de longues lectures : une approche dâoptimisation flexible
Version non corrigĂ©e. Une nouvelle version sera disponible d'ici mars 2023.Each chromosome of a di- or polyploid organism has several haplotypes, which are highly similar but diverge on a certain number of positions. However, most of the reference genomes only provide a single sequence for each chromosome, and therefore do not reflect the biological reality.Yet, it is crucial to have access to this information, which is useful in medicine, agronomy and population studies. The recent development of third generation technologies, especially PacBio and Oxford Nanopore Technologies sequencers, has allowed for the production of long reads that facilitate haplotype sequence reconstruction.Bioinformatics methods exist for this task, but they provide only a single solution. This thesis introduces an approach for haplotype phasing based on the search of connected components in a read similarity graph to identify haplotypes. This method uses Answer Set Programming to work on the set ofoptimal solutions. This phasing algorithm has been used to reconstruct haplotypes of the diploid rotifer Adineta vaga.Chaque chromosome dâorganisme di- ou polyploĂŻde prĂ©sente plusieurs haplotypes, qui sont fortement similaires mais divergent sur un certain nombre de positions. Cependant, la majoritĂ© des gĂ©nomes de rĂ©fĂ©rence ne renseignent quâune seule sĂ©quence pour chaque chromosome, et ne reflĂštent donc pas la rĂ©alitĂ© biologique. Or, il est crucial dâavoir accĂšs Ă ces informations, qui sont utiles en mĂ©decine, en agronomie ou encore dans lâĂ©tude des populations. Le rĂ©cent dĂ©veloppement des technologies de troisiĂšme gĂ©nĂ©ration, notamment des sĂ©quenceurs PacBio et Oxford NanoporeTechnologies, a permis la production de lectures longues facilitant la reconstruction des sĂ©quences dâhaplotypes. Il existe pour cela des mĂ©thodes bioinformatiques, mais elles ne fournissent quâune unique solution. Cette thĂšse propose une mĂ©thode de phasage dâhaplotype basĂ©e sur la recherchede composantes connexes dans un graph de similaritĂ© des lectures pour identifier les haplotypes. Cette mĂ©thode utilise lâAnswer Set Programming pour travailler sur lâensemble des solutions optimales. Lâalgorithme de phasage a permis de reconstruire les haplotypes du rotifĂšre diploĂŻde Adineta vaga
Genetic factors affecting establishment during invasions : the introduction of the topmouth gudgeon (Pseudorasbora parva) and the rainbow trout (Oncorhynchus mykiss) in Europe
The study of biological invasions is a major research topic, both because of the ecological and economical damage caused by invasive species and also as a great natural experiment to study evolutionary responses of non-native populations to their new environment, and the factors influencing invasions. Introduced species often evolve rapidly, despite the assumed loss of genetic variation associated with bottlenecks during the invasion process. In order examine the processes and mechanisms affecting the outcome invasions I studied two non-native fish species, the topmouth gudgeon (Pseudorasbora parva) is an Asian cyprinid that is found in most European countries as a result of accidental introductions. Rainbow trout (Oncorhynchus mykiss) has been introduced from the United States for aquaculture and angling, however, despite numerous introductions, it has only been able to establish in few European waters. I used mitochondrial DNA and microsatellite markers to understand the invasion history of these species and the factors that influence their establishment success/failure. Part of the cytochrome b gene was analysed in European and native Asian P. parva populations and microsatellite markers were used to investigate the source populations of the species. The analyses elucidated the colonisation pattern of P. parva in Europe and supported the hypothesis that the species spread through long-distance and stepping-stone methods and originate from admixed source populations. In O. mykiss, part of the d-loop region of the mitochondrial genome was analysed to compare the phylogeographic structure of native US and introduced European populations to examine the spread of the species outside its native range, as well as to find out whether the resistant Hofer strain is the source population of the European rainbow trout populations. I found that European populations are likely to originate from various sources, mainly from California. The Hofer strain is likely to have contributed to some of the wild European populations. Assessing the role of these processes is fundamental in understanding invasive species and finding suitable management practices to control them. From an evolutionary point of view, I was able to detect some of the processes that are important during invasions, in these studies particularly the role of multiple introductions and introduction from genetically admixed source populations
Recommended from our members
Spatial stochastic models for network analysis
This thesis proposes new stochastic interacting particle models for networks, and studies some fundamental properties of these models. This thesis considers two application areas of networking - engineering design questions in future wireless systems and algorithmic tasks in large scale graph structured data. The key innovation introduced in this thesis is to bring tools and ideas from stochastic geometry to bear on the problems in both these application domains. We identify certain fundamental questions in design and engineering both wireless systems and large scale graph structured data processing systems. Subsequently, we identify novel stochastic geometric models, that captures the fundamental properties of these networks, which forms the first research contribution. We then rigorously study these models, by bringing to bear new tools from stochastic geometry, random graphs, percolation and Markov processes to establish structural results and fundamental phase transitions in these models. Using our developed mathematical methodology, we then identify design insights and develop algorithms, which we demonstrate are instructive in many practical settings. In the setting of wireless systems, this thesis studies both ad-hoc and cellular networks. In the ad-hoc network setting, we aim to understand fundamental limits of the simplest possible protocol to access the spectrum, namely a link transmits whenever it has data to send by treating all interference as noise. Surprisingly this basic question itself was not understood, as the system dynamics is coupled spatially due to the interference links cause one another and temporally due to randomness in traffic arrivals. We propose a novel interacting particle model called the spatial birth-death wireless network model to understand the stability properties of the simple spectrum access protocol. Using tools from Palm calculus and fluid limit theory, we establish a tight characterization of when this model is stable. Furthermore, we show that whenever stable, the links in steady-state exhibit a form of clustering. Leveraging these structural results, we propose two mean field heuristics to obtain formulas for key performance metrics such as average delay experienced by a link. We empirically find that the proposed formulas for delay predicts accurately the system behavior. We subsequently study scalability properties of this model by introducing an appropriate infinite dimensional version of the model we call the Interference Queueing Networks model. The model consists of a queue located at each grid point of an infinite regular integer lattice, with the queues interacting with each other in a translation invariant fashion. We then prove several structural properties of the model namely, tight conditions for existence of stationary solutions and some sufficient conditions for uniqueness of stationary solutions. Remarkably, we obtain exact formula for mean delay in this model, unlike the continuum model where we relied on mean-field type heuristics to obtain insights. In the setting of cellular networks, we study optimal association schemes by mobile phones in the case when there are several possible base station technologies operating on orthogonal bands. We show that this choice leads to a performance gain we term technology diversity. Interestingly, we show that the performance gain relies on the amount of instantaneous information a user has on the various base station technologies that it can leverage to make the association decision. We outline optimal association schemes under various information settings that a user may have on the network. Moreover, we propose simple heuristics for association that relies on a user obtaining minimal instantaneous information and are thus practical to implement. We prove that in certain natural asymptotic regime of parameters, our proposed heuristic policy is also optimal, and thus quantifying the value of having fine grained information at a user for association. We empirically observe that the asymptotic result is valid even at finite parameter regimes that are typical in todays networks. In the application of analyzing large scale graph structured data, we consider the graph clustering problem with side information. Graph clustering is a standard and widely used task which consists in partitioning the set of nodes of a graph into underlying clusters where nodes in the same cluster are similar to each other and nodes across different clusters are different. Motivated by applications in social and biological networks, we consider the task of clustering nodes of a graph, when there is side information on the nodes, other than that contained in the graph. For instance in social networks, one has access to meta data about a person (node in a social graph) such as age, location, income etc, along with the combinatorial data of who are his friends on the social graph. Similarly, in biological networks, there is often meta-data about an experiment that provides additional contextual data about a node, in addition to the combinatorial data. In this thesis, we propose a generative model for such graph structured data with side information, which is inspired by random graph models in stochastic geometry such as the random connection model and the generative models for networks with clusters without contexts, such as the stochastic block model or the planted partition model. We propose a novel graph model called the planted partition random connection model. Roughly speaking, in this model, each node has two labels - an observable R [superscript d] valued (for some fixed d) feature label and an unobservable binary valued community label. Conditional on the node labels, edges are drawn at random in this graph depending on both the feature and community labels of the two end points. The clustering task consists in recovering the underlying partition of nodes corresponding to the respective community labels better than a random assignment, when given an observation of the graph generated and the features of all nodes. We show that if the 'density of nodes', i.e., average number of nodes having features in an unit volume of space of R [superscript d] is small, then no algorithm can cluster the graph that can asymptotically beat a random assignment of community labels. On the contrary, if the density of nodes is sufficiently high, we give a simple algorithm that recovers the true underlying partition strictly better a random assignment. We then apply the proposed algorithm to a problem in computational biology called Haplotype Phasing and observe empirically, that it obtains state of art results. This demonstrates, both the validity of our generative model, as well as our new algorithm.Electrical and Computer Engineerin
Evolutionary patterns and processes in the genus Potentilla L. (Rosaceae)
Firstly, a reconstruction of phylogenetic relationships based on three chloroplast (cp) DNA markers comprising 98 species of the genus Potentilla and 15 additional genera from the tribe Potentilleae (Rosaceae) is presented. The phylogeny supported the current generic concept of two subtribes (Fragariinae and Potentillinae), and resolved major lineages within the subtribe Potentillinae, comprising also taxonomically highly diverse but molecularly little diverged core group of Potentilla. Age estimates of phylogenetic splits resolved in the Potentilleae using Bayesian inference, suggested a diversification of the tribe in the Eocene and radiation of two major evolutionary lineages (subtribes) at approximately comparable times. Ancestral area reconstructions based on the recent distribution ranges suggested an Asian origin for Potentilla s.str., and explained its arrival in Europe and particularly in North America by multiple dispersal events. The combination of the phylogenetic, geographic and fossil record data with inferred time estimates and taxonomy revealed strongly contrasting evolutionary patterns: rapid speciation on a continental and worldwide scale accompanied by multiple intercontinental dispersals opposing to the largely diverged lineages of limited taxonomic diversity and vicariant geographic distribution. Furthermore, hybridisation and polyploidisation as drivers of speciation were identified in two case studies of restricted taxonomical and geographical coverage. Combined analysis of AFLPs, cpDNA sequences and ploidy levels, used in a case study of P. argentea group in Europe, identified four main lineages within the Potentilla argentea group, revealing two ploidy levels. Allopolyploid origin was confirmed for the hexaploid P. argentea, which apears to be apomictic. The diploid P. argentea is a selfâpollinator with a highly reduced genetic variability and P. calabra is reproducing sexually. A Late Quaternary migration route from Iberian Peninsula throughout the western Europe to Scandinavia and probably also farther to the Baltic region was suggested for the diploid P. argentea and no clear geographical patterns were detected for the hexaploid P. argentea, most probably due to independent immigration of genetically divergent lineages, which resulted in an overlap of several immigration routes. Finally, P. alpicola and P. collina populations in the South Tyrol were examined. On one hand, P. argentea and P. pusilla have been identified as parental taxa for the apomictic P. alpicola. On the other hand, apomictic P. collina populations are regarded rather as recent derivatives of the hexaploid P. argentea. Studied populations seem to evolve multiply, at each locality separately, however some populations share similar evolutionary history
Conservation Genetics for Management of Threatened Plant and Animal Species
This book focuses on conservation genetic (and genomic) papers that demonstrate applied outcomes that inform practical threatened species management. We cover a broad range of species and genetic approaches, but focus on how conservation genetic information is used to underpin management actions for species recovery. Through the exposition of a diversity of approaches, we aim to demonstrate to conservation managers and researchers how conservation genetics can inform on-ground species management
Molecular Basis of Apomixis in Plants
Apomixis is the consequence of a concerted mechanism that harnesses the sexual machinery and coordinates developmental steps in the ovule to produce an asexual (clonal) seed. Altered sexual developments involve widely characterized functional and anatomical changes in meiosis, gametogenesis, and embryo and endosperm formation. The ovules of apomictic plants skip meiosis and form unreduced female gametophytes whose egg cells develop into a parthenogenetic embryo, and the central cells may or may not fuse to a sperm to develop the seed endosperm. Thus, functional apomixis involves at least three components, apomeiosis, parthenogenesis, and endosperm development, modified from sexual reproduction that must be coordinated at the molecular level to progress through the developmental steps and form a clonal seed. Despite recent progress uncovering specific genes related to apomixis-like phenotypes and the formation of clonal seeds, the molecular basis and regulatorynetwork of apomixis is still unknown. This is a central problem underlying the current limitations of apomixis breeding. This book collates twelve publications addressing different topics around the molecular basis of apomixis, illustrating recent discoveries and advances toward understanding the genetic regulation of the trait, discussing the possible origins of apomixis and the remaining challenges for its commercial deployment in plants
Pacific Symposium on Biocomputing 2023
The Pacific Symposium on Biocomputing (PSB) 2023 is an international, multidisciplinary conference for the presentation and discussion of current research in the theory and application of computational methods in problems of biological significance. Presentations are rigorously peer reviewed and are published in an archival proceedings volume. PSB 2023 will be held on January 3-7, 2023 in Kohala Coast, Hawaii. Tutorials and workshops will be offered prior to the start of the conference.PSB 2023 will bring together top researchers from the US, the Asian Pacific nations, and around the world to exchange research results and address open issues in all aspects of computational biology. It is a forum for the presentation of work in databases, algorithms, interfaces, visualization, modeling, and other computational methods, as applied to biological problems, with emphasis on applications in data-rich areas of molecular biology.The PSB has been designed to be responsive to the need for critical mass in sub-disciplines within biocomputing. For that reason, it is the only meeting whose sessions are defined dynamically each year in response to specific proposals. PSB sessions are organized by leaders of research in biocomputing's 'hot topics.' In this way, the meeting provides an early forum for serious examination of emerging methods and approaches in this rapidly changing field
- âŠ