2,318 research outputs found

    Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach

    Get PDF
    Polymorphism discovery is a routine application of next-generation sequencing technology where multiple samples are sent to a service provider for library preparation, subsequent sequencing, and bioinformatic analyses. The decreasing cost and advances in multiplexing approaches have made it possible to analyze hundreds of samples at a reasonable cost. However, because of the manual steps involved in the initial processing of samples and handling of sequencing equipment, cross-contamination remains a significant challenge. It is especially problematic in cases where polymorphism frequencies do not adhere to diploid expectation, for example, heterogeneous tumor samples, organellar genomes, as well as during bacterial and viral sequencing. In these instances, low levels of contamination may be readily mistaken for polymorphisms, leading to false results. Here we describe practical steps designed to reliably detect contamination and uncover its origin, and also provide new, Galaxy-based, readily accessible computational tools and workflows for quality control. All results described in this report can be reproduced interactively on the web as described at http://usegalaxy.org/contamination

    Proactive Highly Ambulatory Sensor Routing (PHASeR) protocol for mobile wireless sensor networks

    Get PDF
    This paper presents a novel multihop routing protocol for mobile wireless sensor networks called PHASeR (Proactive Highly Ambulatory Sensor Routing). The proposed protocol uses a simple hop-count metric to enable the dynamic and robust routing of data towards the sink in mobile environments. It is motivated by the application of radiation mapping by unmanned vehicles, which requires the reliable and timely delivery of regular measurements to the sink. PHASeR maintains a gradient metric in mobile environments by using a global TDMA MAC layer. It also uses the technique of blind forwarding to pass messages through the network in a multipath manner. PHASeR is analysed mathematically based on packet delivery ratio, average packet delay, throughput and overhead. It is then simulated with varying mobility, scalability and traffic loads. The protocol gives good results over all measures, which suggests that it may also be suitable for a wider array of emerging applications

    Molecular and Functional Properties of Transmitted HIV-1 Envelope Variants: A Dissertation

    Get PDF
    In 2008 the Nobel Prize in Physiology or Medicine was awarded to the co-discoverers of the Human Immunodeficiency Virus Type 1 (HIV-1), the causative agent of Acquired Immunodeficiency Syndrome (AIDS). This award acknowledged the enormous worldwide impact of the HIV-1/AIDS pandemic and the importance of research aimed at halting its spread. Since the syndrome was first recognized, 25 million people have succumbed to AIDS and over 33 million are currently infected with HIV-1 (www.unaids.org). The most effective strategy for ending the pandemic is the creation of a prophylactic vaccine. Yet, to date, all efforts at HIV-1 vaccine design have met with very limited success. The consistent failures of vaccine candidates stem in large part from the unprecedented diversity of HIV-1. Among the novel theories of vaccine design put forward to address this diversity is the targeted vaccine approach. This proposal is based on the finding that mucosal transmission of HIV-1, the most prevalent form, occurs across a selective bottleneck such that typically only a single (or a few) variants of the viral swarm present in a donor are passed to the recipient. While the mechanisms controlling the selection are largely unknown, the targeted vaccine approach postulates that once they are identified, we can utilize this understanding to design vaccines specifically targeted to the characteristics shared by the rare, mucosally transmissible HIV-1 variants. The studies described in this work were conducted to improve our understanding of the factors influencing viral variant selection during mother-to-child-transmission of HIV-1, a route of mucosal transmission which has globally become the leading cause of child infection. A unique panel was generated, consisting of nearly 300 HIV-1 envelope genes cloned from infected mother-infant pairs. Extensive characterization of the genotypes, phenotypes and phylogeny of these clones was then done to identify attributes differentiating early infant from maternal variants. Low genetic diversity of HIV-1 envelope variants was detected in early infant samples, suggesting a bottleneck and active selection of variants for transmission. Transmitted variants did not differ from non-transmitted variants in CD4 and CCR5 use. Infant isolates replicated poorly in macrophages; a cell subtype hypothesized to be important in the establishment of infection. The sensitivity of infant envelope variants to neutralization by a panel of monoclonal antibodies, heterologous and autologous plasmas and HIV-1 entry inhibitors varied. Most intriguingly, envelopes cloned from infants infected during delivery exhibited a faster entry phenotype than maternal isolates. Together, these findings provide further insight into viral variant selection during mother-to-child transmission. Identification of properties shared by mucosally transmitted viral variants may allow them to be selectively targeted, resulting in improved methods for preventing HIV-1 transmission

    Climbing Atop the Shoulders of Giants: The Impact of Institutions on Cumulative Research

    Get PDF
    While the cumulative nature of knowledge is recognized as central to economic growth, the microeconomic foundations of cumulativeness are less understood. This paper investigates the impact of a research-enhancing institution on cumulativeness, highlighting two effects. First, a selection effect may result in a high correlation between "high-quality" institutions and knowledge of high intrinsic quality. Second, an institution may have a marginal impact -- an incremental influence on cumulativeness, conditional on the type and quality of knowledge considered. This paper distinguishes these effects in the context of a specific institution, biological resource centers (BRCs). BRCs are "living libraries" that authenticate, preserve, and offer independent access to biological materials, such as cells, cultures, and specimens. BRCs may enhance the cumulativeness of knowledge by reducing the marginal cost to researchers of drawing on prior research efforts. We exploit three key aspects of the environment in which BRCs operate to evaluate how they affect the cumulativeness of knowledge: (a) the impact of scientific knowledge is reflected in future scientific citations, (b) deposit into BRCs often occurs with a substantial lag after initial research is completed and published, and (c) "lagged" deposits often result from shocks unrelated to the characteristics of the materials themselves. Employing a difference-in-differences estimator linking specific materials deposits to journal articles, we find evidence for both selection effects and the marginal impact of BRCs on the cumulativeness of knowledge associated with deposited materials. Moreover, the marginal impact increases with time and varies with the economic and institutional conditions in which deposit occurs.

    Deep Learning-Based Robotic Perception for Adaptive Facility Disinfection

    Get PDF
    Hospitals, schools, airports, and other environments built for mass gatherings can become hot spots for microbial pathogen colonization, transmission, and exposure, greatly accelerating the spread of infectious diseases across communities, cities, nations, and the world. Outbreaks of infectious diseases impose huge burdens on our society. Mitigating the spread of infectious pathogens within mass-gathering facilities requires routine cleaning and disinfection, which are primarily performed by cleaning staff under current practice. However, manual disinfection is limited in terms of both effectiveness and efficiency, as it is labor-intensive, time-consuming, and health-undermining. While existing studies have developed a variety of robotic systems for disinfecting contaminated surfaces, those systems are not adequate for intelligent, precise, and environmentally adaptive disinfection. They are also difficult to deploy in mass-gathering infrastructure facilities, given the high volume of occupants. Therefore, there is a critical need to develop an adaptive robot system capable of complete and efficient indoor disinfection. The overarching goal of this research is to develop an artificial intelligence (AI)-enabled robotic system that adapts to ambient environments and social contexts for precise and efficient disinfection. This would maintain environmental hygiene and health, reduce unnecessary labor costs for cleaning, and mitigate opportunity costs incurred from infections. To these ends, this dissertation first develops a multi-classifier decision fusion method, which integrates scene graph and visual information, in order to recognize patterns in human activity in infrastructure facilities. Next, a deep-learning-based method is proposed for detecting and classifying indoor objects, and a new mechanism is developed to map detected objects in 3D maps. A novel framework is then developed to detect and segment object affordance and to project them into a 3D semantic map for precise disinfection. Subsequently, a novel deep-learning network, which integrates multi-scale features and multi-level features, and an encoder network are developed to recognize the materials of surfaces requiring disinfection. Finally, a novel computational method is developed to link the recognition of object surface information to robot disinfection actions with optimal disinfection parameters

    Data Mining and Machine Learning in Astronomy

    Full text link
    We review the current state of data mining and machine learning in astronomy. 'Data Mining' can have a somewhat mixed connotation from the point of view of a researcher in this field. If used correctly, it can be a powerful approach, holding the potential to fully exploit the exponentially increasing amount of available data, promising great scientific advance. However, if misused, it can be little more than the black-box application of complex computing algorithms that may give little physical insight, and provide questionable results. Here, we give an overview of the entire data mining process, from data collection through to the interpretation of results. We cover common machine learning algorithms, such as artificial neural networks and support vector machines, applications from a broad range of astronomy, emphasizing those where data mining techniques directly resulted in improved science, and important current and future directions, including probability density functions, parallel algorithms, petascale computing, and the time domain. We conclude that, so long as one carefully selects an appropriate algorithm, and is guided by the astronomical problem at hand, data mining can be very much the powerful tool, and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra figures, some minor additions to the tex

    Culture-free genome-wide locus sequence typing (GLST) provides new perspectives on Trypanosoma cruzi dispersal and infection complexity

    Get PDF
    El análisis del polimorfismo genético es una poderosa herramienta para la vigilancia epidemiológica y investigar. Sin embargo, la inferencia poderosa de la variación genética del patógeno es a menudo restringido por el acceso limitado al ADN objetivo representativo, especialmente en el estudio de especies parásitas obligadas para las cuales el cultivo ex vivo requiere muchos recursos o es propenso a sesgos. Los métodos modernos de captura de secuencias permiten analizar directamente la variación genética de los patógenos del material del huésped/vector, pero a menudo son demasiado complejos y costosos para entornos de escasos recursos donde prevalecen las enfermedades infecciosas. Este estudio propone un método sencillo y rentable Herramienta de tipificación de secuencias de locus de todo el genoma (GLST) basada en la amplificación paralela masiva de puntos críticos de información en todo el genoma del patógeno objetivo. el multiplexado La reacción en cadena de la polimerasa amplifica cientos de objetivos genéticos diferentes definidos por el usuario en un único tubo de reacción y la posterior limpieza basada en gel de agarosa y código de barras completan la preparación de la biblioteca por menos de 4 USD por muestra. Nuestro estudio genera un modelo flexible Flujo de trabajo de diseño de panel de imprimación GLST para Trypanosoma cruzi, el agente parásito de Chagas enfermedad. Aplicamos con éxito nuestro panel GLST de 203 objetivos a extractos nómicos metagénicos directos y sin cultivo de vectores triatominos que contienen un mínimo de 3,69 pg/μl de ADN de T. cruzi y elaborar más sobre el rendimiento del método mediante la secuenciación de bibliotecas GLST de T. cruzi clones de referencia que representan unidades de tipificación discretas (DTU) TcI, TcIII, TcIV, TcV y TcVI. Los 780 sitios SNP que identificamos en el conjunto de muestras distinguen parásitos de forma repetitiva infectar vectores simpátricos y detectar correlaciones entre distancias genéticas y geográficas a escala regional (< 150 km), así como continental. Los marcadores también separan claramente TcI, TcIII, TcIV y TcV + TcVI y parecen distinguir infecciones multiclonales dentro de TcI. Discutimos las ventajas, limitaciones y perspectivas de nuestro método a través de un espectro de la investigación epidemiológica.Analysis of genetic polymorphism is a powerful tool for epidemiological surveillance and research. Powerful inference from pathogen genetic variation, however, is often restrained by limited access to representative target DNA, especially in the study of obli gate parasitic species for which ex vivo culture is resource-intensive or bias-prone. Mod ern sequence capture methods enable pathogen genetic variation to be analyzed directly from host/vector material but are often too complex and expensive for resource-poor set tings where infectious diseases prevail. This study proposes a simple, cost-effective ‘genome-wide locus sequence typing’ (GLST) tool based on massive parallel amplifica tion of information hotspots throughout the target pathogen genome. The multiplexed polymerase chain reaction amplifies hundreds of different, user-defined genetic targets in a single reaction tube, and subsequent agarose gel-based clean-up and barcoding com pletes library preparation at under 4 USD per sample. Our study generates a flexible GLST primer panel design workflow for Trypanosoma cruzi, the parasitic agent of Chagas disease. We successfully apply our 203-target GLST panel to direct, culture-free metage nomic extracts from triatomine vectors containing a minimum of 3.69 pg/μl T. cruzi DNA and further elaborate on method performance by sequencing GLST libraries from T. cruzi reference clones representing discrete typing units (DTUs) TcI, TcIII, TcIV, TcV and TcVI. The 780 SNP sites we identify in the sample set repeatably distinguish parasites infecting sympatric vectors and detect correlations between genetic and geographic dis tances at regional (< 150 km) as well as continental scales. The markers also clearly sep arate TcI, TcIII, TcIV and TcV + TcVI and appear to distinguish multiclonal infections within TcI. We discuss the advantages, limitations and prospects of our method across a spectrum of epidemiological research

    Genomics and spatial surveillance of Chagas disease and American visceral leishmaniasis

    Get PDF
    The Trypanosomatidae are a family of parasitic protozoa that infect various animals and plants. Several species within the Trypanosoma and Leishmania genera also pose a major threat to human health. Among these are Trypanosoma cruzi and Leishmania infantum, aetiological agents of the highly debilitating and often deadly vector-borne zoonoses Chagas disease and American visceral leishmaniasis. Current treatment options are far from safe, only partially effective and rarely available in the impoverished regions of Latin America where these ‘neglected tropical diseases’ prevail. Wider-reaching, sustainable protection against T. cruzi and L. infantum might best be achieved by intercepting key routes of zoonotic transmission, but this prophylactic approach requires a better understanding of how these parasites disperse and evolve at various spatiotemporal scales. This dissertation addresses key questions around trypanosomatid parasite biology and spatial epidemiology based on high-resolution, geo-referenced DNA sequence datasets constructed from disease foci throughout Latin America: Which forms of genetic exchange occur in T. cruzi, and are exchange events frequent enough to significantly alter the distribution of important epidemiological traits? How do demographic histories, for example, the recent invasive expansion of L. infantum into the Americas, impact parasite population structure, and do structural changes pose a threat to public health? Can environmental variables predict parasite dispersal patterns at the landscape scale? Following the first chapter’s review of population genetic and genomic approaches in the study of trypanosomatid diseases in Latin America, Chapter 2 describes how reproductive polymorphism segregates T. cruzi populations in southern Ecuador. The study is the first to clearly demonstrate meiotic sex in this species, for decades thought to exchange genetic material only very rarely, and only by non-Mendelian means. T. cruzi subpopulations from the Ecuadorian study site exhibit all major hallmarks of sexual reproduction, including genome-wide Hardy-Weinberg allele frequencies, rapid decay of linkage disequilibrium with map distance and genealogies that fluctuate among chromosomes. The presence of sex promotes the transfer and transformation of genotypes underlying important epidemiological traits, posing great challenges to disease surveillance and the development of diagnostics and drugs. Chapter 3 demonstrates that mating events are also pivotal to L. infantum population structure in Brazil, where introduction bottlenecks have led to striking genetic discontinuities between sympatric strains. Genetic hybridization occurs genome-wide, including at a recently identified ‘miltefosine sensitivity locus’ that appears to be deleted from the majority of Brazilian L. infantum genomes. The study combines an array of genomic and phenotypic analyses to determine whether rapid population expansion or strong purifying selection has driven this prominent > 12 kb deletion to high abundance across Brazil. Results expose deletion size differences that covary with phylogenetic structure and suggest that deletion-carrying strains do not form a private monophyletic clade. These observations are inconsistent with the hypothesis that the deletion genotype rose to high prevalence simply as the result of a founder effect. Enzymatic assays show that loss of ecto-3’-nucleotidase gene function within the deleted locus is coupled to increased ecto-ATPase activity, raising the possibility that alternative metabolic strategies enhance L. infantum fitness in its introduced range. The study also uses demographic simulation modelling to determine whether L. infantum populations in the Americas have expanded from just one or multiple introduction events. Comparison of observed vs. simulated summary statistics using random forests suggests a single introduction from the Old World, but better spatial sampling coverage is required to rule out other demographic scenarios in a pattern-process modelling approach. Further sampling is also necessary to substantiate signs of convergent selection introduced above. Chapter 4 therefore develops a ‘genome-wide locus sequence typing’ (GLST) tool to summarize parasite genetic polymorphism at a fraction of genomic sequencing cost. Applied directly to the infection source (e.g., vector or host tissue), the method also avoids bias from cell purification and culturing steps typically involved prior to sequencing of trypanosomatid and other obligate parasite genomes. GLST scans genomic pilot data for hundreds of polymorphic sequence fragments whose thermodynamic properties permit simultaneous PCR amplification in a single reaction tube. For proof of principle, GLST is applied to metagenomic DNA extracts from various Chagas disease vector species collected in Colombia, Venezuela, and Ecuador. Epimastigote DNA from several T. cruzi reference clones is also analyzed. The method distinguishes 387 single-nucleotide polymorphisms (SNPs) in T. cruzi sub-lineage TcI and an additional 393 SNPs in non-TcI clones. Genetic distances calculated from these SNPs correlate with geographic distances among samples but also distinguish parasites from triatomines collected at common collection sites. The method thereby appears suitable for agent-based spatio-genetic (simulation) analyses left wanted by Chapter 3 – and further formulated in Chapter 5. The potential to survey parasite genetic diversity abundantly across landscapes compels deeper, more systematic exploration of how environmental variables influence the spread of disease. As environmental context is only marginally considered in the population genetic analyses of Chapters 2 – 4, Chapter 5 proposes a new, spatially explicit modelling framework to predict vector-borne parasite gene flow through heterogeneous environment. In this framework, remotely sensed environmental raster values are re-coded and merged into a composite ‘resistance surface’ that summarizes hypothesized effects of landscape features on parasite transmission among vectors and hosts. Parasite population genetic differentiation is then simulated on this surface and fitted to observed diversity patterns in order to evaluate original hypotheses on how environmental variables modulate parasite gene flow. The chapter thereby makes a maiden step from standard population genetic to ‘landscape genomic’ approaches in understanding the ecology and evolution of vector-borne disease. In summary, this dissertation first demonstrates the power of population genetics and genomics to understand fundamental biological properties of important protist parasites, then identifies areas where analytical tools are missing and creates new technical and conceptual frameworks to help fill these gaps. The general discussion (Chapter 6) also outlines several follow-up projects on the key finding of meiotic genetic signatures in T. cruzi. Exploiting recently developed T. cruzi genome-editing systems for the detection of meiotic gene expression and heterozygosis will help understand why and in which life cycle stage some parasite populations use sex and others do not. Long-read sequencing of parental and recombinant genomes will help understand the extent to which sex is diversifying T. cruzi phenotypes, especially virulence and drug resistance properties conferred by surface molecules with repetitive genetic bases intractable to short-read analysis. Chapter 6 also provides follow-up plans for all other research chapters. Emphasis is placed on advancing the complementarity, transferability and public health benefit of the many different methods and concepts employed in this work

    2011 Conference Abstracts: Annual Undergraduate Research Conference at the Interface of Biology and Mathematics

    Get PDF
    Abstract book for the Third Annual Undergraduate Research Conference at the Interface of Biology and Mathematics Date: October 21-22, 2011Plenary speaker: J. Carl Panetta, Department of Pharmaceutical Sciences, St. Jude Children\u27s Research HospitalFeatured Speaker: John Jungck, Mead Chair of the Sciences and Professor of Biology, Beloit Colleg
    corecore