602 research outputs found

    Faster algorithms for 1-mappability of a sequence

    Full text link
    In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). We present two algorithms that require worst-case time O(mn) and O(n log^2 n), respectively, and space O(n), thus greatly improving the state of the art. Moreover, we present an algorithm that requires average-case time and space O(n) for integer alphabets if m = {\Omega}(log n/ log {\sigma}), where {\sigma} is the alphabet size

    SEAL: a distributed short read mapping and duplicate removal tool

    Get PDF
    Summary: SEAL is a scalable tool for short read pair mapping and duplicate removal. It computes mappings that are consistent with those produced by BWA and removes duplicates according to the same criteria employed by Picard MarkDuplicates. On a 16-node Hadoop cluster, it is capable of processing about 13 GB per hour in map+rmdup mode, while reaching a throughput of 19 GB per hour in mapping-only mode

    Democratically engaged assessment: Reimagining the purposes and practices of assessment in community engagement

    Get PDF
    This document is a project of reclamation and transformation, one that is both ongoing and rooted in years of dialogue within Imagining America and the work of its Assessing Practices of Public Scholarship research group (APPS). It emerges from our own experiences with assessment related to community engagement and from those of many other colleagues on campuses and in diverse communities. It is intended to bring together those who wish to reimagine assessment in light of its civic potential — to develop what we refer to as Democratically Engaged Assessment (DEA).Imagining Americ

    A Border-friendly, Non-overlay Mechanism for Inter-domain QoS Support in the Internet

    Get PDF
    Many services provided over the Internet, like voice over IP and video on demand, increase the demand for assurances concerning the quality of the underlying network. A score of techniques for assurance of quality of service (QoS) have been devised for use within administrative domains. However, when paths cross the border of autonomous systems, assurance of end-to-end QoS remains an unsolved issue. Thereby the key challenge is the establishment of connection-oriented communication flows. We introduce a technique to establish ISO/OSI Layer 3 multi-domain communication paths. The proposed solution does not stress border-routers and is independent of domain-internal policies, while relying on the common forwarding mechanisms

    Enriching rare variants using family-specific linkage information

    Get PDF
    Genome-wide association studies have been successful in identifying common variants for common complex traits in recent years. However, common variants have generally failed to explain substantial proportions of the trait heritabilities. Rare variants, structural variations, and gene-gene and gene-environment interactions, among others, have been suggested as potential sources of the so-called missing heritability. With the advent of exome-wide and whole-genome next-generation sequencing technologies, finding rare variants in functionally important sites (e.g., protein-coding regions) becomes feasible. We investigate the role of linkage information to select families enriched for rare variants using the simulated Genetic Analysis Workshop 17 data. In each replicate of simulated phenotypes Q1 and Q2 on 697 subjects in 8 extended pedigrees, we select one pedigree with the largest family-specific LOD score. Across all 200 replications, we compare the probability that rare causal alleles will be carried in the selected pedigree versus a randomly chosen pedigree. One example of successful enrichment was exhibited for gene VEGFC. The causal variant had minor allele frequency of 0.0717% in the simulated unrelated individuals and explained about 0.1% of the phenotypic variance. However, it explained 7.9% of the phenotypic variance in the eight simulated pedigrees and 23.8% in the family that carried the minor allele. The carrier’s family was selected in all 200 replications. Thus our results show that family-specific linkage information is useful for selecting families for sequencing, thus ensuring that rare functional variants are segregating in the sequencing samples

    Midgut microbiota of the malaria mosquito vector Anopheles gambiae and Interactions with plasmodium falciparum Infection

    Get PDF
    The susceptibility of Anopheles mosquitoes to Plasmodium infections relies on complex interactions between the insect vector and the malaria parasite. A number of studies have shown that the mosquito innate immune responses play an important role in controlling the malaria infection and that the strength of parasite clearance is under genetic control, but little is known about the influence of environmental factors on the transmission success. We present here evidence that the composition of the vector gut microbiota is one of the major components that determine the outcome of mosquito infections. A. gambiae mosquitoes collected in natural breeding sites from Cameroon were experimentally challenged with a wild P. falciparum isolate, and their gut bacterial content was submitted for pyrosequencing analysis. The meta-taxogenomic approach revealed a broader richness of the midgut bacterial flora than previously described. Unexpectedly, the majority of bacterial species were found in only a small proportion of mosquitoes, and only 20 genera were shared by 80% of individuals. We show that observed differences in gut bacterial flora of adult mosquitoes is a result of breeding in distinct sites, suggesting that the native aquatic source where larvae were grown determines the composition of the midgut microbiota. Importantly, the abundance of Enterobacteriaceae in the mosquito midgut correlates significantly with the Plasmodium infection status. This striking relationship highlights the role of natural gut environment in parasite transmission. Deciphering microbe-pathogen interactions offers new perspectives to control disease transmission.Institut de Recherche pour le Developpement (IRD); French Agence Nationale pour la Recherche [ANR-11-BSV7-009-01]; European Community [242095, 223601]info:eu-repo/semantics/publishedVersio

    Next-generation sequencing of common osteogenesis imperfecta-related genes in clinical practice

    Get PDF
    Next generation sequencing (NGS) is a rapidly developing area in genetics. Utilizing this technology in the management of disorders with complex genetic background and not recurrent mutation hot spots can be extremely useful. In this study, we applied NGS, namely semiconductor sequencing to determine the most significant osteogenesis imperfecta-related genetic variants in the clinical practice. We selected genes coding collagen type I alpha-1 and-2 (COL1A1, COL1A2) which are responsible for more than 90% of all cases. CRTAP and LEPRE1/P3H1 genes involved in the background of the recessive forms with relatively high frequency (type VII and VIII) represent less than 10% of the disease. In our six patients (1-41 years), we identified 23 different variants. We found a total of 14 single nucleotide variants (SNV) in COL1A1 and COL1A2, 5 in CRTAP and 4 in LEPRE1. Two novel and two already well-established pathogenic SNVs have been identified. Among the newly recognized mutations, one results in an amino acid change and one of them is a stop codon. We have shown that a new full-scale cost-effective NGS method can be developed and utilized to supplement diagnostic process of osteogenesis imperfecta with molecular genetic data in clinical practice

    Chromosome Size in Diploid Eukaryotic Species Centers on the Average Length with a Conserved Boundary

    Get PDF
    Understanding genome and chromosome evolution is important for understanding genetic inheritance and evolution. Universal events comprising DNA replication, transcription, repair, mobile genetic element transposition, chromosome rearrangements, mitosis, and meiosis underlie inheritance and variation of living organisms. Although the genome of a species as a whole is important, chromosomes are the basic units subjected to genetic events that coin evolution to a large extent. Now many complete genome sequences are available, we can address evolution and variation of individual chromosomes across species. For example, “How are the repeat and nonrepeat proportions of genetic codes distributed among different chromosomes in a multichromosome species?” “Is there a general rule behind the intuitive observation that chromosome lengths tend to be similar in a species, and if so, can we generalize any findings in chromosome content and size across different taxonomic groups?” Here, we show that chromosomes within a species do not show dramatic fluctuation in their content of mobile genetic elements as the proliferation of these elements increases from unicellular eukaryotes to vertebrates. Furthermore, we demonstrate that, notwithstanding the remarkable plasticity, there is an upper limit to chromosome-size variation in diploid eukaryotes with linear chromosomes. Strikingly, variation in chromosome size for 886 chromosomes in 68 eukaryotic genomes (including 22 human autosomes) can be viably captured by a single model, which predicts that the vast majority of the chromosomes in a species are expected to have a base pair length between 0.4035 and 1.8626 times the average chromosome length. This conserved boundary of chromosome-size variation, which prevails across a wide taxonomic range with few exceptions, indicates that cellular, molecular, and evolutionary mechanisms, possibly together, confine the chromosome lengths around a species-specific average chromosome length
    corecore