770 research outputs found

    Faster algorithms for 1-mappability of a sequence

    Full text link
    In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are at Hamming distance at most k from y. We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). We present two algorithms that require worst-case time O(mn) and O(n log^2 n), respectively, and space O(n), thus greatly improving the state of the art. Moreover, we present an algorithm that requires average-case time and space O(n) for integer alphabets if m = {\Omega}(log n/ log {\sigma}), where {\sigma} is the alphabet size

    Transkingdom Networks: A Systems Biology Approach to Identify Causal Members of Host-Microbiota Interactions

    Full text link
    Improvements in sequencing technologies and reduced experimental costs have resulted in a vast number of studies generating high-throughput data. Although the number of methods to analyze these "omics" data has also increased, computational complexity and lack of documentation hinder researchers from analyzing their high-throughput data to its true potential. In this chapter we detail our data-driven, transkingdom network (TransNet) analysis protocol to integrate and interrogate multi-omics data. This systems biology approach has allowed us to successfully identify important causal relationships between different taxonomic kingdoms (e.g. mammals and microbes) using diverse types of data

    Midgut microbiota of the malaria mosquito vector Anopheles gambiae and Interactions with plasmodium falciparum Infection

    Get PDF
    The susceptibility of Anopheles mosquitoes to Plasmodium infections relies on complex interactions between the insect vector and the malaria parasite. A number of studies have shown that the mosquito innate immune responses play an important role in controlling the malaria infection and that the strength of parasite clearance is under genetic control, but little is known about the influence of environmental factors on the transmission success. We present here evidence that the composition of the vector gut microbiota is one of the major components that determine the outcome of mosquito infections. A. gambiae mosquitoes collected in natural breeding sites from Cameroon were experimentally challenged with a wild P. falciparum isolate, and their gut bacterial content was submitted for pyrosequencing analysis. The meta-taxogenomic approach revealed a broader richness of the midgut bacterial flora than previously described. Unexpectedly, the majority of bacterial species were found in only a small proportion of mosquitoes, and only 20 genera were shared by 80% of individuals. We show that observed differences in gut bacterial flora of adult mosquitoes is a result of breeding in distinct sites, suggesting that the native aquatic source where larvae were grown determines the composition of the midgut microbiota. Importantly, the abundance of Enterobacteriaceae in the mosquito midgut correlates significantly with the Plasmodium infection status. This striking relationship highlights the role of natural gut environment in parasite transmission. Deciphering microbe-pathogen interactions offers new perspectives to control disease transmission.Institut de Recherche pour le Developpement (IRD); French Agence Nationale pour la Recherche [ANR-11-BSV7-009-01]; European Community [242095, 223601]info:eu-repo/semantics/publishedVersio

    Investigation into the annotation of protocol sequencing steps in the sequence read archive

    Get PDF
    BACKGROUND: The workflow for the production of high-throughput sequencing data from nucleic acid samples is complex. There are a series of protocol steps to be followed in the preparation of samples for next-generation sequencing. The quantification of bias in a number of protocol steps, namely DNA fractionation, blunting, phosphorylation, adapter ligation and library enrichment, remains to be determined. RESULTS: We examined the experimental metadata of the public repository Sequence Read Archive (SRA) in order to ascertain the level of annotation of important sequencing steps in submissions to the database. Using SQL relational database queries (using the SRAdb SQLite database generated by the Bioconductor consortium) to search for keywords commonly occurring in key preparatory protocol steps partitioned over studies, we found that 7.10%, 5.84% and 7.57% of all records (fragmentation, ligation and enrichment, respectively), had at least one keyword corresponding to one of the three protocol steps. Only 4.06% of all records, partitioned over studies, had keywords for all three steps in the protocol (5.58% of all SRA records). CONCLUSIONS: The current level of annotation in the SRA inhibits systematic studies of bias due to these protocol steps. Downstream from this, meta-analyses and comparative studies based on these data will have a source of bias that cannot be quantified at present

    Discovery of Molecular Markers to Discriminate Corneal Endothelial Cells in the Human Body

    Get PDF
    The corneal endothelium is a monolayer of hexagonal corneal endothelial cells (CECs) on the inner surface of the cornea. CECs are critical in maintaining corneal transparency through their barrier and pump functions. CECs in vivo have a limited capacity in proliferation, and loss of a significant number of CECs results in corneal edema called bullous keratopathy which can lead to severe visual loss. Corneal transplantation is the most effective method to treat corneal endothelial dysfunction, where it suffers from donor shortage. Therefore, regeneration of CECs from other cell types attracts increasing interests, and specific markers of CECs are crucial to identify actual CECs. However, the currently used markers are far from satisfactory because of their non-specific expression in other cell types. Here, we explored molecular markers to discriminate CECs from other cell types in the human body by integrating the published RNA-seq data of CECs and the FANTOM5 atlas representing diverse range of cell types based on expression patterns. We identified five genes, CLRN1, MRGPRX3, HTR1D, GRIP1 and ZP4 as novel markers of CECs, and the specificities of these genes were successfully confirmed by independent experiments at both the RNA and protein levels. Notably none of them have been documented in the context of CEC function. These markers could be useful for the purification of actual CECs, and also available for the evaluation of the products derived from other cell types. Our results demonstrate an effective approach to identify molecular markers for CECs and open the door for the regeneration of CECs in vitro

    PathogenFinder - Distinguishing Friend from Foe Using Bacterial Whole Genome Sequence Data.

    Get PDF
    Although the majority of bacteria are harmless or even beneficial to their host, others are highly virulent and can cause serious diseases, and even death. Due to the constantly decreasing cost of high-throughput sequencing there are now many completely sequenced genomes available from both human pathogenic and innocuous strains. The data can be used to identify gene families that correlate with pathogenicity and to develop tools to predict the pathogenicity of newly sequenced strains, investigations that previously were mainly done by means of more expensive and time consuming experimental approaches. We describe PathogenFinde

    New challenges for BRCA testing:a view from the diagnostic laboratory

    Get PDF
    Increased demand for BRCA testing is placing pressures on diagnostic laboratories to raise their mutation screening capacity and handle the challenges associated with classifying BRCA sequence variants for clinical significance, for example interpretation of pathogenic mutations or variants of unknown significance, accurate determination of large genomic rearrangements and detection of somatic mutations in DNA extracted from formalin-fixed, paraffin-embedded tumour samples. Many diagnostic laboratories are adopting next-generation sequencing (NGS) technology to increase their screening capacity and reduce processing time and unit costs. However, migration to NGS introduces complexities arising from choice of components of the BRCA testing workflow, such as NGS platform, enrichment method and bioinformatics analysis process. An efficient, cost-effective accurate mutation detection strategy and a standardised, systematic approach to the reporting of BRCA test results is imperative for diagnostic laboratories. This review covers the challenges of BRCA testing from the perspective of a diagnostics laboratory

    Improved annotation with <i>de novo</i> transcriptome assembly in four social amoeba species

    Get PDF
    Background: Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species. Results: An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum. Conclusions: In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects

    Evaluation of next-generation sequencing software in mapping and assembly

    Get PDF
    Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.published_or_final_versio
    corecore