3,284 research outputs found

    SNPredict: A Machine Learning Approach for Detecting Low Frequency Variants in Cancer

    Get PDF
    Cancer is a genetic disease caused by the accumulation of DNA variants such as single nucleotide changes or insertions/deletions in DNA. DNA variants can cause silencing of tumor suppressor genes or increase the activity of oncogenes. In order to come up with successful therapies for cancer patients, these DNA variants need to be identified accurately. DNA variants can be identified by comparing DNA sequence of tumor tissue to a non-tumor tissue by using Next Generation Sequencing (NGS) technology. But the problem of detecting variants in cancer is hard because many of these variant occurs only in a small subpopulation of the tumor tissue. It becomes a challenge to distinguish these low frequency variants from sequencing errors, which are common in today\u27s NGS methods. Several algorithms have been made and implemented as a tool to identify such variants in cancer. However, it has been previously shown that there is low concordance in the results produced by these tools. Moreover, the number of false positives tend to significantly increase when these tools are faced with low frequency variants. This study presents SNPredict, a single nucleotide polymorphism (SNP) detection pipeline that aims to utilize the results of multiple variant callers to produce a consensus output with higher accuracy than any of the individual tool with the help of machine learning techniques. By extracting features from the consensus output that describe traits associated with an individual variant call, it creates binary classifiers that predict a SNP’s true state and therefore help in distinguishing a sequencing error from a true variant

    Multidimensional chemical control of CRISPR–Cas9

    Get PDF
    Cas9-based technologies have transformed genome engineering and the interrogation of genomic functions, but methods to control such technologies across numerous dimensions-including dose, time, specificity, and mutually exclusive modulation of multiple genes-are still lacking. We conferred such multidimensional controls to diverse Cas9 systems by leveraging small-molecule-regulated protein degron domains. Application of our strategy to both Cas9-mediated genome editing and transcriptional activities opens new avenues for systematic genome interrogation

    CRISPR-TSKO : a technique for efficient mutagenesis in specific cell types, tissues, or organs in Arabidopsis

    Get PDF
    Detailed functional analyses of many fundamentally important plant genes via conventional loss-of-function approaches are impeded by the severe pleiotropic phenotypes resulting from these losses. In particular, mutations in genes that are required for basic cellular functions and/or reproduction often interfere with the generation of homozygous mutant plants, precluding further functional studies. To overcome this limitation, we devised a clustered regularly interspaced short palindromic repeats (CRISPR)-based tissue-specific knockout system, CRISPR-TSKO, enabling the generation of somatic mutations in particular plant cell types, tissues, and organs. In Arabidopsis (Arabidopsis thaliana), CRISPR-TSKO mutations in essential genes caused well-defined, localized phenotypes in the root cap, stomatal lineage, or entire lateral roots. The modular cloning system developed in this study allows for the efficient selection, identification, and functional analysis of mutant lines directly in the first transgenic generation. The efficacy of CRISPR-TSKO opens avenues for discovering and analyzing gene functions in the spatial and temporal contexts of plant life while avoiding the pleiotropic effects of system-wide losses of gene function

    Evolution of two distinct phylogenetic lineages of the emerging human pathogen Mycobacterium ulcerans

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparative genomics has greatly improved our understanding of the evolution of pathogenic mycobacteria such as <it>Mycobacterium tuberculosis</it>. Here we have used data from a genome microarray analysis to explore insertion-deletion (InDel) polymorphism among a diverse strain collection of <it>Mycobacterium ulcerans</it>, the causative agent of the devastating skin disease, Buruli ulcer. Detailed analysis of large sequence polymorphisms in twelve regions of difference (RDs), comprising irreversible genetic markers, enabled us to refine the phylogenetic succession within <it>M. ulcerans</it>, to define features of a hypothetical <it>M. ulcerans </it>most recent common ancestor and to confirm its origin from <it>Mycobacterium marinum</it>.</p> <p>Results</p> <p><it> M. ulcerans </it>has evolved into five InDel haplotypes that separate into two distinct lineages: (i) the "classical" lineage including the most pathogenic genotypes – those that come from Africa, Australia and South East Asia; and (ii) an "ancestral" <it>M. ulcerans </it>lineage comprising strains from Asia (China/Japan), South America and Mexico. The ancestral lineage is genetically closer to the progenitor <it>M. marinum </it>in both RD composition and DNA sequence identity, whereas the classical lineage has undergone major genomic rearrangements.</p> <p>Conclusion</p> <p>Results of the InDel analysis are in complete accord with recent multi-locus sequence analysis and indicate that <it>M. ulcerans </it>has passed through at least two major evolutionary bottlenecks since divergence from <it>M. marinum</it>. The classical lineage shows more pronounced reductive evolution than the ancestral lineage, suggesting that there may be differences in the ecology between the two lineages. These findings improve the understanding of the adaptive evolution and virulence of <it>M. ulcerans </it>and pathogenic mycobacteria in general and will facilitate the development of new tools for improved diagnostics and molecular epidemiology.</p

    Development of a RAD-Seq Based DNA Polymorphism Identification Software, AgroMarker Finder, and Its Application in Rice Marker-Assisted Breeding

    Get PDF
    Abstract Rapid and accurate genome-wide marker detection is essential to the marker-assisted breeding and functional genomics studies. In this work, we developed an integrated software, AgroMarker Finder (AMF: http://erp.novelbio.com/AMF), for providing graphical user interface (GUI) to facilitate the recently developed restriction-site associated DNA (RAD) sequencing data analysis in rice. By application of AMF, a total of 90,743 high-quality markers (82,878 SNPs and 7,865 InDels) were detected between rice varieties JP69 and Jiaoyuan5A. The density of the identified markers is 0.2 per Kb for SNP markers, and 0.02 per Kb for InDel markers. Sequencing validation revealed that the accuracy of genome-wide marker detection by AMF is 93%. In addition, a validated subset of 82 SNPs and 31 InDels were found to be closely linked to 117 important agronomic trait genes, providing a basis for subsequent marker-assisted selection (MAS) and variety identification. Furthermore, we selected 12 markers from 31 validated InDel markers to identify seed authenticity of variety Jiaoyuanyou69, and we also identified 10 markers closely linked to the fragrant gene BADH2 to minimize linkage drag for Wuxiang075 (BADH2 donor)/Jiachang1 recombinants selection. Therefore, this software provides an efficient approach for marker identification from RAD-seq data, and it would be a valuable tool for plant MAS and variety protection

    Use of tiling array data and RNA secondary structure predictions to identify noncoding RNA genes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Within the last decade a large number of noncoding RNA genes have been identified, but this may only be the tip of the iceberg. Using comparative genomics a large number of sequences that have signals concordant with conserved RNA secondary structures have been discovered in the human genome. Moreover, genome wide transcription profiling with tiling arrays indicate that the majority of the genome is transcribed.</p> <p>Results</p> <p>We have combined tiling array data with genome wide structural RNA predictions to search for novel noncoding and structural RNA genes that are expressed in the human neuroblastoma cell line SK-N-AS. Using this strategy, we identify thousands of human candidate RNA genes. To further verify the expression of these genes, we focused on candidate genes that had a stable hairpin structures or a high level of covariance. Using northern blotting, we verify the expression of 2 out of 3 of the hairpin structures and 3 out of 9 high covariance structures in SK-N-AS cells.</p> <p>Conclusion</p> <p>Our results demonstrate that many human noncoding, structured and conserved RNA genes remain to be discovered and that tissue specific tiling array data can be used in combination with computational predictions of sequences encoding structural RNAs to improve the search for such genes.</p

    Gene Expression Profiles Deciphering Rice Phenotypic Variation between Nipponbare (Japonica) and 93-11 (Indica) during Oxidative Stress

    Get PDF
    Rice is a very important food staple that feeds more than half the world's population. Two major Asian cultivated rice (Oryza sativa L.) subspecies, japonica and indica, show significant phenotypic variation in their stress responses. However, the molecular mechanisms underlying this phenotypic variation are still largely unknown. A common link among different stresses is that they produce an oxidative burst and result in an increase of reactive oxygen species (ROS). In this study, methyl viologen (MV) as a ROS agent was applied to investigate the rice oxidative stress response. We observed that 93-11 (indica) seedlings exhibited leaf senescence with severe lesions under MV treatment compared to Nipponbare (japonica). Whole-genome microarray experiments were conducted, and 1,062 probe sets were identified with gene expression level polymorphisms between the two rice cultivars in addition to differential expression under MV treatment, which were assigned as Core Intersectional Probesets (CIPs). These CIPs were analyzed by gene ontology (GO) and highlighted with enrichment GO terms related to toxin and oxidative stress responses as well as other responses. These GO term-enriched genes of the CIPs include glutathine S-transferases (GSTs), P450, plant defense genes, and secondary metabolism related genes such as chalcone synthase (CHS). Further insertion/deletion (InDel) and regulatory element analyses for these identified CIPs suggested that there may be some eQTL hotspots related to oxidative stress in the rice genome, such as GST genes encoded on chromosome 10. In addition, we identified a group of marker genes individuating the japonica and indica subspecies. In summary, we developed a new strategy combining biological experiments and data mining to study the possible molecular mechanism of phenotypic variation during oxidative stress between Nipponbare and 93-11. This study will aid in the analysis of the molecular basis of quantitative traits
    • …
    corecore