19 research outputs found

    Discovery of Novel Human Breast Cancer MicroRNAs from Deep Sequencing Data by Analysis of Pri-MicroRNA Secondary Structures

    Get PDF
    MicroRNAs (miRNAs) are key regulators of gene expression and contribute to a variety of biological processes. Abnormal miRNA expression has been reported in various diseases including pathophysiology of breast cancer, where they regulate protumorigenic processes including vascular invasiveness, estrogen receptor status, chemotherapy resistance, invasion and metastasis. The miRBase sequence database, a public repository for newly discovered miRNAs, has grown rapidly with approximately >10,000 entries to date. Despite this rapid growth, many miRNAs have not yet been validated, and several others are yet to be identified. A lack of a full complement of miRNAs has imposed limitations on recognizing their important roles in cancer, including breast cancer. Using deep sequencing technology, we have identified 189 candidate novel microRNAs in human breast cancer cell lines with diverse tumorigenic potential. We further show that analysis of 500-nucleotide pri-microRNA secondary structure constitutes a reliable method to predict bona fide miRNAs as judged by experimental validation. Candidate novel breast cancer miRNAs with stem lengths of greater than 30 bp resulted in the generation of precursor and mature sequences in vivo. On the other hand, candidates with stem length less than 30 bp were less efficient in producing mature miRNA. This approach may be used to predict which candidate novel miRNA would qualify as bona fide miRNAs from deep sequencing data with approximately 90% accuracy

    Detailed Analysis of a Contiguous 22-Mb Region of the Maize Genome

    Get PDF
    Most of our understanding of plant genome structure and evolution has come from the careful annotation of small (e.g., 100 kb) sequenced genomic regions or from automated annotation of complete genome sequences. Here, we sequenced and carefully annotated a contiguous 22 Mb region of maize chromosome 4 using an improved pseudomolecule for annotation. The sequence segment was comprehensively ordered, oriented, and confirmed using the maize optical map. Nearly 84% of the sequence is composed of transposable elements (TEs) that are mostly nested within each other, of which most families are low-copy. We identified 544 gene models using multiple levels of evidence, as well as five miRNA genes. Gene fragments, many captured by TEs, are prevalent within this region. Elimination of gene redundancy from a tetraploid maize ancestor that originated a few million years ago is responsible in this region for most disruptions of synteny with sorghum and rice. Consistent with other sub-genomic analyses in maize, small RNA mapping showed that many small RNAs match TEs and that most TEs match small RNAs. These results, performed on ∼1% of the maize genome, demonstrate the feasibility of refining the B73 RefGen_v1 genome assembly by incorporating optical map, high-resolution genetic map, and comparative genomic data sets. Such improvements, along with those of gene and repeat annotation, will serve to promote future functional genomic and phylogenomic research in maize and other grasses

    Population‐based identity‐by‐descent mapping combined with exome sequencing to detect rare risk variants for schizophrenia

    No full text
    Genome‐wide association studies (GWASs) are highly effective at identifying common risk variants for schizophrenia. Rare risk variants are also important contributors to schizophrenia etiology but, with the exception of large copy number variants, are difficult to detect with GWAS. Exome and genome sequencing, which have accelerated the study of rare variants, are expensive so alternative methods are needed to aid detection of rare variants. Here we re‐analyze an Irish schizophrenia GWAS dataset (n = 3,473) by performing identity‐by‐descent (IBD) mapping followed by exome sequencing of individuals identified as sharing risk haplotypes to search for rare risk variants in coding regions. We identified 45 rare haplotypes (>1 cM) that were significantly more common in cases than controls. By exome sequencing 105 haplotype carriers, we investigated these haplotypes for functional coding variants that could be tested for association in independent GWAS samples. We identified one rare missense variant in PCNT but did not find statistical support for an association with schizophrenia in a replication analysis. However, IBD mapping can prioritize both individual samples and genomic regions for follow‐up analysis but genome rather than exome sequencing may be more effective at detecting risk variants on rare haplotypes.National Institutes of Health, Grant/Award Numbers: R01‐MH041953, R01‐MH083094; Science Foundation Ireland, Grant/Award Numbers: 08/IN.1/B1916, 12/IP/1359, 12/IP/1670; Wellcome Trust, Grant/Award Number: 085475/B/08/

    Familial long-read sequencing increases yield of de novo mutations

    No full text
    Studies of de novo mutation (DNM) have typically excluded some of the most repetitive and complex regions of the genome because these regions cannot be unambiguously mapped with short-read sequencing data. To better understand the genome-wide pattern of DNM, we generated long-read sequence data from an autism parent-child quad with an affected female where no pathogenic variant had been discovered in short-read Illumina sequence data. We deeply sequenced all four individuals by using three sequencing platforms (Illumina, Oxford Nanopore, and Pacific Biosciences) and three complementary technologies (Strand-seq, optical mapping, and 10X Genomics). Using long-read sequencing, we initially discovered and validated 171 DNMs across two children-a 20% increase in the number of de novo single-nucleotide variants (SNVs) and indels when compared to short-read callsets. The number of DNMs further increased by 5% when considering a more complete human reference (T2T-CHM13) because of the recovery of events in regions absent from GRCh38 (e.g., three DNMs in heterochromatic satellites). In total, we validated 195 de novo germline mutations and 23 potential post-zygotic mosaic mutations across both children; the overall true substitution rate based on this integrated callset is at least 1.41 × 10-8 substitutions per nucleotide per generation. We also identified six de novo insertions and deletions in tandem repeats, two of which represent structural variants. We demonstrate that long-read sequencing and assembly, especially when combined with a more complete reference genome, increases the number of DNMs by >25% compared to previous studies, providing a more complete catalog of DNM compared to short-read data alone

    Temperature-Dependent Development, Cold Tolerance, and Potential Distribution of Cricotopus lebetis (Diptera: Chironomidae), a Tip Miner of Hydrilla verticillata (Hydrocharitaceae)

    Get PDF
    © The Author 2014. Published by Oxford University Press on behalf of the Entomological Society of America. A chironomid midge, Cricotopus lebetis (Sublette) (Diptera: Chironomidae), was discovered attacking the apical meristems of Hydrilla verticillata (L.f. Royle) in Crystal River, Citrus Co., Florida in 1992. The larvae mine the stems of H. verticillata and cause basal branching and stunting of the plant. Temperature-dependent development, cold tolerance, and the potential distribution of the midge were investigated. The results of the temperature-dependent development study showed that optimal temperatures for larval development were between 20 and 30°C, and these data were used to construct a map of the potential number of generations per year of C. lebetis in Florida. Data from the cold tolerance study, in conjunction with historical weather data, were used to generate a predicted distribution of C. lebetis in the United States. A distribution was also predicted using an ecological niche modeling approach by characterizing the climate at locations where C. lebetis is known to occur and then finding other locations with similar climate. The distributions predicted using the two modeling approaches were not significantly different and suggested that much of the southeastern United States was climatically suitable for C. lebetis
    corecore