23 research outputs found

    NLP@VCU: Crop Characteristic Extraction Framework

    Get PDF
    We developed a crop characteristic extraction framework. Starting from a custom SpaCy named entity recognition model, we added pre-trained word embeddings and a part-of-speech based entity expansion post-processing step. Then, we implemented an evaluation framework that functioned as a 5-fold cross validation wrapper for SpaCy custom training. Preliminary results showed improvement in the extraction framework after these additions.https://scholarscompass.vcu.edu/reu/1006/thumbnail.jp

    An accessible proteogenomics informatics resource for cancer researchers

    Get PDF
    Proteogenomics has emerged as a valuable approach in cancer research, which integrates genomic and transcriptomic data with mass spectrometry–based proteomics data to directly identify expressed, variant protein sequences that may have functional roles in cancer. This approach is computationally intensive, requiring integration of disparate software tools into sophisticated workflows, challenging its adoption by nonexpert, bench scientists. To address this need, we have developed an extensible, Galaxy-based resource aimed at providing more researchers access to, and training in, proteogenomic informatics. Our resource brings together software from several leading research groups to address two foundational aspects of proteogenomics: (i) generation of customized, annotated protein sequence databases from RNA-Seq data; and (ii) accurate matching of tandem mass spectrometry data to putative variants, followed by filtering to confirm their novelty. Directions for accessing software tools and workflows, along with instructional documentation, can be found at z.umn.edu/canresgithub.publishedVersio

    Quantitative Proteomics Reveals Myosin and Actin as Promising Saliva Biomarkers for Distinguishing Pre-Malignant and Malignant Oral Lesions

    Get PDF
    Oral cancer survival rates increase significantly when it is detected and treated early. Unfortunately, clinicians now lack tests which easily and reliably distinguish pre-malignant oral lesions from those already transitioned to malignancy. A test for proteins, ones found in non-invasively-collected whole saliva and whose abundances distinguish these lesion types, would meet this critical need.To discover such proteins, in a first-of-its-kind study we used advanced mass spectrometry-based quantitative proteomics analysis of the pooled soluble fraction of whole saliva from four subjects with pre-malignant lesions and four with malignant lesions. We prioritized candidate biomarkers via bioinformatics and validated selected proteins by western blotting. Bioinformatic analysis of differentially abundant proteins and initial western blotting revealed increased abundance of myosin and actin in patients with malignant lesions. We validated those results by additional western blotting of individual whole saliva samples from twelve other subjects with pre-malignant oral lesions and twelve with malignant oral lesions. Sensitivity/specificity values for distinguishing between different lesion types were 100%/75% (p = 0.002) for actin, and 67%/83% (p<0.00001) for myosin in soluble saliva. Exfoliated epithelial cells from subjects' saliva also showed increased myosin and actin abundance in those with malignant lesions, linking our observations in soluble saliva to abundance differences between pre-malignant and malignant cells.Salivary actin and myosin abundances distinguish oral lesion types with sensitivity and specificity rivaling other non-invasive oral cancer tests. Our findings provide a promising starting point for the development of non-invasive and inexpensive salivary tests to reliably detect oral cancer early

    Clinical Validation of Targeted Next-Generation Sequencing for Inherited Disorders

    No full text
    Context.-Although next-generation sequencing (NGS) can revolutionize molecular diagnostics, several hurdles remain in the implementation of this technology in clinical laboratories. Objectives.-To validate and implement an NGS panel for genetic diagnosis of more than 100 inherited diseases, such as neurologic conditions, congenital hearing loss and eye disorders, developmental disorders, nonmalignant diseases treated by hematopoietic cell transplantation, familial cancers, connective tissue disorders, metabolic disorders, disorders of sexual development, and cardiac disorders. The diagnostic gene panels ranged from 1 to 54 genes with most of panels containing 10 genes or fewer. Design.-We used a liquid hybridization-based, target-enrichment strategy to enrich 10 067 exons in 568 genes, followed by NGS with a HiSeq 2000 sequencing system (Illumina, San Diego, California). Results.-We successfully sequenced 97.6% (9825 of 10 067) of the targeted exons to obtain a minimum coverage of 203 at all bases. We demonstrated 100% concordance in detecting 19 pathogenic single-nucleotide variations and 11 pathogenic insertion-deletion mutations ranging in size from 1 to 18 base pairs across 18 samples that were previously characterized by Sanger sequencing. Using 4 pairs of blinded, duplicate samples, we demonstrated a high degree of concordance (>99%) among the blinded, duplicate pairs. Conclusions.-We have successfully demonstrated the feasibility of using the NGS platform to multiplex genetic tests for several rare diseases and the use of cloud computing for bioinformatics analysis as a relatively low-cost solution for implementing NGS in clinical laboratories
    corecore