27 research outputs found

    Knowledge-Based Analysis of Genomic Expression Data by Using Different Machine Learning Algorithms for the Purpose of Diagnostic, Prognostic or Therapeutic Application

    Get PDF
    With more and more biological information generated, the most pressing task of bioinformatics has become to analyze and interpret various types of data, including nucleotide and amino acid sequences, protein structures, gene expression profiling and so on. In this dissertation, we apply the data mining techniques of feature generation, feature selection, and feature integration with learning algorithms to tackle the problems of disease phenotype classification, clinical outcome and patient survival prediction from gene expression profiles. We analyzed the effect of batch noise in microarray data on the performance of classification. Batchmatch, a batch adjusting algorithm based on double scaling method is advantageous over Combat, another batch correcting algorithm based on the empirical bayes frame work. In order to identify genes associated with disease phenotype classification or patient survival prediction from gene expression data, we compared and analyzed the performance of five feature selection algorithms. Our observations from these studies indicated that Gainratio algorithm performs better and more consistently over the other algorithms studied. When it comes to performance metric to choose the best classifiers, MCC gives unbiased performance results over accuracy in some endpoints, where class imbalance is more. In the aspect of classification algorithms, no single algorithm is absolutely superior to all others, though SVM achieved fairly good results in most endpoints. Naive bayes algorithm also performed well in some endpoints. Overall, from the total 65 models we reported (5 top models for 13 end points) SVM and SMO (a variant of SVM) dominate mostly, also the linear kernel performed well over RBF in our binary classifications

    RiboaptDB: A Comprehensive Database of Ribozymes and Aptamers

    Get PDF
    BACKGROUND: Catalytic RNA molecules are called ribozymes. The aptamers are DNA or RNA molecules that have been selected from vast populations of random sequences, through a combinatorial approach known as SELEX. The selected oligo-nucleotide sequences (~200 bp in length) have the ability to recognize a broad range of specific ligands by forming binding pockets. These novel aptamer sequences can bind to nucleic acids, proteins or small organic and inorganic chemical compounds and have many potential uses in medicine and technology. RESULTS: The comprehensive sequence information on aptamers and ribozymes that have been generated by in vitro selection methods are included in this RiboaptDB database. Such types of unnatural data generated by in vitro methods are not available in the public 'natural' sequence databases such as GenBank and EMBL. The amount of sequence data generated by in vitro selection experiments has been accumulating exponentially. There are 370 artificial ribozyme sequences and 3842 aptamer sequences in the total 4212 sequences from 423 citations in this RiboaptDB. We included general search feature, and individual feature wise search, user submission form for new data through online and also local BLAST search. CONCLUSION: This database, besides serving as a storehouse of sequences that may have diagnostic or therapeutic utility in medicine, provides valuable information for computational and theoretical biologists. The RiboaptDB is extremely useful for garnering information about in vitro selection experiments as a whole and for better understanding the distribution of functional nucleic acids in sequence space. The database is updated regularly and is publicly available at

    Analysis and functional annotation of expressed sequence tags from the fall armyworm Spodoptera frugiperda

    Get PDF
    BACKGROUND: Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. RESULTS: We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. CONCLUSION: S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses

    Analysis and Functional Annotation of Expressed Sequence Tags from the Fall Armyworm \u3ci\u3eSpodoptera frugiperda\u3c/i\u3e

    Get PDF
    Background Little is known about the genome sequences of lepidopteran insects, although this group of insects has been studied extensively in the fields of endocrinology, development, immunity, and pathogen-host interactions. In addition, cell lines derived from Spodoptera frugiperda and other lepidopteran insects are routinely used for baculovirus foreign gene expression. This study reports the results of an expressed sequence tag (EST) sequencing project in cells from the lepidopteran insect S. frugiperda, the fall armyworm. Results We have constructed an EST database using two cDNA libraries from the S. frugiperda-derived cell line, SF-21. The database consists of 2,367 ESTs which were assembled into 244 contigs and 951 singlets for a total of 1,195 unique sequences. Conclusion S. frugiperda is an agriculturally important pest insect and genomic information will be instrumental for establishing initial transcriptional profiling and gene function studies, and for obtaining information about genes manipulated during infections by insect pathogens such as baculoviruses

    Aptamer base: a collaborative knowledge base to describe aptamers and SELEX experiments

    Get PDF
    Over the past several decades, rapid developments in both molecular and information technology have collectively increased our ability to understand molecular recognition. One emerging area of interest in molecular recognition research includes the isolation of aptamers. Aptamers are single-stranded nucleic acid or amino acid polymers that recognize and bind to targets with high affinity and selectivity. While research has focused on collecting aptamers and their interactions, most of the information regarding experimental methods remains in the unstructured and textual format of peer reviewed publications. To address this, we present the Aptamer Base, a database that provides detailed, structured information about the experimental conditions under which aptamers were selected and their binding affinity quantified. The open collaborative nature of the Aptamer Base provides the community with a unique resource that can be updated and curated in a decentralized manner, thereby accommodating the ever evolving field of aptamer research

    Cross-oncopanel study reveals high sensitivity and accuracy with overall analytical performance depending on genomic regions

    Get PDF
    BackgroundTargeted sequencing using oncopanels requires comprehensive assessments of accuracy and detection sensitivity to ensure analytical validity. By employing reference materials characterized by the U.S. Food and Drug Administration-led SEquence Quality Control project phase2 (SEQC2) effort, we perform a cross-platform multi-lab evaluation of eight Pan-Cancer panels to assess best practices for oncopanel sequencing.ResultsAll panels demonstrate high sensitivity across targeted high-confidence coding regions and variant types for the variants previously verified to have variant allele frequency (VAF) in the 5-20% range. Sensitivity is reduced by utilizing VAF thresholds due to inherent variability in VAF measurements. Enforcing a VAF threshold for reporting has a positive impact on reducing false positive calls. Importantly, the false positive rate is found to be significantly higher outside the high-confidence coding regions, resulting in lower reproducibility. Thus, region restriction and VAF thresholds lead to low relative technical variability in estimating promising biomarkers and tumor mutational burden.ConclusionThis comprehensive study provides actionable guidelines for oncopanel sequencing and clear evidence that supports a simplified approach to assess the analytical performance of oncopanels. It will facilitate the rapid implementation, validation, and quality control of oncopanels in clinical use.Peer reviewe

    Transcriptome Analysis of Frog Virus 3, the Type Species of the Genus \u3ci\u3eRanavirus\u3c/i\u3e, Family \u3ci\u3eIridoviridae\u3c/i\u3e

    No full text
    Frog virus 3 is the best characterized species within the genus Ranavirus, family Iridoviridae. FV3\u27s large (similar to 105 kbp) dsDNA genome encodes 98 putative open reading frames (ORFs) that are expressed in a coordinated fashion leading to the sequential appearance of immediate early (IE), delayed early (DE) and late (L) viral transcripts. As a step toward elucidating molecular events in FV3 replication, we sought to identify the temporal Class Of Viral messages. To accomplish this objective an oligonucleotide microarray containing 70-mer probes corresponding to each of the 98 FV3 ORFs was designed and used to examine viral gene expression. Viral transcription was initially monitored during the Course of a productive replication cycle at 2, 4 and 9 h after infection. To confirm results of the time course assay, vital gene expression was also monitored in the presence of cycloheximide (CHX), which limits expression to only IE genes, and following infection with a temperature-sensitive (ts) mutant which at non-permissive temperatures is defective in viral DNA synthesis and blocked in late gene expression. Subsequently, microarray analyses were validated by RT-PCR and qRT-PCR, Using these approaches we identified 33 IE genes, 22 DE genes and 36 L viral genes. The temporal class of the 7 remaining genes Could not be determined. Comparison of protein function with temporal class indicated that, in general, genes encoding putative regulatory factors, or proteins that played a part in nucleic acid metabolism and immune evasion, were classified as LE and DE genes, whereas those involved in DNA packaging and virion assembly were considered L genes. Information on temporal class will provide the basis for determining whether members of the same temporal class contain common upstream regulatory regions and perhaps allow us to identify virion-associated and virus-induced proteins that control vital gene expression. (C) 2009 Elsevier Inc. All rights reserved

    Identification and Expression Analyses of Poly [I:C]-Stimulated Genes in Channel Catfish (\u3ci\u3eIctalurus punctatus\u3c/i\u3e)

    No full text
    Channel catfish (Ictalurus punctatus) have proven to be an excellent model with which to study immune responses of lower vertebrates. Identification of anti-viral antibodies and cytotoxic cells, as well as both type I and II interferon (IFN), demonstrates that catfish likely mount a vigorous anti-viral immune response. In this report, we focus on other elements of the anti-viral response, and identify more than two dozen genes that are induced following treatment of catfish cells with poly [I:C]. We showed that poly [I:C] induced type I interferon within 2 h of treatment, and that characteristic interferon stimulated genes (ISGs) appeared 6–12 h after exposure. Among the ISGs detected by RT-PCR assay were homologs of ISG15, Mx1, IFN regulatory factor 1 (IRF-1), inhibitor of apoptosis protein-1 (IAP-1) and the chemokine CXCL10. Microarray analyses showed that 13 and 24 cellular genes, respectively, were upregulated in poly [I:C]-treated B cell and fibroblast cultures. Although many of these genes were novel and did not fit the profile of mammalian ISGs, there were several (ISG-15, ubiquitin-conjugating enzyme E2G1, integrin-linked kinase, and clathrin-associated protein 47) that were identified as ISGs in mammalian systems. Taken together, these results suggest that dsRNA, either directly or through the prior induction of IFN, upregulates catfish gene products that function individually and/or collectively to inhibit virus replication
    corecore