46 research outputs found

    Semantically linking and browsing PubMed abstracts with gene ontology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The technological advances in the past decade have lead to massive progress in the field of biotechnology. The documentation of the progress made exists in the form of research articles. The PubMed is the current most used repository for bio-literature. PubMed consists of about 17 million abstracts as of 2007 that require methods to efficiently retrieve and browse large volume of relevant information. The State-of-the-art technologies such as GOPubmed use simple keyword-based techniques for retrieving abstracts from the PubMed and linking them to the Gene Ontology (GO). This paper changes the paradigm by introducing semantics enabled technique to link the PubMed to the Gene Ontology, called, SEGOPubmed for ontology-based browsing. Latent Semantic Analysis (LSA) framework is used to semantically interface PubMed abstracts to the Gene Ontology.</p> <p>Results</p> <p>The Empirical analysis is performed to compare the performance of the SEGOPubmed with the GOPubmed. The analysis is initially performed using a few well-referenced query words. Further, statistical analysis is performed using GO curated dataset as ground truth. The analysis suggests that the SEGOPubmed performs better than the classic GOPubmed as it incorporates semantics.</p> <p>Conclusions</p> <p>The LSA technique is applied on the PubMed abstracts obtained based on the user query and the semantic similarity between the query and the abstracts. The analyses using well-referenced keywords show that the proposed semantic-sensitive technique outperformed the string comparison based techniques in associating the relevant abstracts to the GO terms. The SEGOPubmed also extracted the abstracts in which the keywords do not appear in isolation (i.e. they appear in combination with other terms) that could not be retrieved by simple term matching techniques.</p

    REDHORSE-REcombination and Double crossover detection in Haploid Organisms using next-geneRation SEquencing data

    Get PDF
    BACKGROUND: Next-generation sequencing technology provides a means to study genetic exchange at a higher resolution than was possible using earlier technologies. However, this improvement presents challenges as the alignments of next generation sequence data to a reference genome cannot be directly used as input to existing detection algorithms, which instead typically use multiple sequence alignments as input. We therefore designed a software suite called REDHORSE that uses genomic alignments, extracts genetic markers, and generates multiple sequence alignments that can be used as input to existing recombination detection algorithms. In addition, REDHORSE implements a custom recombination detection algorithm that makes use of sequence information and genomic positions to accurately detect crossovers. REDHORSE is a portable and platform independent suite that provides efficient analysis of genetic crosses based on Next-generation sequencing data. RESULTS: We demonstrated the utility of REDHORSE using simulated data and real Next-generation sequencing data. The simulated dataset mimicked recombination between two known haploid parental strains and allowed comparison of detected break points against known true break points to assess performance of recombination detection algorithms. A newly generated NGS dataset from a genetic cross of Toxoplasma gondii allowed us to demonstrate our pipeline. REDHORSE successfully extracted the relevant genetic markers and was able to transform the read alignments from NGS to the genome to generate multiple sequence alignments. Recombination detection algorithm in REDHORSE was able to detect conventional crossovers and double crossovers typically associated with gene conversions whilst filtering out artifacts that might have been introduced during sequencing or alignment. REDHORSE outperformed other commonly used recombination detection algorithms in finding conventional crossovers. In addition, REDHORSE was the only algorithm that was able to detect double crossovers. CONCLUSION: REDHORSE is an efficient analytical pipeline that serves as a bridge between genomic alignments and existing recombination detection algorithms. Moreover, REDHORSE is equipped with a recombination detection algorithm specifically designed for Next-generation sequencing data. REDHORSE is portable, platform independent Java based utility that provides efficient analysis of genetic crosses based on Next-generation sequencing data. REDHORSE is available at http://redhorse.sourceforge.net/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1309-7) contains supplementary material, which is available to authorized users

    Identification of transcriptional regulatory networks specific to pilocytic astrocytoma.

    Get PDF
    BackgroundPilocytic Astrocytomas (PAs) are common low-grade central nervous system malignancies for which few recurrent and specific genetic alterations have been identified. In an effort to better understand the molecular biology underlying the pathogenesis of these pediatric brain tumors, we performed higher-order transcriptional network analysis of a large gene expression dataset to identify gene regulatory pathways that are specific to this tumor type, relative to other, more aggressive glial or histologically distinct brain tumours.MethodsRNA derived from frozen human PA tumours was subjected to microarray-based gene expression profiling, using Affymetrix U133Plus2 GeneChip microarrays. This data set was compared to similar data sets previously generated from non-malignant human brain tissue and other brain tumour types, after appropriate normalization.ResultsIn this study, we examined gene expression in 66 PA tumors compared to 15 non-malignant cortical brain tissues, and identified 792 genes that demonstrated consistent differential expression between independent sets of PA and non-malignant specimens. From this entire 792 gene set, we used the previously described PAP tool to assemble a core transcriptional regulatory network composed of 6 transcription factor genes (TFs) and 24 target genes, for a total of 55 interactions. A similar analysis of oligodendroglioma and glioblastoma multiforme (GBM) gene expression data sets identified distinct, but overlapping, networks. Most importantly, comparison of each of the brain tumor type-specific networks revealed a network unique to PA that included repressed expression of ONECUT2, a gene frequently methylated in other tumor types, and 13 other uniquely predicted TF-gene interactions.ConclusionsThese results suggest specific transcriptional pathways that may operate to create the unique molecular phenotype of PA and thus opportunities for corresponding targeted therapeutic intervention. Moreover, this study also demonstrates how integration of gene expression data with TF-gene and TF-TF interaction data is a powerful approach to generating testable hypotheses to better understand cell-type specific genetic programs relevant to cancer

    Self-hybridization in Leishmania major

    Get PDF
    Genetic exchange between differen

    Whole genome sequencing of experimental hybrids supports meiosis-like sexual recombination in Leishmania

    Get PDF
    Hybrid genotypes have been repeatedly described among natural isolates of Leishmania, and the recovery of experimental hybrids from sand flies co-infected with different strains or species of Leishmania has formally demonstrated that members of the genus possess the machinery for genetic exchange. As neither gamete stages nor cell fusion events have been directly observed during parasite development in the vector, we have relied on a classical genetic analysis to determine if Leishmania has a true sexual cycle. Here, we used whole genome sequencing to follow the chromosomal inheritance patterns of experimental hybrids generated within and between different strains of L. major and L. infantum. We also generated and sequenced the first experimental hybrids in L. tropica. We found that in each case the parental somy and allele contributions matched the inheritance patterns expected under meiosis 97–99% of the time. The hybrids were equivalent to F1 progeny, heterozygous throughout most of the genome for the markers that were homozygous and different between the parents. Rare, non-Mendelian patterns of chromosomal inheritance were observed, including a gain or loss of somy, and loss of heterozygosity, that likely arose during meiosis or during mitotic divisions of the progeny clones in the fly or culture. While the interspecies hybrids appeared to be sterile, the intraspecies hybrids were able to produce backcross and outcross progeny. Analysis of 5 backcross and outcross progeny clones generated from an L. major F1 hybrid, as well as 17 progeny clones generated from backcrosses involving a natural hybrid of L. tropica, revealed genome wide patterns of recombination, demonstrating that classical crossing over occurs at meiosis, and allowed us to construct the first physical and genetic maps in Leishmania. Altogether, the findings provide strong evidence for meiosis-like sexual recombination in Leishmania, presenting clear opportunities for forward genetic analysis and positional cloning of important genes.</div

    A unified framework for finding differentially expressed genes from microarray experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper presents a unified framework for finding differentially expressed genes (DEGs) from the microarray data. The proposed framework has three interrelated modules: (i) gene ranking, ii) significance analysis of genes and (iii) validation. The first module uses two gene selection algorithms, namely, a) two-way clustering and b) combined adaptive ranking to rank the genes. The second module converts the gene ranks into p-values using an R-test and fuses the two sets of p-values using the Fisher's omnibus criterion. The DEGs are selected using the FDR analysis. The third module performs three fold validations of the obtained DEGs. The robustness of the proposed unified framework in gene selection is first illustrated using false discovery rate analysis. In addition, the clustering-based validation of the DEGs is performed by employing an adaptive subspace-based clustering algorithm on the training and the test datasets. Finally, a projection-based visualization is performed to validate the DEGs obtained using the unified framework.</p> <p>Results</p> <p>The performance of the unified framework is compared with well-known ranking algorithms such as t-statistics, Significance Analysis of Microarrays (SAM), Adaptive Ranking, Combined Adaptive Ranking and Two-way Clustering. The performance curves obtained using 50 simulated microarray datasets each following two different distributions indicate the superiority of the unified framework over the other reported algorithms. Further analyses on 3 real cancer datasets and 3 Parkinson's datasets show the similar improvement in performance. First, a 3 fold validation process is provided for the two-sample cancer datasets. In addition, the analysis on 3 sets of Parkinson's data is performed to demonstrate the scalability of the proposed method to multi-sample microarray datasets.</p> <p>Conclusion</p> <p>This paper presents a unified framework for the robust selection of genes from the two-sample as well as multi-sample microarray experiments. Two different ranking methods used in module 1 bring diversity in the selection of genes. The conversion of ranks to p-values, the fusion of p-values and FDR analysis aid in the identification of significant genes which cannot be judged based on gene ranking alone. The 3 fold validation, namely, robustness in selection of genes using FDR analysis, clustering, and visualization demonstrate the relevance of the DEGs. Empirical analyses on 50 artificial datasets and 6 real microarray datasets illustrate the efficacy of the proposed approach. The analyses on 3 cancer datasets demonstrate the utility of the proposed approach on microarray datasets with two classes of samples. The scalability of the proposed unified approach to multi-sample (more than two sample classes) microarray datasets is addressed using three sets of Parkinson's Data. Empirical analyses show that the unified framework outperformed other gene selection methods in selecting differentially expressed genes from microarray data.</p

    The mating competence of geographically diverse Leishmania major strains in their natural and unnatural sand fly vectors

    Get PDF
    Invertebrate stages of Leishmania are capable of genetic exchange during their extracellular growth and development in the sand fly vector. Here we explore two variables: the ability of diverse L. major strains from across its natural range to undergo mating in pairwise tests; and the timing of the appearance of hybrids and their developmental stage associations within both natural (Phlebotomus duboscqi) and unnatural (Lutzomyia longipalpis) sand fly vectors. Following co-infection of flies with parental lines bearing independent drug markers, doubly-drug resistant hybrid progeny were selected, from which 96 clonal lines were analyzed for DNA content and genotyped for parent alleles at 4-6 unlinked nuclear loci as well as the maxicircle DNA. As seen previously, the majority of hybrids showed '2n' DNA contents, but with a significant number of '3n' and one '4n' offspring. In the natural vector, 97% of the nuclear loci showed both parental alleles; however, 3% (4/150) showed only one parental allele. In the unnatural vector, the frequency of uniparental inheritance rose to 10% (27/275). We attribute this to loss of heterozygosity after mating, most likely arising from aneuploidy which is both common and temporally variable in Leishmania. As seen previously, only uniparental inheritance of maxicircle kDNA was observed. Hybrids were recovered at similar efficiencies in all pairwise crosses tested, suggesting that L. major lacks detectable 'mating types' that limit free genetic exchange. In the natural vector, comparisons of the timing of hybrid formation with the presence of developmental stages suggest nectomonads as the most likely sexually competent stage, with hybrids emerging well before the first appearance of metacyclic promastigotes. These studies provide an important perspective on the prevalence of genetic exchange in natural populations of L. major and a guide for experimental studies to understand the biology of mating

    NextGen sequencing reveals short double crossovers contribute disproportionately to genetic diversity in Toxoplasma gondii

    Get PDF
    BACKGROUND: Toxoplasma gondii is a widespread protozoan parasite of animals that causes zoonotic disease in humans. Three clonal variants predominate in North America and Europe, while South American strains are genetically diverse, and undergo more frequent recombination. All three northern clonal variants share a monomorphic version of chromosome Ia (ChrIa), which is also found in unrelated, but successful southern lineages. Although this pattern could reflect a selective advantage, it might also arise from non-Mendelian segregation during meiosis. To understand the inheritance of ChrIa, we performed a genetic cross between the northern clonal type 2 ME49 strain and a divergent southern type 10 strain called VAND, which harbors a divergent ChrIa. RESULTS: NextGen sequencing of haploid F1 progeny was used to generate a genetic map revealing a low level of conventional recombination, with an unexpectedly high frequency of short, double crossovers. Notably, both the monomorphic and divergent versions of ChrIa were isolated with equal frequency. As well, ChrIa showed no evidence of being a sex chromosome, of harboring an inversion, or distorting patterns of segregation. Although VAND was unable to self fertilize in the cat, it underwent successful out-crossing with ME49 and hybrid survival was strongly associated with inheritance of ChrIII from ME49 and ChrIb from VAND. CONCLUSIONS: Our findings suggest that the successful spread of the monomorphic ChrIa in the wild has not been driven by meiotic drive or related processes, but rather is due to a fitness advantage. As well, the high frequency of short double crossovers is expected to greatly increase genetic diversity among progeny from genetic crosses, thereby providing an unexpected and likely important source of diversity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1168) contains supplementary material, which is available to authorized users

    Global selective sweep of a highly inbred genome of the cattle parasite Neospora caninum

    Get PDF
    Neospora caninum, a cyst-forming apicomplexan parasite, is a leading cause of neuromuscular diseases in dogs as well as fetal abortion in cattle worldwide. The importance of the domestic and sylvatic life cycles of Neospora, and the role of vertical transmission in the expansion and transmission of infection in cattle, is not sufficiently understood. To elucidate the population genomics of Neospora, we genotyped 50 isolates collected worldwide from a wide range of hosts using 19 linked and unlinked genetic markers. Phylogenetic analysis and genetic distance indices resolved a single genotype of N. caninum. Whole-genome sequencing of 7 isolates from 2 different continents identified high linkage disequilibrium, significant structural variation, but only limited polymorphism genome-wide, with only 5,766 biallelic single nucleotide polymorphisms (SNPs) total. Greater than half of these SNPs (∼3,000) clustered into 6 distinct haploblocks and each block possessed limited allelic diversity (with only 4 to 6 haplotypes resolved at each cluster). Importantly, the alleles at each haploblock had independently segregated across the strains sequenced, supporting a unisexual expansion model that is mosaic at 6 genomic blocks. Integrating seroprevalence data from African cattle, our data support a global selective sweep of a highly inbred livestock pathogen that originated within European dairy stock and expanded transcontinentally via unisexual mating and vertical transmission very recently, likely the result of human activities, including recurrent migration, domestication, and breed development of bovid and canid hosts within similar proximities
    corecore