Search CORE

31 research outputs found

Semantically linking and browsing PubMed abstracts with gene ontology

Author: Shaik Jahangheer S
Vanteru Bhanu C
Yeasin Mohammed
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The technological advances in the past decade have lead to massive progress in the field of biotechnology. The documentation of the progress made exists in the form of research articles. The PubMed is the current most used repository for bio-literature. PubMed consists of about 17 million abstracts as of 2007 that require methods to efficiently retrieve and browse large volume of relevant information. The State-of-the-art technologies such as GOPubmed use simple keyword-based techniques for retrieving abstracts from the PubMed and linking them to the Gene Ontology (GO). This paper changes the paradigm by introducing semantics enabled technique to link the PubMed to the Gene Ontology, called, SEGOPubmed for ontology-based browsing. Latent Semantic Analysis (LSA) framework is used to semantically interface PubMed abstracts to the Gene Ontology. Results The Empirical analysis is performed to compare the performance of the SEGOPubmed with the GOPubmed. The analysis is initially performed using a few well-referenced query words. Further, statistical analysis is performed using GO curated dataset as ground truth. The analysis suggests that the SEGOPubmed performs better than the classic GOPubmed as it incorporates semantics. Conclusions The LSA technique is applied on the PubMed abstracts obtained based on the user query and the semantic similarity between the query and the abstracts. The analyses using well-referenced keywords show that the proposed semantic-sensitive technique outperformed the string comparison based techniques in associating the relevant abstracts to the GO terms. The SEGOPubmed also extracted the abstracts in which the keywords do not appear in isolation (i.e. they appear in combination with other terms) that could not be retrieved by simple term matching techniques.</p

University of Memphis Digital Commons

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

REDHORSE-REcombination and Double crossover detection in Haploid Organisms using next-geneRation SEquencing data

Author: Beverley Stephen M
Khan Asis
Shaik Jahangheer S
Sibley L. David
Publication venue: Digital Commons@Becker
Publication date: 01/01/2015
Field of study

BACKGROUND: Next-generation sequencing technology provides a means to study genetic exchange at a higher resolution than was possible using earlier technologies. However, this improvement presents challenges as the alignments of next generation sequence data to a reference genome cannot be directly used as input to existing detection algorithms, which instead typically use multiple sequence alignments as input. We therefore designed a software suite called REDHORSE that uses genomic alignments, extracts genetic markers, and generates multiple sequence alignments that can be used as input to existing recombination detection algorithms. In addition, REDHORSE implements a custom recombination detection algorithm that makes use of sequence information and genomic positions to accurately detect crossovers. REDHORSE is a portable and platform independent suite that provides efficient analysis of genetic crosses based on Next-generation sequencing data. RESULTS: We demonstrated the utility of REDHORSE using simulated data and real Next-generation sequencing data. The simulated dataset mimicked recombination between two known haploid parental strains and allowed comparison of detected break points against known true break points to assess performance of recombination detection algorithms. A newly generated NGS dataset from a genetic cross of Toxoplasma gondii allowed us to demonstrate our pipeline. REDHORSE successfully extracted the relevant genetic markers and was able to transform the read alignments from NGS to the genome to generate multiple sequence alignments. Recombination detection algorithm in REDHORSE was able to detect conventional crossovers and double crossovers typically associated with gene conversions whilst filtering out artifacts that might have been introduced during sequencing or alignment. REDHORSE outperformed other commonly used recombination detection algorithms in finding conventional crossovers. In addition, REDHORSE was the only algorithm that was able to detect double crossovers. CONCLUSION: REDHORSE is an efficient analytical pipeline that serves as a bridge between genomic alignments and existing recombination detection algorithms. Moreover, REDHORSE is equipped with a recombination detection algorithm specifically designed for Next-generation sequencing data. REDHORSE is portable, platform independent Java based utility that provides efficient analysis of genetic crosses based on Next-generation sequencing data. REDHORSE is available at http://redhorse.sourceforge.net/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-1309-7) contains supplementary material, which is available to authorized users

Crossref

Springer - Publisher Connector

Digital Commons@Becker

PubMed Central

Leishmania sexual reproductive strategies as resolved through computational methods designed for aneuploid genomes

Author: Beverley Stephen M
Dobson Deborah E
Sacks David L
Shaik Jahangheer S
Publication venue: Digital Commons@Becker
Publication date: 01/01/2021
Field of study

A cryptic sexual reproductive cycle i

Directory of Open Access Journals

Digital Commons@Becker

A unified framework for finding differentially expressed genes from microarray experiments

Author: C Tang
C Zhang
D Stekel
DL Davies
G Casella
G Getz
GJ McLachlan
H Hui-Huang
H Sahai
I Guyon
I Lonnstedt
IB Jeffery
J Shaik
J Shaik
J Shaik
J Shaik
J Shaik
Jahangheer S Shaik
JD Storey
Mohammed Yeasin
P Tamayo
RA Fisher
RL Fernando
RM Miller
RO Duda
S Mukherjee
S Tavazoie
T Li
TR Golub
U Alon
VG Tusher
X Chen
Y Benjamini
Y Benjamini
Y Su
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background This paper presents a unified framework for finding differentially expressed genes (DEGs) from the microarray data. The proposed framework has three interrelated modules: (i) gene ranking, ii) significance analysis of genes and (iii) validation. The first module uses two gene selection algorithms, namely, a) two-way clustering and b) combined adaptive ranking to rank the genes. The second module converts the gene ranks into p-values using an R-test and fuses the two sets of p-values using the Fisher's omnibus criterion. The DEGs are selected using the FDR analysis. The third module performs three fold validations of the obtained DEGs. The robustness of the proposed unified framework in gene selection is first illustrated using false discovery rate analysis. In addition, the clustering-based validation of the DEGs is performed by employing an adaptive subspace-based clustering algorithm on the training and the test datasets. Finally, a projection-based visualization is performed to validate the DEGs obtained using the unified framework. Results The performance of the unified framework is compared with well-known ranking algorithms such as t-statistics, Significance Analysis of Microarrays (SAM), Adaptive Ranking, Combined Adaptive Ranking and Two-way Clustering. The performance curves obtained using 50 simulated microarray datasets each following two different distributions indicate the superiority of the unified framework over the other reported algorithms. Further analyses on 3 real cancer datasets and 3 Parkinson's datasets show the similar improvement in performance. First, a 3 fold validation process is provided for the two-sample cancer datasets. In addition, the analysis on 3 sets of Parkinson's data is performed to demonstrate the scalability of the proposed method to multi-sample microarray datasets. Conclusion This paper presents a unified framework for the robust selection of genes from the two-sample as well as multi-sample microarray experiments. Two different ranking methods used in module 1 bring diversity in the selection of genes. The conversion of ranks to p-values, the fusion of p-values and FDR analysis aid in the identification of significant genes which cannot be judged based on gene ranking alone. The 3 fold validation, namely, robustness in selection of genes using FDR analysis, clustering, and visualization demonstrate the relevance of the DEGs. Empirical analyses on 50 artificial datasets and 6 real microarray datasets illustrate the efficacy of the proposed approach. The analyses on 3 cancer datasets demonstrate the utility of the proposed approach on microarray datasets with two classes of samples. The scalability of the proposed unified approach to multi-sample (more than two sample classes) microarray datasets is addressed using three sets of Parkinson's Data. Empirical analyses show that the unified framework outperformed other gene selection methods in selecting differentially expressed genes from microarray data.</p

University of Memphis Digital Commons

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

NextGen sequencing reveals short double crossovers contribute disproportionately to genetic diversity in Toxoplasma gondii

Author: Asis Khan
Benjamin M Rosenthal
Hernan A Lorenzi
Jahangheer S Shaik
James W Ajioka
Jitender P Dubey
L Sibley
Michael Behnke
Qiuling Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

BACKGROUND: Toxoplasma gondii is a widespread protozoan parasite of animals that causes zoonotic disease in humans. Three clonal variants predominate in North America and Europe, while South American strains are genetically diverse, and undergo more frequent recombination. All three northern clonal variants share a monomorphic version of chromosome Ia (ChrIa), which is also found in unrelated, but successful southern lineages. Although this pattern could reflect a selective advantage, it might also arise from non-Mendelian segregation during meiosis. To understand the inheritance of ChrIa, we performed a genetic cross between the northern clonal type 2 ME49 strain and a divergent southern type 10 strain called VAND, which harbors a divergent ChrIa. RESULTS: NextGen sequencing of haploid F1 progeny was used to generate a genetic map revealing a low level of conventional recombination, with an unexpectedly high frequency of short, double crossovers. Notably, both the monomorphic and divergent versions of ChrIa were isolated with equal frequency. As well, ChrIa showed no evidence of being a sex chromosome, of harboring an inversion, or distorting patterns of segregation. Although VAND was unable to self fertilize in the cat, it underwent successful out-crossing with ME49 and hybrid survival was strongly associated with inheritance of ChrIII from ME49 and ChrIb from VAND. CONCLUSIONS: Our findings suggest that the successful spread of the monomorphic ChrIa in the wild has not been driven by meiotic drive or related processes, but rather is due to a fitness advantage. As well, the high frequency of short double crossovers is expected to greatly increase genetic diversity among progeny from genetic crosses, thereby providing an unexpected and likely important source of diversity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-1168) contains supplementary material, which is available to authorized users

Crossref

Springer - Publisher Connector

Digital Commons@Becker

PubMed Central

The mating competence of geographically diverse Leishmania major strains in their natural and unnatural sand fly vectors

Author: Akopyants Natalia S
Barhoumi Mourad
Beverley Stephen M
Charmoy Melanie
Dobson Deborah E
Elnaiem Dia-Eldin A
Fay Michael
Grigg Michael
Inbar Ehud
Kauffmann Florence
Lawyer Phillip
Owens Katherine
Romano Audrey
Sacks David
Shaik Jahangheer
Publication venue: Digital Commons@Becker
Publication date: 01/01/2013
Field of study

Invertebrate stages of Leishmania are capable of genetic exchange during their extracellular growth and development in the sand fly vector. Here we explore two variables: the ability of diverse L. major strains from across its natural range to undergo mating in pairwise tests; and the timing of the appearance of hybrids and their developmental stage associations within both natural (Phlebotomus duboscqi) and unnatural (Lutzomyia longipalpis) sand fly vectors. Following co-infection of flies with parental lines bearing independent drug markers, doubly-drug resistant hybrid progeny were selected, from which 96 clonal lines were analyzed for DNA content and genotyped for parent alleles at 4-6 unlinked nuclear loci as well as the maxicircle DNA. As seen previously, the majority of hybrids showed '2n' DNA contents, but with a significant number of '3n' and one '4n' offspring. In the natural vector, 97% of the nuclear loci showed both parental alleles; however, 3% (4/150) showed only one parental allele. In the unnatural vector, the frequency of uniparental inheritance rose to 10% (27/275). We attribute this to loss of heterozygosity after mating, most likely arising from aneuploidy which is both common and temporally variable in Leishmania. As seen previously, only uniparental inheritance of maxicircle kDNA was observed. Hybrids were recovered at similar efficiencies in all pairwise crosses tested, suggesting that L. major lacks detectable 'mating types' that limit free genetic exchange. In the natural vector, comparisons of the timing of hybrid formation with the presence of developmental stages suggest nectomonads as the most likely sexually competent stage, with hybrids emerging well before the first appearance of metacyclic promastigotes. These studies provide an important perspective on the prevalence of genetic exchange in natural populations of L. major and a guide for experimental studies to understand the biology of mating

Crossref

Directory of Open Access Journals

Digital Commons@Becker

PubMed Central

FigShare

Global selective sweep of a highly inbred genome of the cattle parasite Neospora caninum

Author: Akanmori Bartholomew D
Cleaveland Sarah
Dubey Jitender P
Fujita Ayako Wendy
Grigg Michael E
Innes Elizabeth A
Khan Asis
Latham Sophia M
Oler Andrew J
Ortega-Mora Luis M
Quinones Mariam
Randle Nadine
Regidor-Cerrillo Javier
Ryan Una
Schares Gereon
Shaik Jahangheer S
Shen Kui
Slapeta Jan
Wastling Johnathan M
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2019
Field of study

Neospora caninum, a cyst-forming apicomplexan parasite, is a leading cause of neuromuscular diseases in dogs as well as fetal abortion in cattle worldwide. The importance of the domestic and sylvatic life cycles of Neospora, and the role of vertical transmission in the expansion and transmission of infection in cattle, is not sufficiently understood. To elucidate the population genomics of Neospora, we genotyped 50 isolates collected worldwide from a wide range of hosts using 19 linked and unlinked genetic markers. Phylogenetic analysis and genetic distance indices resolved a single genotype of N. caninum. Whole-genome sequencing of 7 isolates from 2 different continents identified high linkage disequilibrium, significant structural variation, but only limited polymorphism genome-wide, with only 5,766 biallelic single nucleotide polymorphisms (SNPs) total. Greater than half of these SNPs (∼3,000) clustered into 6 distinct haploblocks and each block possessed limited allelic diversity (with only 4 to 6 haplotypes resolved at each cluster). Importantly, the alleles at each haploblock had independently segregated across the strains sequenced, supporting a unisexual expansion model that is mosaic at 6 genomic blocks. Integrating seroprevalence data from African cattle, our data support a global selective sweep of a highly inbred livestock pathogen that originated within European dairy stock and expanded transcontinentally via unisexual mating and vertical transmission very recently, likely the result of human activities, including recurrent migration, domestication, and breed development of bovid and canid hosts within similar proximities

University of Liverpool Repository

Research Repository

Enlighten

REDHORSE-REcombination and Double crossover detection in Haploid Organisms using next-geneRation SEquencing data

Author: A Khan
A Khan
Asis Khan
B Langmead
C Amos
D Ajzenberg
D Posada
E Lander
H Li
H Li
J Wang
Jahangheer S Shaik
JM Smith
KW Broman
L David Sibley
M Padidam
MF Boni
MJ Armstrong
MS Behnke
PM McKeique
SR Browning
Stephen M Beverley
T Jombart
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A progressive framework for two-way clustering using adaptive subspace iteration for functionally classifying genes

Author: Shaik Jahangheer S.
Yeasin Mohammed
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

This paper presents an adaptive subspace based two-way clustering of microarray data. To analyze the data at various scales a Progressive framework is introduced. The goals are to functionally classify genes and also to find differentially expressed genes in microarray expression profiles. Empirical analysis on Colon Cancer dataset shows that ASI performs favorably in grouping genes with similar functions and finding genes that may have been involved in the formation of colon cancer. It was also observed that the proposed algorithm is robust against ordering of samples and yield results consistent with ground truth information. © 2006 IEEE

University of Memphis Digital Commons

Performance evaluation of subspace-based algorithm in selecting differentially expressed genes and classification of tissue types from microarray data

Author: Shaik Jahangheer S.
Yeasin Mohammed
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

This paper presents the implementation and evaluation of subspace-based clustering algorithm for robust selection of differentially expressed genes as well as the classification of tissue types from microarray data. The performance of the proposed algorithm is compared against other well known clustering algorithms and the quality of clusters is evaluated using a number of cluster validation indices. Empirical analyses on a number of synthetic and real microarray data sets suggest that the proposed subspace-based algorithm is robust in selecting differentially expressed genes and performs significantly better compared to popular clustering algorithms in selecting differentially expressed genes and classifying different tissue types. © 2006 IEEE

University of Memphis Digital Commons