17 research outputs found
Uncovering functional mechanisms in cancer through integrative genomics
Ph.DDOCTOR OF PHILOSOPH
Human and mouse oligonucleotide-based array CGH
Array-based comparative genomic hybridization is a high resolution method for measuring chromosomal copy number changes. Here we present a validated protocol using in-house spotted oligonucleotide libraries for array comparative genomic hybridization (CGH). This oligo array CGH platform yields reproducible results and is capable of detecting single copy gains, multi-copy amplifications as well as homozygous and heterozygous deletions as small as 100 kb with high resolution. A human oligonucleotide library was printed on amine binding slides. Arrays were hybridized using a hybstation and analysed using BlueFuse feature extraction software, with >95% of spots passing quality control. The protocol allows as little as 300 ng of input DNA and a 90% reduction of Cot-1 DNA without compromising quality. High quality results have also been obtained with DNA from archival tissue. Finally, in addition to human oligo arrays, we have applied the protocol successfully to mouse oligo arrays. We believe that this oligo-based platform using ‘off-the-shelf’ oligo libraries provides an easy accessible alternative to BAC arrays for CGH, which is cost-effective, available at high resolution and easily implemented for any sequenced organism without compromising the quality of the results
Comparative assembly and analysis of different sized genomes using Pacbio sequencing technology
PacBio is the third generation sequencing technology which is based on the single molecule real time sequencing (SMRT) platform using the property of zero-mode waveguide (ZMW). This technology generates very long reads which is best suited for various applications like de novo genome assembly, structural variations, full length transcriptomes, direct detection of base modifications etc. PacBio data can either be used alone or in combination with the illumina based shorter reads to facilitate a good assembly. Different algorithms are available to construct the genome based on PacBio alone or hybrid datasets. In order to identify the best possible approach we did a comparative study employing the widely accepted assembly tools on E.coli, C.elegans and A.thaliana datasets (PacBio & Ilumina (Paired end & Mate Pair)). We performed de novo genome assembly, gene prediction and gene annotation for all possible dataset (PacBio & Illumina PE & MP) and tools combination. The study resulted in the identification of the best method that could assemble the 4.6 MB of E.coli genome covering ~97% of BUSCO represented genes in a single contig. For C.elegans and A.thaliana we were able to achieve 109 MB and 123 MB sized assembly with ~80% of BUSCO represented genes
Pipeline to upgrade the genome annotations
Current era of functional genomics is enriched with good quality draft genomes and annotations for many thousands of species and varieties with the support of the advancements in the next generation sequencing technologies (NGS). Around 25,250 genomes, of the organisms from various kingdoms, are submitted in the NCBI genome resource till date. Each of these genomes was annotated using various tools and knowledge-bases that were available during the period of the annotation. It is obvious that these annotations will be improved if the same genome is annotated using improved tools and knowledge-bases. Here we present a new genome annotation pipeline, strengthened with various tools and knowledge-bases that are capable of producing better quality annotations from the consensus of the predictions from different tools. This resource also perform various additional annotations, apart from the usual gene predictions and functional annotations, which involve SSRs, novel repeats, paralogs, proteins with transmembrane helices, signal peptides etc. This new annotation resource is trained to evaluate and integrate all the predictions together to resolve the overlaps and ambiguities of the boundaries. One of the important highlights of this resource is the capability of predicting the phylogenetic relations of the repeats using the evolutionary trace analysis and orthologous gene clusters. We also present a case study, of the pipeline, in which we upgrade the genome annotation of Nelumbo nucifera (sacred lotus). It is demonstrated that this resource is capable of producing an improved annotation for a better understanding of the biology of various organisms
Re-analysis of RNA-Sequencing Data on Apple Stem Grooving Virus infected Apple reveals more significant differentially expressed genes
RNA sequencing (RNA-Seq) technology has enabled the researchers to investigate the host global gene expression changes in plant-virus interactions which helped to understand the molecular basis of virus diseases. The re-analysis of RNA-Seq studies using most updated genome version and the available best analysis pipeline will produce most accurate results. In this study, we re-analysed the Apple stem grooving virus (ASGV) infected apple shoots in comparison with that of virus-free in vitro shoots [1] using the most updated Malus x domestica genome downloaded from Phytozome database. The re-analysis was done by using HISAT2 software and Cufflinks program was used to mine the differentially expressed genes. We found that ~20% more reads was mapped to the latest genome using the updated pipeline, which proved the significance of such re-analysis. The comparison of the updated results with that of previous was done. In addition, we performed protein-protein interaction (PPI) to investigate the proteins affected by ASGV infection
An integrated computational validation approach for potential novel miRNA prediction
MicroRNAs (miRNAs) are short, non-coding RNAs between 17bp-24bp length that regulate gene expression by targeting mRNA molecules. The regulatory functions of miRNAs are known to be majorly associated with disease phenotypes such as cancer, cell signaling, cell division, growth and other metabolisms. Novel miRNAs are defined as sequences which does not have any similarity with the existing known sequences and void of any experimental evidences. In recent decades, the advent of next-generation sequencing allows us to capture the small RNA molecules form the cells and developing methods to estimate their expression levels. Several computational algorithms are available to predict the novel miRNAs from the deep sequencing data. In this work, we integrated three novel miRNA prediction programs miRDeep, miRanalyzer and miRPRo to compare and validate their prediction efficiency. The dicer cleavage sites, alignment density, seed conservation, minimum free energy, AU-GC percentage, secondary loop scores, false discovery rates and confidence scores will be considered for comparison and evaluation. Efficiency to identify isomiRs and base pair mismatches in a strand specific manner will also be considered for the computational validation. Further, the criteria and parameters for the identification of the best possible novel miRNA with minimal false positive rates were deduced
A comparative study of Bisulphite-seq analysis pipeline
Recent advances in next generation sequencing (NGS) technology provide the opportunity to rapidly understand whole genome methylation profile. However, there are challenges in handling and interpretation of the methylation sequence data because of its large volume and the consequences of bisulphite modification. Most of the current pipelines include a specific aligner to decode and quantify the fraction of methylated cytosine per base; further this quantitative data is studied for differential methylation and annotated for genomic features. We have examined the performance of three pipelines for alignment and differential methylation profiling using the published data from plant and animals. We compared the consistency across these tools and explored various visualization features. We also illustrate our in-house visualization based analytic tool for a higher quality comprehension of whole genome methylation profile. Our comparative study showcases the performance of the widely accepted tools and can guide the scientific community in choosing the appropriate method for their methylation data analysis
Next-generation sequencing analysis reveals high bacterial diversity in wild venomous and non-venomous snakes from India.
Abstract Background The oral cavities of snakes are replete with various types of bacterial flora. Culture-dependent studies suggest that some of the bacterial species are responsible for secondary bacterial infection associated with snakebite. A complete profile of the ophidian oral bacterial community has been unreported until now. Therefore, in the present study, we determined the complete bacterial compositions in the oral cavity of some snakes from India. Methods Total DNA was isolated from oral swabs collected from three wild snake species (Indian Cobra, King Cobra and Indian Python). Next, the DNA was subjected to PCR amplification of microbial 16S rRNA gene using V3-region-specific primers. The amplicons were used for preparation of DNA libraries that were sequenced on an Illumina MiSeq platform. Results The cluster-based taxonomy analysis revealed that Proteobacteria and Actinobacteria were the most predominant phyla present in the oral cavities of snakes. This result indicates that snakes show more similarities to birds than mammals as to their oral bacterial communities. Furthermore, our study reports all the unique and common bacterial species (total: 147) found among the oral microbes of snakes studied, while the majority of commonly abundant species were pathogens or opportunistic pathogens to humans. A wide difference in ophidian oral bacterial flora suggests variation by individual, species and geographical region. Conclusion The present study would provide a foundation for further research on snakes to recognize the potential drugs/antibiotics for the different infectious diseases
<i>Akkermansia</i>, a Possible Microbial Marker for Poor Glycemic Control in Qataris Children Consuming Arabic Diet—A Pilot Study on Pediatric T1DM in Qatar
In Qatar, Type 1 Diabetes mellitus (T1DM) is one of the most prevalent disorders. This study aimed to explore the gut microbiome’s relation to the continuous subcutaneous insulin infusion (CSII) therapy, dietary habits, and the HbA1c level in the pediatric T1DM subjects in Qatar. We recruited 28 T1DM subjects with an average age of 10.5 ± 3.53 years. The stool sample was used to measure microbial composition by 16s rDNA sequencing method. The results have revealed that the subjects who had undergone CSII therapy had increased microbial diversity and genus Akkermansia was significantly enriched in the subjects without CSII therapy. Moreover, genus Akkermansia was higher in the subjects with poor glycemic control (HbA1c > 7.5%). When we classified the subjects based on dietary patterns and nationality, Akkermansia was significantly enriched in Qataris subjects without the CSII therapy consuming Arabic diet than expatriates living in Qatar and eating a Western/mixed diet. Thus, this pilot study showed that abundance of Akkermansia is dependent on the Arabic diet only in poorly controlled Qataris T1DM patients, opening new routes to personalized treatment for T1DM in Qataris pediatric subjects. Further comprehensive studies on the relation between the Arabic diet, ethnicity, and Akkermansia are warranted to confirm this preliminary finding