56 research outputs found
Novel techniques for protein structure characterization using graph representation of proteins
Proteins exhibit an infinite variety of structures. Around 50K 3D structures of proteins exist in PDB database among unlimited possibilities. The three dimensional structure of a protein is crucial to its function. Even within a common structure family, proteins vary in length, size, and sequence. This variation is the reflection of evolution on protein sequences. The intrinsic information in protein structures can be captured by their graph representations. The structural similarities between protein families can be deduced using their structural features such as connectivity, betweenness, and cliquishness. Most of the structure comparison and alignment methods use all atom coordinates thatās why they need reliable full atom representation of proteins which is difficult to obtain using experimental methods. These methods can be used for variety of problems in bioinformatics such as protein fold prediction, function annotation, domain prediction, and fold classification. Our approach can capture the same knowledge by using much less information from the actual structure. In this thesis, we used graph representations of proteins and graph theoretical properties to discriminate native and non-native proteins. Then we used these methods to find out overall and local similarity of protein structures by using dynamic programming. Afterward, local alignment using dynamic programming is used to determine the function of a protein. Moreover, sub graph matching algorithms was employed for domain prediction. In order to find the correct fold we also developed a genetic algorithm based threading approach. All these applications gave better or comparable results to state of the art
DolphinNext: a distributed data processing platform for high throughput genomics
BACKGROUND: The emergence of high throughput technologies that produce vast amounts of genomic data, such as next-generation sequencing (NGS) is transforming biological research. The dramatic increase in the volume of data, the variety and continuous change of data processing tools, algorithms and databases make analysis the main bottleneck for scientific discovery. The processing of high throughput datasets typically involves many different computational programs, each of which performs a specific step in a pipeline. Given the wide range of applications and organizational infrastructures, there is a great need for highly parallel, flexible, portable, and reproducible data processing frameworks. Several platforms currently exist for the design and execution of complex pipelines. Unfortunately, current platforms lack the necessary combination of parallelism, portability, flexibility and/or reproducibility that are required by the current research environment. To address these shortcomings, workflow frameworks that provide a platform to develop and share portable pipelines have recently arisen. We complement these new platforms by providing a graphical user interface to create, maintain, and execute complex pipelines. Such a platform will simplify robust and reproducible workflow creation for non-technical users as well as provide a robust platform to maintain pipelines for large organizations.
RESULTS: To simplify development, maintenance, and execution of complex pipelines we created DolphinNext. DolphinNext facilitates building and deployment of complex pipelines using a modular approach implemented in a graphical interface that relies on the powerful Nextflow workflow framework by providing 1. A drag and drop user interface that visualizes pipelines and allows users to create pipelines without familiarity in underlying programming languages. 2. Modules to execute and monitor pipelines in distributed computing environments such as high-performance clusters and/or cloud 3. Reproducible pipelines with version tracking and stand-alone versions that can be run independently. 4. Modular process design with process revisioning support to increase reusability and pipeline development efficiency. 5. Pipeline sharing with GitHub and automated testing 6. Extensive reports with R-markdown and shiny support for interactive data visualization and analysis.
CONCLUSION: DolphinNext is a flexible, intuitive, web-based data processing and analysis platform that enables creating, deploying, sharing, and executing complex Nextflow pipelines with extensive revisioning and interactive reporting to enhance reproducible results
Dietary suppression of MHC-II expression in intestinal stem cells enhances intestinal tumorigenesis [preprint]
Little is known about how interactions between diet, immune recognition, and intestinal stem cells (ISCs) impact the early steps of intestinal tumorigenesis. Here, we show that a high fat diet (HFD) reduces the expression of the major histocompatibility complex II (MHC-II) genes in ISCs. This decline in ISC MHC-II expression in a HFD correlates with an altered intestinal microbiome composition and is recapitulated in antibiotic treated and germ-free mice on a control diet. Mechanistically, pattern recognition receptor and IFNg signaling regulate MHC-II expression in ISCs. Although MHC-II expression on ISCs is dispensable for stem cell function in organoid cultures in vitro, upon loss of the tumor suppressor gene Apc in a HFD, MHC-II- ISCs harbor greater in vivo tumor-initiating capacity than their MHC-II+ counterparts, thus implicating a role for epithelial MHC-II in suppressing tumorigenesis. Finally, ISC-specific genetic ablation of MHC-II in engineered Apc-mediated intestinal tumor models increases tumor burden in a cell autonomous manner. These findings highlight how a HFD alters the immune recognition properties of ISCs through the regulation of MHC-II expression in a manner that could contribute to intestinal tumorigenesis
GUIDEseq: a bioconductor package to analyze GUIDE-Seq datasets for CRISPR-Cas nucleases
BACKGROUND: Genome editing technologies developed around the CRISPR-Cas9 nuclease system have facilitated the investigation of a broad range of biological questions. These nucleases also hold tremendous promise for treating a variety of genetic disorders. In the context of their therapeutic application, it is important to identify the spectrum of genomic sequences that are cleaved by a candidate nuclease when programmed with a particular guide RNA, as well as the cleavage efficiency of these sites. Powerful new experimental approaches, such as GUIDE-seq, facilitate the sensitive, unbiased genome-wide detection of nuclease cleavage sites within the genome. Flexible bioinformatics analysis tools for processing GUIDE-seq data are needed.
RESULTS: Here, we describe an open source, open development software suite, GUIDEseq, for GUIDE-seq data analysis and annotation as a Bioconductor package in R. The GUIDEseq package provides a flexible platform with more than 60 adjustable parameters for the analysis of datasets associated with custom nuclease applications. These parameters allow data analysis to be tailored to different nuclease platforms with different length and complexity in their guide and PAM recognition sequences or their DNA cleavage position. They also enable users to customize sequence aggregation criteria, and vary peak calling thresholds that can influence the number of potential off-target sites recovered. GUIDEseq also annotates potential off-target sites that overlap with genes based on genome annotation information, as these may be the most important off-target sites for further characterization. In addition, GUIDEseq enables the comparison and visualization of off-target site overlap between different datasets for a rapid comparison of different nuclease configurations or experimental conditions. For each identified off-target, the GUIDEseq package outputs mapped GUIDE-Seq read count as well as cleavage score from a user specified off-target cleavage score prediction algorithm permitting the identification of genomic sequences with unexpected cleavage activity.
CONCLUSION: The GUIDEseq package enables analysis of GUIDE-data from various nuclease platforms for any species with a defined genomic sequence. This software package has been used successfully to analyze several GUIDE-seq datasets. The software, source code and documentation are freely available at http://www.bioconductor.org/packages/release/bioc/html/GUIDEseq.html
An atlas of cell types in the mouse epididymis and vas deferens
Following testicular spermatogenesis, mammalian sperm continue to mature in a long epithelial tube known as the epididymis, which plays key roles in remodeling sperm protein, lipid, and RNA composition. To understand the roles for the epididymis in reproductive biology, we generated a single-cell atlas of the murine epididymis and vas deferens. We recovered key epithelial cell types including principal cells, clear cells, and basal cells, along with associated support cells that include fibroblasts, smooth muscle, macrophages and other immune cells. Moreover, our data illuminate extensive regional specialization of principal cell populations across the length of the epididymis. In addition to region-specific specialization of principal cells, we find evidence for functionally specialized subpopulations of stromal cells, and, most notably, two distinct populations of clear cells. Our dataset extends on existing knowledge of epididymal biology, and provides a wealth of information on potential regulatory and signaling factors that bear future investigation
HIV-1 unmasks the plasticity of innate lymphoid cells [preprint]
Pharmaceuticals that suppress HIV-1 viremia preserve CD4+ T cells and prevent AIDS. Nonetheless, HIV-1 infected people taking these drugs have chronic inflammation attributable to persistent disruption of intestinal barrier function with increased rates of cardiovascular mortality. To better understand the etiology of this inflammation we examined the effect of HIV-1 infection on innate lymphoid cells (ILCs). These innate immune counterparts of T cells lack clonotypic antigen receptors, classify according to signature transcription factors and cytokines, and maintain homeostasis in inflamed tissues. ILCs have been defined, in part, by the IL-7RĪ±, CD127. Here we report that the vast majority of type 1 and 3 ILCs in human adult and placental cord blood are in fact CD127-, as are colon lamina propria ILC1s and many ILC3s. Among ILCs, CD127-ILC1s were the major producer of inflammatory cytokines. In contrast to CD127+ILC3s, CD127-ILC3s did not produce IL-22, a cytokine that maintains epithelial barrier function. In HIV-1+ people taking antivirals that preserve CD4+ T cells, CD127-ILC1s and all homeostatic cytokine-producing CD127+ILCs were decreased in blood and colon. Common Ī³-chain cytokines that are reported to be elevated in response to HIV-1 infection caused JAK3-dependent downregulation of CD127 and converted CD127-ILC1s into NK cells with heightened cytolytic activity. Consistent with the recent report that human blood CD117+ILCs give rise to both ILC1s and NK cells, pseudotemporal clustering of transcriptomes from thousands of individual cells identified a developmental trajectory from CD127-ILC1s to memory NK cells that was defined by WNT-transcription factor TCF7. WNT inhibition prevented the cytokine-induced transition of CD127-ILC1 cells into memory NK cells. In HIV-1+ people, effector NK cells and TCF7+ memory NK cells were elevated, concomitant with reduction in CD127-ILC1s. These studies describe previously overlooked human ILC subsets that are significant in number and function, identify profound abnormalities in homeostatic ILCs that likely contribute to ongoing inflammation in HIV-1 infection despite control of viremia, provide explanation for increased memory NK cells in HIV-1 infection, and reveal functional plasticity of ILCs
An improved zebrafish transcriptome annotation for sensitive and comprehensive detection of cell type-specific genes
The zebrafish is ideal for studying embryogenesis and is increasingly applied to model human disease. In these contexts, RNA-sequencing (RNA-seq) provides mechanistic insights by identifying transcriptome changes between experimental conditions. Application of RNA-seq relies on accurate transcript annotation for a genome of interest. Here, we find discrepancies in analysis from RNA-seq datasets quantified using Ensembl and RefSeq zebrafish annotations. These issues were due, in part, to variably annotated 3\u27 untranslated regions and thousands of gene models missing from each annotation. Since these discrepancies could compromise downstream analyses and biological reproducibility, we built a more comprehensive zebrafish transcriptome annotation that addresses these deficiencies. Our annotation improves detection of cell type-specific genes in both bulk and single cell RNA-seq datasets, where it also improves resolution of cell clustering. Thus, we demonstrate that our new transcriptome annotation can outperform existing annotations, providing an important resource for zebrafish researchers
Generation and Analysis of Expressed Sequence Tags from Olea europaea L.
Olive (Olea europaea L.) is an important source of edible oil which was originated in Near-East region. In this study, two cDNA libraries were constructed from young olive leaves and immature olive fruits for generation of ESTs to discover the novel genes and search the function of unknown genes of olive. The randomly selected 3840 colonies were sequenced for EST collection from both libraries. Readable 2228 sequences for olive leaf and 1506 sequences for olive fruit were assembled into 205 and 69 contigs, respectively, whereas 2478 were singletons. Putative functions of all 2752 differentially expressed unique sequences were designated by gene homology based on BLAST and annotated using BLAST2GO. While 1339 ESTs show no homology to the database, 2024 ESTs have homology (under 80%) with hypothetical proteins, putative proteins, expressed proteins, and unknown proteins in NCBI-GenBank. 635 EST's unique genes sequence have been identified by over 80% homology to known function in other species which were not previously described in Olea family. Only 3.1% of total EST's was shown similarity with olive database existing in NCBI. This generated EST's data and consensus sequences were submitted to NCBI as valuable source for functional genome studies of olive
Type I IFN-Driven Immune Cell Dysregulation in Rat Autoimmune Diabetes
Type 1 diabetes is a chronic autoimmune disease, characterized by the immune-mediated destruction of insulin-producing beta cells of pancreatic islets. Essential components of the innate immune antiviral response, including type I IFN and IFN receptor (IFNAR)-mediated signaling pathways, likely contribute to human type 1 diabetes susceptibility. We previously showed that LEW.1WR1 Ifnar1 (-/-) rats have a significant reduction in diabetes frequency following Kilham rat virus (KRV) infection. To delineate the impact of IFNAR loss on immune cell populations in KRV-induced diabetes, we performed flow cytometric analysis in spleens from LEW.1WR1 wild-type (WT) and Ifnar1 (-/-) rats after viral infection but before the onset of insulitis and diabetes. We found a relative decrease in CD8(+) T cells and NK cells in KRV-infected LEW.1WR1 Ifnar1 (-/-) rats compared with KRV-infected WT rats; splenic regulatory T cells were diminished in WT but not Ifnar1 (-/-) rats. In contrast, splenic neutrophils were increased in KRV-infected Ifnar1 (-/-) rats compared with KRV-infected WT rats. Transcriptional analysis of splenic cells from KRV-infected rats confirmed a reduction in IFN-stimulated genes in Ifnar1 (-/-) compared with WT rats and revealed an increase in transcripts related to neutrophil chemotaxis and MHC class II. Single-cell RNA sequencing confirmed that MHC class II transcripts are increased in monocytes and macrophages and that numerous types of splenic cells harbor KRV. Collectively, these findings identify dynamic shifts in innate and adaptive immune cells following IFNAR disruption in a rat model of autoimmune diabetes, providing insights toward the role of type I IFNs in autoimmunity
Simultaneous generation of many RNA-seq libraries in a single reaction
Although RNA-seq is a powerful tool, the considerable time and cost associated with library construction has limited its utilization for various applications. RNAtag-Seq, an approach to generate multiple RNA-seq libraries in a single reaction, lowers time and cost per sample, and it produces data on prokaryotic and eukaryotic samples that are comparable to those generated by traditional strand-specific RNA-seq approaches
- ā¦