14 research outputs found

    Network-based visualisation and analysis of next-generation sequencing (NGS) data

    Get PDF
    Next-generation sequencing (NGS) technologies have revolutionised research into nature and diversity of genomes and transcriptomes. Since the initial description of these technology platforms over a decade ago, massively parallel RNA sequencing (RNA-seq) has driven many advances in the characterization and quantification of transcriptomes. RNA-seq is a powerful gene expression profiling technology enabling transcript discovery and provides a far more precise measure of the levels of transcripts and their isoforms than other methods e.g. microarray. However, the analysis of RNA-seq data remains a significant challenge for many biologists. The data generated is large and the tools for its assembly, analysis and visualisation are still under development. Assemblies of reads can be inspected using tools such as the Integrative Genomics Viewer (IGV) where visualisation of results involves ‘stacking’ the reads onto a reference genome. Whilst sufficient for many needs, when the underlying variance of the genome or transcript assemblies is complex, this visualisation method can be limiting; errors in assembly can be difficult to spot and visualisation of splicing events may be challenging. Data visualisation is increasingly recognised as an essential component of genomic and transcriptomic data analysis, enabling large and complex datasets to be better understood. An approach that has been gaining traction in biological research is based on the application of network visualisation and analysis methods. Networks consist of nodes connected by edges (lines), where nodes usually represent an entity and edge a relationship between them. These are now widely used for plotting experimentally or computationally derived relationships between genes and proteins. The overall aim of this PhD project was to explore the use of network-based visualisation in the analysis and interpretation of RNA-seq data. In chapter 2, I describe the development of a data pipeline that has been designed to go from ‘raw’ RNA-seq data to a file format which supports data visualisation as a ‘DNA assembly graph’. In DNA assembly graphs, nodes represent sequence reads and edges denote a homology between reads above a defined threshold. Following the mapping of reads to a reference sequence and defining which reads a map to a given loci, pairwise sequence alignments are performed between reads using MegaBLAST. This provides a weighted similarity score that is used to define edges between reads. Visualisation of the resulting networks is then carried out using BioLayout Express3D that can render large networks in 3-D, thereby allowing a better appreciation of the often-complex network structure. This pipeline has formed the basis for my subsequent work on the exploring and analysing alternative splicing in human RNA-seq data. In the second half of this chapter, I provide a series of tutorials aimed at different types of users allowing them to perform such analyses. The first tutorial is aimed at computational novices who might want to generate networks using a web-browser and pre-prepared data. Other tutorials are designed for use by more advanced users who can access the code for the pipeline through GitHub or via an Amazon Machine Image (AMI). In chapter 3, the utility of network-based visualisations of RNA-seq data is explored using data processed through the pipeline described in Chapter 2. The aim of the work described in this chapter was to better understand the basic principles and challenges associated with network visualisation of RNA-seq data, in particular how it could be used to visualise transcript structure and splice-variation. These analyses were performed on data generated from four samples of human fibroblasts taken at different time points during their entry into cell division. One of the first challenges encountered was the fact that the existing network layout algorithm (Fruchterman- Reingold) implemented within BioLayout Express3D did not result in an optimal layout of the unusual graph structures produced by these analyses. Following the implementation of the more advanced layout algorithm FMMM within the tool, network structure could be far better appreciated. Using this layout method, the majority of genes sequenced to an adequate depth assemble into networks with a linear ‘corkscrew’ appearance and when representing single isoform transcripts add little to existing views of these data. However, in a small number of cases (~5%), the networks generated from transcripts expressed in human fibroblasts possess more complex structures, with ‘loops’, ‘knots’ and multiple ends being observed. In a majority of cases examined, these loops were associated with alternative splicing events, a fact confirmed by RT-PCR analyses. Other DNA assembly networks representing the mRNAs for genes such as MKI67 showed knot-like structures, which was found to be due to the presence of repetitive sequence within an exon of the gene. In another case, CENPO the unusual structure observed was due to reads derived from an overlapping gene of ADCY3 gene present on the opposite strand with reads being wrongly mapped to CENPO. Finally, I explored the use of a network reduction strategy as an approach to visualising highly expressed genes such as GAPDH and TUBA1C. Having successfully demonstrated the utility of networks in analysing transcript isoforms in data derived from a single cell type I set out to explore its utility in analysing transcript variation in tissue data where multiple isoforms expressed by different cells within the tissue might be present in a given sample. In chapter 4, I explore the analysis of transcript variation in an RNA-seq dataset derived from human tissue. The first half of this chapter describes the quality control of these data again using a network-based approach but this time based the correlation in expression between genes and samples. Of the 95 samples derived from 27 human tissues, 77 passed the quality control. A network was constructed using a correlation threshold of r ≥ 0.9, which comprised 6,109 nodes (genes) and 1,091,477 edges (correlations) and clustered. Subsequently, the profile and gene content of each cluster was examined and enrichment of GO terms analysed. In the second half of this chapter, the aim was to detect and analyse alternative splicing events between different tissues using the rMATS tool. By using a false-discovery rate (FDR) cut-off of < 0.01, I found that in comparisons of brain vs. heart, brain vs. liver and heart vs. liver, the program reported 4,992, 4,804 and 3,990 splicing events, respectively. Of these events, only 78 splicing events (52 genes) with more than 50% of exon inclusion level and expression level more than FPKM 30. To further explore the sometimes-complex structure of transcripts diversity derived from tissue, RNAseq assembly networks for KLC1, SORBS2, GUK1, and TPM1 were explored. Each of these networks showed different types of alternative splicing events and it was sometimes difficult to determine the isoforms expressed between tissues using other approaches. For instance, there is an issue in visualising the read assembly of long genes such as KLC1 and SORBS2, using a Sashimi plots or even Vials, just because of the number of exons and the size of their genomic loci. In another case of GUK1, tissue-specific isoform expression was observed when a network of three tissues was combined. Arguably the most complex analysis is the network of TPM1 where the uniquification step was employed for this highly expressed gene. In chapter 5, I perform a usability testing for NGS Graph Generator web application and visualising RNA-seq assemblies as a network using BioLayout Express3D. This test was important to ensure that the application is well received and utilised by the user. Almost all participants of this usability test agree that this application would encourage biologists to visualise and understand the alternative splicing together with existing tools. The participants agreed that Sashimi plots rather difficult to view and visualise and perhaps would lose something interesting features. However, there were also reviews of this application that need improvements such as the capability to analyse big network in a short time, side-by-side analysis of network with Sashimi plot and Ensembl. Additional information of the network would be necessary to improve the understanding of the alternative splicing. In conclusion, this work demonstrates the utility of network visualisation of RNAseq data, where the unusual structure of these networks can be used to identify issues in assembly, repetitive sequences within transcripts and splice variation. As such, this approach has the potential to significantly improve our understanding of transcript complexity. Overall, this thesis demonstrates that network-based visualisation provides a new and complementary approach to characterise alternative splicing from RNA-seq data and has the potential to be useful for the analysis and interpretation of other kinds of sequencing data

    Two sides of the same coin: The roles of KLF6 in physiology and pathophysiology

    Get PDF
    The Krüppel-like factors (KLFs) family of proteins control several key biological processes that include proliferation, differentiation, metabolism, apoptosis and inflammation. Dysregulation of KLF functions have been shown to disrupt cellular homeostasis and contribute to disease development. KLF6 is a relevant example; a range of functional and expression assays suggested that the dysregulation of KLF6 contributes to the onset of cancer, inflammation-associated diseases as well as cardiovascular diseases. KLF6 expression is either suppressed or elevated depending on the disease, and this is largely due to alternative splicing events producing KLF6 isoforms with specialised functions. Hence, the aim of this review is to discuss the known aspects of KLF6 biology that covers the gene and protein architecture, gene regulation, post-translational modifications and functions of KLF6 in health and diseases. We put special emphasis on the equivocal roles of its full-length and spliced variants. We also deliberate on the therapeutic strategies of KLF6 and its associated signalling pathways. Finally, we provide compelling basic and clinical questions to enhance the knowledge and research on elucidating the roles of KLF6 in physiological and pathophysiological processes. © 2020 by the authors. Licensee MDPI, Basel, Switzerland

    An investigation of cytokines and chemokines interaction using network analysis in covid-19

    Get PDF
    The emergence of a pandemic coronavirus disease 2019 (COVID-19) caused by infection with SARS-CoV-2 have become threats to humanity. In terms of their physio pathological pathways, a wide variety of biomolecules have been activated, based on immunological responses. It is therefore relevant to compare the human respiratory cell lines to infections with the SARS-CoV-2 and other respiratory viruses. In this study, we examined gene expression profiles of GSE147507 from the Gene Expression Omnibus (GEO) were used to explore the transcriptional response of SARS-CoV-2 with other respiratory viruses, including human parainfluenza virus 3 (HPIV3), respiratory syncytial virus (RSV), and influenza A virus (IAV), in human respiratory cell lines. Network Analyst 3.0 software was used to perform this gene expression data via intuitive web interface. Through its well-established statistical procedures with state- of-the-art data visualization techniques, it allows us to navigate large complex gene expression data sets to determine important features, patterns, functions and connections that would lead us to a new biological hypothesis. The raw RNA-sequencing data undergoes data processing, including filtering, quality check and normalization before it corresponds to data analysis and interpretation. The edger package was used to identify differentially expressed genes (DEGs) on respiratory viruses infected in human lung epithelium- derived cell lines, such as lung alveolar cells (A549), A549 cells expressing ACE2 (A549-ACE2) and cultured human airway epithelial cells (NHBE). P |2| were set as thresholds for identifying DEGs. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were applied for the functional annotation and pathway analysis. Interpretation of gene expression data obtained from the visualization of volcano plot and analysis of protein-protein interaction (PPI) to reveal the functional associations between proteins on a genome-wide scale using STRING interactome

    Draft genome sequence of Pasteurella multocida subsp. multocida strain PMTB, isolated from a buffalo

    Get PDF
    Pasteurella multocida serotypes B:2 and E:2 are the main causative agents of ruminant hemorrhagic septicemia in Asia and Africa, respectively. Pasteurella multocida strain PMTB was isolated from a buffalo with hemorrhagic septicemia and has been determined to be serotype B:2. Here we report the draft genome sequence of strain PMTB

    Integration of RNA-Seq and proteomics data identifies glioblastoma multiforme surfaceome signature

    Get PDF
    Background: Glioblastoma multiforme (GBM) is a highly lethal, stage IV brain tumour with a prevalence of approximately 2 per 10,000 people globally. The cell surface proteins or surfaceome serve as information gateway in many oncogenic signalling pathways and are important in modulating cancer phenotypes. Dysregulation in surfaceome expression and activity have been shown to promote tumorigenesis. The expression of GBM surfaceome is a case in point; OMICS screening in a cell-based system identified that this sub-proteome is largely perturbed in GBM. Additionally, since these cell surface proteins have ‘direct’ access to drugs, they are appealing targets for cancer therapy. However, a comprehensive GBM surfaceome landscape has not been fully defined yet. Thus, this study aimed to define GBM-associated surfaceome genes and identify key cell-surface genes that could potentially be developed as novel GBM biomarkers for therapeutic purposes. Methods: We integrated the RNA-Seq data from TCGA GBM (n = 166) and GTEx normal brain cortex (n = 408) databases to identify the significantly dysregulated surfaceome in GBM. This was followed by an integrative analysis that combines transcriptomics, proteomics and protein-protein interaction network data to prioritize the highconfidence GBM surfaceome signature. Results: Of the 2381 significantly dysregulated genes in GBM, 395 genes were classified as surfaceome. Via the integrative analysis, we identified 6 high-confidence GBM molecular signature, HLA-DRA, CD44, SLC1A5, EGFR, ITGB2, PTPRJ, which were significantly upregulated in GBM. The expression of these genes was validated in an independent transcriptomics database, which confirmed their upregulated expression in GBM. Importantly, high expression of CD44, PTPRJ and HLA-DRA is significantly associated with poor disease-free survival. Last, using the Drugbank database, we identified several clinically-approved drugs targeting the GBM molecular signature suggesting potential drug repurposing. Conclusions: In summary, we identified and highlighted the key GBM surface-enriched repertoires that could be biologically relevant in supporting GBM pathogenesis. These genes could be further interrogated experimentally in future studies that could lead to efficient diagnostic/prognostic markers or potential treatment options for GBM

    Deep Transcriptome Sequencing of Pediatric Acute Myeloid Leukemia Patients at Diagnosis, Remission and Relapse: Experience in 3 Malaysian Children in a Single Center Study

    Get PDF
    Among the many types of leukemia, acute myeloid leukemia (AML) affects 20% of diagnosed hematological malignancies in pediatric patients (Meshinchi and Arceci, 2007; de Rooij et al., 2015). Standard chemotherapy regimen remains as the first line treatment for pediatric AML, however nearly 40% of AML patients may suffer from relapse and eventually die from the disease (de Rooij et al., 2015). Similarly, it has been reported that 50% of the pediatric AML relapsed within 12–18 months of diagnosis and 45% of those relapsed were not expected to survive (Creutzig et al., 2014). Despite advances in cytogenetic analysis through fluorescence in situ hybridization and multiplex PCR, there is still a need for a better and comprehensive molecular profiling. For instance, microarray has long been used to study the gene expression profiles of AML patients. The different profile of gene expression has enabled clinicians to tailor better treatment for patients and predict whether patients have the tendency to relapse (Goswami et al., 2009). In a recent study, Handschuch et al. reported that three genes, ANXA3, S100A9, and WT1 can differentiate between different prognostic types of AML (Handschuh et al., 2018). The study outcome was in agreement with another study conducted by Shimada et al. (2012), where a high expression of WT1 gene showed prognostic impact in pediatric AML (Shimada et al., 2012). Another study by Jo et al. (2015) reported that high expression of EVI1 and MEL1 could predict the prognosis of pediatric AML (Jo et al., 2015). However, none of the biomarkers identified from these studies have been translated into clinical use. Therefore, the search continues for additional promising biomarkers, notably novel transcripts, novel fusion genes and non-coding RNAs which are not represented in the microarray platform. Transcriptome sequencing through next generation sequencing represents an effective approach to discover new genetic information on gene expression which may contribute to tumorigenesis. Notably, several novel and rare fusion transcripts have been identified from AML patients via RNA-sequencing (Padella et al., 2015). A recent study combining whole genome sequencing, whole exome sequencing and RNA sequencing in pediatric cancers has identified 240 pathogenic variants with increased sensitivity (Rusch et al., 2018). Previous studies in relapsed AML have shown that the cells acquired additional genetic mutations that were either different or evolved from subclones of diagnostic blasts cells (Padella et al., 2015; Rusch et al., 2018). Nevertheless, little is known about the genetic changes at the transcriptomic level at diagnostic, remission and relapse stages of the same patients, especially in the Malaysian population

    Genome Sequencing and Bioinformatic Analysis of Pasteurella multocida Serotype B:2 Strain PMTB

    Get PDF
    Pasteurella multocida is a Gram-negative bacterium, which is the causative agent of a wide range of diseases in animals. This organism usually resides in the mucous membrane of the intestinal, genital and respiratory tissues and is an opportunistic pathogen that causes fowl cholera, bovine haemorrhagic septicaemia and porcine atrophic rhinitis. So far, only complete genome of P. multocida serotype A:3 strain Pm70 has been elucidated. This study was conducted to sequence the genome of P. multocida serotype B:2 strain PMTB and to compare with the complete genome of Pm70. A total of 7.2 million sequence reads were generated from Illumina Genome Analyzer. De novo sequence assembly followed by comparison with Pm70 reference sequence produced a partial near-complete genome of PMTB with missing nucleotide sequences located in 81 gaps. The partial genome of P. multocida strain PMTB is 97.78% identical to Pm70. The estimated size of the partial genome of PMTB is 2,208,894 bp while the Pm70 genome is 2,257,487 bp. In addition, both genomes contain similar % GC content 40 to 41%. Analysis using GeneMark software indicated the total genes of the partial genome of PMTB are 2078 while the reference genome Pm70 has 2014 genes. Gene comparison between PMTB and Pm70 to construct PMTB sequences as a database blast against Pm70 sequences showed there are 223 unique genes found in PMTB but absence in Pm70. The unique genes are probably specific to serotype B:2 only or the genes were not detected in sequence analysis since they are located in the missing sequences in the gaps. On the other hand, a total of 49 genes are not detected in partial PMTB genome but present in Pm70. Sequence analysis also showed the presence of genes with high similarity (99 to 100%) to the genes from previously characterized serotype B:2, genes that are also found in other P. multocida serotypes and genes found in other bacteria especially Haemophilus influenza, Actinobacillus minor and Vibrio cholerae. Based on the partial genome sequence analysis, there are probably several virulence genes and virulence-associated genes in the P. multocida PMTB genome which include adhesins protein [type 4 fimbria (ptfA)], serotype–specific capsular polysaccharide, lipopolysaccharide, iron acquisition related genes such as Exbd and tonB, gene associated hemoglobin binding protein (HgbA), gene encode for transferrin-binding protein (tbpA), and several uncharacterized secreted enzymes and proteins that play important role in the pathogenicity of the disease. Complete genome sequencing and genome-wide functional genomics studies on P. multocida PMTB genome will be able to provide valuable information on pathogenicity of haemorrhagic septicaemia in ruminants

    An In Silico Design of Peptides Targeting the S1/S2 Cleavage Site of the SARS-CoV-2 Spike Protein

    Get PDF
    SARS-CoV-2, responsible for the COVID-19 pandemic, invades host cells via its spike protein, which includes critical binding regions, such as the receptor-binding domain (RBD), the S1/S2 cleavage site, the S2 cleavage site, and heptad-repeat (HR) sections. Peptides targeting the RBD and HR1 inhibit binding to host ACE2 receptors and the formation of the fusion core. Other peptides target proteases, such as TMPRSS2 and cathepsin L, to prevent the cleavage of the S protein. However, research has largely ignored peptides targeting the S1/S2 cleavage site. In this study, bioinformatics was used to investigate the binding of the S1/S2 cleavage site to host proteases, including furin, trypsin, TMPRSS2, matriptase, cathepsin B, and cathepsin L. Peptides targeting the S1/S2 site were designed by identifying binding residues. Peptides were docked to the S1/S2 site using HADDOCK (High-Ambiguity-Driven protein–protein DOCKing). Nine peptides with the lowest HADDOCK scores and strong binding affinities were selected, which was followed by molecular dynamics simulations (MDSs) for further investigation. Among these peptides, BR582 and BR599 stand out. They exhibited relatively high interaction energies with the S protein at −1004.769 ± 21.2 kJ/mol and −1040.334 ± 24.1 kJ/mol, respectively. It is noteworthy that the binding of these peptides to the S protein remained stable during the MDSs. In conclusion, this research highlights the potential of peptides targeting the S1/S2 cleavage site as a means to prevent SARS-CoV-2 from entering cells, and contributes to the development of therapeutic interventions against COVID-19

    Biosensors based on sphyraena barracuda muscle cholinesterase inhibition for insecticides (chlorpyrifos, malathion, diazinon and dimethoate) detection

    No full text
    Fish is a living organism that can be used as a sensitive biomarker, while their enzyme such cholinesterase (ChE) acts as a biosensor device that is able to detect the presence of organophosphate in the water. This study was carried out to extract and purify the ChE enzyme from the muscle tissue of Sphyraena barracuda. The muscle tissue was homogenized and then purified via ion exchange chromatography method using DEAE cellulose as the matrix of column. Ellman assay followed by Bradford assay was carried out to determine the enzyme activity and protein content, respectively. The Sphyraena barracuda ChE was successfully purified at the total recovery of 32.74 % with 2.26 purification fold. The optimal conditions for the ChE activity were determined at pH 9 of Tris-HCl buffer and at temperature of 25 °C. The enzyme was also specific to ATC substrate due to its high catalytic efficiency. All organophosphate tested at concentration of 1 mg/L were able to inhibit the activity of ChE; chlorpyrifos (100 %), malathion (100 %), diazinon (91.5 %) and dimethoate (35.8 %). These findings showed that the partially purified ChE from muscle tissue of Sphyraena barracuda is another alternative bioindicator candidate for the detection of organophosphate in water

    Antioxidant screening of Garcinia forbesii originated from Sabah

    Get PDF
    Garcinia forbesii is a wild-type of plant that have been long used traditionally with broad utilities in several fields like medicines, cosmetics, food and neutraceutics. Increasing awareness towards the use of phytochemicals and other plant derives products worldwide has broaden the study of bioactivities from several industrial sectors. Therefore, the present study aims to screen the antioxidant and antimicrobial properties of fruits and leaves of Garcinia forbesii. The methanolic, hexanic and ethyl acetate extracts were obtained by maceration extraction, isolated and undergoes purification through Thin layer Chromatography (TLC) and Column Chromatography (CC). The highest yield of extraction are both from methanol extracts of fruits and leaves which are 9.26 ± 0.34 g and 7.04 ±0.21 g. The antimicrobial activity of the extracts was determined against Escherichia coli using Kirby-Bauer method. The disc diffusion assay showed that the inhibition of growth of E. coli was fully attributed to hexanic extract for fruits and leaves of G. forbesii while all other extracts displayed minimum or none inhibition zone. The inhibition zone of hexanic extract of leaves (11.83mm ± 1.04) showed the highest than fruits (9.33mm ± 1.53) among all the fractions. Antioxidant activities of the leave extracts of G. forbesii in reducing power, FRAP (ferric reducing antioxidant power) and DPPH followed the same order of methanolic > ethyl acetate > hexanic. Meanwhile, extract hexane, methanol and ethyl acetate of the fruits of G. forbesii showed IC50 values at 1.05%, 3.35% and 4.44% while for leaves are at 1.19%, 2.4% and 8.94% respectively. Methanol is therefore a better solvent to extract most of the antioxidant components from G. forbesii leaves
    corecore