23 research outputs found
Recommended from our members
In silico identification of novel open reading frames in Plasmodium falciparum oocyte and salivary gland sporozoites using proteogenomics framework.
BACKGROUND: Plasmodium falciparum causes the deadliest form of malaria, which remains one of the most prevalent infectious diseases. Unfortunately, the only licensed vaccine showed limited protection and resistance to anti-malarial drug is increasing, which can be largely attributed to the biological complexity of the parasite's life cycle. The progression from one developmental stage to another in P. falciparum involves drastic changes in gene expressions, where its infectivity to human hosts varies greatly depending on the stage. Approaches to identify candidate genes that are responsible for the development of infectivity to human hosts typically involve differential gene expression analysis between stages. However, the detection may be limited to annotated proteins and open reading frames (ORFs) predicted using restrictive criteria. METHODS: The above problem is particularly relevant for P. falciparum; whose genome annotation is relatively incomplete given its clinical significance. In this work, systems proteogenomics approach was used to address this challenge, as it allows computational detection of unannotated, novel Open Reading Frames (nORFs), which are neglected by conventional analyses. Two pairs of transcriptome/proteome were obtained from a previous study where one was collected in the mosquito-infectious oocyst sporozoite stage, and the other in the salivary gland sporozoite stage with human infectivity. They were then re-analysed using the proteogenomics framework to identify nORFs in each stage. RESULTS: Translational products of nORFs that map to antisense, intergenic, intronic, 3' UTR and 5' UTR regions, as well as alternative reading frames of canonical proteins were detected. Some of these nORFs also showed differential expression between the two life cycle stages studied. Their regulatory roles were explored through further bioinformatics analyses including the expression regulation on the parent reference genes, in silico structure prediction, and gene ontology term enrichment analysis. CONCLUSION: The identification of nORFs in P. falciparum sporozoites highlights the biological complexity of the parasite. Although the analyses are solely computational, these results provide a starting point for further experimental validation of the existence and functional roles of these nORFs
Recommended from our members
PhosphoEffect: Prioritizing Variants On or Adjacent to Phosphorylation Sites through Their Effect on Kinase Recognition Motifs.
Phosphorylation sites often have key regulatory functions and are central to many cellular signaling pathways, so mutations that modify them have the potential to contribute to pathological states such as cancer. Although many classifiers exist for prioritization of coding genomic variants, to our knowledge none of them explicitly account for the alteration or creation of kinase recognition motifs that alter protein structure, function, regulation of activity, and interaction networks through modifying the pattern of phosphorylation. We present a novel computational pipeline that uses a random forest classifier to predict the pathogenicity of a variant, according to its direct or indirect effect on local phosphorylation sites and the predicted functional impact of perturbing a phosphorylation event. We call this classifier PhosphoEffect and find that it compares favorably and with increased accuracy to the existing classifier PolyPhen 2.2.2 when tested on a dataset of known variants enriched for phosphorylation sites and their neighbors
Recommended from our members
In silico identification of novel open reading frames in Plasmodium falciparum oocyte and salivary gland sporozoites using proteogenomics framework.
BACKGROUND: Plasmodium falciparum causes the deadliest form of malaria, which remains one of the most prevalent infectious diseases. Unfortunately, the only licensed vaccine showed limited protection and resistance to anti-malarial drug is increasing, which can be largely attributed to the biological complexity of the parasite's life cycle. The progression from one developmental stage to another in P. falciparum involves drastic changes in gene expressions, where its infectivity to human hosts varies greatly depending on the stage. Approaches to identify candidate genes that are responsible for the development of infectivity to human hosts typically involve differential gene expression analysis between stages. However, the detection may be limited to annotated proteins and open reading frames (ORFs) predicted using restrictive criteria. METHODS: The above problem is particularly relevant for P. falciparum; whose genome annotation is relatively incomplete given its clinical significance. In this work, systems proteogenomics approach was used to address this challenge, as it allows computational detection of unannotated, novel Open Reading Frames (nORFs), which are neglected by conventional analyses. Two pairs of transcriptome/proteome were obtained from a previous study where one was collected in the mosquito-infectious oocyst sporozoite stage, and the other in the salivary gland sporozoite stage with human infectivity. They were then re-analysed using the proteogenomics framework to identify nORFs in each stage. RESULTS: Translational products of nORFs that map to antisense, intergenic, intronic, 3' UTR and 5' UTR regions, as well as alternative reading frames of canonical proteins were detected. Some of these nORFs also showed differential expression between the two life cycle stages studied. Their regulatory roles were explored through further bioinformatics analyses including the expression regulation on the parent reference genes, in silico structure prediction, and gene ontology term enrichment analysis. CONCLUSION: The identification of nORFs in P. falciparum sporozoites highlights the biological complexity of the parasite. Although the analyses are solely computational, these results provide a starting point for further experimental validation of the existence and functional roles of these nORFs
Identification and Prioritisation of Variants in the Short Open-Reading Frame Regions of the Human Genome
As whole-genome sequencing technologies improve and accurate maps of the entire genome are assembled, short open-reading frames (sORFs) are garnering interest as functionally important regions that were previously overlooked. However, there is a paucity of tools available to investigate variants in sORF regions of the genome. Here we investigate the performance of commonly used tools for variant calling and variant prioritisation in these regions, and present a framework for optimising these processes. First, the performance of four widely used germline variant calling algorithms is systematically compared. Haplotype Caller is found to perform best across the whole genome, but FreeBayes is shown to produce the most accurate variant set in sORF regions. An accurate set of variants is found by taking the intersection of called variants. The potential deleteriousness of each variant is then predicted using a pathogenicity scoring algorithm developed here, called sORF-c. This algorithm uses supervised machine-learning to predict the pathogenicity of each variant, based on a holistic range of functional, conservation-based and region-based scores defined for each variant. By training on a dataset of over 130,000 variants, sORF-c outperforms other comparable pathogenicity scoring algorithms on a test set of variants in sORF regions of the human genome. List of Abbreviations AUPRC Area under the precision-recall curve BED Browser Extensible Data CADD Combined annotation-dependent depletion DANN Deleterious annotation of genetic variants using neural networks EPO Enredo, Pecan, Ortheus pipeline GATK Genome analysis toolkit GIAB Genome in a bottle HGMD Human gene mutation database Indels Insertions and deletions MS Mass spectrometry ORF Open reading frame RF Random Forests ROC Receiver Operating Characteristics SEP sORF encoded peptide sklearn Scikit-learn package SNVs Single nucleotide variants sORF Short open-reading frame TF Transcription factor TSS Transcription start site VCF Variant Call Format fil
Recommended from our members
Big data in digital healthcare: lessons learnt and recommendations for general practice
Abstract: Big Data will be an integral part of the next generation of technological developments—allowing us to gain new insights from the vast quantities of data being produced by modern life. There is significant potential for the application of Big Data to healthcare, but there are still some impediments to overcome, such as fragmentation, high costs, and questions around data ownership. Envisioning a future role for Big Data within the digital healthcare context means balancing the benefits of improving patient outcomes with the potential pitfalls of increasing physician burnout due to poor implementation leading to added complexity. Oncology, the field where Big Data collection and utilization got a heard start with programs like TCGA and the Cancer Moon Shot, provides an instructive example as we see different perspectives provided by the United States (US), the United Kingdom (UK) and other nations in the implementation of Big Data in patient care with regards to their centralization and regulatory approach to data. By drawing upon global approaches, we propose recommendations for guidelines and regulations of data use in healthcare centering on the creation of a unique global patient ID that can integrate data from a variety of healthcare providers. In addition, we expand upon the topic by discussing potential pitfalls to Big Data such as the lack of diversity in Big Data research, and the security and transparency risks posed by machine learning algorithms
Recommended from our members
A Complex Hierarchy of Avoidance Behaviors in a Single-Cell Eukaryote.
Complex behavior is associated with animals with nervous systems, but decision-making and learning also occur in non-neural organisms [1], including singly nucleated cells [2-5] and multi-nucleate synctia [6-8]. Ciliates are single-cell eukaryotes, widely dispersed in aquatic habitats [9], with an extensive behavioral repertoire [10-13]. In 1906, Herbert Spencer Jennings [14, 15] described in the sessile ciliate Stentor roeseli a hierarchy of responses to repeated stimulation, which are among the most complex behaviors reported for a singly nucleated cell [16, 17]. These results attracted widespread interest [18, 19] and exert continuing fascination [7, 20-22] but were discredited during the behaviorist orthodoxy by claims of non-reproducibility [23]. These claims were based on experiments with the motile ciliate Stentor coeruleus. We acquired and maintained the correct organism in laboratory culture and used micromanipulation and video microscopy to confirm Jennings' observations. Despite significant individual variation, not addressed by Jennings, S. roeseli exhibits avoidance behaviors in a characteristic hierarchy of bending, ciliary alteration, contractions, and detachment, which is distinct from habituation or conditioning. Remarkably, the choice of contraction versus detachment is consistent with a fair coin toss. Such behavioral complexity may have had an evolutionary advantage in protist ecosystems, and the ciliate cortex may have provided mechanisms for implementing such behavior prior to the emergence of multicellularity. Our work resurrects Jennings' pioneering insights and adds to the list of exceptional features, including regeneration [24], genome rearrangement [25], codon reassignment [26], and cortical inheritance [27], for which the ciliate clade is renowned.DBT-Cambridge Lectureshi
Recommended from our members
Big data in digital healthcare: lessons learnt and recommendations for general practice
Abstract: Big Data will be an integral part of the next generation of technological developments—allowing us to gain new insights from the vast quantities of data being produced by modern life. There is significant potential for the application of Big Data to healthcare, but there are still some impediments to overcome, such as fragmentation, high costs, and questions around data ownership. Envisioning a future role for Big Data within the digital healthcare context means balancing the benefits of improving patient outcomes with the potential pitfalls of increasing physician burnout due to poor implementation leading to added complexity. Oncology, the field where Big Data collection and utilization got a heard start with programs like TCGA and the Cancer Moon Shot, provides an instructive example as we see different perspectives provided by the United States (US), the United Kingdom (UK) and other nations in the implementation of Big Data in patient care with regards to their centralization and regulatory approach to data. By drawing upon global approaches, we propose recommendations for guidelines and regulations of data use in healthcare centering on the creation of a unique global patient ID that can integrate data from a variety of healthcare providers. In addition, we expand upon the topic by discussing potential pitfalls to Big Data such as the lack of diversity in Big Data research, and the security and transparency risks posed by machine learning algorithms
Recommended from our members
Behavioural analysis of single-cell aneural ciliate, Stentor roeseli, using machine learning approaches.
There is still a significant gap between our understanding of neural circuits and the behaviours they compute-i.e. the computations performed by these neural networks (Carandini 2012 Nat. Neurosci.15, 507-509. (doi:10.1038/nn.3043)). Cellular decision-making processes, learning, behaviour and memory formation-all that have been only associated with animals with neural systems-have also been observed in many unicellular aneural organisms, namely Physarum, Paramecium and Stentor (Tang & Marshall2018 Curr. Biol.28, R1180-R1184. (doi:10.1016/j.cub.2018.09.015)). As these are fully functioning organisms, yet being unicellular, there is a much better chance to elucidate the detailed mechanisms underlying these learning processes in these organisms without the complications of highly interconnected neural circuits. An intriguing learning behaviour observed in Stentor roeseli (Jennings 1902 Am. J. Physiol. Legacy Content8, 23-60. (doi:10.1152/ajplegacy.1902.8.1.23)) when stimulated with carmine has left scientists puzzled for more than a century. So far, none of the existing learning paradigm can fully encapsulate this particular series of five characteristic avoidance reactions. Although we were able to observe all responses described in the literature and in a previous study (Dexter et al. 2019), they do not conform to any particular learning model. We then investigated whether models inferred from machine learning approaches, including decision tree, random forest and feed-forward artificial neural networks could infer and predict the behaviour of S. roeseli. Our results showed that an artificial neural network with multiple 'computational' neurons is inefficient at modelling the single-celled ciliate's avoidance reactions. This has highlighted the complexity of behaviours in aneural organisms. Additionally, this report will also discuss the significance of elucidating molecular details underlying learning and decision-making processes in these unicellular organisms, which could offer valuable insights that are applicable to higher animals.KMT is funded by Cambridge Trust Scholarship and Trinity Overseas Bursaries; SP is funded by the Cambridge-DBT lectureship
Recommended from our members
Comparative analysis of Erk phosphorylation suggests a mixed strategy for measuring phospho-form distributions.
The functional impact of multisite protein phosphorylation can depend on both the numbers and the positions of phosphorylated sites-the global pattern of phosphorylation or 'phospho-form'-giving biological systems profound capabilities for dynamic information processing. A central problem in quantitative systems biology, therefore, is to measure the 'phospho-form distribution': the relative amount of each of the 2(n) phospho-forms of a protein with n-phosphorylation sites. We compared four potential methods-western blots with phospho-specific antibodies, peptide-based liquid chromatography (LC) and mass spectrometry (MS; pepMS), protein-based LC/MS (proMS) and nuclear magnetic resonance spectroscopy (NMR)-on differentially phosphorylated samples of the well-studied mitogen-activated protein kinase Erk2, with two phosphorylation sites. The MS methods were quantitatively consistent with each other and with NMR to within 10%, but western blots, while highly sensitive, showed significant discrepancies with MS. NMR also uncovered two additional phosphorylations, for which a combination of pepMS and proMS yielded an estimate of the 16-member phospho-form distribution. This combined MS strategy provides an optimal mixture of accuracy and coverage for quantifying distributions, but positional isomers remain a challenging problem
Recommended from our members