18 research outputs found

    Computational approaches to discovering differentiation genes in the peripheral nervous system of drosophila melanogaster

    Get PDF
    In the common fruit fly, Drosophila melanogaster, neural cell fate specification is triggered by a group of conserved transcriptional regulators known as proneural factors. Proneural factors induce neural fate in uncommitted neuroectodermal progenitor cells, in a process that culminates in sensory neuron differentiation. While the role of proneural factors in early fate specification has been described, less is known about the transition between neural specification and neural differentiation. The aim of this thesis is to use computational methods to improve the understanding of terminal neural differentiation in the Peripheral Nervous System (PNS) of Drosophila. To provide an insight into how proneural factors coordinate the developmental programme leading to neural differentiation, expression profiling covering the first 3 hours of PNS development in Drosophila embryos had been previously carried out by Cachero et al. [2011]. The study revealed a time-course of gene expression changes from specification to differentiation and suggested a cascade model, whereby proneural factors regulate a group of intermediate transcriptional regulators which are in turn responsible for the activation of specific differentiation target genes. In this thesis, I propose to select potentially important differentiation genes from the transcriptional data in Cachero et al. [2011] using a novel approach centred on protein interaction network-driven prioritisation. This is based on the insight that biological hypotheses supported by diverse data sources can represent stronger candidates for follow-up studies. Specifically, I propose the usage of protein interaction network data because of documented transcriptome-interactome correlations, which suggest that differentially expressed genes encode products that tend to belong to functionally related protein interaction clusters. Experimental protein interaction data is, however, remarkably sparse. To increase the informative power of protein-level analyses, I develop a novel approach to augment publicly available protein interaction datasets using functional conservation between orthologous proteins across different genomes, to predict interologs (interacting orthologs). I implement this interolog retrieval methodology in a collection of open-source software modules called Bio:: Homology::InterologWalk, the first generalised framework using web-services for “on-the- fly” interolog projection. Bio::Homology::InterologWalk works with homology data for any of the hundreds of genomes in Ensembl and Ensembgenomes Metazoa, and with experimental protein interaction data curated by EBI Intact. It generates putative protein interactions and optionally collates meta-data into a prioritisation index that can be used to help select interologs with high experimental support. The methodology proposed represents a significant advance over existing interolog data sources, which are restricted to specific biological domains with fixed underlying data sources often only accessible through basic web-interfaces. Using Bio::Homology::InterologWalk, I build interolog models in Drosophila sensory neurons and, guided by the transcriptome data, find evidence implicating a small set of genes in a conserved sensory neuronal specialisation dynamic, the assembly of the ciliary dendrite in mechanosensory neurons. Using network community-finding algorithms I obtain functionally enriched communities, which I analyse using an array of novel computational techniques. The ensuing datasets lead to the elucidation of a cluster of interacting proteins encoded by the target genes of one of the intermediate transcriptional regulators of neurogenesis and ciliogenesis, fd3F. These targets are validated in vivo and result in improved knowledge of the important target genes activated by the transcriptional cascade, suggesting a scenario for the mechanisms orchestrating the ordered assembly of the cilium during differentiation

    2008 GREAT Day Program

    Get PDF
    SUNY Geneseo’s Second Annual GREAT Day.https://knightscholar.geneseo.edu/program-2007/1002/thumbnail.jp

    Incorporation of Knowledge for Network-based Candidate Gene Prioritization

    Get PDF
    In order to identify the genes associated with a given disease, a number of different high-throughput techniques are available such as gene expression profiles. However, these high-throughput approaches often result in hundreds of different candidate genes, and it is thus very difficult for biomedical researchers to narrow their focus to a few candidate genes when studying a given disease. In order to assist in this challenge, a process called gene prioritization can be utilized. Gene prioritization is the process of identifying and ranking new genes as being associated with a given disease. Candidate genes which rank high are deemed more likely to be associated with the disease than those that rank low. This dissertation focuses on a specific kind of gene prioritization method called network-based gene prioritization. Network-based methods utilize a biological network such as a protein-protein interaction network to rank the candidate genes. In a biological network, a node represents a protein (or gene), and a link represents a biological relationship between two proteins such as a physical interaction. The purpose of this dissertation was to investigate if the incorporation of biological knowledge into the network-based gene prioritization process can provide a significant benefit. The biological knowledge consisted of a variety of information about a given gene including gene ontology (GO) functional terms, MEDLINE articles, gene co-expression measurements, and protein domains to name just a few. The biological knowledge was incorporated into the network’s links and nodes as link and node knowledge respectively. An example of link knowledge is the degree of functional similarity between two proteins, and an example of node knowledge is the number of GO terms associated with a given protein. Since there were no existing network-based inference algorithms which could incorporate node knowledge, I developed a new network-based inference algorithm to incorporate both link and node knowledge called the Knowledge Network Gene Prioritization (KNGP) algorithm. The results showed that the incorporation of biological knowledge via link and node knowledge can provide a significant benefit for network-based gene prioritization. The KNGP algorithm was utilized to combine the link and node knowledge

    IN SILICO METHODS FOR DRUG DESIGN AND DISCOVERY

    Get PDF
    Computer-aided drug design (CADD) methodologies are playing an ever-increasing role in drug discovery that are critical in the cost-effective identification of promising drug candidates. These computational methods are relevant in limiting the use of animal models in pharmacological research, for aiding the rational design of novel and safe drug candidates, and for repositioning marketed drugs, supporting medicinal chemists and pharmacologists during the drug discovery trajectory.Within this field of research, we launched a Research Topic in Frontiers in Chemistry in March 2019 entitled “In silico Methods for Drug Design and Discovery,” which involved two sections of the journal: Medicinal and Pharmaceutical Chemistry and Theoretical and Computational Chemistry. For the reasons mentioned, this Research Topic attracted the attention of scientists and received a large number of submitted manuscripts. Among them 27 Original Research articles, five Review articles, and two Perspective articles have been published within the Research Topic. The Original Research articles cover most of the topics in CADD, reporting advanced in silico methods in drug discovery, while the Review articles offer a point of view of some computer-driven techniques applied to drug research. Finally, the Perspective articles provide a vision of specific computational approaches with an outlook in the modern era of CADD

    Network Compression as a Quality Measure for Protein Interaction Networks

    Get PDF
    With the advent of large-scale protein interaction studies, there is much debate about data quality. Can different noise levels in the measurements be assessed by analyzing network structure? Because proteomic regulation is inherently co-operative, modular and redundant, it is inherently compressible when represented as a network. Here we propose that network compression can be used to compare false positive and false negative noise levels in protein interaction networks. We validate this hypothesis by first confirming the detrimental effect of false positives and false negatives. Second, we show that gold standard networks are more compressible. Third, we show that compressibility correlates with co-expression, co-localization, and shared function. Fourth, we also observe correlation with better protein tagging methods, physiological expression in contrast to over-expression of tagged proteins, and smart pooling approaches for yeast two-hybrid screens. Overall, this new measure is a proxy for both sensitivity and specificity and gives complementary information to standard measures such as average degree and clustering coefficients

    Computational techniques for cell signaling

    Get PDF
    Cells can be viewed as sophisticated machines that organize their constituent components and molecules to receive, process, and respond to signals. The goal of the scientist is to uncover both the individual operations underlying these processes and the mechanism of the emergent properties of interest that give rise to the various phenomena such as disease, development, recovery or aging. Cell signaling plays a crucial role in all of these areas. The complexity of biological processes coupled with the physical limitations of experiments to observe individual molecular components across small to large scales limits the knowlege that can be gleaned from direct observations. Mathematical modeling can be used to estimate parameters that are hidden or too difficult to observe in experiments, and it can make qualitative predictions that can distinguish between hypotheses of interest. Statistical analysis can be employed to explore the large amounts of data generated by modern experimental techniques such as sequencing and high-throughput screening, and it can integrate the observations from many individual experiments or even separate studies to generate new hypotheses. This dissertation employs mathematical and statistical analyses for three prominent aspects of cell signaling: the physical transfer of signaling molecules between cells, the intracellular protein machinery that organizes into pathways to process these signals, and changes in gene expression in response to cell signaling. Computational biology can be described as an applied discipline in that it aims to further the knowledge of a discipline that is distinct from itself. However, the richness of the problems encountered in biology requires continuous development of better methods equipped to handle the complexity, size, or uncertainty of the data, and to build in constraints motivated by the reality of the underlying biological system. In addition, better computational and mathematical methods are also needed to model the emergent behavior that arises from many components. The work presented in this dissertation fulfills both of these roles. We apply known and existing techniques to analyse experimental data and provide biological meaning, and we also develop new statistical and mathematical models that add to the knowledge and practice of computational biology. Much of cell signaling is initiated by signal transduction from the exterior, either by sensing the environmental conditions or the recpetion of specific signals from other cells. The phenomena of most immediate concern to our species, that of human health and disease, are usually also generated from, and manifest in, our tissues and organs due to the interaction and signaling between cells. A modality of inter-cellular communication that was regarded earlier as an obscure phenomenon but has more recently come to the attention of the scientific community is that of tunneling nanotubes (TNs). TNs have been observed as thin (of the order of 100 nanometers) extensions from a cell to another closely located one. The formation of such structures along with the intercellular exchange of molecules through them, and their interaction with the cytoskeleton, could be involved in many important processes, such as tissue formation and cancer growth. We describe a simple model of passive transport of molecules between cells due to TNs. Building on a few basic assumptions, we derive parametrized, closed-form expressions to describe the concentration of transported molecules as a function of distance from a population of TN-forming cells. Our model predicts how the perfusion of molecules through the TNs is affected by the size of the transferred molecules, the length and stability of nanotube formation, and the differences between membrane-bound and cytosolic proteins. To our knowledge, this is the first published mathematical model of intercellular transfer through tunneling nanotubes. We envision that experimental observations will be able to confirm or improve the assumptions made in our model. Furthermore, quantifying the form of inter-cellular communication in the basic scenario envisioned in our model can help suggest ways to measure and investigate cases of possible regulation of either formation of tunneling nanotubes or transport through them. The next problem we focus on is uncovering how the interactions between the genes and proteins in a cell organize into pathways to process call signals or perform other tasks. The ability to accurately model and deeply understand gene and protein interaction networks of various kinds can be very powerful for prioritizing candidate genes and predicting their role in various signaling pathways and processes. A popular technique for gene prioritization and function prediction is the graph diffusion kernel. We show how the graph diffusion kernel is mathematically similar to the Ising spin graph, a model popular in statistical physics but not usually employed on biological interaction networks. We develop a new method for calculating gene association based on the Ising spin model which is different from the methods common in either bioinformatics or statistical physics. We show that our method performs better than both the graph diffusion kernel and its commonly used equivalent in the Ising model. We present a theoretical argument for understanding its performance based on ideas of phase transitions on networks. We measure its performance by applying our method to link prediction on protein interaction networks. Unlike candidate gene prioritization or function prediction, link prediction does not depend on the existing annotation or characterization of genes for ground truth. It helps us to avoid the confounding noise and uncertainty in the network and annotation data. As a purely network analysis problem, it is well suited for comparing network analysis methods. Once we know that we are accurately modeling the interaction network, we can employ our model to solve other problems like gene prioritization using interaction data. We also apply statistical analysis for a specific instance of a cell signaling process: the drought response in Brassica napus, a plant of scientific and economic importance. Important changes in the cell physiology of guard cells are initiated by abscisic acid, an important phytohormone that signals water deficit stress. We analyse RNA-seq reads resulting from the sequencing of mRNA extracted from protoplasts treated with abscisic acid. We employ sequence analysis, statisitical modeling, and the integration of cross-species network data to uncover genes, pathways, and interactions important in this process. We confirm what is known from other species and generate new gene and interaction candidates. By associating functional and sequence modification, we are also able to uncover evidence of evolution of gene specialization, a process that is likely widespread in polyploid genomes. This work has developed new computational methods and applied existing tools for understanding cellular signaling and pathways. We have applied statistical analysis to integrate expression, interactome, pathway, regulatory elements, and homology data to infer \textit{Brassica napus} genes and their roles involved in drought response. Previous literature suggesting support for our findings from other species based on independent experiments is found for many of of these findings. By relating the changes in regulatory elements, our RNA-seq results and common gene ancestry, we present evidence of its evolution in the context of polyploidy. Our work can provide a scientific basis for the pursuit of certain genes as targets of breeding and genetic engineering efforts for the development of drought tolerant oil crops. Building on ideas from statistical physics, we developed a new model of gene associations in networks. Using link prediction as a metric for the accuracy of modeling the underlying structure of a real network, we show that our model shows improved performance on real protein interaction networks. Our model of gene associations can be use to prioritize candidate genes for a disease or phenotype of interest. We also develop a mathematical model for a novel inter-cellular mode of biomolecule transfer. We relate hypotheses about the dynamics of TN formation, stability, and nature of molecular transport to quantitative predictions that may be tested by suitable experiments. In summary, this work demostrates the application and development of computational analysis of cell signaling at the level of the transcriptome, the interactome, and physical transport

    Book of abstracts

    Get PDF

    Determining the potential of wearable technologies within the disease landscape of sub-Saharan Africa

    Get PDF
    Thesis (MEng)--Stellenbosch University, 2019.ENGLISH ABSTRACT: Please refer to full text for abstract.AFRIKAANSE OPSOMMING: Raadpleeg asseblief vol teks vir opsomming

    Adapting Community Detection Approaches to Large, Multilayer, and Attributed Networks

    Get PDF
    Networks have become a common data mining tool to encode relational definitions between a set of entities. Whether studying biological correlations, or communication between individuals in a social network, network analysis tools enable interpretation, prediction, and visualization of patterns in the data. Community detection is a well-developed subfield of network analysis, where the objective is to cluster nodes into 'communities' based on their connectivity patterns. There are many useful and robust approaches for identifying communities in a single, moderately-sized network, but the ability to work with more complicated types of networks containing extra or a large amount of information poses challenges. In this thesis, we address three types of challenging network data and how to adapt standard community detection approaches to handle these situations. In particular, we focus on networks that are large, attributed, and multilayer. First, we present a method for identifying communities in multilayer networks, where there exist multiple relational definitions between a set of nodes. Next, we provide a pre-processing technique for reducing the size of large networks, where standard community detection approaches might have inconsistent results or be prohibitively slow. We then introduce an extension to a probabilistic model for community structure to take into account node attribute information and develop a test to quantify the extent to which connectivity and attribute information align. Finally, we demonstrate example applications of these methods in biological and social networks. This work helps to advance the understand of network clustering, network compression, and the joint modeling of node attributes and network connectivity.Doctor of Philosoph
    corecore