20 research outputs found

    KOnezumi: a web application for automating gene disruption strategies to generate knockout mice

    Get PDF
    A Summary: Although gene editing using the CRISPR/Cas9 system enables the rapid generation of knockout mice, constructing an optimal gene disruption strategy is still labourious. Here, we propose KOnezumi, a simple and user-friendly web application, for use in automating the design of knockout strategies for multiple genes. Users only need to input gene symbols, and then KOnezumi returns target exons, gRNA candidates to delete the target exons, genotyping PCR primers, nucleotide sequences of the target exons and coding sequences of expected deletion products. KOnezumi enables users to easily and rapidly apply a rational strategy to accelerate the generation of KO mic

    RNAseq 解析パイプライン「ikra」の開発及び、「ikra」を用いたRNAseq メタ解析手法の確立

    Full text link
    申請代表者: 医学部医学科2年 山田 航暉アドバイザー教員: 医学系研究科遺伝統計学 鈴木 顕共同研究者: 医学部医学科1年 石川海斗, 松本康成採択番号: 医-2

    Enhancing Retinal Scan Classification: A Comparative Study of Transfer Learning and Ensemble Techniques

    Get PDF
    Ophthalmic diseases are a significant health concern globally, causing visual impairment and blindness in millions of people, particularly in dispersed populations. Among these diseases, retinal fundus diseases are a leading cause of irreversible vision loss, and early diagnosis and treatment can prevent this outcome. Retinal fundus scans have become an indispensable tool for doctors to diagnose multiple ocular diseases simultaneously. In this paper, the results of a variety of deep learning models (DenseNet-201, ResNet125V2, XceptionNet, EfficientNet-B7, MobileNetV2, and EfficientNetV2M) and ensemble learning approaches are presented, which can accurately detect 20 common fundus diseases by analyzing retinal fundus scan images. The proposed model is able to achieve a remarkable accuracy of 96.98% for risk classification and 76.92% for multi-disease detection, demonstrating its potential for use in clinical settings. By utilizing the proposed model, doctors can provide swift and accurate diagnoses to patients, improving their chances of receiving timely treatment and preserving their vision

    NeuroCADR: Drug Repurposing to Reveal Novel Anti-Epileptic Drug Candidates Through an Integrated Computational Approach

    Full text link
    Drug repurposing is an emerging approach for drug discovery involving the reassignment of existing drugs for novel purposes. An alternative to the traditional de novo process of drug development, repurposed drugs are faster, cheaper, and less failure prone than drugs developed from traditional methods. Recently, drug repurposing has been performed in silico, in which databases of drugs and chemical information are used to determine interactions between target proteins and drug molecules to identify potential drug candidates. A proposed algorithm is NeuroCADR, a novel system for drug repurposing via a multi-pronged approach consisting of k-nearest neighbor algorithms (KNN), random forest classification, and decision trees. Data was sourced from several databases consisting of interactions between diseases, symptoms, genes, and affiliated drug molecules, which were then compiled into datasets expressed in binary. The proposed method displayed a high level of accuracy, outperforming nearly all in silico approaches. NeuroCADR was performed on epilepsy, a condition characterized by seizures, periods of time with bursts of uncontrolled electrical activity in brain cells. Existing drugs for epilepsy can be ineffective and expensive, revealing a need for new antiepileptic drugs. NeuroCADR identified novel drug candidates for epilepsy that can be further approved through clinical trials. The algorithm has the potential to determine possible drug combinations to prescribe a patient based on a patient's prior medical history. This project examines NeuroCADR, a novel approach to computational drug repurposing capable of revealing potential drug candidates in neurological diseases such as epilepsy.Comment: 8 pages, 5 figure

    Function Prediction

    Full text link
    While many good textbooks are available on Protein Structure, Molecular Simulations, Thermodynamics and Bioinformatics methods in general, there is no good introductory level book for the field of Structural Bioinformatics. This book aims to give an introduction into Structural Bioinformatics, which is where the previous topics meet to explore three dimensional protein structures through computational analysis. We provide an overview of existing computational techniques, to validate, simulate, predict and analyse protein structures. More importantly, it will aim to provide practical knowledge about how and when to use such techniques. We will consider proteins from three major vantage points: Protein structure quantification, Protein structure prediction, and Protein simulation & dynamics. There are still huge gaps in understanding the molecular function of proteins. This raises the question on how we may predict protein function, when little to no knowledge from direct experiments is available. Protein function is a broad concept which spans different scales: from quantum scale effects for catalyzing enzymatic reactions, to phenotypes that manifest at the organism level. In fact, many of these functional scales are entirely different research areas. Here, we will consider prediction of a smaller range of functions, roughly spanning the protein residue-level up to the pathway level. We will give a conceptual overview of which functional aspects of proteins we can predict, which methods are currently available, and how well they work in practice.Comment: editorial responsability: K. Anton Feenstra, Sanne Abeln. This chapter is part of the book "Introduction to Protein Structural Bioinformatics". The Preface arXiv:1801.09442 contains links to all the (published) chapters. The update adds available arxiv hyperlinks for the chapter

    Function Prediction

    Get PDF
    While many good textbooks are available on Protein Structure, Molecular Simulations, Thermodynamics and Bioinformatics methods in general, there is no good introductory level book for the field of Structural Bioinformatics. This book aims to give an introduction into Structural Bioinformatics, which is where the previous topics meet to explore three dimensional protein structures through computational analysis. We provide an overview of existing computational techniques, to validate, simulate, predict and analyse protein structures. More importantly, it will aim to provide practical knowledge about how and when to use such techniques. We will consider proteins from three major vantage points: Protein structure quantification, Protein structure prediction, and Protein simulation & dynamics. There are still huge gaps in understanding the molecular function of proteins. This raises the question on how we may predict protein function, when little to no knowledge from direct experiments is available. Protein function is a broad concept which spans different scales: from quantum scale effects for catalyzing enzymatic reactions, to phenotypes that manifest at the organism level. In fact, many of these functional scales are entirely different research areas. Here, we will consider prediction of a smaller range of functions, roughly spanning the protein residue-level up to the pathway level. We will give a conceptual overview of which functional aspects of proteins we can predict, which methods are currently available, and how well they work in practice

    Functional geometry of protein interactomes

    Get PDF
    Motivation Protein–protein interactions (PPIs) are usually modeled as networks. These networks have extensively been studied using graphlets, small induced subgraphs capturing the local wiring patterns around nodes in networks. They revealed that proteins involved in similar functions tend to be similarly wired. However, such simple models can only represent pairwise relationships and cannot fully capture the higher-order organization of protein interactomes, including protein complexes. Results To model the multi-scale organization of these complex biological systems, we utilize simplicial complexes from computational geometry. The question is how to mine these new representations of protein interactomes to reveal additional biological information. To address this, we define simplets, a generalization of graphlets to simplicial complexes. By using simplets, we define a sensitive measure of similarity between simplicial complex representations that allows for clustering them according to their data types better than clustering them by using other state-of-the-art measures, e.g. spectral distance, or facet distribution distance. We model human and baker’s yeast protein interactomes as simplicial complexes that capture PPIs and protein complexes as simplices. On these models, we show that our newly introduced simplet-based methods cluster proteins by function better than the clustering methods that use the standard PPI networks, uncovering the new underlying functional organization of the cell. We demonstrate the existence of the functional geometry in the protein interactome data and the superiority of our simplet-based methods to effectively mine for new biological information hidden in the complexity of the higher-order organization of protein interactomes.This work was supported by the European Research Council (ERC) Starting Independent Researcher Grant 278212, the European Research Council (ERC) Consolidator Grant 770827, the Serbian Ministry of Education and Science Project III44006, the Slovenian Research Agency project J1-8155 and the awards to establish the Farr Institute of Health Informatics Research, London, from the Medical Research Council, Arthritis Research UK, British Heart Foundation, Cancer Research UK, Chief Scientist Office, Economic and Social Research Council, Engineering and Physical Sciences Research Council, National Institute for Health Research, National Institute for Social Care and Health Research, and Wellcome Trust (grant MR/K006584/1).Peer ReviewedPostprint (author's final draft

    Comparison of Software Packages for Detecting Differentially Expressed Genes from Single-Sample Rna-Seq Data

    Get PDF
    RNA-sequencing (RNA-seq) has rapidly become the tool in many genome-wide transcriptomic studies. It provides a way to understand the RNA environment of cells in different physiological or pathological states to determine how cells respond to these changes. RNA-seq provides quantitative information about the abundance of different RNA species present in a given sample. If the difference or change observed in the read counts or expression level between two experimental conditions is statistically significant, the gene is declared as differentially expressed. A large number of methods for detecting differentially expressed genes (DEGs) with RNA-seq have been developed, such as the methods based on negative binomial models (edgeR, DESeq and baySeq), non-parametric approaches (NOIseq and SAMseq), transformations of gene-level read counts for linear modeling with Limma, as well as transcript-based detection methods that also enable gene-level differential expression reports (Cuffdiff 2, EBSeq and TSPM.) Recently, there have been several studies on the comparison of software packages for detecting differential expression. Some of them can be used to detect DEGs by comparing a single sample with a control. It is necessary to compare these methods in order to find a more efficient and accurate method. S. R. Zaim, C. Kenost, J. Berghout, et. al. proposed an “all-against-one” framework and compared it with eight single-subject methods (NOISeq, DEGseq, edgeR, mixture model, DESeq, DESeq2, iDEG, and ensemble) for identifying DEGs from the single-subject RNA-seq data. They claimed that different methods had different performance under different conditions, and it remained difficult to have a single method obtained both high precision and recall. Differential expression analysis requires a comparison of gene expression values between samples. However, sometimes it is hard to obtain replicates, such as only one single sample from a cancer patient can be obtained. Hence it is necessary to study methods for detecting DEGs without replicates. We focused on comparing the log fold change, edgeR, NOISeq, iDEG and ACDtool methods. The log fold change method can directly obtain the differential change value when detecting DEGs, so it has advantages in the research related to the absolute value of a differential expression. However, it is more difficult to select the required threshold. The edgeR method uses empirical Bayesian estimation and precise tests based on the negative binomial distribution to determine differential genes. It adjusts the degree of over-dispersion across genes between genes and uses a precise test similar to Fisher\u27s exact test but adapts to over-dispersed data to assess the differential expression of each gene. The NOISeq method contains various diagnostic maps to identify sources of bias in RNA-seq data and apply appropriate standardization procedures in each case. It is more effective in avoiding false positive detection at the cost of certain sensitivity. The iDEG method uses the algorithm based on modeling read counts via a re-parameterized negative binomial distribution. It applies the Variance Stabilizing Transformation for each gene in order to detect the identified DEG set. It is a method for assessing singlesubject gene differential expression. The ACDtool is a fully revamped version of the Audic-Claverie (AC) test adapted to the diverse and much larger datasets produced by contemporary omics techniques. Under the null hypothesis that the tag counts are generated from Poisson distributions with equal means (or proportional to the respective sample sizes), this approach returns the probability that the compared samples contain the same proportion of the event. We used the data set in the SEQC project, and the gene expression levels of the samples by using the RT-PCR technologies to compare several methods for detecting single-sample differentially expressed genes by the performance on the receiver operating characteristic curves: 1) With the differentially expressed genes obtained by Limma applying to genes with RT-PCR data; 2) With the differentially expressed genes obtained by DESeq2 method on all genes; 3) Applying an experimental method to compare the false positive rates. We conclude that the iDEG method gives the least false positive rate with sacrificing the sensitivity. Although the edgeR and simple fold change methods give higher false positive rate comparing with the iDEG method, they obtain the best trade-off and hence are the most reliable and efficient methods among all of the methods we studied for the single-sample RNA-seq data

    Towards Open Domain Literature Based Discovery

    Get PDF
    Appeared in: Open Search Symposium 2021, 11-13 October 2021, CERN, Geneva, Switzerland
    corecore