8 research outputs found

    Application of Information Theory to RNA-sequencing Data Sets for Better Understanding of Human Cancers

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.This research utilizes information theory to study the regulatory roles of non-coding RNAs in human cancers. microRNAs (miRNA) are small non-coding RNAs binding to mRNAs to suppress protein expression. Long non-coding RNAs (lncRNA) can act as competing endogenous RNAs (ceRNAs) to compete with mRNAs to bind to miRNAs. LncRNAs, miRNAs, and mRNAs form the ceRNA networks, which play a vital role in regulating molecular pathways of human cancers. Furthermore, miRNA isoforms, which are called isomiRs, are also enable to regulate the gene expression and could be used to distinguish cancer subtypes. Therefore, constructing ceRNA regulatory networks and identifying isomiRs as cancer subtype biomarkers are very important for understanding the regulatory role of non-coding RNAs in cancers. Current methods for constructing ceRNA networks and discovering biomarkers that faithfully classify different cancer subtypes have some limitations. Information theory is a powerful tool for better understanding the regulatory role of non-coding RNAs in human cancer. This thesis utilizes information theory for constructing ceRNA network and discovering human cancer subtype biomarkers in cancers. The novel contributions to the research field by this thesis are enlisted below: • A competition rule-based pointwise mutual information is proposed to construct ceRNA networks. • An improved mutual information and an information gain are developed to identify isomiRs as biomarkers for classifying different cancer subtypes. • A distribution-based method is proposed to flitter out the noisy data in RNA-seq data. Three case studies have been performed to study the regulatory roles of non-coding RNAs in human cancers. (1) The first case study is to construct the competition relationships between lncRNA, miRNA, and mRNA in breast cancer by using pointwise mutual information. (2) The second case study is to utilize the improved mutual information to discover isomiR biomarkers for classifying different breast cancer subtypes. (3) The third case study applies the improved information gain to detect isomiR based biomarkers to classify different glioma subtypes

    Exploring Consensus RNA Substructural Patterns Using Subgraph Mining

    No full text

    Extra spuisluis in de Afsluitdijk: Effect op onderhoud havens

    No full text
    Datasets for constructing the miRNA-disease association prediction models. Four datasets such as three positive sample sets “positive_miR”, “positive_HMDD” and “positive_miRcancer” and the negative sample set “negative_expression” are stored. The three positive sample sets are retrieved from the three existing databases such as miR2Disease, HMDD v2 and miRCancer, while the negative sample set was obtained via analyzing the expression of the miRNAs.This file can also be downloaded from: https://drive.google.com/open?id=0B6lH3mKdA9CSTkg2OVBPS0ZfVnM . (XLS 947 kb

    Additional file 1 of Imbalance learning for the prediction of N6-Methylation sites in mRNAs

    No full text
    Data set of human mature mRNA N6-Methylation. Training and testing data used in this paper is accessible in this file. For each sample, the transcript id, site position, transcript length and flanking sequence with a size of 26 nts are given. (XLSX 15155 kb

    Additional file 2 of Imbalance learning for the prediction of N6-Methylation sites in mRNAs

    No full text
    Supplementary Tables, Algorithm and Figure. Table S1: The result of Fisher’s exact test on training data. The SNP variant states of positive and negative samples are counted respectively at all positions in window sequence. The P-value is computed with Fisher’s exact function from Python scipy package. Table S2: Complete SNP specificity ranking for all positions. Table S3: The feature distribution in HMpre feature space. Algorithm S1: SNP Specificity Identification Algorithm. Figure S1: Distribution of feature importance scores in XGBoost Classifier learning stage. (PDF 323 kb
    corecore