69 research outputs found

    Genetic Algorithm based Convolutional Neural Network for Few Shot Learning in Disease type prediction on RNA-Seq Data

    Get PDF
    Diagnosing the correct types of the disease is essential to the effective treatment. The diagnosis may not always be straightforward from the biological tests especially during the early stages of the disease. Human body responds to the disease by producing certain proteins. If we know which genes are active, that is, which proteins are being produced, we can more accurately classify disease subtypes. This study is based on the genetic information extracted from the patient’s biological sample and is used to classify cancer subtypes. Among different types of genetic data, we consider RNA-seq data in this thesis. Studies based on genetic information often suffer from very limited samples and few shot learning has recently been studied for disease classification. Given the success of neural networks in assisting data analysis mostly with large amounts of data, we perform few shot learning by retraining the neural networks with genetic algorithmic processes. We follow the proposal from the Human Genome Organization (HUGO) to group genes based on their chemical composition and apply genetic algorithms to the HUGO gene groups to help retrain the neural networks. We apply our proposed approach to several different cancer datasets and compare our method across state-of-the-art methods. We have implemented our proposed approach and compared its performance with a wide variety of existing methods in machine learning and neural networks on three cancer datasets. According to our experiment, while performing similar to other methods when a relatively larger amount of data is available, our proposed approach outperforms Affinitynet by an average of 4 percent for few-shot learning with small datasets

    Transient myeloproliferative disorder: A pointer to underlying trisomy 21

    Get PDF
    A 19-day-old male neonate was presented with abdominal distension, refusal to feed, and high-grade fever, suggestive of late-onset sepsis. Apart from a suspected clinodactyly, no dysmorphism was present. The hemograms were suggestive of leukocytosis with 29% blasts and flow cytometry revealed acute myeloid leukemia. Due to the presence of congenital leukemia, the dysmorphism in the child was investigated and a karyotype revealed trisomy 21; a diagnosis of transient myeloproliferative disorder (TMD) was made. The child developed significant bleeding, impending congestive cardiac failure and significant weight loss, and prompting initiation of low-dose chemotherapy with cytarabine. The child improved following therapy but developed fungal sepsis and multiple joint osteomyelitis secondary to the chemotherapy-induced myelosuppression which was managed with antibiotics. The child was discharged and is on close 3 monthly follow-up to screen for acute megakaryoblastic leukemia, as babies with TMD are prone to developing acute megakaryoblastic leukemia in early childhood

    A machine learning pipeline for discriminant pathways identification

    Full text link
    Motivation: Identifying the molecular pathways more prone to disruption during a pathological process is a key task in network medicine and, more in general, in systems biology. Results: In this work we propose a pipeline that couples a machine learning solution for molecular profiling with a recent network comparison method. The pipeline can identify changes occurring between specific sub-modules of networks built in a case-control biomarker study, discriminating key groups of genes whose interactions are modified by an underlying condition. The proposal is independent from the classification algorithm used. Three applications on genomewide data are presented regarding children susceptibility to air pollution and two neurodegenerative diseases: Parkinson's and Alzheimer's. Availability: Details about the software used for the experiments discussed in this paper are provided in the Appendix

    Positional distribution of human transcription factor binding sites

    Get PDF
    We developed a method for estimating the positional distribution of transcription fac-tor (TF) binding sites using ChIP-chip data, and applied it to recently published experiments on binding sites of nine TFs; OCT4, SOX2, NANOG, HNF1A, HNF4A, HNF6, FOXA2, USF1 and CREB1. The data were obtained from a genome-wide cov-erage of promoter regions from 8kb upstream of the Transcription Start Site (TSS) to 2kb downstream. The number of target genes of each TF ranges from few hundred to several thousand. We found that for each of the nine TFs the estimated binding site distribution is closely approximated by a mixture of two components: a narrow peak, localized within 300 base pairs upstream of the TSS, and a distribution of almost uni-form density within the tested region. Using Gene Ontology and Enrichment analysis, we were able to associate (for each of the TFs studied) the target genes of both types of binding with known biological processes. Most GO terms were enriched either among the proximal targets or among those with a uniform distribution of binding sites. For example, the three stemness-related TFs have several hundred target genes that belong to "development" and "morphogenesis" whose binding sites belong to the uniform dis-tribution.Comment: 27 pages, 8 figures (already embedded in file) To appear in Nucleic Acids Researc

    Report of the Committee on Amendments to Criminal Law

    Get PDF
    "Dear Prime Minister, This Committee was constituted by GOI Notification No. SA (3003)E, dated December 23, 2012 to look into possible amendments of the Criminal Law to provide for quicker trial and enhanced punishment for criminals committing sexual assault of extreme nature against women. In view of the significance and urgency of the task, the Committee undertook to perform it within 30 days, which task has been completed. Accordingly, the Committee has prepared its Report, which is enclosed herewith. It is the Committee's hope that the promptitude with which this Committee was constituted within a few days of the brutal gang rape in Delhi on December 16, 2012, will continue to accomplish the task by speedy implementation of its Recommendations to retain public confidence in good governance. With regards, Yours Sincerely, J.S. Verma [Committee Chair]

    Report of the Committee on Amendments to Criminal Law

    Get PDF
    "Dear Prime Minister, This Committee was constituted by GOI Notification No. SA (3003)E, dated December 23, 2012 to look into possible amendments of the Criminal Law to provide for quicker trial and enhanced punishment for criminals committing sexual assault of extreme nature against women. In view of the significance and urgency of the task, the Committee undertook to perform it within 30 days, which task has been completed. Accordingly, the Committee has prepared its Report, which is enclosed herewith. It is the Committee's hope that the promptitude with which this Committee was constituted within a few days of the brutal gang rape in Delhi on December 16, 2012, will continue to accomplish the task by speedy implementation of its Recommendations to retain public confidence in good governance. With regards, Yours Sincerely, J.S. Verma [Committee Chair]

    Wide-Scale Analysis of Human Functional Transcription Factor Binding Reveals a Strong Bias towards the Transcription Start Site

    Get PDF
    We introduce a novel method to screen the promoters of a set of genes with shared biological function, against a precompiled library of motifs, and find those motifs which are statistically over-represented in the gene set. The gene sets were obtained from the functional Gene Ontology (GO) classification; for each set and motif we optimized the sequence similarity score threshold, independently for every location window (measured with respect to the TSS), taking into account the location dependent nucleotide heterogeneity along the promoters of the target genes. We performed a high throughput analysis, searching the promoters (from 200bp downstream to 1000bp upstream the TSS), of more than 8000 human and 23,000 mouse genes, for 134 functional Gene Ontology classes and for 412 known DNA motifs. When combined with binding site and location conservation between human and mouse, the method identifies with high probability functional binding sites that regulate groups of biologically related genes. We found many location-sensitive functional binding events and showed that they clustered close to the TSS. Our method and findings were put to several experimental tests. By allowing a "flexible" threshold and combining our functional class and location specific search method with conservation between human and mouse, we are able to identify reliably functional TF binding sites. This is an essential step towards constructing regulatory networks and elucidating the design principles that govern transcriptional regulation of expression. The promoter region proximal to the TSS appears to be of central importance for regulation of transcription in human and mouse, just as it is in bacteria and yeast.Comment: 31 pages, including Supplementary Information and figure

    The nucleotide composition of microsatellites impacts both replication fidelity and mismatch repair in human colorectal cells

    Get PDF
    Microsatellite instability is a key mechanism of colon carcinogenesis. We have previously studied mutations within a (CA)13 microsatellite using an enhanced green fluorescent protein (EGFP)-based reporter assay that allows the distinction of replication errors and mismatch repair (MMR) activity. Here we utilize this assay to compare mutations of mono- and dinucleotide repeats in human colorectal cells. HCT116 and HCT116+chr3 cells were stably transfected with EGFP-based plasmids harboring A10, G10, G16, (CA)13 and (CA)26 repeats. EGFP-positive mutant fractions were quantitated by flow cytometry, mutation rates were calculated and the mutant spectrum was analyzed by cycle sequencing. EGFP fluorescence pattern changed with the microsatellite's nucleotide sequence and cell type and clonal variations were observed in mononucleotide repeats. Replication errors (as calculated in HCT116) at A10 repeats were 5–10-fold higher than in G10, G16 were 30-fold higher than G10 and (CA)26 were 10-fold higher than (CA)13. The mutation rates in hMLH1-proficient HCT116+chr3 were 30–230-fold lower than in HCT116. MMR was more efficient in G16 than in A10 clones leading to a higher stability of poly-G tracts. Mutation spectra revealed predominantly 1-unit deletions in A10, (CA)13 and G10 and 2-unit deletions or 1-unit insertion in (CA)26. These findings indicate that both replication fidelity and MMR are affected by the microsatellite's nucleotide composition

    Relating gene expression data on two-component systems to functional annotations in Escherichia coli

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Obtaining physiological insights from microarray experiments requires computational techniques that relate gene expression data to functional information. Traditionally, this has been done in two consecutive steps. The first step identifies important genes through clustering or statistical techniques, while the second step assigns biological functions to the identified groups. Recently, techniques have been developed that identify such relationships in a single step.</p> <p>Results</p> <p>We have developed an algorithm that relates patterns of gene expression in a set of microarray experiments to functional groups in one step. Our only assumption is that patterns co-occur frequently. The effectiveness of the algorithm is demonstrated as part of a study of regulation by two-component systems in <it>Escherichia coli</it>. The significance of the relationships between expression data and functional annotations is evaluated based on density histograms that are constructed using product similarity among expression vectors. We present a biological analysis of three of the resulting functional groups of proteins, develop hypotheses for further biological studies, and test one of these hypotheses experimentally. A comparison with other algorithms and a different data set is presented.</p> <p>Conclusion</p> <p>Our new algorithm is able to find interesting and biologically meaningful relationships, not found by other algorithms, in previously analyzed data sets. Scaling of the algorithm to large data sets can be achieved based on a theoretical model.</p

    Inferring Pathway Activity toward Precise Disease Classification

    Get PDF
    The advent of microarray technology has made it possible to classify disease states based on gene expression profiles of patients. Typically, marker genes are selected by measuring the power of their expression profiles to discriminate among patients of different disease states. However, expression-based classification can be challenging in complex diseases due to factors such as cellular heterogeneity within a tissue sample and genetic heterogeneity across patients. A promising technique for coping with these challenges is to incorporate pathway information into the disease classification procedure in order to classify disease based on the activity of entire signaling pathways or protein complexes rather than on the expression levels of individual genes or proteins. We propose a new classification method based on pathway activities inferred for each patient. For each pathway, an activity level is summarized from the gene expression levels of its condition-responsive genes (CORGs), defined as the subset of genes in the pathway whose combined expression delivers optimal discriminative power for the disease phenotype. We show that classifiers using pathway activity achieve better performance than classifiers based on individual gene expression, for both simple and complex case-control studies including differentiation of perturbed from non-perturbed cells and subtyping of several different kinds of cancer. Moreover, the new method outperforms several previous approaches that use a static (i.e., non-conditional) definition of pathways. Within a pathway, the identified CORGs may facilitate the development of better diagnostic markers and the discovery of core alterations in human disease
    corecore