2,916 research outputs found

    NANOCONTROLLER PROGRAM OPTIMIZATION USING ITE DAGS

    Get PDF
    Kentucky Architecture nanocontrollers employ a bit-serial SIMD-parallel hardware design to execute MIMD control programs. A MIMD program is transformed into equivalent SIMD code by a process called Meta-State Conversion (MSC), which makes heavy use of enable masking to distinguish which code should be executed by each processing element. Both the bit-serial operations and the enable masking imposed on them are expressed in terms of if-then-else (ITE) operations implemented by a 1-of-2 multiplexor, greatly simplifying the hardware. However, it takes a lot of ITEs to implement even a small program fragment. Traditionally, bit-serial SIMD machines had been programmed by expanding a fixed bitserial pattern for each word-level operation. Instead, nanocontrollers can make use of the fact that ITEs are equivalent to the operations in Binary Decision Diagrams (BDDs), and can apply BDD analysis to optimize the ITEs. This thesis proposes and experimentally evaluates a number of techniques for minimizing the complexity of the BDDs, primarily by manipulating normalization ordering constraints. The best method found is a new approach in which a simple set of optimization transformations is followed by normalization using an ordering determined by a Genetic Algorithm (GA)

    Acute Myeloid Leukemia

    Get PDF
    Acute myeloid leukemia (AML) is the most common type of leukemia. The Cancer Genome Atlas Research Network has demonstrated the increasing genomic complexity of acute myeloid leukemia (AML). In addition, the network has facilitated our understanding of the molecular events leading to this deadly form of malignancy for which the prognosis has not improved over past decades. AML is a highly heterogeneous disease, and cytogenetics and molecular analysis of the various chromosome aberrations including deletions, duplications, aneuploidy, balanced reciprocal translocations and fusion of transcription factor genes and tyrosine kinases has led to better understanding and identification of subgroups of AML with different prognoses. Furthermore, molecular classification based on mRNA expression profiling has facilitated identification of novel subclasses and defined high-, poor-risk AML based on specific molecular signatures. However, despite increased understanding of AML genetics, the outcome for AML patients whose number is likely to rise as the population ages, has not changed significantly. Until it does, further investigation of the genomic complexity of the disease and advances in drug development are needed. In this review, leading AML clinicians and research investigators provide an up-to-date understanding of the molecular biology of the disease addressing advances in diagnosis, classification, prognostication and therapeutic strategies that may have significant promise and impact on overall patient survival

    Microarray Data Mining and Gene Regulatory Network Analysis

    Get PDF
    The novel molecular biological technology, microarray, makes it feasible to obtain quantitative measurements of expression of thousands of genes present in a biological sample simultaneously. Genome-wide expression data generated from this technology are promising to uncover the implicit, previously unknown biological knowledge. In this study, several problems about microarray data mining techniques were investigated, including feature(gene) selection, classifier genes identification, generation of reference genetic interaction network for non-model organisms and gene regulatory network reconstruction using time-series gene expression data. The limitations of most of the existing computational models employed to infer gene regulatory network lie in that they either suffer from low accuracy or computational complexity. To overcome such limitations, the following strategies were proposed to integrate bioinformatics data mining techniques with existing GRN inference algorithms, which enables the discovery of novel biological knowledge. An integrated statistical and machine learning (ISML) pipeline was developed for feature selection and classifier genes identification to solve the challenges of the curse of dimensionality problem as well as the huge search space. Using the selected classifier genes as seeds, a scale-up technique is applied to search through major databases of genetic interaction networks, metabolic pathways, etc. By curating relevant genes and blasting genomic sequences of non-model organisms against well-studied genetic model organisms, a reference gene regulatory network for less-studied organisms was built and used both as prior knowledge and model validation for GRN reconstructions. Networks of gene interactions were inferred using a Dynamic Bayesian Network (DBN) approach and were analyzed for elucidating the dynamics caused by perturbations. Our proposed pipelines were applied to investigate molecular mechanisms for chemical-induced reversible neurotoxicity

    STATISTICAL ISSUES IN NEXT-GENERATION SEQUENCING

    Get PDF
    High throughput deep-sequencing or next-generation sequencing has emerged as an exciting new tool in a great number of applications (e.g., variant discovery, profiling of histone modifications, identifying transcription factor binding sites, resequencing, and transcriptome characterization). Even though this technology has generated unprecedented amounts of data in the scientific community few studies have looked carefully at its inherent variability. Recent studies of mRNA expression levels found little appreciable technical variation in Illumina’s Solexa sequencing platform (a next-generation sequencing device). Although these results are encouraging, they are limited to a specific platform and application, and have been made without any attention to experimental design. This paper provides an overview of some key issues in data management and experimental design related to Illumina’s Solexa Genome Analyzer technology

    Machine Learning Models for Deciphering Regulatory Mechanisms and Morphological Variations in Cancer

    Get PDF
    The exponential growth of multi-omics biological datasets is resulting in an emerging paradigm shift in fundamental biological research. In recent years, imaging and transcriptomics datasets are increasingly incorporated into biological studies, pushing biology further into the domain of data-intensive-sciences. New approaches and tools from statistics, computer science, and data engineering are profoundly influencing biological research. Harnessing this ever-growing deluge of multi-omics biological data requires the development of novel and creative computational approaches. In parallel, fundamental research in data sciences and Artificial Intelligence (AI) has advanced tremendously, allowing the scientific community to generate a massive amount of knowledge from data. Advances in Deep Learning (DL), in particular, are transforming many branches of engineering, science, and technology. Several of these methodologies have already been adapted for harnessing biological datasets; however, there is still a need to further adapt and tailor these techniques to new and emerging technologies. In this dissertation, we present computational algorithms and tools that we have developed to study gene-regulation and cellular morphology in cancer. The models and platforms that we have developed are general and widely applicable to several problems relating to dysregulation of gene expression in diseases. Our pipelines and software packages are disseminated in public repositories for larger scientific community use. This dissertation is organized in three main projects. In the first project, we present Causal Inference Engine (CIE), an integrated platform for the identification and interpretation of active regulators of transcriptional response. The platform offers visualization tools and pathway enrichment analysis to map predicted regulators to Reactome pathways. We provide a parallelized R-package for fast and flexible directional enrichment analysis to run the inference on custom regulatory networks. Next, we designed and developed MODEX, a fully automated text-mining system to extract and annotate causal regulatory interaction between Transcription Factors (TFs) and genes from the biomedical literature. MODEX uses putative TF-gene interactions derived from high-throughput ChIP-Seq or other experiments and seeks to collect evidence and meta-data in the biomedical literature to validate and annotate the interactions. MODEX is a complementary platform to CIE that provides auxiliary information on CIE inferred interactions by mining the literature. In the second project, we present a Convolutional Neural Network (CNN) classifier to perform a pan-cancer analysis of tumor morphology, and predict mutations in key genes. The main challenges were to determine morphological features underlying a genetic status and assess whether these features were common in other cancer types. We trained an Inception-v3 based model to predict TP53 mutation in five cancer types with the highest rate of TP53 mutations. We also performed a cross-classification analysis to assess shared morphological features across multiple cancer types. Further, we applied a similar methodology to classify HER2 status in breast cancer and predict response to treatment in HER2 positive samples. For this study, our training slides were manually annotated by expert pathologists to highlight Regions of Interest (ROIs) associated with HER2+/- tumor microenvironment. Our results indicated that there are strong morphological features associated with each tumor type. Moreover, our predictions highly agree with manual annotations in the test set, indicating the feasibility of our approach in devising an image-based diagnostic tool for HER2 status and treatment response prediction. We have validated our model using samples from an independent cohort, which demonstrates the generalizability of our approach. Finally, in the third project, we present an approach to use spatial transcriptomics data to predict spatially-resolved active gene regulatory mechanisms in tissues. Using spatial transcriptomics, we identified tissue regions with differentially expressed genes and applied our CIE methodology to predict active TFs that can potentially regulate the marker genes in the region. This project bridged the gap between inference of active regulators using molecular data and morphological studies using images. The results demonstrate a significant local pattern in TF activity across the tissue, indicating differential spatial-regulation in tissues. The results suggest that the integrative analysis of spatial transcriptomics data with CIE can capture discriminant features and identify localized TF-target links in the tissue

    Ontology-Based Meta-Analysis of Global Collections of High-Throughput Public Data

    Get PDF
    The investigation of the interconnections between the molecular and genetic events that govern biological systems is essential if we are to understand the development of disease and design effective novel treatments. Microarray and next-generation sequencing technologies have the potential to provide this information. However, taking full advantage of these approaches requires that biological connections be made across large quantities of highly heterogeneous genomic datasets. Leveraging the increasingly huge quantities of genomic data in the public domain is fast becoming one of the key challenges in the research community today.We have developed a novel data mining framework that enables researchers to use this growing collection of public high-throughput data to investigate any set of genes or proteins. The connectivity between molecular states across thousands of heterogeneous datasets from microarrays and other genomic platforms is determined through a combination of rank-based enrichment statistics, meta-analyses, and biomedical ontologies. We address data quality concerns through dataset replication and meta-analysis and ensure that the majority of the findings are derived using multiple lines of evidence. As an example of our strategy and the utility of this framework, we apply our data mining approach to explore the biology of brown fat within the context of the thousands of publicly available gene expression datasets.Our work presents a practical strategy for organizing, mining, and correlating global collections of large-scale genomic data to explore normal and disease biology. Using a hypothesis-free approach, we demonstrate how a data-driven analysis across very large collections of genomic data can reveal novel discoveries and evidence to support existing hypothesis

    Analysis of the transcriptional program governing meiosis and gametogenesis in yeast and mammals

    Get PDF
    During meiosis a competent diploid cell replicates its DNA once and then undergoes two consecutive divisions followed by haploid gamete differentiation. Important aspects of meiotic development that distinguish it from mitotic growth include a highly increased rate of recombination, formation of the synaptonemal complex that aligns the homologous chromosomes, as well as separation of the homologues and sister chromatids during meiosis I and II without an intervening S-phase. Budding yeast is an excellent model organism to study meiosis and gametogenesis and accordingly, to date it belongs to the best studied eukaryotic systems in this context. Knowledge coming from these studies has provided important insights into meiotic development in higher eukaryotes. This was possible because sporulation in yeast and spermatogenesis in higher eukaryotes are analogous developmental pathways that involve conserved genes. For budding yeast a huge amount of data from numerous genome-scale studies on gene expression and deletion phenotypes of meiotic development and sporulation are available. In contrast, mammalian gametogenesis has not been studied on a large-scale until recently. It was unclear if an expression profiling study using germ cells and testicular somatic control cells that underwent lengthy purification procedures would yield interpretable results. We have therefore carried out a pioneering expression profiling study of male germ cells from Rattus norvegicus using Affymetrix U34A and B GeneChips. This work resulted in the first comprehensive large-scale expression profiling analysis of mammalian male germ cells undergoing mitotic growth, meiosis and gametogenesis. We have identified 1268 differentially expressed genes in germ cells at different developmental stages, which were organized into four distinct expression clusters that reflect somatic, mitotic, meiotic and post-meiotic cell types. This included 293 yet uncharacterized transcripts whose expression pattern suggests that they are involved in spermatogenesis and fertility. A group of 121 transcripts were only expressed in meiotic (spermatocytes) and postmeiotic germ cells (round spermatids) but not in dividing germ cells (spermatogonia), Sertoli cells or two somatic control tissues (brain and skeletal muscle). Functional analysis reveals that most of the known genes in this group fulfill essential functions during meiosis, spermiogenesis (the process of sperm maturation) and fertility. Therefore it is highly possible that some of the �30 uncharacterized transcripts in this group also contribute to these processes. A web-accessible database (called reXbase, which was later on integrated into GermOnline) has been developed for our expression profiling study of mammalian male meiosis, which summarizes annotation information and shows a graphical display of expression profiles of every gene covered in our study. In the budding yeast Saccharomyces cerevisiae entry into meiosis and subsequent progression through sporulation and gametogenesis are driven by a highly regulated transcriptional program activated by signal pathways responding to nutritional and cell-type cues. Abf1p, which is a general transcription factor, has previously been demonstrated to participate in the induction of numerous mitotic as well as early and middle meiotic genes. In the current study we have addressed the question how Abf1p transcriptionally coordinates mitotic growth and meiotic development on a genome-wide level. Because ABF1 is an essential gene we used the temperature-sensitive allele abf1-1. A phenotypical analysis of mutant cells revealed that ABF1 plays an important role in cell separation during mitosis, meiotic development, and spore formation. In order to identify genes whose expression depends on Abf1p in growing and sporulating cells we have performed expression profiling experiments using Affymetrix S98 GeneChips comparing wild-type and abf1-1 mutant cells at both permissive and restrictive temperature. We have identified 504 genes whose normal expression depends on functional ABF1. By combining the expression profiling data with data from genome-wide DNA binding assays (ChIPCHIP) and in silico predictions of potential Abf1p-binding sites in the yeast genome, we were able to define direct target genes. Expression of these genes decreases in the absence of functional ABF1 and whose promotors are bound by Abf1p and/or contain a predicted binding site. Among 352 such bona fide direct target genes we found many involved in ribosome biogenesis, translation, vegetative growth and meiotic developement and therefore could account for the observed growth and sporulation defects of abf1-1 mutant cells. Furthermore, the fact that two members of the septin family (CDC3 and CDC10 ) were found to be direct target genes suggests a novel role for Abf1p in cytokinesis. This was further substantiated by the observation that chitin localization and septin ring formation are perturbed in abf1-1 mutant cells

    Clinical Utility of Microarrays: Current Status, Existing Challenges and Future Outlook

    Get PDF
    Microarray-based clinical tests have become powerful tools in the diagnosis and treatment of diseases. In contrast to traditional DNA-based tests that largely focus on single genes associated with rare conditions, microarray-based tests are ideal for the study of diseases with underlying complex genetic causes. Several microarray based tests have been translated into clinical practice such as MammaPrint and AmpliChip CYP450. Additional cancer-related microarray-based tests are either in the process of FDA review or under active development, including Tissue of Tumor Origin and AmpliChip p53. All diagnostic microarray testing is ordered by physicians and tested by a Clinical Laboratories Improvement Amendment-certified (CLIA) reference laboratory. Recently, companies offering consumer based microarray testing have emerged. Individuals can order tests online and service providers deliver the results directly to the clients via a password-protected secure website. Navigenics, 23andMe and deCODE Genetics represent pioneering companies in this field. Although the progress of these microarray-based tests is extremely encouraging with the potential to revolutionize the recognition and treatment of common diseases, these tests are still in their infancy and face technical, clinical and marketing challenges. In this article, we review microarray-based tests which are currently approved or under review by the FDA, as well as the consumer-based testing. We also provide a summary of the challenges and strategic solutions in the development and clinical use of the microarray-based tests. Finally, we present a brief outlook for the future of microarray-based clinical applications
    corecore