9 research outputs found

    WHIDE—a web tool for visual data mining colocation patterns in multivariate bioimages

    Get PDF
    Motivation: Bioimaging techniques rapidly develop toward higher resolution and dimension. The increase in dimension is achieved by different techniques such as multitag fluorescence imaging, Matrix Assisted Laser Desorption / Ionization (MALDI) imaging or Raman imaging, which record for each pixel an N-dimensional intensity array, representing local abundances of molecules, residues or interaction patterns. The analysis of such multivariate bioimages (MBIs) calls for new approaches to support users in the analysis of both feature domains: space (i.e. sample morphology) and molecular colocation or interaction. In this article, we present our approach WHIDE (Web-based Hyperbolic Image Data Explorer) that combines principles from computational learning, dimension reduction and visualization in a free web application

    Spatio-temporal analysis of metabolite profiles during barley germination

    Get PDF
    Kölling J, Gorzolka K, Niehaus K, Nattkemper TW. Spatio-temporal analysis of metabolite profiles during barley germination. Presented at the German Conference on Bioinformatics (GCB), Bielefeld, Germany

    Robust normalization protocols for multiplexed fluorescence bioimage analysis

    Get PDF
    study of mapping and interaction of co-localized proteins at a sub-cellular level is important for understanding complex biological phenomena. One of the recent techniques to map co-localized proteins is to use the standard immuno-fluorescence microscopy in a cyclic manner (Nat Biotechnol 24:1270–8, 2006; Proc Natl Acad Sci 110:11982–7, 2013). Unfortunately, these techniques suffer from variability in intensity and positioning of signals from protein markers within a run and across different runs. Therefore, it is necessary to standardize protocols for preprocessing of the multiplexed bioimaging (MBI) data from multiple runs to a comparable scale before any further analysis can be performed on the data. In this paper, we compare various normalization protocols and propose on the basis of the obtained results, a robust normalization technique that produces consistent results on the MBI data collected from different runs using the Toponome Imaging System (TIS). Normalization results produced by the proposed method on a sample TIS data set for colorectal cancer patients were ranked favorably by two pathologists and two biologists. We show that the proposed method produces higher between class Kullback-Leibler (KL) divergence and lower within class KL divergence on a distribution of cell phenotypes from colorectal cancer and histologically normal samples

    Bioinformatics Solutions for Image Data Processing

    Get PDF
    In recent years, the increasing use of medical devices has led to the generation of large amounts of data, including image data. Bioinformatics solutions provide an effective approach for image data processing in order to retrieve information of interest and to integrate several data sources for knowledge extraction; furthermore, images processing techniques support scientists and physicians in diagnosis and therapies. In addition, bioinformatics image analysis may be extended to support several scenarios, for instance, in cyber-security the biometric recognition systems are applied to unlock devices and restricted areas, as well as to access sensitive data. In medicine, computational platforms generate high amount of data from medical devices such as Computed Tomography (CT), and Magnetic Resonance Imaging (MRI); this chapter will survey on bioinformatics solutions and toolkits for medical imaging in order to suggest an overview of techniques and methods that can be applied for the imaging analysis in medicine

    Cluster analysis of student activity in a web-based intelligent tutoring system

    Get PDF
    In this paper we present a model of a system for integration of an intelligent tutoring system with data mining tools. The purpose of the integration is twofold; a) to power the system adaptability based on clustering and sequential pattern mining, and b) to enable teachers (non-experts in data mining) to use data mining techniques in their web browser on a daily basis, and get useful visualizations that provide insights into the learning progress of their students. We also present an approach to clustering results evaluation developed so that the system can independently deduce the best number of clusters for the k-means algorithm as well as order the clusters in terms of learning efficiency of cluster members (students)

    Modelling and analysis of the tumour microenvironment of colorectal cancer

    Get PDF
    New bioimaging techniques have recently been proposed to visualise the colocation or interaction of several proteins within individual cells, displaying the heterogeneity of neighbouring cells within the same tissue specimen. Such techniques could hold the key to understanding complex biological systems such as the protein interactions involved in cancer. However, there is a need for new algorithmic approaches that analyse the large amounts of multi-tag bioimage data from cancerous and normal tissue specimens in order to begin to infer protein networks and unravel the cellular heterogeneity at a molecular level. In the firrst part of the thesis, we propose an approach to analyses cell phenotypes in normal and cancerous colon tissue imaged using the robotically controlled Toponome Imaging System (TIS) microscope. It involves segmenting the DAPI labelled image into cells and determining the cell phenotypes according to their protein-protein dependence profile. These were analysed using two new measures, Difference in Sums of Weighted cO-dependence/Anti-co-dependence profiles (DiSWOP and DiSWAP) for overall co-expression and anti-co-expression, respectively. This approach enables one to easily identify protein pairs which have significantly higher/lower co-dependence levels in cancerous tissue samples when compared to normal colon tissue. The proposed approach could identify potentially functional protein complexes active in cancer progression and cell differentiation. Due to the lack of ground truth data for bioimages, the objective evaluation of the methods developed for its analysis can be very challenging. To that end, in the second part of the thesis we propose a model of the healthy and cancerous colonic crypt microenvironments. Our model is designed to generate realistic synthetic fluorescence and histology image data with parameters that allow control over differentiation grade of cancer, crypt morphology, cellularity, cell overlap ratio, image resolution, and objective level. The model learns some of its parameters from real histology image data stained with standard Hematoxylin and Eosin (H&E) dyes in order to generate realistic chromatin texture, nuclei morphology, and crypt architecture. To the best of our knowledge, ours is the first model to simulate image data at subcellular level for healthy and cancerous colon tissue, where the cells are organised to mimic the microenvironment of tissue in situ rather than dispersed cells in a cultured environment. The simulated data could be used to validate techniques such as image restoration, cell segmentation, cell phenotyping, crypt segmentation, and differentiation grading, only to name a few. In addition, developing a detailed model of the tumour microenvironment can aid the understanding of the underpinning laws of tumour heterogeneity. In the third part of the thesis, we extend the model to include detailed models of protein expression to generate synthetic multi-tag fluorescence data. As a first step, we have developed models for various cell organelles that have been learned from real immunofluorescence data. We then develop models for five proteins associated with microsatellite instability, namely MLH1, PMS2, MSH2, MSH6 and p53. The protein models include subcellular location, which cells express the protein and under what conditions

    Preprocessing algorithms for the digital histology of colorectal cancer

    Get PDF
    Pre-processing techniques were developed for cell identification algorithms. These algorithms which locate and classify cells in digital microscopy images are important in digital pathology. The pre-processing methods included image sampling and colour normalisation for standard Haemotoxilyn and Eosin (H&E) images and co-localisation algorithms for multiplexed images. Data studied in the thesis came from patients with colorectal cancer. Patient histology images came from `The Cancer Genome Atlas' (TCGA), a repository with contributions from many different institutional sites. The multiplexed images were created by TIS, the Toponome Imaging System. Experiments with image sampling were applied to TCGA diagnostic images. The effect of sample size and sampling policy were evaluated. TCGA images were also used in experiments with colour normalisation algorithms. For TIS multiplexed images, probabilistic graphical models were developed as well as clustering applications. NW-BHC, an extension to Bayesian Hierarchical Clustering, was developed and, for TIS antibodies, applied to TCGA expression data. Using image sampling with a sample size of 100 tiles gave accurate prediction results while being seven to nine times faster than processing the entire image. The two most accurate colour normalisation methods were that of Macenko and a `Nave' algorithm. Accuracy varied by TCGA site, indicating that researchers should use several independent data sets when evaluating colour normalisation algorithms. Probabilistic graphical models, applied to multiplexed images, calculated links between pairs of antibodies. The application of clustering to cell nuclei resulted in two main groups, one associated with epithelial cells and the second associated with the stromal environment. For TCGA expression data and for several clustering metrics, NW-BHC improved on the standard EM algorithm

    A Graph Analytics Framework for Knowledge Discovery

    Get PDF
    Title from PDF of title page, viewed on June 20, 2016Dissertation advisor: Yugyung LeeVitaIncludes bibliographical references (pages 203-222)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2016In the current data movement, numerous efforts have been made to convert and normalize a large number of traditionally structured and unstructured data to semi-structured data (e.g., RDF, OWL). With the increasing number of semi-structured data coming into the big data community, data integration and knowledge discovery from heterogeneous do mains become important research problems. In the application level, detection of related concepts among ontologies shows a huge potential to do knowledge discovery with big data. In RDF graph, concepts represent entities and predicates indicate properties that connect different entities. It is more crucial to figure out how different concepts are re lated within a single ontology or across multiple ontologies by analyzing predicates in different knowledge bases. However, the world today is one of information explosion, and it is extremely difficult for researchers to find existing or potential predicates to per form linking among cross domains concepts without any support from schema pattern analysis. Therefore, there is a need for a mechanism to do predicate oriented pattern analysis to partition heterogeneous ontologies into closer small topics and generate query to discover cross domains knowledge from each topic. In this work, we present such a model that conducts predicate oriented pattern analysis based on their close relationship and generates a similarity matrix. Based on this similarity matrix, we apply an innovative unsupervised learning algorithm to partition large data sets into smaller and closer topics that generate meaningful queries to fully discover knowledge over a set of interlinked data sources. In this dissertation, we present a graph analytics framework that aims at providing semantic methods for analysis and pattern discovery from graph data with cross domains. Our contributions can be summarized as follows: • The definition of predicate oriented neighborhood measures to determine the neighborhood relationships among different RDF predicates of linked data across do mains; • The design of the global and local optimization of clustering and retrieval algorithms to maximize the knowledge discovery from large linked data: i) top-down clustering, called the Hierarchical Predicate oriented K-means Clustering;ii)bottom up clustering, called the Predicate oriented Hierarchical Agglomerative Clustering; iii) automatic topic discovery and query generation, context aware topic path finding for a given source and target pair; • The implementation of an interactive tool and endpoints for knowledge discovery and visualization from integrated query design and query processing for cross do mains; • Experimental evaluations conducted to validate proposed methodologies of the frame work using DBpedia, YAGO, and Bio2RDF datasets and comparison of the pro posed methods with existing graph partition methods and topic discovery methods. In this dissertation, we propose a framework called the GraphKDD. The GraphKDD is able to analyze and quantify close relationship among predicates based on Predicate Oriented Neighbor Pattern (PONP). Based on PONP, the GraphKDD conducts a Hierarchical Predicate oriented K-Means clustering (HPKM) algorithm and a Predicate oriented Hierarchical Agglomerative clustering (PHAL) algorithm to partition graphs into semantically related sub-graphs. In addition, in application level, the GraphKDD is capable of generating query dynamically from topic discovery results and testing reachability be tween source target nodes. We validate the proposed GraphKDD framework through comprehensive evaluations using DBPedia, Yago and Bio2RDF datasets.Introduction -- Predicate oriented neighborhood patterns -- Unsupervised learning on PONP Association Measurement -- Query generation and topic aware link discovery -- The GraphKDD ontology learning framework -- Conclusion and future wor
    corecore