430 research outputs found
Visual analytics for relationships in scientific data
Domain scientists hope to address grand scientific challenges by exploring the abundance of data generated and made available through modern high-throughput techniques. Typical scientific investigations can make use of novel visualization tools that enable dynamic formulation and fine-tuning of hypotheses to aid the process of evaluating sensitivity of key parameters. These general tools should be applicable to many disciplines: allowing biologists to develop an intuitive understanding of the structure of coexpression networks and discover genes that reside in critical positions of biological pathways, intelligence analysts to decompose social networks, and climate scientists to model extrapolate future climate conditions. By using a graph as a universal data representation of correlation, our novel visualization tool employs several techniques that when used in an integrated manner provide innovative analytical capabilities. Our tool integrates techniques such as graph layout, qualitative subgraph extraction through a novel 2D user interface, quantitative subgraph extraction using graph-theoretic algorithms or by querying an optimized B-tree, dynamic level-of-detail graph abstraction, and template-based fuzzy classification using neural networks. We demonstrate our system using real-world workflows from several large-scale studies. Parallel coordinates has proven to be a scalable visualization and navigation framework for multivariate data. However, when data with thousands of variables are at hand, we do not have a comprehensive solution to select the right set of variables and order them to uncover important or potentially insightful patterns. We present algorithms to rank axes based upon the importance of bivariate relationships among the variables and showcase the efficacy of the proposed system by demonstrating autonomous detection of patterns in a modern large-scale dataset of time-varying climate simulation
Recommended from our members
Constraint based approaches to interpretable and semi-supervised machine learning
Interpretability and Explainability of machine learning algorithms are becoming increasingly important as Machine Learning (ML) systems get widely applied to domains like clinical healthcare, social media and governance. A related major challenge in deploying ML systems pertains to reliable learning when expert annotation is severely limited. This dissertation prescribes a common framework to address these challenges, based on the use of constraints that can make an ML model more interpretable, lead to novel methods for explaining ML models, or help to learn reliably with limited supervision.
In particular, we focus on the class of latent variable models and develop a general learning framework by constraining realizations of latent variables and/or model parameters. We propose specific constraints that can be used to develop identifiable latent variable models, that in turn learn interpretable outcomes. The proposed framework is first used in Nonānegative Matrix Factorization and Probabilistic Graphical Models. For both models, algorithms are proposed to incorporate such constraints with seamless and tractable augmentation of the associated learning and inference procedures. The utility of the proposed methods is demonstrated for our working application domain ā identifiable phenotyping using Electronic Health Records (EHRs). Evaluation by domain experts reveals that the proposed models are indeed more clinically relevant (and hence more interpretable) than existing counterparts. The work also demonstrates that while there may be inherent tradeāoffs between constraining models to encourage interpretability, the quantitative performance of downstream tasks remains competitive.
We then focus on constraint based mechanisms to explain decisions or outcomes of supervised black-box models. We propose an explanation model based on generating examples where the nature of the examples is constrained i.e. they have to be sampled from the underlying data domain. To do so, we train a generative model to characterize the data manifold in a high dimensional ambient space. Constrained sampling then allows us to generate naturalistic examples that lie along the data manifold. We propose ways to summarize model behavior using such constrained examples.
In the last part of the contributions, we argue that heterogeneity of data sources is useful in situations where very little to no supervision is available. This thesis leverages such heterogeneity (via constraints) for two critical but widely different machine learning algorithms. In each case, a novel algorithm in the sub-class of coāregularization is developed to combine information from heterogeneous sources. Coāregularization is a framework of constraining latent variables and/or latent distributions in order to leverage heterogeneity. The proposed algorithms are utilized for clustering, where the intent is to generate a partition or grouping of observed samples, and for Learning to Rank algorithms ā used to rank a set of observed samples in order of preference with respect to a specific search query. The proposed methods are evaluated on clustering web documents, social network users, and information retrieval applications for ranking search queries.Electrical and Computer Engineerin
A comprehensive survey on deep active learning and its applications in medical image analysis
Deep learning has achieved widespread success in medical image analysis,
leading to an increasing demand for large-scale expert-annotated medical image
datasets. Yet, the high cost of annotating medical images severely hampers the
development of deep learning in this field. To reduce annotation costs, active
learning aims to select the most informative samples for annotation and train
high-performance models with as few labeled samples as possible. In this
survey, we review the core methods of active learning, including the evaluation
of informativeness and sampling strategy. For the first time, we provide a
detailed summary of the integration of active learning with other
label-efficient techniques, such as semi-supervised, self-supervised learning,
and so on. Additionally, we also highlight active learning works that are
specifically tailored to medical image analysis. In the end, we offer our
perspectives on the future trends and challenges of active learning and its
applications in medical image analysis.Comment: Paper List on Github:
https://github.com/LightersWang/Awesome-Active-Learning-for-Medical-Image-Analysi
AI in Medical Imaging Informatics: Current Challenges and Future Directions
This paper reviews state-of-the-art research solutions across the spectrum of medical imaging informatics, discusses clinical translation, and provides future directions for advancing clinical practice. More specifically, it summarizes advances in medical imaging acquisition technologies for different modalities, highlighting the necessity for efficient medical data management strategies in the context of AI in big healthcare data analytics. It then provides a synopsis of contemporary and emerging algorithmic methods for disease classification and organ/ tissue segmentation, focusing on AI and deep learning architectures that have already become the de facto approach. The clinical benefits of in-silico modelling advances linked with evolving 3D reconstruction and visualization applications are further documented. Concluding, integrative analytics approaches driven by associate research branches highlighted in this study promise to revolutionize imaging informatics as known today across the healthcare continuum for both radiology and digital pathology applications. The latter, is projected to enable informed, more accurate diagnosis, timely prognosis, and effective treatment planning, underpinning precision medicine
Automated Semantic Content Extraction from Images
In this study, an automatic semantic segmentation and object recognition methodology is implemented which bridges the semantic gap between low level features of image content and high level conceptual meaning. Semantically understanding an image is essential in modeling autonomous robots, targeting customers in marketing or reverse engineering of building information modeling in the construction industry. To achieve an understanding of a room from a single image we proposed a new object recognition framework which has four major components: segmentation, scene detection, conceptual cueing and object recognition. The new segmentation methodology developed in this research extends Felzenswalb\u27s cost function to include new surface index and depth features as well as color, texture and normal features to overcome issues of occlusion and shadowing commonly found in images. Adding depth allows capturing new features for object recognition stage to achieve high accuracy compared to the current state of the art. The goal was to develop an approach to capture and label perceptually important regions which often reflect global representation and understanding of the image. We developed a system by using contextual and common sense information for improving object recognition and scene detection, and fused the information from scene and objects to reduce the level of uncertainty. This study in addition to improving segmentation, scene detection and object recognition, can be used in applications that require physical parsing of the image into objects, surfaces and their relations. The applications include robotics, social networking, intelligence and anti-terrorism efforts, criminal investigations and security, marketing, and building information modeling in the construction industry. In this dissertation a structural framework (ontology) is developed that generates text descriptions based on understanding of objects, structures and the attributes of an image
Evidence combination for incremental decision-making processes
The establishment of a medical diagnosis is an incremental process highly fraught with uncertainty. At each step of this painstaking process, it may be beneficial to be able to quantify the uncertainty linked to the diagnosis and steadily update the uncertainty estimation using available sources of information, for example user feedback, as they become available. Using the example of medical data in general and EEG data in particular, we show what types of evidence can affect discrete variables such as a medical diagnosis and build a simple and computationally efficient evidence combination model based on the Dempster-Shafer theory
Symbiotic deep learning for medical image analysis with applications in real-time diagnosis for fetal ultrasound screening
The last hundred years have seen a monumental rise in the power and capability of machines to
perform intelligent tasks in the stead of previously human operators. This rise is not expected
to slow down any time soon and what this means for society and humanity as a whole remains
to be seen. The overwhelming notion is that with the right goals in mind, the growing influence
of machines on our every day tasks will enable humanity to give more attention to the truly
groundbreaking challenges that we all face together. This will usher in a new age of human
machine collaboration in which humans and machines may work side by side to achieve greater
heights for all of humanity. Intelligent systems are useful in isolation, but the true benefits of
intelligent systems come to the fore in complex systems where the interaction between humans
and machines can be made seamless, and it is this goal of symbiosis between human and machine
that may democratise complex knowledge, which motivates this thesis. In the recent past, datadriven
methods have come to the fore and now represent the state-of-the-art in many different
fields. Alongside the shift from rule-based towards data-driven methods we have also seen a
shift in how humans interact with these technologies. Human computer interaction is changing
in response to data-driven methods and new techniques must be developed to enable the same
symbiosis between man and machine for data-driven methods as for previous formula-driven
technology.
We address five key challenges which need to be overcome for data-driven human-in-the-loop
computing to reach maturity. These are (1) the āCategorisation Challengeā where we examine
existing work and form a taxonomy of the different methods being utilised for data-driven
human-in-the-loop computing; (2) the āConfidence Challengeā, where data-driven methods must
communicate interpretable beliefs in how confident their predictions are; (3) the āComplexity
Challengeā where the aim of reasoned communication becomes increasingly important as the
complexity of tasks and methods to solve also increases; (4) the āClassification Challengeā in
which we look at how complex methods can be separated in order to provide greater reasoning
in complex classification tasks; and finally (5) the āCuration Challengeā where we challenge the
assumptions around bottleneck creation for the development of supervised learning methods.Open Acces
- ā¦