719 research outputs found

    Partitioning Clustering Based on Support Vector Ranking

    Get PDF
    Postprin

    Support Vector Motion Clustering

    Get PDF
    This work was supported in part by the Erasmus Mundus Joint Doctorate in Interactive and Cognitive Environments (which is funded by the EACEA Agency of the European Commission under EMJD ICE FPA n 2010-0012) and by the Artemis JU and the UK Technology Strategy Board through COPCAMS Project under Grant 332913

    Zebrafish differentially process colour across visual space to match natural scenes

    Get PDF
    Animal eyes have evolved to process behaviourally important visual information, but how retinas deal with statistical asymmetries in visual space remains poorly understood. Using hyperspectral imaging in the field, in-vivo 2-photon imaging of retinal neurons and anatomy, here we show that larval zebrafish use a highly anisotropic retina to asymmetrically survey their natural visual world. First, different neurons dominate different parts of the eye, and are linked to a systematic shift in inner retinal function: Above the animal, there is little colour in nature and retinal circuits are largely achromatic. Conversely, the lower visual field and horizon are colour-rich and are predominately surveyed by chromatic and colour-opponent circuits that are spectrally matched to the dominant chromatic axes in nature. Second, in the horizontal and lower visual field bipolar cell terminals encoding achromatic and colour opponent visual features are systematically arranged into distinct layers of the inner retina. Third, above the frontal horizon, a high-gain ultraviolet-system piggy-backs onto retinal circuits, likely to support prey-capture

    Image Based Biomarkers from Magnetic Resonance Modalities: Blending Multiple Modalities, Dimensions and Scales.

    Get PDF
    The successful analysis and processing of medical imaging data is a multidisciplinary work that requires the application and combination of knowledge from diverse fields, such as medical engineering, medicine, computer science and pattern classification. Imaging biomarkers are biologic features detectable by imaging modalities and their use offer the prospect of more efficient clinical studies and improvement in both diagnosis and therapy assessment. The use of Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI) and its application to the diagnosis and therapy has been extensively validated, nevertheless the issue of an appropriate or optimal processing of data that helps to extract relevant biomarkers to highlight the difference between heterogeneous tissue still remains. Together with DCE-MRI, the data extracted from Diffusion MRI (DWI-MR and DTI-MR) represents a promising and complementary tool. This project initially proposes the exploration of diverse techniques and methodologies for the characterization of tissue, following an analysis and classification of voxel-level time-intensity curves from DCE-MRI data mainly through the exploration of dissimilarity based representations and models. We will explore metrics and representations to correlate the multidimensional data acquired through diverse imaging modalities, a work which starts with the appropriate elastic registration methodology between DCE-MRI and DWI- MR on the breast and its corresponding validation. It has been shown that the combination of multi-modal MRI images improve the discrimination of diseased tissue. However the fusion of dissimilar imaging data for classification and segmentation purposes is not a trivial task, there is an inherent difference in information domains, dimensionality and scales. This work also proposes a multi-view consensus clustering methodology for the integration of multi-modal MR images into a unified segmentation of tumoral lesions for heterogeneity assessment. Using a variety of metrics and distance functions this multi-view imaging approach calculates multiple vectorial dissimilarity-spaces for each one of the MRI modalities and makes use of the concepts behind cluster ensembles to combine a set of base unsupervised segmentations into an unified partition of the voxel-based data. The methodology is specially designed for combining DCE-MRI and DTI-MR, for which a manifold learning step is implemented in order to account for the geometric constrains of the high dimensional diffusion information.The successful analysis and processing of medical imaging data is a multidisciplinary work that requires the application and combination of knowledge from diverse fields, such as medical engineering, medicine, computer science and pattern classification. Imaging biomarkers are biologic features detectable by imaging modalities and their use offer the prospect of more efficient clinical studies and improvement in both diagnosis and therapy assessment. The use of Dynamic Contrast Enhanced Magnetic Resonance Imaging (DCE-MRI) and its application to the diagnosis and therapy has been extensively validated, nevertheless the issue of an appropriate or optimal processing of data that helps to extract relevant biomarkers to highlight the difference between heterogeneous tissue still remains. Together with DCE-MRI, the data extracted from Diffusion MRI (DWI-MR and DTI-MR) represents a promising and complementary tool. This project initially proposes the exploration of diverse techniques and methodologies for the characterization of tissue, following an analysis and classification of voxel-level time-intensity curves from DCE-MRI data mainly through the exploration of dissimilarity based representations and models. We will explore metrics and representations to correlate the multidimensional data acquired through diverse imaging modalities, a work which starts with the appropriate elastic registration methodology between DCE-MRI and DWI- MR on the breast and its corresponding validation. It has been shown that the combination of multi-modal MRI images improve the discrimination of diseased tissue. However the fusion of dissimilar imaging data for classification and segmentation purposes is not a trivial task, there is an inherent difference in information domains, dimensionality and scales. This work also proposes a multi-view consensus clustering methodology for the integration of multi-modal MR images into a unified segmentation of tumoral lesions for heterogeneity assessment. Using a variety of metrics and distance functions this multi-view imaging approach calculates multiple vectorial dissimilarity-spaces for each one of the MRI modalities and makes use of the concepts behind cluster ensembles to combine a set of base unsupervised segmentations into an unified partition of the voxel-based data. The methodology is specially designed for combining DCE-MRI and DTI-MR, for which a manifold learning step is implemented in order to account for the geometric constrains of the high dimensional diffusion information

    A Sensitivity and Array-Configuration Study for Measuring the Power Spectrum of 21cm Emission from Reionization

    Full text link
    Telescopes aiming to measure 21cm emission from the Epoch of Reionization must toe a careful line, balancing the need for raw sensitivity against the stringent calibration requirements for removing bright foregrounds. It is unclear what the optimal design is for achieving both of these goals. Via a pedagogical derivation of an interferometer's response to the power spectrum of 21cm reionization fluctuations, we show that even under optimistic scenarios, first-generation arrays will yield low-SNR detections, and that different compact array configurations can substantially alter sensitivity. We explore the sensitivity gains of array configurations that yield high redundancy in the uv-plane -- configurations that have been largely ignored since the advent of self-calibration for high-dynamic-range imaging. We first introduce a mathematical framework to generate optimal minimum-redundancy configurations for imaging. We contrast the sensitivity of such configurations with high-redundancy configurations, finding that high-redundancy configurations can improve power-spectrum sensitivity by more than an order of magnitude. We explore how high-redundancy array configurations can be tuned to various angular scales, enabling array sensitivity to be directed away from regions of the uv-plane (such as the origin) where foregrounds are brighter and where instrumental systematics are more problematic. We demonstrate that a 132-antenna deployment of the Precision Array for Probing the Epoch of Reionization (PAPER) observing for 120 days in a high-redundancy configuration will, under ideal conditions, have the requisite sensitivity to detect the power spectrum of the 21cm signal from reionization at a 3\sigma level at k<0.25h Mpc^{-1} in a bin of \Delta ln k=1. We discuss the tradeoffs of low- versus high-redundancy configurations.Comment: 34 pages, 5 figures, 2 appendices. Version accepted to Ap

    Cost-Quality Trade-Offs in One-Class Active Learning

    Get PDF
    Active learning is a paradigm to involve users in a machine learning process. The core idea of active learning is to ask a user to annotate a specific observation to improve the classification performance. One important application of active learning is detecting outliers, i.e., unusual observations that deviate from the regular ones in a data set. Applying active learning for outlier detection in practice requires to design a system that consists of several components: the data, the classifier that discerns between inliers and outliers, the query strategy that selects the observations for feedback collection, and an oracle, e.g., the human expert that annotates the queries. Each of these components and their interplay influences the classification quality. Naturally, there are cost budgets limiting certain parts of the system, e.g., the number of queries one can ask a human. Thus, to configure efficient active learning systems, one must decide on several trade-offs between costs and quality. The existing literature on active learning systems does not provide an overview nor a formal description of the cost-quality trade-offs of active learning. All this makes the configuration of efficient active learning systems in practice difficult. In this thesis, we study different cost-quality trade-offs that are pivotal for configuring an active learning system for outlier detection. We first provide an overview of the costs of an active learning system. Then, we analyze three important trade-offs and propose ways to model and quantify them. In our first contribution, we study how one can reduce classification training costs by training only on a sample of the data set. We formalize the sampling trade-off between classifier training costs and resulting quality as an optimization problem and propose an efficient algorithm to solve it. Compared to the existing sampling methods in literature, our approach guarantees that a classifier trained on our sample makes the same predictions as if trained on the complete data set. We can therefore reduce the classification training costs without a loss of classification quality. In our second contribution, we investigate how selecting multiple queries allows trading off costs against quality. So-called batch queries reduce classifier training costs because the system only updates the classifier once for each batch. But the annotation of a batch may give redundant information, which reduces the achievable quality with a fixed query budget. We are the first to consider batch queries for outlier detection, a generalization of the more common case to query sequentially. We formalize batch active learning and propose several strategies to construct batches by modeling the expected utility of a batch. In our third contribution, we propose query synthesis for outlier detection. Query synthesis allows to artificially generate queries at any point in the data space without being restricted by a pool of query candidates. We propose a framework to efficiently synthesize queries and develop a novel query strategy to improve the generalization of a classifier beyond a biased data set with active learning. For all contributions, we derive recommendations for the cost-quality trade-offs from formal investigations and empirical studies to facilitate the configuration of robust and efficient active learning systems for outlier detection

    Improving the Tractography Pipeline: on Evaluation, Segmentation, and Visualization

    Get PDF
    Recent advances in tractography allow for connectomes to be constructed in vivo. These have applications for example in brain tumor surgery and understanding of brain development and diseases. The large size of the data produced by these methods lead to a variety problems, including how to evaluate tractography outputs, development of faster processing algorithms for tractography and clustering, and the development of advanced visualization methods for verification and exploration. This thesis presents several advances in these fields. First, an evaluation is presented for the robustness to noise of multiple commonly used tractography algorithms. It employs a Monte–Carlo simulation of measurement noise on a constructed ground truth dataset. As a result of this evaluation, evidence for obustness of global tractography is found, and algorithmic sources of uncertainty are identified. The second contribution is a fast clustering algorithm for tractography data based on k–means and vector fields for representing the flow of each cluster. It is demonstrated that this algorithm can handle large tractography datasets due to its linear time and memory complexity, and that it can effectively integrate interrupted fibers that would be rejected as outliers by other algorithms. Furthermore, a visualization for the exploration of structural connectomes is presented. It uses illustrative rendering techniques for efficient presentation of connecting fiber bundles in context in anatomical space. Visual hints are employed to improve the perception of spatial relations. Finally, a visualization method with application to exploration and verification of probabilistic tractography is presented, which improves on the previously presented Fiber Stippling technique. It is demonstrated that the method is able to show multiple overlapping tracts in context, and correctly present crossing fiber configurations

    Robust Face Recognition under Uncontrolled Environments

    Get PDF

    Doctor of Philosophy

    Get PDF
    dissertationWith the tremendous growth of data produced in the recent years, it is impossible to identify patterns or test hypotheses without reducing data size. Data mining is an area of science that extracts useful information from the data by discovering patterns and structures present in the data. In this dissertation, we will largely focus on clustering which is often the first step in any exploratory data mining task, where items that are similar to each other are grouped together, making downstream data analysis robust. Different clustering techniques have different strengths, and the resulting groupings provide different perspectives on the data. Due to the unsupervised nature i.e., the lack of domain experts who can label the data, validation of results is very difficult. While there are measures that compute "goodness" scores for clustering solutions as a whole, there are few methods that validate the assignment of individual data items to their clusters. To address these challenges we focus on developing a framework that can generate, compare, combine, and evaluate different solutions to make more robust and significant statements about the data. In the first part of this dissertation, we present fast and efficient techniques to generate and combine different clustering solutions. We build on some recent ideas on efficient representations of clusters of partitions to develop a well founded metric that is spatially aware to compare clusterings. With the ability to compare clusterings, we describe a heuristic to combine different solutions to produce a single high quality clustering. We also introduce a Markov chain Monte Carlo approach to sample different clusterings from the entire landscape to provide the users with a variety of choices. In the second part of this dissertation, we build certificates for individual data items and study their influence on effective data reduction. We present a geometric approach by defining regions of influence for data items and clusters and use this to develop adaptive sampling techniques to speedup machine learning algorithms. This dissertation is therefore a systematic approach to study the landscape of clusterings in an attempt to provide a better understanding of the data
    corecore