2,159 research outputs found

    Scalable approximate FRNN-OWA classification

    Get PDF
    Fuzzy Rough Nearest Neighbour classification with Ordered Weighted Averaging operators (FRNN-OWA) is an algorithm that classifies unseen instances according to their membership in the fuzzy upper and lower approximations of the decision classes. Previous research has shown that the use of OWA operators increases the robustness of this model. However, calculating membership in an approximation requires a nearest neighbour search. In practice, the query time complexity of exact nearest neighbour search algorithms in more than a handful of dimensions is near-linear, which limits the scalability of FRNN-OWA. Therefore, we propose approximate FRNN-OWA, a modified model that calculates upper and lower approximations of decision classes using the approximate nearest neighbours returned by Hierarchical Navigable Small Worlds (HNSW), a recent approximative nearest neighbour search algorithm with logarithmic query time complexity at constant near-100% accuracy. We demonstrate that approximate FRNN-OWA is sufficiently robust to match the classification accuracy of exact FRNN-OWA while scaling much more efficiently. We test four parameter configurations of HNSW, and evaluate their performance by measuring classification accuracy and construction and query times for samples of various sizes from three large datasets. We find that with two of the parameter configurations, approximate FRNN-OWA achieves near-identical accuracy to exact FRNN-OWA for most sample sizes within query times that are up to several orders of magnitude faster

    Structure and evolution of protoplanetary disks

    Get PDF
    We present here a few thoughts on how high-angular resolution observations can give clues to some properties of protoplanetary disks that are fundamental to theories of planet formation. High-angular resolution infrared spectroscopy, either with a large single mirror telescope, or by using infrared interferometry, allows us to probe the abundance of thermally processed dust in the disk as a function of distance to the star. We show that this radial abundance profile can give information about the early evolution of the protoplanetary disk as well as about the nature of the turbulence. Since turbulence is one of the main ingredients in theories of planet formation, this latter result is particularly important. We also show that Nature itself provides an interesting way to perform high-angular resolution observations with intermediate-angular resolution telescopes: if a disk has a (nearly) edge-on orientation and is located in a low-density ambient dusty medium, the disk casts a shadow into this medium, as it blocks the starlight in equatorial direction. We argue how these shadows can be used to characterize the dust in the disk

    Fuzzy covering based rough sets revisited

    Get PDF
    In this paper we review four fuzzy extensions of the so-called tight pair of covering based rough set approximation operators. Furthermore, we propose two new extensions of the tight pair: for the first model, we apply the technique of representation by levels to define the approximation operators, while the second model is an intuitive extension of the crisp operators. For the six models, we study which theoretical properties they satisfy. Moreover, we discuss interrelationships between the models

    A scalable approach to fuzzy rough nearest neighbour classification with ordered weighted averaging operators

    Get PDF
    Fuzzy rough sets have been successfully applied in classification tasks, in particular in combination with OWA operators. There has been a lot of research into adapting algorithms for use with Big Data through parallelisation, but no concrete strategy exists to design a Big Data fuzzy rough sets based classifier. Existing Big Data approaches use fuzzy rough sets for feature and prototype selection, and have often not involved very large datasets. We fill this gap by presenting the first Big Data extension of an algorithm that uses fuzzy rough sets directly to classify test instances, a distributed implementation of FRNN-OWA in Apache Spark. Through a series of systematic tests involving generated datasets, we demonstrate that it can achieve a speedup effectively equal to the number of computing cores used, meaning that it can scale to arbitrarily large datasets

    Polar Encoding: A Simple Baseline Approach for Classification with Missing Values

    Full text link
    We propose polar encoding, a representation of categorical and numerical [0,1][0,1]-valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and [0,1][0,1]-valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that this offers a natural interpretation of polar encoding as a fuzzified form of one-hot encoding. With an experiment based on twenty real-life datasets with missing values, we show that, in terms of the resulting classification performance, polar encoding performs better than the state-of-the-art strategies \e{multiple imputation by chained equations} (MICE) and \e{multiple imputation with denoising autoencoders} (MIDAS) and -- depending on the classifier -- about as well or better than mean/mode imputation with missing-indicators

    Mesmerize is a dynamically adaptable user-friendly analysis platform for 2D and 3D calcium imaging data

    Get PDF
    Calcium imaging is an increasingly valuable technique for understanding neural circuits, neuroethology, and cellular mechanisms. The analysis of calcium imaging data presents challenges in image processing, data organization, analysis, and accessibility. Tools have been created to address these problems independently, however a comprehensive user-friendly package does not exist. Here we present Mesmerize, an efficient, expandable and user-friendly analysis platform, which uses a Findable, Accessible, Interoperable and Reproducible (FAIR) system to encapsulate the entire analysis process, from raw data to interactive visualizations for publication. Mesmerize provides a user-friendly graphical interface to state-of-the-art analysis methods for signal extraction & downstream analysis. We demonstrate the broad scientific scope of Mesmerize’s applications by analyzing neuronal datasets from mouse and a volumetric zebrafish dataset. We also applied contemporary time-series analysis techniques to analyze a novel dataset comprising neuronal, epidermal, and migratory mesenchymal cells of the protochordate Ciona intestinalis.publishedVersio
    • …
    corecore