354 research outputs found

    Renewing the respect for similarity

    Get PDF
    In psychology, the concept of similarity has traditionally evoked a mixture of respect, stemming from its ubiquity and intuitive appeal, and concern, due to its dependence on the framing of the problem at hand and on its context. We argue for a renewed focus on similarity as an explanatory concept, by surveying established results and new developments in the theory and methods of similarity-preserving associative lookup and dimensionality reduction—critical components of many cognitive functions, as well as of intelligent data management in computer vision. We focus in particular on the growing family of algorithms that support associative memory by performing hashing that respects local similarity, and on the uses of similarity in representing structured objects and scenes. Insofar as these similarity-based ideas and methods are useful in cognitive modeling and in AI applications, they should be included in the core conceptual toolkit of computational neuroscience. In support of this stance, the present paper (1) offers a discussion of conceptual, mathematical, computational, and empirical aspects of similarity, as applied to the problems of visual object and scene representation, recognition, and interpretation, (2) mentions some key computational problems arising in attempts to put similarity to use, along with their possible solutions, (3) briefly states a previously developed similarity-based framework for visual object representation, the Chorus of Prototypes, along with the empirical support it enjoys, (4) presents new mathematical insights into the effectiveness of this framework, derived from its relationship to locality-sensitive hashing (LSH) and to concomitant statistics, (5) introduces a new model, the Chorus of Relational Descriptors (ChoRD), that extends this framework to scene representation and interpretation, (6) describes its implementation and testing, and finally (7) suggests possible directions in which the present research program can be extended in the future

    Building Blocks for Mapping Services

    Get PDF
    Mapping services are ubiquitous on the Internet. These services enjoy a considerable user base. But it is often overlooked that providing a service on a global scale with virtually millions of users has been the playground of an oligopoly of a select few service providers are able to do so. Unfortunately, the literature on these solutions is more than scarce. This thesis adds a number of building blocks to the literature that explain how to design and implement a number of features

    Algorithms for Triangles, Cones & Peaks

    Get PDF
    Three different geometric objects are at the center of this dissertation: triangles, cones and peaks. In computational geometry, triangles are the most basic shape for planar subdivisions. Particularly, Delaunay triangulations are a widely used for manifold applications in engineering, geographic information systems, telecommunication networks, etc. We present two novel parallel algorithms to construct the Delaunay triangulation of a given point set. Yao graphs are geometric spanners that connect each point of a given set to its nearest neighbor in each of kk cones drawn around it. They are used to aid the construction of Euclidean minimum spanning trees or in wireless networks for topology control and routing. We present the first implementation of an optimal O(nlogn)\mathcal{O}(n \log n)-time sweepline algorithm to construct Yao graphs. One metric to quantify the importance of a mountain peak is its isolation. Isolation measures the distance between a peak and the closest point of higher elevation. Computing this metric from high-resolution digital elevation models (DEMs) requires efficient algorithms. We present a novel sweep-plane algorithm that can calculate the isolation of all peaks on Earth in mere minutes

    Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

    Full text link
    By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years.Comment: Editorial changes (typos fixed, minor adjustments to figures

    Data exploration process based on the self-organizing map

    Get PDF
    With the advances in computer technology, the amount of data that is obtained from various sources and stored in electronic media is growing at exponential rates. Data mining is a research area which answers to the challange of analysing this data in order to find useful information contained therein. The Self-Organizing Map (SOM) is one of the methods used in data mining. It quantizes the training data into a representative set of prototype vectors and maps them on a low-dimensional grid. The SOM is a prominent tool in the initial exploratory phase in data mining. The thesis consists of an introduction and ten publications. In the publications, the validity of SOM-based data exploration methods has been investigated and various enhancements to them have been proposed. In the introduction, these methods are presented as parts of the data mining process, and they are compared with other data exploration methods with similar aims. The work makes two primary contributions. Firstly, it has been shown that the SOM provides a versatile platform on top of which various data exploration methods can be efficiently constructed. New methods and measures for visualization of data, clustering, cluster characterization, and quantization have been proposed. The SOM algorithm and the proposed methods and measures have been implemented as a set of Matlab routines in the SOM Toolbox software library. Secondly, a framework for SOM-based data exploration of table-format data - both single tables and hierarchically organized tables - has been constructed. The framework divides exploratory data analysis into several sub-tasks, most notably the analysis of samples and the analysis of variables. The analysis methods are applied autonomously and their results are provided in a report describing the most important properties of the data manifold. In such a framework, the attention of the data miner can be directed more towards the actual data exploration task, rather than on the application of the analysis methods. Because of the highly iterative nature of the data exploration, the automation of routine analysis tasks can reduce the time needed by the data exploration process considerably.reviewe

    Population Structure, Connectivity, and Phylogeography of Two Balistidae with High Potential for Larval Dispersal: \u3ci\u3eBalistes capriscus\u3c/i\u3e and \u3ci\u3eBalistes vetula\u3c/i\u3e

    Get PDF
    The gray triggerfish (Balistes capriscus) and the queen triggerfish (Balistes vetula) are two exploited reef fish distributed in tropical and temperate shelf waters of the Atlantic Ocean and the Mediterranean Sea. Both species are highly sedentary as adults but disperse pelagic larvae for extended periods of time potentially allowing connectivity across long distances under the action of oceanic currents. In this work population structure, phylogeography, and migration patterns were examined in the two species and contrasted with predictions of larval transport based on surface circulation data. A total of 1,017 gray triggerfish from twelve sampling localities spanning the species distribution range were assayed at 17 homologous microsatellite markers and sequence variation at the ND4 mitochondrial gene. Four genetically distinct populations were detected including (i) a North Atlantic group that comprised the North American, European, and Northwest African populations, (ii) a Mediterranean group that was inferred to result from a recent colonization of the Mediterranean Sea by a small number of migrants of North Atlantic origin, (iii) a southeastern Atlantic group that included populations from the Gulf of Guinea and Southwest Africa, and (iv) a southwestern Atlantic group recently diverged from the southeastern group. Analysis of phylogeography supported long-term historical isolation of the South Atlantic and North Atlantic groups. Assignment tests and isolation-by-distance analysis supported the hypothesis of long-distance connectivity with evidence for transatlantic migrations and estimates of the mean dispersal distance of 740 km or greater. The high estimates of contemporaneous migration rates (up to 36.7%) may reflect increased larval transport in connection with the recent development of new Sargassum in the equatorial region. Analysis of high density genome scans revealed homogeneous distributions of genetic variants among queen triggerfish from the French Antilles, the U.S. Virgin-Islands, and South Florida, suggesting high connectivity is occurring across the region. These findings suggest that, in both species, local recruitment depends largely on the output of spawning populations located hundreds or thousands of kilometers away from a given stock, highlighting the need to conserve populations across each species’ range in particular in areas where circulation patterns predict a low likelihood of incoming migrants
    corecore