Search CORE

354 research outputs found

Renewing the respect for similarity

Author: Reza Shahbazi
Shimon Edelman
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2012
Field of study

In psychology, the concept of similarity has traditionally evoked a mixture of respect, stemming from its ubiquity and intuitive appeal, and concern, due to its dependence on the framing of the problem at hand and on its context. We argue for a renewed focus on similarity as an explanatory concept, by surveying established results and new developments in the theory and methods of similarity-preserving associative lookup and dimensionality reduction—critical components of many cognitive functions, as well as of intelligent data management in computer vision. We focus in particular on the growing family of algorithms that support associative memory by performing hashing that respects local similarity, and on the uses of similarity in representing structured objects and scenes. Insofar as these similarity-based ideas and methods are useful in cognitive modeling and in AI applications, they should be included in the core conceptual toolkit of computational neuroscience. In support of this stance, the present paper (1) offers a discussion of conceptual, mathematical, computational, and empirical aspects of similarity, as applied to the problems of visual object and scene representation, recognition, and interpretation, (2) mentions some key computational problems arising in attempts to put similarity to use, along with their possible solutions, (3) briefly states a previously developed similarity-based framework for visual object representation, the Chorus of Prototypes, along with the empirical support it enjoys, (4) presents new mathematical insights into the effectiveness of this framework, derived from its relationship to locality-sensitive hashing (LSH) and to concomitant statistics, (5) introduces a new model, the Chorus of Relational Descriptors (ChoRD), that extends this framework to scene representation and interpretation, (6) describes its implementation and testing, and finally (7) suggests possible directions in which the present research program can be extended in the future

Crossref

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

Building Blocks for Mapping Services

Author: Luxen Dennis
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2013
Field of study

Mapping services are ubiquitous on the Internet. These services enjoy a considerable user base. But it is often overlooked that providing a service on a global scale with virtually millions of users has been the playground of an oligopoly of a select few service providers are able to do so. Unfortunately, the literature on these solutions is more than scarce. This thesis adds a number of building blocks to the literature that explain how to design and implement a number of features

KITopen

Algorithms for Triangles, Cones & Peaks

Author: Funke Daniel
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 15/12/2023
Field of study

Three different geometric objects are at the center of this dissertation: triangles, cones and peaks. In computational geometry, triangles are the most basic shape for planar subdivisions. Particularly, Delaunay triangulations are a widely used for manifold applications in engineering, geographic information systems, telecommunication networks, etc. We present two novel parallel algorithms to construct the Delaunay triangulation of a given point set. Yao graphs are geometric spanners that connect each point of a given set to its nearest neighbor in each of

k

cones drawn around it. They are used to aid the construction of Euclidean minimum spanning trees or in wireless networks for topology control and routing. We present the first implementation of an optimal

\mathcal{O}(n \log n)

-time sweepline algorithm to construct Yao graphs. One metric to quantify the importance of a mountain peak is its isolation. Isolation measures the distance between a peak and the closest point of higher elevation. Computing this metric from high-resolution digital elevation models (DEMs) requires efficient algorithms. We present a novel sweep-plane algorithm that can calculate the isolation of all peaks on Earth in mere minutes

KITopen

Big-Data Science in Porous Materials: Materials Genomics and Machine Learning

Author: Adams H.
Anderson R.
Berend Smit
Bergstra J.
Bergstra J.
Bishop C. M.
Caruana R.
Caruana R.
Chen T.
Dacrema M. F.
Daniele Ongari
Forman G.
Gilmer J.
Goodfellow I.
Grünwald P. D.
Guyon I.
Géron A.
Hardt M.
Hastie T.
Hey A. J. G.
Hofer C. D.
Ioffe S.
James G.
Kevin Maik Jablonka
Maturana D.
Molnar C.
Montgomery D. C.
Noh H.
Pedregosa F.
Pettifor D. G.
Ramsundar B.
Saul N.
Seyed Mohamad Moosavi
Shafer G.
Shalev-Shwartz S.
Smit B.
Snoek J.
Srivastava N.
Sutton R. S.
Tibshirani T.
Tomek I.
Trickett C. A.
Tukey J. W.
Vishwakarma G.
Weinberger S.
Weisberg H. F.
Weyl H.
Publication venue: 'American Chemical Society (ACS)'
Publication date: 08/06/2020
Field of study

By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal organic frameworks (MOFs). At present, we have libraries of over ten thousand synthesized materials and millions of in-silico predicted materials. The fact that we have so many materials opens many exciting avenues to tailor make a material that is optimal for a given application. However, from an experimental and computational point of view we simply have too many materials to screen using brute-force techniques. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We emphasize the importance of data collection, methods to augment small data sets, how to select appropriate training sets. An important part of this review are the different approaches that are used to represent these materials in feature space. The review also includes a general overview of the different ML techniques, but as most applications in porous materials use supervised ML our review is focused on the different approaches for supervised ML. In particular, we review the different method to optimize the ML process and how to quantify the performance of the different methods. In the second part, we review how the different approaches of ML have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. The range of topics illustrates the large variety of topics that can be studied with big-data science. Given the increasing interest of the scientific community in ML, we expect this list to rapidly expand in the coming years.Comment: Editorial changes (typos fixed, minor adjustments to figures

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

eScholarship - University of California

Data exploration process based on the self-organizing map

Author: Vesanto Juha
Publication venue: Teknillinen korkeakoulu
Publication date: 16/05/2002
Field of study

With the advances in computer technology, the amount of data that is obtained from various sources and stored in electronic media is growing at exponential rates. Data mining is a research area which answers to the challange of analysing this data in order to find useful information contained therein. The Self-Organizing Map (SOM) is one of the methods used in data mining. It quantizes the training data into a representative set of prototype vectors and maps them on a low-dimensional grid. The SOM is a prominent tool in the initial exploratory phase in data mining. The thesis consists of an introduction and ten publications. In the publications, the validity of SOM-based data exploration methods has been investigated and various enhancements to them have been proposed. In the introduction, these methods are presented as parts of the data mining process, and they are compared with other data exploration methods with similar aims. The work makes two primary contributions. Firstly, it has been shown that the SOM provides a versatile platform on top of which various data exploration methods can be efficiently constructed. New methods and measures for visualization of data, clustering, cluster characterization, and quantization have been proposed. The SOM algorithm and the proposed methods and measures have been implemented as a set of Matlab routines in the SOM Toolbox software library. Secondly, a framework for SOM-based data exploration of table-format data - both single tables and hierarchically organized tables - has been constructed. The framework divides exploratory data analysis into several sub-tasks, most notably the analysis of samples and the analysis of variables. The analysis methods are applied autonomously and their results are provided in a report describing the most important properties of the data manifold. In such a framework, the attention of the data miner can be directed more towards the actual data exploration task, rather than on the application of the analysis methods. Because of the highly iterative nature of the data exploration, the automation of routine analysis tasks can reduce the time needed by the data exploration process considerably.reviewe

Aaltodoc Publication Archive

Population Structure, Connectivity, and Phylogeography of Two Balistidae with High Potential for Larval Dispersal: \u3ci\u3eBalistes capriscus\u3c/i\u3e and \u3ci\u3eBalistes vetula\u3c/i\u3e

Author: Antoni Luca
Publication venue: The Aquila Digital Community
Publication date: 01/05/2017
Field of study

The gray triggerfish (Balistes capriscus) and the queen triggerfish (Balistes vetula) are two exploited reef fish distributed in tropical and temperate shelf waters of the Atlantic Ocean and the Mediterranean Sea. Both species are highly sedentary as adults but disperse pelagic larvae for extended periods of time potentially allowing connectivity across long distances under the action of oceanic currents. In this work population structure, phylogeography, and migration patterns were examined in the two species and contrasted with predictions of larval transport based on surface circulation data. A total of 1,017 gray triggerfish from twelve sampling localities spanning the species distribution range were assayed at 17 homologous microsatellite markers and sequence variation at the ND4 mitochondrial gene. Four genetically distinct populations were detected including (i) a North Atlantic group that comprised the North American, European, and Northwest African populations, (ii) a Mediterranean group that was inferred to result from a recent colonization of the Mediterranean Sea by a small number of migrants of North Atlantic origin, (iii) a southeastern Atlantic group that included populations from the Gulf of Guinea and Southwest Africa, and (iv) a southwestern Atlantic group recently diverged from the southeastern group. Analysis of phylogeography supported long-term historical isolation of the South Atlantic and North Atlantic groups. Assignment tests and isolation-by-distance analysis supported the hypothesis of long-distance connectivity with evidence for transatlantic migrations and estimates of the mean dispersal distance of 740 km or greater. The high estimates of contemporaneous migration rates (up to 36.7%) may reflect increased larval transport in connection with the recent development of new Sargassum in the equatorial region. Analysis of high density genome scans revealed homogeneous distributions of genetic variants among queen triggerfish from the French Antilles, the U.S. Virgin-Islands, and South Florida, suggesting high connectivity is occurring across the region. These findings suggest that, in both species, local recruitment depends largely on the output of spawning populations located hundreds or thousands of kilometers away from a given stock, highlighting the need to conserve populations across each species’ range in particular in areas where circulation patterns predict a low likelihood of incoming migrants

Aquila Digital Community

Recommended from our members

Community detection method based on mixed-norm sparse subspace clustering

Author: Li Weizi
Tian Bo
Publication venue: 'Elsevier BV'
Publication date: 01/01/2018
Field of study

Community or group is an important structure in disciplines such as social networks, biology gene expression, and physics systems. Community detections for different types of networks have attracted considerable interest. However, it is still challenging to find meaningful community structures in various networks. In particular, accurate community description and implementation of effective detection algorithms with huge datasets are still not solved. In this paper, we present a novel community detection algorithm based on the theory of sparse subspace clustering (SSC) with mixed-norm constraints. Inspired by the sparse representation of subspace, each community in a given network can span a subspace in some similarity measure space. If the basis of subspaces can be solved, all of the nodes can be represented as a linear combination of the nodes that span the same subspace. By introducing a novel mixed-norm constraint in SCC, the connections of nodes among different communities are modeled as noise to improve the clustering accuracy. The formulation of the basis of subspaces is derived from the self-representation property of data by using SSC. Then, the alternating directions method of multipliers (ADMM) framework is used to solve the formulation. Finally, communities are detected by subspace clustering method. The proposed method is compared with state-of-the-art algorithms on synthetic networks and real-world networks. The experimental results show the effectiveness of the proposed algorithm in accurately describing the community. The results also show that the mixed-norm SSC is a practical approach for detecting communities in huge datasets

Central Archive at the University of Reading

Crossref