Search CORE

12,420 research outputs found

How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need?

Author: Rossi Fabrice
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/07/2014
Field of study

In numerous applicative contexts, data are too rich and too complex to be represented by numerical vectors. A general approach to extend machine learning and data mining techniques to such data is to really on a dissimilarity or on a kernel that measures how different or similar two objects are. This approach has been used to define several variants of the Self Organizing Map (SOM). This paper reviews those variants in using a common set of notations in order to outline differences and similarities between them. It discusses the advantages and drawbacks of the variants, as well as the actual relevance of the dissimilarity/kernel SOM for practical applications

arXiv.org e-Print Archive

CiteSeerX

HAL-Paris1

FlashProfile: A Framework for Synthesizing Data Profiles

Author: Gulwani Sumit
Jain Prateek
Millstein Todd
Padhi Saswat
Perelman Daniel
Polozov Oleksandr
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/10/2018
Field of study

We address the problem of learning a syntactic profile for a collection of strings, i.e. a set of regex-like patterns that succinctly describe the syntactic variations in the strings. Real-world datasets, typically curated from multiple sources, often contain data in various syntactic formats. Thus, any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify the different formats is infeasible in standard big-data scenarios. Prior techniques are restricted to a small set of pre-defined patterns (e.g. digits, letters, words, etc.), and provide no control over granularity of profiles. We define syntactic profiling as a problem of clustering strings based on syntactic similarity, followed by identifying patterns that succinctly describe each cluster. We present a technique for synthesizing such profiles over a given language of patterns, that also allows for interactive refinement by requesting a desired number of clusters. Using a state-of-the-art inductive synthesis framework, PROSE, we have implemented our technique as FlashProfile. Across

153

tasks over

75

large real datasets, we observe a median profiling time of only

\sim\,0.7\,

s. Furthermore, we show that access to syntactic profiles may allow for more accurate synthesis of programs, i.e. using fewer examples, in programming-by-example (PBE) workflows such as FlashFill.Comment: 28 pages, SPLASH (OOPSLA) 201

arXiv.org e-Print Archive

eScholarship - University of California

An Agent-Based Algorithm exploiting Multiple Local Dissimilarities for Clusters Mining and Knowledge Discovery

Author: Bianchi Filippo Maria
Livi Lorenzo
Maiorino Enrico
Rizzi Antonello
Sadeghian Alireza
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/09/2014
Field of study

We propose a multi-agent algorithm able to automatically discover relevant regularities in a given dataset, determining at the same time the set of configurations of the adopted parametric dissimilarity measure yielding compact and separated clusters. Each agent operates independently by performing a Markovian random walk on a suitable weighted graph representation of the input dataset. Such a weighted graph representation is induced by the specific parameter configuration of the dissimilarity measure adopted by the agent, which searches and takes decisions autonomously for one cluster at a time. Results show that the algorithm is able to discover parameter configurations that yield a consistent and interpretable collection of clusters. Moreover, we demonstrate that our algorithm shows comparable performances with other similar state-of-the-art algorithms when facing specific clustering problems

arXiv.org e-Print Archive

CiteSeerX

Reduction of Second-Order Network Systems with Structure Preservation

Author: Cheng Xiaodong
Kawano Yu
Scherpen Jacquelien M. A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/06/2017
Field of study

This paper proposes a general framework for structure-preserving model reduction of a secondorder network system based on graph clustering. In this approach, vertex dynamics are captured by the transfer functions from inputs to individual states, and the dissimilarities of vertices are quantified by the H2-norms of the transfer function discrepancies. A greedy hierarchical clustering algorithm is proposed to place those vertices with similar dynamics into same clusters. Then, the reduced-order model is generated by the Petrov-Galerkin method, where the projection is formed by the characteristic matrix of the resulting network clustering. It is shown that the simplified system preserves an interconnection structure, i.e., it can be again interpreted as a second-order system evolving over a reduced graph. Furthermore, this paper generalizes the definition of network controllability Gramian to second-order network systems. Based on it, we develop an efficient method to compute H2-norms and derive the approximation error between the full-order and reduced-order models. Finally, the approach is illustrated by the example of a small-world network

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen