442,395 research outputs found
On Recursive Edit Distance Kernels with Application to Time Series Classification
This paper proposes some extensions to the work on kernels dedicated to
string or time series global alignment based on the aggregation of scores
obtained by local alignments. The extensions we propose allow to construct,
from classical recursive definition of elastic distances, recursive edit
distance (or time-warp) kernels that are positive definite if some sufficient
conditions are satisfied. The sufficient conditions we end-up with are original
and weaker than those proposed in earlier works, although a recursive
regularizing term is required to get the proof of the positive definiteness as
a direct consequence of the Haussler's convolution theorem. The classification
experiment we conducted on three classical time warp distances (two of which
being metrics), using Support Vector Machine classifier, leads to conclude
that, when the pairwise distance matrix obtained from the training data is
\textit{far} from definiteness, the positive definite recursive elastic kernels
outperform in general the distance substituting kernels for the classical
elastic distances we have tested.Comment: 14 page
A new fuzzy set merging technique using inclusion-based fuzzy clustering
This paper proposes a new method of merging parameterized fuzzy sets based on clustering in the parameters space, taking into account the degree of inclusion of each fuzzy set in the cluster prototypes. The merger method is applied to fuzzy rule base simplification by automatically replacing the fuzzy sets corresponding to a given cluster with that pertaining to cluster prototype. The feasibility and the performance of the proposed method are studied using an application in mobile robot navigation. The results indicate that the proposed merging and rule base simplification approach leads to good navigation performance in the application considered and to fuzzy models that are interpretable by experts. In this paper, we concentrate mainly on fuzzy systems with Gaussian membership functions, but the general approach can also be applied to other parameterized fuzzy sets
Scalable Techniques for Similarity Search
Document similarity is similar to the nearest neighbour problem and has applications in various domains. In order to determine the similarity / dissimilarity of the documents first they need to be converted into sets containing shingles. Each document is converted into k-shingles, k being the length of each shingle. The similarity is calculated using Jaccard distance between sets and output into a characteristic matrix, the complexity to parse this matrix is significantly high especially when the sets are large. In this project we explore various approaches such as Min hashing, LSH & Bloom Filter to decrease the matrix size and to improve the time complexity. Min hashing creates a signature matrix which significantly smaller compared to a characteristic matrix. In this project we will look into Min-Hashing implementation, pros and cons. Also we will explore Locality Sensitive Hashing, Bloom Filters and their advantages
- …