Search CORE

68,979 research outputs found

On linear balancing sets

Author: Mazumdar Arya
Roth Ron M.
Vontobel Pascal O.
Publication venue: 'American Institute of Mathematical Sciences (AIMS)'
Publication date: 01/01/2009
Field of study

Let n be an even positive integer and F be the field \GF(2). A word in F^n is called balanced if its Hamming weight is n/2. A subset C \subseteq F^n$ is called a balancing set if for every word y \in F^n there is a word x \in C such that y + x is balanced. It is shown that most linear subspaces of F^n of dimension slightly larger than 3/2\log_2(n) are balancing sets. A generalization of this result to linear subspaces that are "almost balancing" is also presented. On the other hand, it is shown that the problem of deciding whether a given set of vectors in F^n spans a balancing set, is NP-hard. An application of linear balancing sets is presented for designing efficient error-correcting coding schemes in which the codewords are balanced.Comment: The abstract of this paper appeared in the proc. of 2009 International Symposium on Information Theor

arXiv.org e-Print Archive

CiteSeerX

Crossref

PSO-based method for svm classification on skewed data-sets

Author: Adrián Trueba Espinosa /
Cervantes Jair
Cervantes Jair
Cervantes Jair
García Lamont Farid
García Lamont Farid
García Lamont Farid
Lopez Chau Asdrubal /
LOPEZ CHAU ASDRUBAL
LOPEZ CHAU ASDRUBAL
Rodríguez Mazahua Lisbeth
Rodríguez Mazahua Lisbeth
Rodríguez Lisbeth
RUIZ CASTILLA JOSE SERGIO
Ruiz Castilla Jose Sergio
RUIZ CASTILLA JOSE SERGIO
Trueba Espinosa Adrián
Trueba Espinosa Adrián
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2016
Field of study

Support Vector Machines (SVM) have shown excellent generalization power in classification problems. However, on skewed data-sets, SVM learns a biased model that affects the classifier performance, which is severely damaged when the unbalanced ratio is very large. In this paper, a new external balancing method for applying SVM on skewed data sets is developed. In the first phase of the method, the separating hyperplane is computed. Support vectors are then used to generate the initial population of PSO algorithm, which is used to improve the population of artificial instances and to eliminate noise instances. Experimental results demonstrate the ability of the proposed method to improve the performance of SVM on imbalanced data-sets.Proyecto UAEM 3771/2014/CI

Crossref

Red Mexicana de Repositorios Institucionales

Repositorio Institucional de la Universidad Autónoma del Estado de México

Dynamic load balancing in parallel KD-tree k-means

Author: Di Fatta Giuseppe
Pettinger David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/06/2010
Field of study

One among the most influential and popular data mining methods is the k-Means algorithm for cluster analysis. Techniques for improving the efficiency of k-Means have been largely explored in two main directions. The amount of computation can be significantly reduced by adopting geometrical constraints and an efficient data structure, notably a multidimensional binary search tree (KD-Tree). These techniques allow to reduce the number of distance computations the algorithm performs at each iteration. A second direction is parallel processing, where data and computation loads are distributed over many processing nodes. However, little work has been done to provide a parallel formulation of the efficient sequential techniques based on KD-Trees. Such approaches are expected to have an irregular distribution of computation load and can suffer from load imbalance. This issue has so far limited the adoption of these efficient k-Means variants in parallel computing environments. In this work, we provide a parallel formulation of the KD-Tree based k-Means algorithm for distributed memory systems and address its load balancing issue. Three solutions have been developed and tested. Two approaches are based on a static partitioning of the data set and a third solution incorporates a dynamic load balancing policy

Central Archive at the University of Reading

Crossref

Tradeoffs for nearest neighbors on the sphere

Author: Laarhoven Thijs
Publication venue
Publication date: 01/01/2015
Field of study

We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the recent spherical filters to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity

n^{\rho_q}

and update complexity

n^{\rho_u}

for data sets of size

n

is given by the following equation in terms of the approximation factor

c

and the exponents

\rho_q

and

\rho_u

c^2\sqrt{\rho_q}+(c^2-1)\sqrt{\rho_u}=\sqrt{2c^2-1}.

For small

c=1+\epsilon

, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity

n^{1-4\epsilon^2}

. Balancing the query and update costs leads to optimal complexities

n^{1/(2c^2-1)}

, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner, IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn, STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A subpolynomial query time complexity

n^{o(1)}

can be achieved at the cost of a space complexity of the order

n^{1/(4\epsilon^2)}

, matching the bound

n^{\Omega(1/\epsilon^2)}

of [Andoni-Indyk-Patrascu, FOCS'06] and [Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of [Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98]. For large