544 research outputs found
ASKIT: Approximate Skeletonization Kernel-Independent Treecode in High Dimensions
We present a fast algorithm for kernel summation problems in high-dimensions.
These problems appear in computational physics, numerical approximation,
non-parametric statistics, and machine learning. In our context, the sums
depend on a kernel function that is a pair potential defined on a dataset of
points in a high-dimensional Euclidean space. A direct evaluation of the sum
scales quadratically with the number of points. Fast kernel summation methods
can reduce this cost to linear complexity, but the constants involved do not
scale well with the dimensionality of the dataset.
The main algorithmic components of fast kernel summation algorithms are the
separation of the kernel sum between near and far field (which is the basis for
pruning) and the efficient and accurate approximation of the far field.
We introduce novel methods for pruning and approximating the far field. Our
far field approximation requires only kernel evaluations and does not use
analytic expansions. Pruning is not done using bounding boxes but rather
combinatorially using a sparsified nearest-neighbor graph of the input. The
time complexity of our algorithm depends linearly on the ambient dimension. The
error in the algorithm depends on the low-rank approximability of the far
field, which in turn depends on the kernel function and on the intrinsic
dimensionality of the distribution of the points. The error of the far field
approximation does not depend on the ambient dimension.
We present the new algorithm along with experimental results that demonstrate
its performance. We report results for Gaussian kernel sums for 100 million
points in 64 dimensions, for one million points in 1000 dimensions, and for
problems in which the Gaussian kernel has a variable bandwidth. To the best of
our knowledge, all of these experiments are impossible or prohibitively
expensive with existing fast kernel summation methods.Comment: 22 pages, 6 figure
KD-ART: Should we intensify or diversify tests to kill mutants?
CONTEXT:
Adaptive Random Testing (ART) spreads test cases evenly over the input domain. Yet once a fault is found, decisions must be made to diversify or intensify subsequent inputs. Diversification employs a wide range of tests to increase the chances of finding new faults. Intensification selects test inputs similar to those previously shown to be successful.
OBJECTIVE:
Explore the trade-off between diversification and intensification to kill mutants.
METHOD:
We augment Adaptive Random Testing (ART) to estimate the Kernel Density (KD–ART) of input values found to kill mutants. KD–ART was first proposed at the 10th International Workshop on Mutation Analysis. We now extend this work to handle real world non numeric applications. Specifically we incorporate a technique to support programs with input parameters that have composite data types (such as arrays and structs).
RESULTS:
Intensification is the most effective strategy for the numerical programs (it achieves 8.5% higher mutation score than ART). By contrast, diversification seems more effective for programs with composite inputs. KD–ART kills mutants 15.4 times faster than ART.
CONCLUSION:
Intensify tests for numerical types, but diversify them for composite types
Improved neural network generalization using channel-wise NNK graph constructions
State-of-the-art neural network architectures continue to scale in size and deliver impressive results on unseen data points at the expense of poor interpretability. In the deep layers of these models we often encounter very high dimensional feature spaces, where constructing graphs from intermediate data representations can lead to the well-known curse of dimensionality. We propose a channel-wise graph construction method that works on lower dimensional subspaces and provides a new channel-based perspective that leads to better interpretability of the data and relationship between channels. In addition, we introduce a novel generalization estimate based on the proposed graph construction method with which we perform local polytope interpolation. We show its potential to replace the standard generalization estimate based on validation set performance to perform progressive channel-wise early stopping without requiring a validation set.Las arquitecturas de redes neuronales más avanzadas siguen aumentando en tamaño y ofreciendo resultados impresionantes en nuevos datos a costa de una escasa interpretabilidad. En las capas profundas de estos modelos nos encontramos a menudo con espacios de características de muy alta dimensión, en los que la construcción de grafos a partir de representaciones de datos intermedias puede llevar al conocido ''curse of dimensionality''. Proponemos un método de construcción de grafos por canal que trabaja en subespacios de menor dimensión y proporciona una nueva perspectiva basada en canales, que lleva a una mejor interpretabilidad de los datos y de la relación entre canales. Además, introducimos un nuevo estimador de generalización basado en el método de construcción de grafos propuesto con el que realizamos interpolación local en politopos. Mostramos su potencial para sustituir el estimador de generalización estándar basado en el rendimiento en un set de validación independiente para realizar ''early stopping'' progresivo por canales y sin necesidad de un set de validación.Les arquitectures de xarxes neuronals més avançades segueixen augmentant la seva mida i oferint resultats impressionants en noves dades a costa d'una escassa interpretabilitat. A les capes profundes d'aquests models ens trobem sovint amb espais de característiques de molt alta dimensió, en què la construcció de grafs a partir de representacions de dades intermèdies pot portar al conegut ''curse of dimensionality''. Proposem un mètode de construcció de grafs per canal que treballa en subespais de menor dimensió i proporciona una nova perspectiva basada en canals, que porta a una millor interpretabilitat de les dades i de la relació entre canals. A més, introduïm un nou estimador de generalització basat en el mètode de construcció de grafs proposat amb el qual realitzem interpolació local en polítops. Mostrem el seu potencial per substituir l'estimador de generalització estàndard basat en el rendiment en un set de validació independent per a realitzar ''early stopping'' progressiu per canals i sense necessitat d'un set de validació
- …