970 research outputs found

    ASKIT: Approximate Skeletonization Kernel-Independent Treecode in High Dimensions

    Full text link
    We present a fast algorithm for kernel summation problems in high-dimensions. These problems appear in computational physics, numerical approximation, non-parametric statistics, and machine learning. In our context, the sums depend on a kernel function that is a pair potential defined on a dataset of points in a high-dimensional Euclidean space. A direct evaluation of the sum scales quadratically with the number of points. Fast kernel summation methods can reduce this cost to linear complexity, but the constants involved do not scale well with the dimensionality of the dataset. The main algorithmic components of fast kernel summation algorithms are the separation of the kernel sum between near and far field (which is the basis for pruning) and the efficient and accurate approximation of the far field. We introduce novel methods for pruning and approximating the far field. Our far field approximation requires only kernel evaluations and does not use analytic expansions. Pruning is not done using bounding boxes but rather combinatorially using a sparsified nearest-neighbor graph of the input. The time complexity of our algorithm depends linearly on the ambient dimension. The error in the algorithm depends on the low-rank approximability of the far field, which in turn depends on the kernel function and on the intrinsic dimensionality of the distribution of the points. The error of the far field approximation does not depend on the ambient dimension. We present the new algorithm along with experimental results that demonstrate its performance. We report results for Gaussian kernel sums for 100 million points in 64 dimensions, for one million points in 1000 dimensions, and for problems in which the Gaussian kernel has a variable bandwidth. To the best of our knowledge, all of these experiments are impossible or prohibitively expensive with existing fast kernel summation methods.Comment: 22 pages, 6 figure

    Clustering and Community Detection with Imbalanced Clusters

    Full text link
    Spectral clustering methods which are frequently used in clustering and community detection applications are sensitive to the specific graph constructions particularly when imbalanced clusters are present. We show that ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced cluster sizes since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced cluster sizes. Our approach parameterizes a family of graphs by adaptively modulating node degrees on a fixed node set, yielding a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach and demonstrate the superiority of our method through experiments on synthetic and real datasets for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted to IEEE TSIP

    Man-made Surface Structures from Triangulated Point Clouds

    Get PDF
    Photogrammetry aims at reconstructing shape and dimensions of objects captured with cameras, 3D laser scanners or other spatial acquisition systems. While many acquisition techniques deliver triangulated point clouds with millions of vertices within seconds, the interpretation is usually left to the user. Especially when reconstructing man-made objects, one is interested in the underlying surface structure, which is not inherently present in the data. This includes the geometric shape of the object, e.g. cubical or cylindrical, as well as corresponding surface parameters, e.g. width, height and radius. Applications are manifold and range from industrial production control to architectural on-site measurements to large-scale city models. The goal of this thesis is to automatically derive such surface structures from triangulated 3D point clouds of man-made objects. They are defined as a compound of planar or curved geometric primitives. Model knowledge about typical primitives and relations between adjacent pairs of them should affect the reconstruction positively. After formulating a parametrized model for man-made surface structures, we develop a reconstruction framework with three processing steps: During a fast pre-segmentation exploiting local surface properties we divide the given surface mesh into planar regions. Making use of a model selection scheme based on minimizing the description length, this surface segmentation is free of control parameters and automatically yields an optimal number of segments. A subsequent refinement introduces a set of planar or curved geometric primitives and hierarchically merges adjacent regions based on their joint description length. A global classification and constraint parameter estimation combines the data-driven segmentation with high-level model knowledge. Therefore, we represent the surface structure with a graphical model and formulate factors based on likelihood as well as prior knowledge about parameter distributions and class probabilities. We infer the most probable setting of surface and relation classes with belief propagation and estimate an optimal surface parametrization with constraints induced by inter-regional relations. The process is specifically designed to work on noisy data with outliers and a few exceptional freeform regions not describable with geometric primitives. It yields full 3D surface structures with watertightly connected surface primitives of different types. The performance of the proposed framework is experimentally evaluated on various data sets. On small synthetically generated meshes we analyze the accuracy of the estimated surface parameters, the sensitivity w.r.t. various properties of the input data and w.r.t. model assumptions as well as the computational complexity. Additionally we demonstrate the flexibility w.r.t. different acquisition techniques on real data sets. The proposed method turns out to be accurate, reasonably fast and little sensitive to defects in the data or imprecise model assumptions.Künstliche Oberflächenstrukturen aus triangulierten Punktwolken Ein Ziel der Photogrammetrie ist die Rekonstruktion der Form und Größe von Objekten, die mit Kameras, 3D-Laserscannern und anderern räumlichen Erfassungssystemen aufgenommen wurden. Während viele Aufnahmetechniken innerhalb von Sekunden triangulierte Punktwolken mit Millionen von Punkten liefern, ist deren Interpretation gewöhnlicherweise dem Nutzer überlassen. Besonders bei der Rekonstruktion künstlicher Objekte (i.S.v. engl. man-made = „von Menschenhand gemacht“ ist man an der zugrunde liegenden Oberflächenstruktur interessiert, welche nicht inhärent in den Daten enthalten ist. Diese umfasst die geometrische Form des Objekts, z.B. quaderförmig oder zylindrisch, als auch die zugehörigen Oberflächenparameter, z.B. Breite, Höhe oder Radius. Die Anwendungen sind vielfältig und reichen von industriellen Fertigungskontrollen über architektonische Raumaufmaße bis hin zu großmaßstäbigen Stadtmodellen. Das Ziel dieser Arbeit ist es, solche Oberflächenstrukturen automatisch aus triangulierten Punktwolken von künstlichen Objekten abzuleiten. Sie sind definiert als ein Verbund ebener und gekrümmter geometrischer Primitive. Modellwissen über typische Primitive und Relationen zwischen Paaren von ihnen soll die Rekonstruktion positiv beeinflussen. Nachdem wir ein parametrisiertes Modell für künstliche Oberflächenstrukturen formuliert haben, entwickeln wir ein Rekonstruktionsverfahren mit drei Verarbeitungsschritten: Im Rahmen einer schnellen Vorsegmentierung, die lokale Oberflächeneigenschaften berücksichtigt, teilen wir die gegebene vermaschte Oberfläche in ebene Regionen. Unter Verwendung eines Schemas zur Modellauswahl, das auf der Minimierung der Beschreibungslänge beruht, ist diese Oberflächensegmentierung unabhängig von Kontrollparametern und liefert automatisch eine optimale Anzahl an Regionen. Eine anschließende Verbesserung führt eine Menge von ebenen und gekrümmten geometrischen Primitiven ein und fusioniert benachbarte Regionen hierarchisch basierend auf ihrer gemeinsamen Beschreibungslänge. Eine globale Klassifikation und bedingte Parameterschätzung verbindet die datengetriebene Segmentierung mit hochrangigem Modellwissen. Dazu stellen wir die Oberflächenstruktur in Form eines graphischen Modells dar und formulieren Faktoren basierend auf der Likelihood sowie auf apriori Wissen über die Parameterverteilungen und Klassenwahrscheinlichkeiten. Wir leiten die wahrscheinlichste Konfiguration von Flächen- und Relationsklassen mit Hilfe von Belief-Propagation ab und schätzen eine optimale Oberflächenparametrisierung mit Bedingungen, die durch die Relationen zwischen benachbarten Primitiven induziert werden. Der Prozess ist eigens für verrauschte Daten mit Ausreißern und wenigen Ausnahmeregionen konzipiert, die nicht durch geometrische Primitive beschreibbar sind. Er liefert wasserdichte 3D-Oberflächenstrukturen mit Oberflächenprimitiven verschiedener Art. Die Leistungsfähigkeit des vorgestellten Verfahrens wird an verschiedenen Datensätzen experimentell evaluiert. Auf kleinen, synthetisch generierten Oberflächen untersuchen wir die Genauigkeit der geschätzten Oberflächenparameter, die Sensitivität bzgl. verschiedener Eigenschaften der Eingangsdaten und bzgl. Modellannahmen sowie die Rechenkomplexität. Außerdem demonstrieren wir die Flexibilität bzgl. verschiedener Aufnahmetechniken anhand realer Datensätze. Das vorgestellte Rekonstruktionsverfahren erweist sich als genau, hinreichend schnell und wenig anfällig für Defekte in den Daten oder falsche Modellannahmen

    KD-ART: Should we intensify or diversify tests to kill mutants?

    Get PDF
    CONTEXT: Adaptive Random Testing (ART) spreads test cases evenly over the input domain. Yet once a fault is found, decisions must be made to diversify or intensify subsequent inputs. Diversification employs a wide range of tests to increase the chances of finding new faults. Intensification selects test inputs similar to those previously shown to be successful. OBJECTIVE: Explore the trade-off between diversification and intensification to kill mutants. METHOD: We augment Adaptive Random Testing (ART) to estimate the Kernel Density (KD–ART) of input values found to kill mutants. KD–ART was first proposed at the 10th International Workshop on Mutation Analysis. We now extend this work to handle real world non numeric applications. Specifically we incorporate a technique to support programs with input parameters that have composite data types (such as arrays and structs). RESULTS: Intensification is the most effective strategy for the numerical programs (it achieves 8.5% higher mutation score than ART). By contrast, diversification seems more effective for programs with composite inputs. KD–ART kills mutants 15.4 times faster than ART. CONCLUSION: Intensify tests for numerical types, but diversify them for composite types

    Kernel Methods for Machine Learning with Life Science Applications

    Get PDF

    3D mapping and path planning from range data

    Get PDF
    This thesis reports research on mapping, terrain classification and path planning. These are classical problems in robotics, typically studied independently, and here we link such problems by framing them within a common proprioceptive modality, that of three-dimensional laser range scanning. The ultimate goal is to deliver navigation paths for challenging mobile robotics scenarios. For this reason we also deliver safe traversable regions from a previously computed globally consistent map. We first examine the problem of registering dense point clouds acquired at different instances in time. We contribute with a novel range registration mechanism for pairs of 3D range scans using point-to-point and point-to-line correspondences in a hierarchical correspondence search strategy. For the minimization we adopt a metric that takes into account not only the distance between corresponding points, but also the orientation of their relative reference frames. We also propose FaMSA, a fast technique for multi-scan point cloud alignment that takes advantage of the asserted point correspondences during sequential scan matching, using the point match history to speed up the computation of new scan matches. To properly propagate the model of the sensor noise and the scan matching, we employ first order error propagation, and to correct the error accumulation from local data alignment, we consider the probabilistic alignment of 3D point clouds using a delayed-state Extended Information Filter (EIF). In this thesis we adapt the Pose SLAM algorithm to the case of 3D range mapping, Pose SLAM is the variant of SLAM where only the robot trajectory is estimated and where sensor data is solely used to produce relative constraints between robot poses. These dense mapping techniques are tested in several scenarios acquired with our 3D sensors, producing impressively rich 3D environment models. The computed maps are then processed to identify traversable regions and to plan navigation sequences. In this thesis we present a pair of methods to attain high-level off-line classification of traversable areas, in which training data is acquired automatically from navigation sequences. Traversable features came from the robot footprint samples during manual robot motion, allowing us to capture terrain constrains not easy to model. Using only some of the traversed areas as positive training samples, our algorithms are tested in real scenarios to find the rest of the traversable terrain, and are compared with a naive parametric and some variants of the Support Vector Machine. Later, we contribute with a path planner that guarantees reachability at a desired robot pose with significantly lower computation time than competing alternatives. To search for the best path, our planner incrementally builds a tree using the A* algorithm, it includes a hybrid cost policy to efficiently expand the search tree, combining random sampling from the continuous space of kinematically feasible motion commands with a cost to goal metric that also takes into account the vehicle nonholonomic constraints. The planer also allows for node rewiring, and to speed up node search, our method includes heuristics that penalize node expansion near obstacles, and that limit the number of explored nodes. The method book-keeps visited cells in the configuration space, and disallows node expansion at those configurations in the first full iteration of the algorithm. We validate the proposed methods with experiments in extensive real scenarios from different very complex 3D outdoors environments, and compare it with other techniques such as the A*, RRT and RRT* algorithms.Esta tesis reporta investigación sobre el mapeo, clasificación de terreno y planificación de trayectorias. Estos son problemas clásicos en robótica los cuales generalmente se estudian de forma independiente, aquí se vinculan enmarcandolos con una modalidad propioceptiva común: un láser de rango 3D. El objetivo final es ofrecer trayectorias de navegación para escenarios complejos en el marco de la robótica móvil. Por esta razón también entregamos regiones transitables en un mapa global consistente calculado previamente. Primero examinamos el problema de registro de nubes de puntos adquiridas en diferentes instancias de tiempo. Contribuimos con un novedoso mecanismo de registro de pares de imagenes de rango 3D usando correspondencias punto a punto y punto a línea, en una estrategia de búsqueda de correspondencias jerárquica. Para la minimización optamos por una metrica que considera no sólo la distancia entre puntos, sino también la orientación de los marcos de referencia relativos. También proponemos FAMSA, una técnica para el registro rápido simultaneo de multiples nubes de puntos, la cual aprovecha las correspondencias de puntos obtenidas durante el registro secuencial, usando inicialmente la historia de correspondencias para acelerar el cálculo de las correspondecias en los nuevos registros de imagenes. Para propagar adecuadamente el modelo del ruido del sensor y del registro de imagenes, empleamos la propagación de error de primer orden, y para corregir el error acumulado del registro local, consideramos la alineación probabilística de nubes de puntos 3D utilizando un Filtro Extendido de Información de estados retrasados. En esta tesis adaptamos el algóritmo Pose SLAM para el caso de mapas con imagenes de rango 3D, Pose SLAM es la variante de SLAM donde solamente se estima la trayectoria del robot, usando los datos del sensor como restricciones relativas entre las poses robot. Estas técnicas de mapeo se prueban en varios escenarios adquiridos con nuestros sensores 3D produciendo modelos 3D impresionantes. Los mapas obtenidos se procesan para identificar regiones navegables y para planificar secuencias de navegación. Presentamos un par de métodos para lograr la clasificación de zonas transitables fuera de línea. Los datos de entrenamiento se adquieren de forma automática usando secuencias de navegación obtenidas manualmente. Las características transitables se captan de las huella de la trayectoria del robot, lo cual permite capturar restricciones del terreno difíciles de modelar. Con sólo algunas de las zonas transitables como muestras de entrenamiento positivo, nuestros algoritmos se prueban en escenarios reales para encontrar el resto del terreno transitable. Los algoritmos se comparan con algunas variantes de la máquina de soporte de vectores (SVM) y una parametrizacion ingenua. También, contribuimos con un planificador de trayectorias que garantiza llegar a una posicion deseada del robot en significante menor tiempo de cálculo a otras alternativas. Para buscar el mejor camino, nuestro planificador emplea un arbol de busqueda incremental basado en el algoritmo A*. Incluimos una póliza de coste híbrido para crecer de manera eficiente el árbol, combinando el muestro aleatorio del espacio continuo de comandos cinemáticos del robot con una métrica de coste al objetivo que también concidera las cinemática del robot. El planificador además permite reconectado de nodos, y, para acelerar la búsqueda de nodos, se incluye una heurística que penaliza la expansión de nodos cerca de los obstáculos, que limita el número de nodos explorados. El método conoce las céldas que ha visitado del espacio de configuraciones, evitando la expansión de nodos en configuraciones que han sido vistadas en la primera iteración completa del algoritmo. Los métodos propuestos se validán con amplios experimentos con escenarios reales en diferentes entornos exteriores, asi como su comparación con otras técnicas como los algoritmos A*, RRT y RRT*.Postprint (published version

    Improved neural network generalization using channel-wise NNK graph constructions

    Get PDF
    State-of-the-art neural network architectures continue to scale in size and deliver impressive results on unseen data points at the expense of poor interpretability. In the deep layers of these models we often encounter very high dimensional feature spaces, where constructing graphs from intermediate data representations can lead to the well-known curse of dimensionality. We propose a channel-wise graph construction method that works on lower dimensional subspaces and provides a new channel-based perspective that leads to better interpretability of the data and relationship between channels. In addition, we introduce a novel generalization estimate based on the proposed graph construction method with which we perform local polytope interpolation. We show its potential to replace the standard generalization estimate based on validation set performance to perform progressive channel-wise early stopping without requiring a validation set.Las arquitecturas de redes neuronales más avanzadas siguen aumentando en tamaño y ofreciendo resultados impresionantes en nuevos datos a costa de una escasa interpretabilidad. En las capas profundas de estos modelos nos encontramos a menudo con espacios de características de muy alta dimensión, en los que la construcción de grafos a partir de representaciones de datos intermedias puede llevar al conocido ''curse of dimensionality''. Proponemos un método de construcción de grafos por canal que trabaja en subespacios de menor dimensión y proporciona una nueva perspectiva basada en canales, que lleva a una mejor interpretabilidad de los datos y de la relación entre canales. Además, introducimos un nuevo estimador de generalización basado en el método de construcción de grafos propuesto con el que realizamos interpolación local en politopos. Mostramos su potencial para sustituir el estimador de generalización estándar basado en el rendimiento en un set de validación independiente para realizar ''early stopping'' progresivo por canales y sin necesidad de un set de validación.Les arquitectures de xarxes neuronals més avançades segueixen augmentant la seva mida i oferint resultats impressionants en noves dades a costa d'una escassa interpretabilitat. A les capes profundes d'aquests models ens trobem sovint amb espais de característiques de molt alta dimensió, en què la construcció de grafs a partir de representacions de dades intermèdies pot portar al conegut ''curse of dimensionality''. Proposem un mètode de construcció de grafs per canal que treballa en subespais de menor dimensió i proporciona una nova perspectiva basada en canals, que porta a una millor interpretabilitat de les dades i de la relació entre canals. A més, introduïm un nou estimador de generalització basat en el mètode de construcció de grafs proposat amb el qual realitzem interpolació local en polítops. Mostrem el seu potencial per substituir l'estimador de generalització estàndard basat en el rendiment en un set de validació independent per a realitzar ''early stopping'' progressiu per canals i sense necessitat d'un set de validació

    Classification under input uncertainty with support vector machines

    No full text
    Uncertainty can exist in any measurement of data describing the real world. Many machine learning approaches attempt to model any uncertainty in the form of additive noise on the target, which can be effective for simple models. However, for more complex models, and where a richer description of anisotropic uncertainty is available, these approaches can suffer. The principal focus of this thesis is the development of advanced classification approaches that can incorporate the known input uncertainties into support vector machines (SVMs), which can accommodate isotropic uncertain information in the classification. This new method is termed as uncertainty support vector classification (USVC). Kernel functions can be used as well through the derivation of a novel kernelisation formulation to generalise this proposed technique to non-linear models and the resulting optimisation problem is a second order cone program (SOCP) with a unique solution. Based on the statistical models on the input uncertainty, Bi and Zhang (2005) developed total support vector classification (TSVC), which has a similar geometric interpretation and optimisation formulation to USVC, but chooses much lower probabilities that the corresponding original inputs are going to be correctly classified by the optimal solution than USVC. Adaptive uncertainty support vector classification (AUSVC) is then developed based on the combination of TSVC and USVC, in which the probabilities of the original inputs being correctly classified are adaptively adjusted in accordance with the corresponding uncertain inputs. Inheriting the advantages from AUSVC and the minimax probability machine (MPM), minimax probability support vector classification (MPSVC) is developed to maximise the probabilities of the original inputs being correctly classified. Statistical tests are used to evaluate the experimental results of different approaches. Experiments illustrate that AUSVC and MPSVC are suitable for classifying the observed uncertain inputs and recovering the true target function respectively since the contamination is normally unknown for the learner
    corecore