144 research outputs found

    Distributional Feature Mapping in Data Classification

    Get PDF
    Performance of a machine learning algorithm depends on the representation of the input data. In computer vision problems, histogram based feature representation has significantly improved the classification tasks. L1 normalized histograms can be modelled by Dirichlet and related distributions to transform input space to feature space. We propose a mapping technique that contains prior knowledge about the distribution of the data and increases the discriminative power of the classifiers in supervised learning such as Support Vector Machine (SVM). The mapping technique for proportional data which is based on Dirichlet, Generalized Dirichlet, Beta Liouville, scaled Dirichlet and shifted scaled Dirichlet distributions can be incorporated with traditional kernels to improve the base kernels accuracy. Experimental results show that the proposed technique for proportional data increases accuracy for machine vision tasks such as natural scene recognition, satellite image classification, gender classification, facial expression recognition and human action recognition in videos. In addition, in object tracking, learning parametric features of the target object using Dirichlet and related distributions may help to capture representations invariant to noise. This further motivated our study of such distributions in object tracking. We propose a framework for feature representation on probability simplex for proportional data utilizing the histogram representation of the target object at initial frame. A set of parameter vectors determine the appearance features of the target object in the subsequent frames. Motivated by the success of distribution based feature mapping for proportional data, we extend this technique for semi-bounded data utilizing inverted Dirichlet, generalized inverted Dirichlet and inverted Beta Liouville distributions. Similar approach is taken into account for count data where Dirichlet multinomial and generalized Dirichlet multinomial distributions are used to map density features with input features

    Bin ratio-based histogram distances and their application to image classification

    Get PDF
    Large variations in image background may cause partial matching and normalization problems for histogram-based representations, i.e., the histograms of the same category may have bins which are significantly different, and normalization may produce large changes in the differences between corresponding bins. In this paper, we deal with this problem by using the ratios between bin values of histograms, rather than bin values' differences which are used in the traditional histogram distances. We propose a bin ratio-based histogram distance (BRD), which is an intra-cross-bin distance, in contrast with previous bin-to-bin distances and cross-bin distances. The BRD is robust to partial matching and histogram normalization, and captures correlations between bins with only a linear computational complexity. We combine the BRD with the ℓ1 histogram distance and the χ2 histogram distance to generate the ℓ1 BRD and the χ2 BRD, respectively. These combinations exploit and benefit from the robustness of the BRD under partial matching and the robustness of the ℓ1 and χ2 distances to small noise. We propose a method for assessing the robustness of histogram distances to partial matching. The BRDs and logistic regression-based histogram fusion are applied to image classification. The experimental results on synthetic data sets show the robustness of the BRDs to partial matching, and the experiments on seven benchmark data sets demonstrate promising results of the BRDs for image classification

    Geometric deep learning: going beyond Euclidean data

    Get PDF
    Many scientific fields study data with an underlying structure that is a non-Euclidean space. Some examples include social networks in computational social sciences, sensor networks in communications, functional networks in brain imaging, regulatory networks in genetics, and meshed surfaces in computer graphics. In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions), and are natural targets for machine learning techniques. In particular, we would like to use deep neural networks, which have recently proven to be powerful tools for a broad range of problems from computer vision, natural language processing, and audio analysis. However, these tools have been most successful on data with an underlying Euclidean or grid-like structure, and in cases where the invariances of these structures are built into networks used to model them. Geometric deep learning is an umbrella term for emerging techniques attempting to generalize (structured) deep neural models to non-Euclidean domains such as graphs and manifolds. The purpose of this paper is to overview different examples of geometric deep learning problems and present available solutions, key difficulties, applications, and future research directions in this nascent field

    3D Robotic Sensing of People: Human Perception, Representation and Activity Recognition

    Get PDF
    The robots are coming. Their presence will eventually bridge the digital-physical divide and dramatically impact human life by taking over tasks where our current society has shortcomings (e.g., search and rescue, elderly care, and child education). Human-centered robotics (HCR) is a vision to address how robots can coexist with humans and help people live safer, simpler and more independent lives. As humans, we have a remarkable ability to perceive the world around us, perceive people, and interpret their behaviors. Endowing robots with these critical capabilities in highly dynamic human social environments is a significant but very challenging problem in practical human-centered robotics applications. This research focuses on robotic sensing of people, that is, how robots can perceive and represent humans and understand their behaviors, primarily through 3D robotic vision. In this dissertation, I begin with a broad perspective on human-centered robotics by discussing its real-world applications and significant challenges. Then, I will introduce a real-time perception system, based on the concept of Depth of Interest, to detect and track multiple individuals using a color-depth camera that is installed on moving robotic platforms. In addition, I will discuss human representation approaches, based on local spatio-temporal features, including new “CoDe4D” features that incorporate both color and depth information, a new “SOD” descriptor to efficiently quantize 3D visual features, and the novel AdHuC features, which are capable of representing the activities of multiple individuals. Several new algorithms to recognize human activities are also discussed, including the RG-PLSA model, which allows us to discover activity patterns without supervision, the MC-HCRF model, which can explicitly investigate certainty in latent temporal patterns, and the FuzzySR model, which is used to segment continuous data into events and probabilistically recognize human activities. Cognition models based on recognition results are also implemented for decision making that allow robotic systems to react to human activities. Finally, I will conclude with a discussion of future directions that will accelerate the upcoming technological revolution of human-centered robotics

    Learning in the Real World: Constraints on Cost, Space, and Privacy

    Get PDF
    The sheer demand for machine learning in fields as varied as: healthcare, web-search ranking, factory automation, collision prediction, spam filtering, and many others, frequently outpaces the intended use-case of machine learning models. In fact, a growing number of companies hire machine learning researchers to rectify this very problem: to tailor and/or design new state-of-the-art models to the setting at hand. However, we can generalize a large set of the machine learning problems encountered in practical settings into three categories: cost, space, and privacy. The first category (cost) considers problems that need to balance the accuracy of a machine learning model with the cost required to evaluate it. These include problems in web-search, where results need to be delivered to a user in under a second and be as accurate as possible. The second category (space) collects problems that require running machine learning algorithms on low-memory computing devices. For instance, in search-and-rescue operations we may opt to use many small unmanned aerial vehicles (UAVs) equipped with machine learning algorithms for object detection to find a desired search target. These algorithms should be small to fit within the physical memory limits of the UAV (and be energy efficient) while reliably detecting objects. The third category (privacy) considers problems where one wishes to run machine learning algorithms on sensitive data. It has been shown that seemingly innocuous analyses on such data can be exploited to reveal data individuals would prefer to keep private. Thus, nearly any algorithm that runs on patient or economic data falls under this set of problems. We devise solutions for each of these problem categories including (i) a fast tree-based model for explicitly trading off accuracy and model evaluation time, (ii) a compression method for the k-nearest neighbor classifier, and (iii) a private causal inference algorithm that protects sensitive data

    Structured Machine Learning for Robotics

    Get PDF
    Machine Learning has become the essential tool for automating tasks that consist in predicting the output associated to a certain input. However many modern algorithms are mainly developed for the simple cases of classification and regression. Structured prediction is the field concerned with predicting outputs consisting of complex objects such as graphs, orientations or sequences. While these objects are often of practical interest, they do not have many of the mathematical properties that allow to design principled and computationally feasible algorithms with traditional techniques. In this thesis we investigate and develop algorithms for learning manifold-valued functions in the context of structured prediction. Differentiable manifolds are a mathematical abstraction used in many domains to describe sets with continuous constraints and non-Euclidean geometric properties. By taking a structured prediction approach we show how to define statistically consistent estimators for predicting elements of a manifold, in constrast to traditional structured predition algorithms that are restricted to output sets with finite cardinality. We introduce a wide range of applications that leverage manifolds structures. Above all, we study the case of the hyperbolic manifold, a space suited for representing hierarchical data. By representing supervised datasets within hyperbolic space we show how it is possible to invent new concepts in a previously known hierarchy and show promising results in hierarchical classification. We also study how modern structured approaches can help with practical robotics tasks, either improving performances in behavioural pipelines or showing more robust predictions for constrained tasks. Specifically, we show how structured prediction can be used to tackle inverse kinematics problems of redundant robots, accounting for the constraints of the robotic joints. We also consider the task of biological motion detection and show that by leveraging the sequence structure of video streams we significantly reduce the latency of the application. Our studies are complemented by empirical evaluations on both synthetic and real data

    Local learning by partitioning

    Full text link
    In many machine learning applications data is assumed to be locally simple, where examples near each other have similar characteristics such as class labels or regression responses. Our goal is to exploit this assumption to construct locally simple yet globally complex systems that improve performance or reduce the cost of common machine learning tasks. To this end, we address three main problems: discovering and separating local non-linear structure in high-dimensional data, learning low-complexity local systems to improve performance of risk-based learning tasks, and exploiting local similarity to reduce the test-time cost of learning algorithms. First, we develop a structure-based similarity metric, where low-dimensional non-linear structure is captured by solving a non-linear, low-rank representation problem. We show that this problem can be kernelized, has a closed-form solution, naturally separates independent manifolds, and is robust to noise. Experimental results indicate that incorporating this structural similarity in well-studied problems such as clustering, anomaly detection, and classification improves performance. Next, we address the problem of local learning, where a partitioning function divides the feature space into regions where independent functions are applied. We focus on the problem of local linear classification using linear partitioning and local decision functions. Under an alternating minimization scheme, learning the partitioning functions can be reduced to solving a weighted supervised learning problem. We then present a novel reformulation that yields a globally convex surrogate, allowing for efficient, joint training of the partitioning functions and local classifiers. We then examine the problem of learning under test-time budgets, where acquiring sensors (features) for each example during test-time has a cost. Our goal is to partition the space into regions, with only a small subset of sensors needed in each region, reducing the average number of sensors required per example. Starting with a cascade structure and expanding to binary trees, we formulate this problem as an empirical risk minimization and construct an upper-bounding surrogate that allows for sequential decision functions to be trained jointly by solving a linear program. Finally, we present preliminary work extending the notion of test-time budgets to the problem of adaptive privacy

    Nonlinear hyperspectral unmixing: strategies for nonlinear mixture detection, endmember estimation and band-selection

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós-Graduação em Engenharia Elétrica, Florianópolis, 2016.Abstract : Mixing phenomena in hyperspectral images depend on a variety of factors such as the resolution of observation devices, the properties of materials, and how these materials interact with incident light in the scene. Different parametric and nonparametric models have been considered to address hyperspectral unmixing problems. The simplest one is the linear mixing model. Nevertheless, it has been recognized that mixing phenomena can also be nonlinear. Kernel-based nonlinear mixing models have been applied to unmix spectral information of hyperspectral images when the type of mixing occurring in the scene is too complex or unknown. However, the corresponding nonlinear analysis techniques are necessarily more challenging and complex than those employed for linear unmixing. Within this context, it makes sense to search for different strategies to produce simpler and/or more accurate results. In this thesis, we tackle three distinct parts of the complete spectral unmixing (SU) problem. First, we propose a technique for detecting nonlinearly mixed pixels. The detection approach is based on the comparison of the reconstruction errors using both a Gaussian process regression model and a linear regression model. The two errors are combined into a detection test statistics for which a probability density function can be reasonably approximated. Second, we propose an iterative endmember extraction algorithm to be employed in combination with the detection algorithm. The proposed detect-then-unmix strategy, which consists of extracting endmembers, detecting nonlinearly mixed pixels and unmixing, is tested with synthetic and real images. Finally, we propose two methods for band selection (BS) in the reproducing kernel Hilbert space (RKHS), which lead to a significant reduction of the processing time required by nonlinear unmixing techniques. The first method employs the kernel k-means (KKM) algorithm to find clusters in the RKHS. Each cluster centroid is then associated to the closest mapped spectral vector. The second method is centralized, and it is based upon the coherence criterion, which sets the largest value allowed for correlations between the basis kernel functions characterizing the unmixing model. We show that the proposed BS approach is equivalent to solving a maximum clique problem (MCP), that is, to searching for the largest complete subgraph in a graph. Furthermore, we devise a strategy for selecting the coherence threshold and the Gaussian kernel bandwidth using coherence bounds for linearly independent bases. Simulation results illustrate the efficiency of the proposed method.Imagem hiperespectral (HI) é uma imagem em que cada pixel contém centenas (ou até milhares) de bandas estreitas e contíguas amostradas num amplo domínio do espectro eletromagnético. Sensores hiperespectrais normalmente trocam resolução espacial por resolução espectral devido principalmente a fatores como a distância entre o instrumento e a cena alvo, e limitada capacidade de processamento, transmissão e armazenamento históricas, mas que se tornam cada vez menos problemáticas. Este tipo de imagem encontra ampla utilização em uma gama de aplicações em astronomia, agricultura, imagens biomédicas, geociências, física, vigilância e sensoriamento remoto. A usual baixa resolução espacial de sensores espectrais implica que o que se observa em cada pixel é normalmente uma mistura das assinaturas espectrais dos materiais presentes na cena correspondente (normalmente denominados de endmembers). Assim um pixel em uma imagem hiperespectral não pode mais ser determinado por um tom ou cor mas sim por uma assinatura espectral do material, ou materiais, que se encontram na região analisada. O modelo mais simples e amplamente utilizado em aplicações com imagens hiperespectrais é o modelo linear, no qual o pixel observado é modelado como uma combinação linear dos endmembers. No entanto, fortes evidências de múltiplas reflexões da radiação solar e/ou materiais intimamente misturados, i.e., misturados em nível microscópico, resultam em diversos modelos não-lineares dos quais destacam-se os modelos bilineares, modelos de pós não-linearidade, modelos de mistura íntima e modelos não-paramétricos. Define-se então o problema de desmistura espectral (ou em inglês spectral unmixing - SU), que consiste em determinar as assinaturas espectrais dos endmembers puros presentes em uma cena e suas proporções (denominadas de abundâncias) para cada pixel da imagem. SU é um problema inverso e por natureza cego uma vez que raramente estão disponíveis informações confiáveis sobre o número de endmembers, suas assinaturas espectrais e suas distribuições em uma dada cena. Este problema possui forte conexão com o problema de separação cega de fontes mas difere no fato de que no problema de SU a independência de fontes não pode ser considerada já que as abundâncias são de fato proporções e por isso dependentes (abundâncias são positivas e devem somar 1). A determinação dos endmembers é conhecida como extração de endmembers e a literatura apresenta uma gama de algoritmos com esse propósito. Esses algoritmos normalmente exploram a geometria convexa resultante do modelo linear e da restrições sobre as abundâncias. Quando os endmembers são considerados conhecidos, ou estimados em um passo anterior, o problema de SU torna-se um problema supervisionado, com pares de entrada (endmembers) e saída (pixels), reduzindo-se a uma etapa de inversão, ou regressão, para determinar as proporções dos endmembers em cada pixel. Quando modelos não-lineares são considerados, a literatura apresenta diversas técnicas que podem ser empregadas dependendo da disponibilidade de informações sobre os endmembers e sobre os modelos que regem a interação entre a luz e os materiais numa dada cena. No entanto, informações sobre o tipo de mistura presente em cenas reais são raramente disponíveis. Nesse contexto, métodos kernelizados, que assumem modelos não-paramétricos, têm sido especialmente bem sucedidos quando aplicados ao problema de SU. Dentre esses métodos destaca-se o SK-Hype, que emprega a teoria de mínimos quadrados-máquinas de vetores de suporte (LS-SVM), numa abordagem que considera um modelo linear com uma flutuação não-linear representada por uma função pertencente a um espaço de Hilbert de kernel reprodutivos (RKHS). Nesta tese de doutoramento diferentes problemas foram abordados dentro do processo de SU de imagens hiperespectrais não-lineares como um todo. Contribuições foram dadas para a detecção de misturas não-lineares, estimação de endmembers quando uma parte considerável da imagem possui misturas não-lineares, e seleção de bandas no espaço de Hilbert de kernels reprodutivos (RKHS). Todos os métodos foram testados através de simulações com dados sintéticos e reais, e considerando unmixing supervisionado e não-supervisionado. No Capítulo 4, um método semi-paramétrico de detecção de misturas não-lineares é apresentado para imagens hiperespectrais. Esse detector compara a performance de dois modelos: um linear paramétrico, usando mínimos-quadrados (LS), e um não-linear não-paramétrico usando processos Gaussianos. A idéia da utilização de modelos não-paramétricos se conecta com o fato de que na prática pouco se sabe sobre a real natureza da não-linearidade presente na cena. Os erros de ajuste desses modelos são então comparados em uma estatística de teste para a qual é possível aproximar a distribuição na hipótese de misturas lineares e, assim, estimar um limiar de detecção para uma dada probabilidade de falso-alarme. A performance do detector proposto foi estudada considerando problemas supervisionados e não-supervisionados, sendo mostrado que a melhoria obtida no desempenho SU utilizando o detector proposto é estatisticamente consistente. Além disso, um grau de não-linearidade baseado nas energias relativas das contribuições lineares e não-lineares do processo de mistura foi definido para quantificar a importância das parcelas linear e não-linear dos modelos. Tal definição é importante para uma correta avaliação dos desempenhos relativos de diferentes estratégias de detecção de misturas não-lineares. No Capítulo 5 um algoritmo iterativo foi proposto para a estimação de endmembers como uma etapa de pré-processamento para problemas SU não supervisionados. Esse algoritmo intercala etapas de detecção de misturas não-lineares e estimação de endmembers de forma iterativa, na qual uma etapa de estimação de endmembers é seguida por uma etapa de detecção, na qual uma parcela dos pixels mais não-lineares é descartada. Esse processo é repetido por um número máximo de execuções ou até um critério de parada ser atingido. Demonstra-se que o uso combinado do detector proposto com um algoritmo de estimação de endmembers leva a melhores resultados de SU quando comparado com soluções do estado da arte. Simulações utilizando diferentes cenários corroboram as conclusões. No Capítulo 6 dois métodos para SU não-linear de imagens hiperespectrais, que empregam seleção de bandas (BS) diretamente no espaço de Hilbert de kernels reprodutivos (RKHS), são apresentados. O primeiro método utiliza o algoritmo Kernel K-Means (KKM) para encontrar clusters diretamente no RKHS onde cada centroide é então associada ao vetor espectral mais próximo. O segundo método é centralizado e baseado no critério de coerência, que incorpora uma medida da qualidade do dicionário no RKHS para a SU não-linear. Essa abordagem centralizada é equivalente a resolver um problema de máximo clique (MCP). Contrariamente a outros métodos concorrentes que não incluem uma escolha eficiente dos parâmetros do modelo, o método proposto requer apenas uma estimativa inicial do número de bandas selecionadas. Os resultados das simulações empregando dados, tanto sintéticos como reais, ilustram a qualidade dos resultados de unmixing obtidos com os métodos de BS propostos. Ao utilizar o SK-Hype, para um número reduzido de bandas, são obtidas estimativas de abundância tão precisas quanto aquelas obtidas utilizando o método SK-Hype com todo o espectro disponível, mas com uma pequena fração do custo computacional

    Apprentissage statistique pour la personnalisation de modèles cardiaques à partir de données d’imagerie

    Get PDF
    This thesis focuses on the calibration of an electromechanical model of the heart from patient-specific, image-based data; and on the related task of extracting the cardiac motion from 4D images. Long-term perspectives for personalized computer simulation of the cardiac function include aid to the diagnosis, aid to the planning of therapy and prevention of risks. To this end, we explore tools and possibilities offered by statistical learning. To personalize cardiac mechanics, we introduce an efficient framework coupling machine learning and an original statistical representation of shape & motion based on 3D+t currents. The method relies on a reduced mapping between the space of mechanical parameters and the space of cardiac motion. The second focus of the thesis is on cardiac motion tracking, a key processing step in the calibration pipeline, with an emphasis on quantification of uncertainty. We develop a generic sparse Bayesian model of image registration with three main contributions: an extended image similarity term, the automated tuning of registration parameters and uncertainty quantification. We propose an approximate inference scheme that is tractable on 4D clinical data. Finally, we wish to evaluate the quality of uncertainty estimates returned by the approximate inference scheme. We compare the predictions of the approximate scheme with those of an inference scheme developed on the grounds of reversible jump MCMC. We provide more insight into the theoretical properties of the sparse structured Bayesian model and into the empirical behaviour of both inference schemesCette thèse porte sur un problème de calibration d'un modèle électromécanique de cœur, personnalisé à partir de données d'imagerie médicale 3D+t ; et sur celui - en amont - de suivi du mouvement cardiaque. A cette fin, nous adoptons une méthodologie fondée sur l'apprentissage statistique. Pour la calibration du modèle mécanique, nous introduisons une méthode efficace mêlant apprentissage automatique et une description statistique originale du mouvement cardiaque utilisant la représentation des courants 3D+t. Notre approche repose sur la construction d'un modèle statistique réduit reliant l'espace des paramètres mécaniques à celui du mouvement cardiaque. L'extraction du mouvement à partir d'images médicales avec quantification d'incertitude apparaît essentielle pour cette calibration, et constitue l'objet de la seconde partie de cette thèse. Plus généralement, nous développons un modèle bayésien parcimonieux pour le problème de recalage d'images médicales. Notre contribution est triple et porte sur un modèle étendu de similarité entre images, sur l'ajustement automatique des paramètres du recalage et sur la quantification de l'incertitude. Nous proposons une technique rapide d'inférence gloutonne, applicable à des données cliniques 4D. Enfin, nous nous intéressons de plus près à la qualité des estimations d'incertitude fournies par le modèle. Nous comparons les prédictions du schéma d'inférence gloutonne avec celles données par une procédure d'inférence fidèle au modèle, que nous développons sur la base de techniques MCMC. Nous approfondissons les propriétés théoriques et empiriques du modèle bayésien parcimonieux et des deux schémas d'inférenc

    Shape Representations Using Nested Descriptors

    Get PDF
    The problem of shape representation is a core problem in computer vision. It can be argued that shape representation is the most central representational problem for computer vision, since unlike texture or color, shape alone can be used for perceptual tasks such as image matching, object detection and object categorization. This dissertation introduces a new shape representation called the nested descriptor. A nested descriptor represents shape both globally and locally by pooling salient scaled and oriented complex gradients in a large nested support set. We show that this nesting property introduces a nested correlation structure that enables a new local distance function called the nesting distance, which provides a provably robust similarity function for image matching. Furthermore, the nesting property suggests an elegant flower like normalization strategy called a log-spiral difference. We show that this normalization enables a compact binary representation and is equivalent to a form a bottom up saliency. This suggests that the nested descriptor representational power is due to representing salient edges, which makes a fundamental connection between the saliency and local feature descriptor literature. In this dissertation, we introduce three examples of shape representation using nested descriptors: nested shape descriptors for imagery, nested motion descriptors for video and nested pooling for activities. We show evaluation results for these representations that demonstrate state-of-the-art performance for image matching, wide baseline stereo and activity recognition tasks
    corecore