72 research outputs found

    Feature transforms for image data augmentation

    Get PDF
    A problem with convolutional neural networks (CNNs) is that they require large datasets to obtain adequate robustness; on small datasets, they are prone to overfitting. Many methods have been proposed to overcome this shortcoming with CNNs. In cases where additional samples cannot easily be collected, a common approach is to generate more data points from existing data using an augmentation technique. In image classification, many augmentation approaches utilize simple image manipulation algorithms. In this work, we propose some new methods for data augmentation based on several image transformations: the Fourier transform (FT), the Radon transform (RT), and the discrete cosine transform (DCT). These and other data augmentation methods are considered in order to quantify their effectiveness in creating ensembles of neural networks. The novelty of this research is to consider different strategies for data augmentation to generate training sets from which to train several classifiers which are combined into an ensemble. Specifically, the idea is to create an ensemble based on a kind of bagging of the training set, where each model is trained on a different training set obtained by augmenting the original training set with different approaches. We build ensembles on the data level by adding images generated by combining fourteen augmentation approaches, with three based on FT, RT, and DCT, proposed here for the first time. Pretrained ResNet50 networks are finetuned on training sets that include images derived from each augmentation method. These networks and several fusions are evaluated and compared across eleven benchmarks. Results show that building ensembles on the data level by combining different data augmentation methods produce classifiers that not only compete competitively against the state-of-the-art but often surpass the best approaches reported in the literature

    The larger the better: Analysis of a scalable spectral clustering algorithm with cosine similarity

    Get PDF
    Chen (2018) proposed a scalable spectral clustering algorithm for cosine similarity to handle the task of clustering large data sets. It runs extremely fast, with a linear complexity in the size of the data, and achieves state of the art accuracy. This paper conducts perturbation analysis of the algorithm to understand the effect of discarding a perturbation term in an eigendecomposition step. Our results show that the accuracy of the approximation by the scalable algorithm depends on the connectivity of the clusters, their separation and sizes, and is especially accurate for large data sets

    Error-tolerant Graph Matching on Huge Graphs and Learning Strategies on the Edit Costs

    Get PDF
    Els grafs són estructures de dades abstractes que s'utilitzen per a modelar problemes reals amb dues entitats bàsiques: nodes i arestes. Cada node o vèrtex representa un punt d'interès rellevant d'un problema, i cada aresta representa la relació entre aquests vèrtexs. Els nodes i les arestes podrien incorporar atributs per augmentar la precisió del problema modelat. Degut a aquesta versatilitat, s'han trobat moltes aplicacions en camps com la visió per computador, biomèdics, anàlisi de xarxes, etc. La Distància d'edició de grafs (GED) s'ha convertit en una eina important en el reconeixement de patrons estructurals, ja que permet mesurar la dissimilitud dels grafs. A la primera part d'aquesta tesi es presenta un mètode per generar una parella grafs juntament amb la seva correspondència en un cost computacional lineal. A continuació, se centra en com mesurar la dissimilitud entre dos grafs enormes (més de 10.000 nodes), utilitzant un nou algoritme de aparellament de grafs anomenat Belief Propagation. Té un cost computacional O(d^3.5N). Aquesta tesi també presenta un marc general per aprendre els costos d'edició implicats en els càlculs de la GED automàticament. Després, concretem aquest marc en dos models diferents basats en xarxes neuronals i funcions de densitat de probabilitat. S'ha realitzat una validació pràctica exhaustiva en 14 bases de dades públiques. Aquesta validació mostra que la precisió és major amb els costos d'edició apresos, que amb alguns costos impostos manualment o altres costos apresos automàticament per mètodes anteriors. Finalment proposem una aplicació de l'algoritme Belief propagation utilitzat en la simulació de la mecànica muscular.Los grafos son estructuras de datos abstractos que se utilizan para modelar problemas reales con dos entidades básicas: nodos y aristas. Cada nodo o vértice representa un punto de interés relevante de un problema, y cada arista representa la relación entre estos vértices. Los nodos y las aristas podrían incorporar atributos para aumentar la precisión del problema modelado. Debido a esta versatilidad, se han encontrado muchas aplicaciones en campos como la visión por computador, biomédicos, análisis de redes, etc. La Distancia de edición de grafos (GED) se ha convertido en una herramienta importante en el reconocimiento de patrones estructurales, ya que permite medir la disimilitud de los grafos. En la primera parte de esta tesis se presenta un método para generar una pareja grafos junto con su correspondencia en un coste computacional lineal. A continuación, se centra en cómo medir la disimilitud entre dos grafos enormes (más de 10.000 nodos), utilizando un nuevo algoritmo de emparejamiento de grafos llamado Belief Propagation. Tiene un coste computacional O(d^3.5n). Esta tesis también presenta un marco general para aprender los costos de edición implicados en los cálculos de GED automáticamente. Luego, concretamos este marco en dos modelos diferentes basados en redes neuronales y funciones de densidad de probabilidad. Se ha realizado una validación práctica exhaustiva en 14 bases de datos públicas. Esta validación muestra que la precisión es mayor con los costos de edición aprendidos, que con algunos costos impuestos manualmente u otros costos aprendidos automáticamente por métodos anteriores. Finalmente proponemos una aplicación del algoritmo Belief propagation utilizado en la simulación de la mecánica muscular.Graphs are abstract data structures used to model real problems with two basic entities: nodes and edges. Each node or vertex represents a relevant point of interest of a problem, and each edge represents the relationship between these points. Nodes and edges could be attributed to increase the accuracy of the modeled problem, which means that these attributes could vary from feature vectors to description labels. Due to this versatility, many applications have been found in fields such as computer vision, bio-medics, network analysis, etc. Graph Edit Distance (GED) has become an important tool in structural pattern recognition since it allows to measure the dissimilarity of attributed graphs. The first part presents a method is presented to generate graphs together with an upper and lower bound distance and a correspondence in a linear computational cost. Through this method, the behaviour of the known -or the new- sub-optimal Error-Tolerant graph matching algorithm can be tested against a lower and an upper bound GED on large graphs, even though we do not have the true distance. Next, the present is focused on how to measure the dissimilarity between two huge graphs (more than 10.000 nodes), using a new Error-Tolerant graph matching algorithm called Belief Propagation algorithm. It has a O(d^3.5n) computational cost.This thesis also presents a general framework to learn the edit costs involved in the GED calculations automatically. Then, we concretise this framework in two different models based on neural networks and probability density functions. An exhaustive practical validation on 14 public databases has been performed. This validation shows that the accuracy is higher with the learned edit costs, than with some manually imposed costs or other costs automatically learned by previous methods. Finally we propose an application of the Belief propagation algorithm applied to muscle mechanics

    Bag of ARSRG Words (BoAW)

    Get PDF
    In recent years researchers have worked to understand image contents in computer vision. In particular, the bag of visual words (BoVW) model, which describes images in terms of a frequency histogram of visual words, is the most adopted paradigm. The main drawback is the lack of information about location and the relationships between features. For this purpose, we propose a new paradigm called bag of ARSRG (attributed relational SIFT (scale-invariant feature transform) regions graph) words (BoAW). A digital image is described as a vector in terms of a frequency histogram of graphs. Adopting a set of steps, the images are mapped into a vector space passing through a graph transformation. BoAW is evaluated in an image classification context on standard datasets and its effectiveness is demonstrated through experimental results compared with well-known competitors

    Deep Supervised Hashing using Symmetric Relative Entropy

    Get PDF
    By virtue of their simplicity and efficiency, hashing algorithms have achieved significant success on large-scale approximate nearest neighbor search. Recently, many deep neural network based hashing methods have been proposed to improve the search accuracy by simultaneously learning both the feature representation and the binary hash functions. Most deep hashing methods depend on supervised semantic label information for preserving the distance or similarity between local structures, which unfortunately ignores the global distribution of the learned hash codes. We propose a novel deep supervised hashing method that aims to minimize the information loss generated during the embedding process. Specifically, the information loss is measured by the Jensen-Shannon divergence to ensure that compact hash codes have a similar distribution with those from the original images. Experimental results show that our method outperforms current state-of-the-art approaches on two benchmark datasets

    Quantum Cognitively Motivated Decision Fusion for Video Sentiment Analysis

    Full text link
    Video sentiment analysis as a decision-making process is inherently complex, involving the fusion of decisions from multiple modalities and the so-caused cognitive biases. Inspired by recent advances in quantum cognition, we show that the sentiment judgment from one modality could be incompatible with the judgment from another, i.e., the order matters and they cannot be jointly measured to produce a final decision. Thus the cognitive process exhibits "quantum-like" biases that cannot be captured by classical probability theories. Accordingly, we propose a fundamentally new, quantum cognitively motivated fusion strategy for predicting sentiment judgments. In particular, we formulate utterances as quantum superposition states of positive and negative sentiment judgments, and uni-modal classifiers as mutually incompatible observables, on a complex-valued Hilbert space with positive-operator valued measures. Experiments on two benchmarking datasets illustrate that our model significantly outperforms various existing decision level and a range of state-of-the-art content-level fusion approaches. The results also show that the concept of incompatibility allows effective handling of all combination patterns, including those extreme cases that are wrongly predicted by all uni-modal classifiers.Comment: The uploaded version is a preprint of the accepted AAAI-21 pape