140 research outputs found

    Siamese Networks for Visual Object Tracking

    Get PDF
    Visual object tracking has become one of the hottest topics in computer vision since its appearance in the 90s. It has a wide range of important applications in real life, such as autonomous driving, robot navigation and video surveillance. Despite the efforts made by the research community during the last decades, arbitrary object tracking is still, in its generality, an unsolved problem. Recently, some tracking algorithms have used convolutional neural networks trained from large datasets, providing richer image features and achieving more accurate object tracking. Results show that deep learning techniques can be applied to enhance the tracking capabilities by learning a better model of the object?s appearance. The aim of this thesis is to study and evaluate the implementation of one method of this approach called SiamFC and to give a brief overview of the current tracking challenges. The code developed in this study makes use of an existing Python implementation of SiamFC and is publicly available at https://github.com/sergi2596/pytorch-siamfcEl seguimiento de objetos se ha convertido en uno de los temas más candentes en visión artificial de las últimas décadas. Se puede aplicar a multitud de situaciones en la vida real, como por ejemplo la conducción autónoma, la robótica o la videovigilancia. A pesar de que la comunidad científica ha estado investigando activamente en este campo, el seguimiento de objetos es todavía un problema complejo que necesita ser mejorado. Recientemente, algunos algoritmos han utilizado las redes neuronales convolucionales entrenadas con grandes bancos de datos para ofrecer un seguimiento de objetos mejor y más fiable. Los resultados muestran que las técnicas de aprendizaje profundo se pueden aplicar para mejorar las capacidades de seguimiento gracias a la oportunidad de aprender modelos más complejos de la apariencia de los objetos. Este trabajo busca estudiar y probar la implementación de uno de estos algoritmos conocido como SiamFC, así como dar una visión global de los retos actuales del seguimiento de objetos. El código desarrollado en esta tesis está basado en una implementación ya existente de SiamFC basada en Python y está disponible en https://github.com/sergi2596/pytorch-siamfc.El seguiment d'objectes s'ha convertit en un dels temes més candents en visió artificial de les últimes dècades. Es pot aplicar a multitud de situacions a la vida real, com per exemple conducció autònoma, robòtica i videovigilància. Tot i que la comunitat científica ha estat molt activa investigant en aquest camp, el seguiment d'objectes és encara un problema complex que necessita ser millorat. Recentment, alguns algoritmes han utilitzat les xarxes neuronals convolucionals entrenades amb grans bancs de dades per oferir un seguiment d'objectes millor i més fiable. Els resultats mostren que les tècniques d'aprenentatge profund es poden aplicar per millorar les capacitats de seguiment gràcies a la oportunitat d'aprendre models més complexos de l'aparença dels objectes. L'objectiu d'aquest treball és estudiar i provar la implementació d'un d'aquests algoritmes anomenat SiamFC, així com donar una visió global dels reptes actuals del seguiment d'objectes. El codi desenvolupat en aquesta tesis està basat en una implementació ja existent del SiamFC basada en Python i està a https://github.com/sergi2596/pytorch-siamf

    感性推定のためのDeep Learning による特徴抽出

    Get PDF
    広島大学(Hiroshima University)博士(工学)Doctor of Engineeringdoctora

    Age and gender classification: A proposed system

    Get PDF
    With the new General Data Protection Regulation, there has been a lot of concerns when it comes to saving personal and sensitive data. As a result, there is a necessity to gather information without storing any data that could be considered sensitive, and that could identify the person to which it belongs to. Our motivation was to create a system that could be used to gather information about the people that visit commercial areas, using their surveillance systems as input to the application. In the present work, we developed a system capable of gathering age and gender information from people based on images, using Deep Learning. Such system was built using a face detection model based on the GoogLeNet deep neural network and on a Wide Residual Network for age and gender classification, supported by a Siamese Network for the latter. The outcome is, to the best of our knowledge, the first available implementation that makes use of Wide Residual Networks and Siamese Networks at the same time for gender classification.Com o aparecimento do Regulamento Geral de Proteção de Dados, têm surgido várias preocupações no que diz respeito ao armazenamento de dados sensíveis e pessoais de clientes. Com isto, surge a necessidade de obter informação sem guardar quaisquer dados sensíveis que possam identificar a pessoa aos quais dizem respeito. Um exemplo, que serviu de motivação para o trabalho desenvolvido nesta dissertação, é o de uma aplicação que requeira a criação de modelos que sejam capazes de recolher informação acerca do tipo de pessoas que frequenta determinadas áreas comerciais, utilizando os seus sistemas de vigilância como dados de entrada para essa aplicação. No presente trabalho foi desenvolvido um sistema com o intuito de obter dados, nomeadamente a idade e género, através da utilização de imagens, utilizando para isso técnicas de Deep Learning. Este sistema é constuído por um modelo de detecção de pessoas baeado no modelo GoogLeNet e, para a classificação de idades e género, por uma Wide Residual Network, suportada por uma Rede Siamesa no que diz respeito à classificação de género. Para além da criação de um sistema capaz de classificar idades e género a partir de imagens de forma integrada, no melhor do nosso conhecimento, esse sistema constitui a primeira implementação disponível que utiliza Wide Residual Networks em conjunto com Redes Siameses para o problema específico da classificação de género

    Improving the Generalizability of Speech Emotion Recognition: Methods for Handling Data and Label Variability

    Full text link
    Emotion is an essential component in our interaction with others. It transmits information that helps us interpret the content of what others say. Therefore, detecting emotion from speech is an important step towards enabling machine understanding of human behaviors and intentions. Researchers have demonstrated the potential of emotion recognition in areas such as interactive systems in smart homes and mobile devices, computer games, and computational medical assistants. However, emotion communication is variable: individuals may express emotion in a manner that is uniquely their own; different speech content and environments may shape how emotion is expressed and recorded; individuals may perceive emotional messages differently. Practically, this variability is reflected in both the audio-visual data and the labels used to create speech emotion recognition (SER) systems. SER systems must be robust and generalizable to handle the variability effectively. The focus of this dissertation is on the development of speech emotion recognition systems that handle variability in emotion communications. We break the dissertation into three parts, according to the type of variability we address: (I) in the data, (II) in the labels, and (III) in both the data and the labels. Part I: The first part of this dissertation focuses on handling variability present in data. We approximate variations in environmental properties and expression styles by corpus and gender of the speakers. We find that training on multiple corpora and controlling for the variability in gender and corpus using multi-task learning result in more generalizable models, compared to the traditional single-task models that do not take corpus and gender variability into account. Another source of variability present in the recordings used in SER is the phonetic modulation of acoustics. On the other hand, phonemes also provide information about the emotion expressed in speech content. We discover that we can make more accurate predictions of emotion by explicitly considering both roles of phonemes. Part II: The second part of this dissertation addresses variability present in emotion labels, including the differences between emotion expression and perception, and the variations in emotion perception. We discover that it is beneficial to jointly model both the perception of others and how one perceives one’s own expression, compared to focusing on either one. Further, we show that the variability in emotion perception is a modelable signal and can be captured using probability distributions that describe how groups of evaluators perceive emotional messages. Part III: The last part of this dissertation presents methods that handle variability in both data and labels. We reduce the data variability due to non-emotional factors using deep metric learning and model the variability in emotion perception using soft labels. We propose a family of loss functions and show that by pairing examples that potentially vary in expression styles and lexical content and preserving the real-valued emotional similarity between them, we develop systems that generalize better across datasets and are more robust to over-training. These works demonstrate the importance of considering data and label variability in the creation of robust and generalizable emotion recognition systems. We conclude this dissertation with the following future directions: (1) the development of real-time SER systems; (2) the personalization of general SER systems.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147639/1/didizbq_1.pd

    DeepC2: AI-powered Covert Botnet Command and Control on OSNs

    Full text link
    Botnets are one of the major threats to computer security. In previous botnet command and control (C&C) scenarios using online social networks (OSNs), methods for addressing (e.g., IDs, links, or DGAs) are hardcoded into bots. Once a bot is reverse engineered, the botmaster and C&C infrastructure will be exposed. Additionally, abnormal content from explicit commands may expose botmasters and raise anomalies on OSNs. To overcome these deficiencies, we proposed DeepC2, an AI-powered covert C&C method on OSNs. By leveraging neural networks, bots can find botmasters by avatars, which are converted into feature vectors and embedded into bots. Adversaries cannot infer botmasters' accounts from the vectors. Commands are embedded into normal contents (e.g., tweets and comments) using text data augmentation and hash collision. Experiments on Twitter show that command-embedded contents can be generated efficiently, and bots can find botmasters and obtain commands accurately. Security analysis on different scenarios show that DeepC2 is robust and hard to be shut down. By demonstrating how AI may help promote covert communication on OSNs, this work provides a new perspective on botnet detection and confrontation.Comment: 13 pages, 15 figures, 7 tables. Discussion on possible countermeasures update

    Emotion-aware cross-modal domain adaptation in video sequences

    Get PDF
    corecore