8 research outputs found

    An Exploration of MPEG-7 Shape Descriptors

    Get PDF
    The Multimedia Content Description Interface (ISO/IEC 15938), commonly known to as MPEG-7, became a standard as of September of 2001. Unlike its predecessors, MPEG- 7 standardizes multimedia metadata description. By providing robust descriptors and an effective system for storing them, MPEG-7 is designed to provide a means of navigation through audio-visual content. In particular, MPEG-7 provides two two-dimensional shape descriptors, the Angular Radial Transform (ART) and Curvature Scaled Space (CSS), for use in image and video annotation and retrieval. Field Programmable Gate Arrays (FPGAs) have a very general structure and are made up of programmable switches that allow the end-user, rather than the manufacturer, to configure these switches for whatever design is needed by their application. This flexibly has led to the use of FPGAs for prototyping and implementing circuit designs as well as their use being suggesting as part of reconfigurable computing. For this work, an FPGA based ART extractor was designed and simulated for a Xilinx Virtex-E XCV300e in order to provide a speedup over software based extraction. The design created is capable of processing over 69,4400 pixels a minute. This design utilizes 99% of the FPGA\u27s logical resources and operates at a clock rate of 25 MHz. Along with the proposed design, the MPEG-7 shape descriptors were explored as to how well they retrieved similar objects and how these objects matched up to what a human would expect. Results showed that the majority of the retrievals made using the MPEG-7 shape descriptors returned visually acceptable results. It should be noted that even the human results had a high amount of variance. Finally, this thesis briefly explored the potential of utilizing the ART descriptor for optical character recognition (OCR) in the context of image retrieval from databases. It was demonstrated that the ART has potential for use in OCR, however there is still research to be performed in this area

    Deep learning for animal recognition

    Get PDF
    Deep learning has obtained many successes in different computer vision tasks such as classification, detection, and segmentation of objects or faces. Many of these successes can be ascribed to training deep convolutional neural network architectures on a dataset containing many images. Limited research has explored deep learning methods for performing recognition or detection of animals using a limited number of images. This thesis examines the use of different deep learning techniques and conventional computer vision methods for performing animal recognition or detection with relatively small training datasets and has the following objectives: 1) Analyse the performance of deep learning systems compared to classical approaches when there exists a limited number of images of animals; 2) Develop an algorithm for effectively dealing with rotation variation naturally present in aerial images; 3) Construct a computer vision system that is more robust to illumination variation; 4) Analyse how important the use of different color spaces is in deep learning; 5) Compare different deep convolutional neural-network algorithms for detecting and recognizing individual instances (identities) in a group of animals, for example, badgers. For most of the experiments, effectively reduced neural network recognition systems are used, which are derived from existing architectures. These reduced systems are compared to standard architectures and classical computer vision methods. We also propose a color transformation algorithm, a novel rotation-matrix data-augmentation algorithm and a hybrid variant of such a method, that factors color constancy with the aim to enhance images and construct a system that is more robust to different kinds of visual appearances. The results show that our proposed algorithms aid deep learning systems to become more accurate in classifying animals for a large number of different animal datasets. Furthermore, the developed systems yield performances that significantly surpass classical computer vision techniques, even with limited amounts of available images for training

    An Image Processing Approach Toward a Visual Intra-Cortical Stimulator

    Get PDF
    Abstract Visual impairment may be caused by various factors varying from trauma, birth-defects, and diseases. Until today there are no viable medical treatments for this condition; hence bio-medical approaches are being employed to overcome that. The Cortivision team has been working on an intra-cortical implant that can bypass the retina and optic nerve and directly stimulate the visual cortex. In this work we aimed to implement a modular, reusable, and parameterizable object recognition system that tends to ``simplify'' video data prior to stimulation; hence opening new horizons for partial vision restoration, navigational and even recognition abilities. We identified the Scale Invariant Feature Transform (SIFT) algorithm as being a robust candidate for our application's needs. A multithreaded software prototype of the SIFT and Lucas-Kanade tracker was implemented to ensure proper overall operation. The feature extractor, difference of Gaussians (DoG) part of the SIFT, being the most computationally expensive, was migrated to an FPGA implementation due to the real-time restrictions that is not achievable on a host machine. The VHDL implementation is highly parameterizable for different application needs and tradeoffs. We introduced a novel architecture employing the sub-kernel trick to reduce resource usage compared to preexisting architectures while still being comparably accurate to a software floating point implementation. In order to alleviate transmission bottlenecks, the system also includes a new parallel Huffman encoder design that is capable of performing lossless compression of both images and scale space image pyramids taking into account spatial and scale data correlations during the predictor phase. The encoder was able to achieve compression ratios of 27.3% on the Caltech-256 data-set. Furthermore, a new camera and fiducial markers setup based on image processing was proposed in order to target the phosphene map estimation problem which affects the quality of the final stimulation that is perceived by the patient.----------RÉSUMÉ Introduction et objectifs La déficience visuelle, qui est définie par la perte totale ou partielle de la vision, n'est actuellement pas médicalement traitable. Des approches biomédicales modernes sont utilisées pour stimuler électriquement la vision; ces approches peuvent être divisées en trois groupes principaux: le premier ciblant les implants rétiniens Humayun et al. (2003), Kim et al. (2004), Chow et al. (2004); Palanker et al. (2005), Toledo et al. (2005); Yanai et al. (2007), Winter et al. (2007); Zrenner et al. (2011), le deuxième ciblant les implants du nerf optique Veraart et al. (2003), Sakaguchi et al. (2009), et le troisième ciblant les implants intra-corticaux Doljanu et Sawan (2007); Coulombe et al. (2007); Srivastava et al. (2007). L’inconvénient principal des deux premiers groupes, c'est qu'ils ne sont pas suffisamment génériques pour surmonter la majorité des maladies de déficience visuelle, car ils dépendent du fait que le patient doit avoir un nerf optique intact et/ou une rétine partiellement opérationnelle ; ce qui n'est pas le cas pour le troisième groupe. L'équipe du Laboratoire Polystim Neurotechnologies travaille actuellement sur un implant intra-cortical qui stimule directement le cortex visuel primaire (région V1) ; le nom du projet global est Cortivision. Le système utilise une caméra, un module de traitement d'image, un transmetteur RF (radiofréquence) et un stimulateur implantable. Cette méthode est robuste et générique car elle contourne l'oeil et le nerf optique. Un des défis majeurs est le traitement d'image nécessaire pour «simplifier» les données antérieures à la stimulation, l'extraction de l’information utile en écartant les données superflues. Les pixels qui sont capturés par la caméra n'ont pas de correspondance un-à-un sur le cortex visuel comme dans une image rectangulaire, ils sont plutôt mis en correspondance avec une carte complexe de «phosphènes» Coulombe et al. (2007); Srivastava et al. (2007). Les phosphènes sont des points lumineux qui apparaissent dans le champ de vision du patient quand le cerveau est stimulé électriquement. Ces points changent en terme de taille, de luminosité et d’emplacement en fonction de la façon dont la stimulation électrique est effectuée (c'est à dire un changement dans la fréquence, la tension, la durée, etc. ...) et même par le placement physique des électrodes dans le cortex visuel. Les approches actuelles visent à stimuler des images de phosphènes monochromes à basse résolution. Sachant cela, nous nous attendons plutôt à une vision de faible qualité qui rend des activités comme naviguer, interpréter des objets, ou encore lire, difficile pour le patient. Ceci est principalement dû à la complexité de l’étalonnage de la carte phosphène et sa correspondance, et aussi à la non-trivialité de savoir comment simplifier les données à partir des images qui viennent de la camera de façon qu’on conserve seulement les données pertinentes. La Figure 1.1 est un exemple qui démontre la non-trivialité de transformer une image grise en stimulation phosphène

    A Dataflow Framework For Developing Flexible Embedded Accelerators A Computer Vision Case Study.

    Get PDF
    The focus of this dissertation is the design and the implementation of a computing platform which can accelerate data processing in the embedded computation domain. We focus on a heterogeneous computing platform, whose hardware implementation can approach the power and area efficiency of specialized designs, while remaining flexible across the application domain. The multi-core architectures require parallel programming, which is widely-regarded as more challenging than sequential programming. Although shared memory parallel programs may be fairly easy to write (using OpenMP, for example), they are quite hard to optimize; providing embedded application developers with optimizing tools and programming frameworks is a challenge. The heterogeneous specialized elements make the problem even more difficult. Dataflow is a parallel computation model that relies exclusively on message passing, and that has some advantages over parallel programming tools in wide use today: simplicity, graphical representation, and determinism. Dataflow model is also a good match to streaming applications, such as audio, video and image processing, which operate on large sequences of data and are characterized by abundant parallelism and regular memory access patterns. Dataflow model of computation has gained acceptance in simulation and signal-processing communities. This thesis evaluates the applicability of the dataflow model for implementing domain-specific embedded accelerators for streaming applications

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    XATA 2006: XML: aplicações e tecnologias associadas

    Get PDF
    Esta é a quarta conferência sobre XML e Tecnologias Associadas. Este evento tem-se tornado um ponto de encontro para quem se interessa pela temática e tem sido engraçado observar que os participantes gostam e tentam voltar nos anos posteriores. O grupo base de trabalho, a comissão científica, também tem vindo a ser alargada e todos os que têm colaborado com vontade e com uma qualidade crescente ano após ano. Pela quarta vez estou a redigir este prefácio e não consigo evitar a redacção de uma descrição da evolução da XATA ao longo destes quatro anos: 2003 Nesta "reunião", houve uma vintena de trabalhos submetidos, maioritariamente da autoria ou da supervisão dos membros que integravam a comissão organizadora o que não envalidou uma grande participação e acesas discussões. 2004 Houve uma participação mais forte da comunidade portuguesa mas ainda com números pouco expressivos. Nesta altura, apostou-se também numa forte participação da indústria, o que se traduziu num conjunto apreciável de apresentações de casos reais. Foi introduzido o processo de revisão formal dos trabalhos submetidos. 2005 Houve uma forte adesão nacional e internacional (Espanha e Brasil, o que para um evento onde se pretende privilegiar a língua portuguesa é ainda mais significativo). A distribuição geográfica em Portugal também aumentou, havendo mais instituições participantes. Automatizaram-se várias tarefas como o processo de submissão e de revisão de artigos. 2006 Nesta edição actual, e contrariamente ao que acontece no plano nacional, houve um crescimento significativo. Em todas as edições, tem sido objectivo da comissão organizadora, previlegiar a produção científica e dar voz ao máximo número de participantes. Nesse sentido, este ano, não haverá oradores convidados, sendo o programa integralmente preenchido com as apresentações dos trabalhos seleccionados. Apesar disso ainda houve uma taxa significativa de rejeições, principalmente devido ao elevado número de submissões. Foi introduzido também, nesta edição, um dia de tutoriais com o objectivo de fornecer competências mínimas a quem quer começar a trabalhar na área e também poder assistir de uma forma mais informada à conferência. Se analisarmos as temáticas, abordadas nas quatro conferências, percebemos que também aqui há uma evolução no sentido de uma maior maturidade. Enquanto que no primeiro encontro, os trabalhos abordavam problemas emergentes na utilização da tecnologia, no segundo encontro a grande incidência foi nos Web Services, uma nova tecnologia baseada em XML, no terceiro, a maior incidência foi na construção de repositórios, motores de pesquisa e linguagens de interrogação, nesta quarta edição há uma distribuição quase homogénea por todas as áreas temáticas tendo mesmo aparecido trabalhos que abordam aspectos científicos e tecnológicos da base da tecnologia XML. Desta forma, podemos concluir que a tecnologia sob o ponto de vista de utilização e aplicação está dominada e que a comunidade portuguesa começa a fazer contributos para a ciência de base.Microsoft

    Personality Identification from Social Media Using Deep Learning: A Review

    Get PDF
    Social media helps in sharing of ideas and information among people scattered around the world and thus helps in creating communities, groups, and virtual networks. Identification of personality is significant in many types of applications such as in detecting the mental state or character of a person, predicting job satisfaction, professional and personal relationship success, in recommendation systems. Personality is also an important factor to determine individual variation in thoughts, feelings, and conduct systems. According to the survey of Global social media research in 2018, approximately 3.196 billion social media users are in worldwide. The numbers are estimated to grow rapidly further with the use of mobile smart devices and advancement in technology. Support vector machine (SVM), Naive Bayes (NB), Multilayer perceptron neural network, and convolutional neural network (CNN) are some of the machine learning techniques used for personality identification in the literature review. This paper presents various studies conducted in identifying the personality of social media users with the help of machine learning approaches and the recent studies that targeted to predict the personality of online social media (OSM) users are reviewed

    CACIC 2015 : XXI Congreso Argentino de Ciencias de la Computación. Libro de actas

    Get PDF
    Actas del XXI Congreso Argentino de Ciencias de la Computación (CACIC 2015), realizado en Sede UNNOBA Junín, del 5 al 9 de octubre de 2015.Red de Universidades con Carreras en Informática (RedUNCI
    corecore