14 research outputs found

    Quality estimation and optimization of adaptive stereo matching algorithms for smart vehicles

    Get PDF
    Stereo matching is a promising approach for smart vehicles to find the depth of nearby objects. Transforming a traditional stereo matching algorithm to its adaptive version has potential advantages to achieve the maximum quality (depth accuracy) in a best-effort manner. However, it is very challenging to support this adaptive feature, since (1) the internal mechanism of adaptive stereo matching (ASM) has to be accurately modeled, and (2) scheduling ASM tasks on multiprocessors to generate the maximum quality is difficult under strict real-time constraints of smart vehicles. In this article, we propose a framework for constructing an ASM application and optimizing its output quality on smart vehicles. First, we empirically convert stereo matching into ASM by exploiting its inherent characteristics of disparity–cycle correspondence and introduce an exponential quality model that accurately represents the quality–cycle relationship. Second, with the explicit quality model, we propose an efficient quadratic programming-based dynamic voltage/frequency scaling (DVFS) algorithm to decide the optimal operating strategy, which maximizes the output quality under timing, energy, and temperature constraints. Third, we propose two novel methods to efficiently estimate the parameters of the quality model, namely location similarity-based feature point thresholding and street scenario-confined CNN prediction. Results show that our DVFS algorithm achieves at least 1.61 times quality improvement compared to the state-of-the-art techniques, and average parameter estimation for the quality model achieves 96.35% accuracy on the straight road

    Real-Time Multi-Fisheye Camera Self-Localization and Egomotion Estimation in Complex Indoor Environments

    Get PDF
    In this work a real-time capable multi-fisheye camera self-localization and egomotion estimation framework is developed. The thesis covers all aspects ranging from omnidirectional camera calibration to the development of a complete multi-fisheye camera SLAM system based on a generic multi-camera bundle adjustment method

    OpenCL acceleration on FPGA vs CUDA on GPU

    Get PDF

    Holistic Optimization of Embedded Computer Vision Systems

    Full text link
    Despite strong interest in embedded computer vision, the computational demands of Convolutional Neural Network (CNN) inference far exceed the resources available in embedded devices. Thankfully, the typical embedded device has a number of desirable properties that can be leveraged to significantly reduce the time and energy required for CNN inference. This thesis presents three independent and synergistic methods for optimizing embedded computer vision: 1) Reducing the time and energy needed to capture and preprocess input images by optimizing the image capture pipeline for the needs of CNNs rather than humans. 2) Exploiting temporal redundancy within incoming video streams to perform computationally cheap motion estimation and compensation in lieu of full CNN inference for the majority of frames. 3) Leveraging the sparsity of CNN activations within the frequency domain to significantly reduce the number of operations needed for inference. Collectively these techniques significantly reduce the time and energy needed for computer vision at the edge, enabling a wide variety of exciting new applications

    Novo método iterativo de localização da câmera baseado no conceito de resection-intersection

    Get PDF
    A Odometria Visual é o processo de estimar o movimento de um ente a partir de duas ou mais imagens fornecidas por uma ou mais câmeras. É uma técnica de grande importância na visão computacional, com aplicações em diversas áreas tais como assistência ao motorista e navegação de veículos autônomos, sistemas de realidade aumentada, veículos autônomos não-tripulados (VANTs) e até mesmo na exploração interplanetária. Os mé- todos mais comuns de Odometria Visual utilizam câmeras com visão estéreo, através das quais é possível calcular diretamente as informações de profundidade de detalhes de uma cena, o que permite estimar as posições sucessivas das câmeras. A Odometria Visual Monocular estima o deslocamento de um objeto com base nas imagens fornecidas por uma única câmera, o que oferece vantagens construtivas e operacionais embora exija processamento mais complexo. Os sistemas de Odometria Visual Monocular do tipo esparsos estimam a pose da câmera a partir de singularidades detectadas nas imagens, o que reduz significativamente o poder de processamento necessário, sendo assim ideal para aplica- ções de tempo real. Nessa óptica, este trabalho apresenta um novo sistema de Odometria Visual Monocular esparsa para tempo real, validado em veículo instrumentado. O novo sistema é baseado no conceito de Resection-Intersection, combinado com um novo teste de convergência, e um método de refinamento iterativo para minimizar os erros de reproje- ção. O sistema foi projetado para ser capaz de utilizar diferentes algoritmos de otimização não linear, tais como Gauss-Newton, Levenberg-Marquardt, Davidon-Fletcher-Powell ou Broyden–Fletcher–Goldfarb–Shannon. Utilizando o benchmark KITTI, o sistema proposto obteve um erro de translação em relação à distância média percorrida de 0, 86% e erro médio de rotação em relação à distância média percorrida de 0.0024◦/m. O sistema foi desenvolvido em Python em uma única thread, foi embarcado em uma placa Raspberry Pi 4B e obteve um tempo médio de processamento de 775ms por imagem para os onze primeiros cenários do benchmark. O desempenho obtido neste trabalho supera os resultados de outros sistemas de Odometria Visual Monocular baseados no conceito de ResectionIntersection até o momento submetidos na classificação do benchmark KITTI.Visual Odometry is the process of estimating the movement of an entity from two or more images provided by one or more cameras. It is a technique ofmain concern in computer vision, with applications in several areas such as driver assistance and autonomous vehicle navigation, augmented reality systems, Unmanned Aerial Vehicle (UAV) and even in interplanetary exploration. Most common methods of Visual Odometry use stereo cameras, through which it is possible to directly calculate the depth information of details of a scene, which allows to estimate the successive positions of the cameras. Monocular Visual Odometry estimates the displacement of an object based on images provided by a single camera, which offers constructive and operational advantages although it requires more complex processing. Sparse-type Monocular Visual Odometry systems estimate the camera pose from singularities detected in the images, which significantly reduces the processing power required, thus making it ideal for real-time applications. In this perspective, this work presents a new Sparse Monocular visual Odometry system for real-time, validated on a instrumented vehicle. The new system is based on the Resection-Intersection concept, combined with an expanded convergence test, and an iterative refinement method to minimize reprojection errors. It was designed to be able to use different non-linear optimization algorithms, such as Gauss-Newton, Levenberg-Marquardt, Davidon-FletcherPowell or Broyden–Fletcher–Goldfarb–Shannon. Using the benchmark KITTI, the proposed system obtained a translation error in relation to the average distance traveled of 0.86% and an average rotation error in relation to the average distance covered of 0.0024◦/m. The system was developed in Python on a single thread, was embedded on a Raspberry Pi 4B board and an average processing time of 775ms per image for the first eleven scenarios of the benchmark. The results obtained in this work surpass the results obtained by other visual odometry systems based on the concept of Resection-Intersection so far submitted to the KITTI benchmark ranking

    Une approche computationnelle de la complexité linguistique par le traitement automatique du langage naturel et l'oculométrie

    Full text link
    Le manque d'intégration des sciences cognitives et de la psychométrie est régulièrement déploré – et ignoré. En mesure et évaluation de la lecture, une manifestation de ce problème est l’évitement théorique concernant les sources de difficulté linguistiques et les processus cognitifs associés à la compréhension de texte. Pour faciliter le rapprochement souhaité entre sciences cognitives et psychométrie, nous proposons d’adopter une approche computationnelle. En considérant les procédures informatiques comme des représentations simplifiées et partielles de théories cognitivistes, une approche computationnelle facilite l’intégration d’éléments théoriques en psychométrie, ainsi que l’élaboration de théories en psychologie cognitive. La présente thèse étudie la contribution d’une approche computationnelle à la mesure de deux facettes de la complexité linguistique, abordées à travers des perspectives complémentaires. La complexité intrinsèque du texte est abordée du point de vue du traitement automatique du langage naturel, avec pour objectif d'identifier et de mesurer les attributs (caractéristiques mesurables) qui modélisent le mieux la difficulté du texte. L'article 1 présente ALSI (pour Analyseur Lexico-syntaxique intégré), un nouvel outil de traitement automatisé du langage naturel qui extrait une variété d'attributs linguistiques, principalement issus de la recherche en psycholinguistique et en linguistique computationnelle. Nous évaluons ensuite le potentiel des attributs pour estimer la difficulté du texte. L'article 2 emploie ALSI et des méthodes d’apprentissage statistique pour estimer la difficulté de textes scolaires québécois. Dans le second volet de la thèse, la complexité associée aux processus de lecture est abordée sous l'angle de l'oculométrie, qui permet de faire des inférences quant à la charge cognitive et aux stratégies d’allocation de l’attention visuelle en lecture. L'article 3 décrit une méthodologie d'analyse des enregistrements d’oculométrie mobile à l'aide de techniques de vision par ordinateur (une branche de l'intelligence artificielle); cette méthodologie est ensuite testée sur des données de simulation. L'article 4 déploie la même méthodologie dans le cadre d’une expérience pilote d’oculométrie comparant les processus de lecture de novices et d'experts répondant à un test de compréhension du texte argumentatif. Dans l’ensemble, nos travaux montrent qu’il est possible d’obtenir des résultats probants en combinant des apports théoriques à une approche computationnelle mobilisant des techniques d’apprentissage statistique. Les outils créés ou perfectionnés dans le cadre de cette thèse constituent une avancée significative dans le développement des technologies numériques en mesure et évaluation de la lecture, avec des retombées à anticiper en contexte scolaire comme en recherche.The lack of integration of cognitive science and psychometrics is commonly deplored - and ignored. In the assessment of reading, one manifestation of this problem is a theoretical avoidance regarding sources of text difficulty and cognitive processes underlying text comprehension. To facilitate the desired integration of cognitive science and psychometrics, we adopt a computational approach. By considering computational procedures as simplified and partial representations of cognitivist models, a computational approach facilitates the integration of theoretical elements in psychometrics, as well as the development of theories in cognitive psychology. This thesis studies the contribution of a computational perspective to the measurement of two facets of linguistic complexity, using complementary perspectives. Intrinsic text complexity is approached from the perspective of natural language processing, with the goal of identifying and measuring text features that best model text difficulty. Paper 1 introduces ISLA (Integrated Lexico-Syntactic Analyzer), a new natural language processing tool that extracts a variety of linguistic features from French text, primarily taken from research in psycholinguistics and computational linguistics. We then evaluate the features’ potential to estimate text difficulty. Paper 2 uses ISLA and statistical learning methods to estimate difficulty of texts used in primary and secondary education in Quebec. In the second part of the thesis, complexity associated with reading processes is addressed using eye-tracking, which allows inferences to be made about cognitive load and visual attention allocation strategies in reading. Paper 3 describes a methodology for analyzing mobile eye-tracking recordings using computer vision techniques (a branch of artificial intelligence); this methodology is then tested on simulated data. Paper 4 deploys the same methodology in the context of an eye-tracking pilot experiment comparing reading processes in novices and experts during an argumentative text comprehension test. Overall, our work demonstrates that it is possible to obtain convincing results by combining theoretical contributions with a computational approach using statistical learning techniques. The tools created or perfected in the context of this thesis constitute a significant advance in the development of digital technologies for the measurement and evaluation of reading, with easy-to-identify applications in both academic and research contexts

    Navegación endoscópica con ORB-SLAM para cirugía uretral mínimamente invasiva

    Get PDF
    En las operaciones de cirugía mínimamente invasiva un endoscopio se mueve por una cavidad u órgano del cuerpo. El objetivo del SLAM (Simultaneous Localization And Mapping) es estimar el mapa 3D de la cavidad u órgano que se está explorando y localizar el endoscopio con respecto al mapa. Para ello la única información empleada es el vídeo tomado por el endoscopio. ORB-SLAM es un sistema de SLAM desarrollado en la Universidad de Zaragoza que utiliza puntos FAST y descriptores ORB para estimar tanto el mapa 3D como la localización de la cámara. El principal problema de ORB-SLAM2 es que este sistema sólo funciona de forma robusta (con una precisión alta y sin fallos o desviaciones) en escenas que se pueden asumir rígidas, con textura, en las que es fácil extraer y emparejar puntos característicos visuales. Éste no es el caso de los órganos, pues son deformables, húmedos y es difícil encontrar visualmente puntos característicos fiables (esquinas) dentro de éstos, lo que puede llevar a una estimación incorrecta del mapa. A ello hay que añadir los cambios bruscos de iluminación y los movimientos bruscos del endoscopio. En este trabajo se han realizado diferentes modificaciones a ORB-SLAM2 para que funcione en secuencias grabadas por un endoscopio cómo la reducción del número de puntos necesarios para realizar el seguimiento, la creación de un mapa inicial más robusto y el pre-procesamiento de las imágenes para aumentar el contraste, reducir el ruido y evitar detectar puntos en los reflejos. Además, para conseguir puntos más robustos se han probado diferentes combinaciones de detector-descriptor creando un nuevo vocabulario para cada una de ellas. Se ha demostrado que con las modificaciones realizadas a ORB-SLAM2, el cambio de FAST-ORB a A-KAZE-ORB y el nuevo vocabulario se ha conseguido estimar un 80% de la trayectoria cuando con el sistema original no se llegaba a estimar un 40%. Esta mejora sólo ha supuesto un incremento en el tiempo de cómputo de 40 ms.Un sistema de SLAM puede ser muy útil a la hora de crear una interfaz quirúrgica ya que éste permite conectar un punto que se toca en la pantalla con el mapa 3D estimado permitiendo al cirujano insertar anotaciones de realidad aumentada en los lugares que se crean oportunos.<br /

    Endoscopic navigation using ORB-SLAM for urethral minimally-invasive surgery (ESP: Navegacion endoscopica con ORB-SLAM para cirugia uretral minimamente invasiva)

    Get PDF
    In minimally invasive surgery, an endoscope is moved through a body cavity or organ. The aim of SLAM (Simultaneous Localization And Mapping) is to estimate the 3D map of the cavity or organ being explored and to locate the endoscope with respect to the map. The only information used for this purpose is the video taken by the endoscope. ORB-SLAM is a SLAM system developed at the University of Zaragoza that uses FAST features and ORB descriptors to estimate both the 3D map and the camera pose. The main problem of ORBSLAM2 is that this system is robust (with high accuracy and without failures or deviations) when it works with scenes that can be assumed rigid, with texture, in which it is easy to extract and match visual features. This is not the case with organs, as they are deformable, wet and it is difficult to visually find reliable features (corners) within them, which can lead to an incorrect map estimation. Abrupt changes in illumination and sudden movements of the endoscope must also be taken into account. In this work, ORB-SLAM2 has been modified to work in sequences recorded by an endoscope. This modifications include reducing the number of points required to perform the monitoring, creating a more robust initial map and pre-processing the images to increase contrast, reduce noise and avoid detecting points in the reflections. In addition, to achieve more robust keypoints, different detectordescriptor combinations have been tested, creating a new vocabulary for each of them. It has been demonstrated that with the modifications made to ORB-SLAM2, the change from FAST-ORB to AKAZE-ORB and the new vocabulary, it has been possible to estimate 80 % of the trajectory when with the original system it was not possible to estimate 40 %. This improvement has only increased the computation time by 40 ms. A SLAM system can be very useful in surgical interfaces since it allows to connect a point in screen with the estimated 3D map, allowing the surgeon to insert augmented reality annotations in the appropriate places
    corecore