155 research outputs found

    Single-Image Depth Prediction Makes Feature Matching Easier

    Get PDF
    Good local features improve the robustness of many 3D re-localization and multi-view reconstruction pipelines. The problem is that viewing angle and distance severely impact the recognizability of a local feature. Attempts to improve appearance invariance by choosing better local feature points or by leveraging outside information, have come with pre-requisites that made some of them impractical. In this paper, we propose a surprisingly effective enhancement to local feature extraction, which improves matching. We show that CNN-based depths inferred from single RGB images are quite helpful, despite their flaws. They allow us to pre-warp images and rectify perspective distortions, to significantly enhance SIFT and BRISK features, enabling more good matches, even when cameras are looking at the same scene but in opposite directions.Comment: 14 pages, 7 figures, accepted for publication at the European conference on computer vision (ECCV) 202

    Deep Learning-Based Human Pose Estimation: A Survey

    Full text link
    Human pose estimation aims to locate the human body parts and build human body representation (e.g., body skeleton) from input data such as images and videos. It has drawn increasing attention during the past decade and has been utilized in a wide range of applications including human-computer interaction, motion analysis, augmented reality, and virtual reality. Although the recently developed deep learning-based solutions have achieved high performance in human pose estimation, there still remain challenges due to insufficient training data, depth ambiguities, and occlusion. The goal of this survey paper is to provide a comprehensive review of recent deep learning-based solutions for both 2D and 3D pose estimation via a systematic analysis and comparison of these solutions based on their input data and inference procedures. More than 240 research papers since 2014 are covered in this survey. Furthermore, 2D and 3D human pose estimation datasets and evaluation metrics are included. Quantitative performance comparisons of the reviewed methods on popular datasets are summarized and discussed. Finally, the challenges involved, applications, and future research directions are concluded. We also provide a regularly updated project page: \url{https://github.com/zczcwh/DL-HPE

    Automatic driving: 2D detection and tracking using artificial intelligence techniques

    Get PDF
    DissertaĆ§Ć£o de mestrado integrado em Informatics EngineeringRoad accidents are estimated to be the cause of millions of deaths and tens of millions of injuries every year. For this reason, any measure that reduces accidents' probability or severity will save lives. Speeding, driving under the influence of psychotropic substances and distraction are leading causes of road accidents. Causes that can be classified as human since they all come from driver errors. Autonomous driving is a potential solution to this problem as it can reduce road accidents by removing human error from the task of driving. This dissertation aims to study Artificial Intelligence techniques and Edge Computing networks to explore solutions for autonomous driving. To this end, Artificial Intelligence models for detecting and tracking objects based on Machine Learning and Computer Vision, and Edge Computing networks for vehicles were explored. The YOLOv5 model was studied for object detection, in which different training parameters and data pre-processing techniques were applied. For object tracking, the StrongSORT model was chosen, for which its performance was evaluated for different combinations of its components. Finally, the Simu5G simulation tool was studied in order to simulate an edge computing network, and the viability of this type of network to aid autonomous driving was analysed.Ɖ estimado que os acidentes rodoviĆ”rios sejam a causa de milhƵes de mortes e dezenas de milhƵes de lesƵes todos os anos. Por esta razĆ£o, qualquer medida que diminua a probabilidade de acidentes ou que diminua a sua gravidade acabarĆ” por salvar vidas. Excesso de velocidade, conduĆ§Ć£o sob influĆŖncia de substĆ¢ncias psicotrĆ³picas e distraĆ§Ć£o no ato da conduĆ§Ć£o sĆ£o algumas das principais causas de acidentes rodoviĆ”rios. Causas essas que podem ser classificadas como humanas visto que sĆ£o oriundas de um erro do condutor. A conduĆ§Ć£o autĆ³noma surge como soluĆ§Ć£o para este problema. Esta tem o potencial de diminuir acidentes rodoviĆ”rios removendo o erro humano da tarefa da conduĆ§Ć£o. Esta dissertaĆ§Ć£o teve como objetivo o estudo de tĆ©cnicas InteligĆŖncia Artificial e redes ComputaĆ§Ć£o de Borda de forma a explorar soluƧƵes para a conduĆ§Ć£o autĆ³noma. Para tal foram estuados modelos InteligĆŖncia Artificial de deteĆ§Ć£o e rastreamento de objetos com base nas Ć”reas de Aprendizagem MĆ”quina e VisĆ£o por Computador e redes de ComputaĆ§Ć£o de Borda para veĆ­culos. Para a deteĆ§Ć£o de objetos foi estudado o modelo YOLOv5, no qual diferentes combinaƧƵes de parĆ¢metros de treino e tĆ©cnicas de prĆ©-processamento de dados foram aplicadas. Para o rastreamento de objetos foi escolhido o modelo StrongSORT, para o qual foi avaliada a sua performance para diferentes combinaƧƵes das suas componentes. Por fim, foi estudada a ferramenta de simulaĆ§Ć£o Simu5G, de forma a simular uma rede de computaĆ§Ć£o de borda, e foi feita uma anĆ”lise sobre a viabilidade deste tipo de redes no auxĆ­lio Ć  conduĆ§Ć£o autĆ³noma

    A Cordial Sync: Going Beyond Marginal Policies for Multi-Agent Embodied Tasks

    Full text link
    Autonomous agents must learn to collaborate. It is not scalable to develop a new centralized agent every time a task's difficulty outpaces a single agent's abilities. While multi-agent collaboration research has flourished in gridworld-like environments, relatively little work has considered visually rich domains. Addressing this, we introduce the novel task FurnMove in which agents work together to move a piece of furniture through a living room to a goal. Unlike existing tasks, FurnMove requires agents to coordinate at every timestep. We identify two challenges when training agents to complete FurnMove: existing decentralized action sampling procedures do not permit expressive joint action policies and, in tasks requiring close coordination, the number of failed actions dominates successful actions. To confront these challenges we introduce SYNC-policies (synchronize your actions coherently) and CORDIAL (coordination loss). Using SYNC-policies and CORDIAL, our agents achieve a 58% completion rate on FurnMove, an impressive absolute gain of 25 percentage points over competitive decentralized baselines. Our dataset, code, and pretrained models are available at https://unnat.github.io/cordial-sync .Comment: Accepted to ECCV 2020 (spotlight); Project page: https://unnat.github.io/cordial-syn

    Jeter un regard sur une phase prƩcoce des traitements visuels

    Get PDF
    L'objectif de cette thĆØse a Ć©tĆ© d'Ć©tudier la dynamique des traitements cognitifs permettant la reconnaissance rapide d'objets dans les scĆØnes naturelles. Afin d'obtenir des rĆ©ponses comportementales prĆ©coces, nous avons utilisĆ© un protocole de choix saccadique, dans lequel les sujets devaient diriger leur regard le plus rapidement possible vers l'image contenant l'objet cible parmi deux images affichĆ©es Ć  l'Ć©cran. Ce protocole a d'abord permis de mettre en Ć©vidence des diffĆ©rences de temps de traitement entre les catĆ©gories d'objets, avec un avantage particulier pour la dĆ©tection des visages humains. En effet, lorsque ceux-ci sont utilisĆ©s comme cible, les premiĆØres saccades sĆ©lectives apparaissent dĆØs 100 ms ! Nous nous sommes donc intĆ©ressĆ©s aux mĆ©canismes permettant une dĆ©tection aussi rapide et avons montrĆ© qu'un attribut bas-niveau pourrait ĆŖtre utilisĆ© pour dĆ©tecter et localiser les visages dans notre champ visuel en une fraction de seconde. Afin de mieux comprendre la nature des reprĆ©sentations prĆ©coces mises en jeu, nous avons menĆ© deux nouvelles Ć©tudes qui nous ont permis de montrer que les saccades les plus rapides ne seraient pas influencĆ©es par les informations contextuelles, et seraient basĆ©es sur une information rudimentaire. Enfin, j'ai proposĆ© un modĆØle simple de dĆ©cision, basĆ© sur des diffĆ©rences de temps de traitement neuronal entre catĆ©gories, qui permet de reproduire fidĆØlement nos rĆ©sultats expĆ©rimentaux. L'ensemble de ces rĆ©sultats, mis en perspective avec les connaissances actuelles sur les bases neuronales de la reconnaissance d'objet, dĆ©montre que le protocole de choix saccadique, en donnant accĆØs Ć  une fenĆŖtre temporelle inaccessible jusqu'alors par les Ć©tudes comportementales, s'avĆØre un outil de choix pour les recherches Ć  venir sur la reconnaissance rapide d'objets.The aim of this thesis is to investigate the dynamics of the cognitive processing involved in rapid object recognition in natural scenes. In order to get the fastest behavioral responses, we used a saccadic choice task in which subjects had to initiate saccades as fast as possible toward the image containing the target among two images displayed at the same time on the screen. This protocol first revealed differences in processing times between categories, with an advantage for the detection of human faces. Indeed, when human faces were used as the target, the first selective saccades appeared as early as 100 ms after the apparition of the images! We were thus interested in the mechanisms allowing such fast detection and showed that a low-level attribute might be used to detect and locate faces in the visual field. In order to understand the nature of the early representation used, we designed two other studies which showed that the fastest saccades were not influenced by contextual information, and were based on relatively coarse information. Finally, I present a simple decision model, based on a latency difference between neuronal population, which accounts for our experimental results. These results, taken in the perspective of what is known about the neural basis of object recognition, showed that the saccadic choice task, allowing access to an early temporal window, will be a very useful tool of interest for future studies on rapid object recognition

    Visual analytics and artificial intelligence for marketing

    Get PDF
    In todayā€™s online environments, such as social media platforms and e-commerce websites, consumers are overloaded with information and firms are competing for their attention. Most of the data on these platforms comes in the form of text, images, or other unstructured data sources. It is important to understand which information on company websites and social media platforms are enticing and/or likeable by consumers. The impact of online visual content, in particular, remains largely unknown. Finding the drivers behind likes and clicks can help (1) understand how consumers interact with the information that is presented to them and (2) leverage this knowledge to improve marketing content. The main goal of this dissertation is to learn more about why consumers like and click on visual content online. To reach this goal visual analytics are used for automatic extraction of relevant information from visual content. This information can then be related, at scale, to consumer and their decisions
    • ā€¦
    corecore