14 research outputs found

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    Multimedia Forensics

    Get PDF
    This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field

    AI-Generated Images as Data Source: The Dawn of Synthetic Era

    Full text link
    The advancement of visual intelligence is intrinsically tethered to the availability of large-scale data. In parallel, generative Artificial Intelligence (AI) has unlocked the potential to create synthetic images that closely resemble real-world photographs. This prompts a compelling inquiry: how much visual intelligence could benefit from the advance of generative AI? This paper explores the innovative concept of harnessing these AI-generated images as new data sources, reshaping traditional modeling paradigms in visual intelligence. In contrast to real data, AI-generated data exhibit remarkable advantages, including unmatched abundance and scalability, the rapid generation of vast datasets, and the effortless simulation of edge cases. Built on the success of generative AI models, we examine the potential of their generated data in a range of applications, from training machine learning models to simulating scenarios for computational modeling, testing, and validation. We probe the technological foundations that support this groundbreaking use of generative AI, engaging in an in-depth discussion on the ethical, legal, and practical considerations that accompany this transformative paradigm shift. Through an exhaustive survey of current technologies and applications, this paper presents a comprehensive view of the synthetic era in visual intelligence. A project associated with this paper can be found at https://github.com/mwxely/AIGS .Comment: 20 pages, 11 figure

    Multimedia Forensics

    Get PDF
    This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field

    Estimating and understanding motion : from diagnostic to robotic surgery

    Get PDF
    Estimating and understanding motion from an image sequence is a central topic in computer vision. The high interest in this topic is because we are living in a world where many events that occur in the environment are dynamic. This makes motion estimation and understanding a natural component and a key factor in a widespread of applications including object recognition , 3D shape reconstruction, autonomous navigation and medica! diagnosis. Particularly, we focus on the medical domain in which understanding the human body for clinical purposes requires retrieving the organs' complex motion patterns, which is in general a hard problem when using only image data. In this thesis, we cope with this problem by posing the question - How to achieve a realistic motion estimation to offer a better clinical understanding? We focus this thesis on answering this question by using a variational formulation as a basis to understand one of the most complex motions in the human's body, the heart motion, through three different applications: (i) cardiac motion estimation for diagnostic, (ii) force estimation and (iii) motion prediction, both for robotic surgery. Firstly, we focus on a central topic in cardiac imaging that is the estimation of the cardiac motion. The main aim is to offer objective and understandable measures to physicians for helping them in the diagnostic of cardiovascular diseases. We employ ultrafast ultrasound data and tools for imaging motion drawn from diverse areas such as low-rank analysis and variational deformation to perform a realistic cardiac motion estimation. The significance is that by taking low-rank data with carefully chosen penalization, synergies in this complex variational problem can be created. We demonstrate how our proposed solution deals with complex deformations through careful numerical experiments using realistic and simulated data. We then move from diagnostic to robotic surgeries where surgeons perform delicate procedures remotely through robotic manipulators without directly interacting with the patients. As a result, they lack force feedback, which is an important primary sense for increasing surgeon-patient transparency and avoiding injuries and high mental workload. To solve this problem, we follow the conservation principies of continuum mechanics in which it is clear that the change in shape of an elastic object is directly proportional to the force applied. Thus, we create a variational framework to acquire the deformation that the tissues undergo due to an applied force. Then, this information is used in a learning system to find the nonlinear relationship between the given data and the applied force. We carried out experiments with in-vivo and ex-vivo data and combined statistical, graphical and perceptual analyses to demonstrate the strength of our solution. Finally, we explore robotic cardiac surgery, which allows carrying out complex procedures including Off-Pump Coronary Artery Bypass Grafting (OPCABG). This procedure avoids the associated complications of using Cardiopulmonary Bypass (CPB) since the heart is not arrested while performing the surgery on a beating heart. Thus, surgeons have to deal with a dynamic target that compromisetheir dexterity and the surgery's precision. To compensate the heart motion, we propase a solution composed of three elements: an energy function to estimate the 3D heart motion, a specular highlight detection strategy and a prediction approach for increasing the robustness of the solution. We conduct evaluation of our solution using phantom and realistic datasets. We conclude the thesis by reporting our findings on these three applications and highlight the dependency between motion estimation and motion understanding at any dynamic event, particularly in clinical scenarios.L’estimació i comprensió del moviment dins d’una seqüència d’imatges és un tema central en la visió per ordinador, el que genera un gran interès perquè vivim en un entorn ple d’esdeveniments dinàmics. Per aquest motiu és considerat com un component natural i factor clau dins d’un ampli ventall d’aplicacions, el qual inclou el reconeixement d’objectes, la reconstrucció de formes tridimensionals, la navegació autònoma i el diagnòstic de malalties. En particular, ens situem en l’àmbit mèdic en el qual la comprensió del cos humà, amb finalitats clíniques, requereix l’obtenció de patrons complexos de moviment dels òrgans. Aquesta és, en general, una tasca difícil quan s’utilitzen només dades de tipus visual. En aquesta tesi afrontem el problema plantejant-nos la pregunta - Com es pot aconseguir una estimació realista del moviment amb l’objectiu d’oferir una millor comprensió clínica? La tesi se centra en la resposta mitjançant l’ús d’una formulació variacional com a base per entendre un dels moviments més complexos del cos humà, el del cor, a través de tres aplicacions: (i) estimació del moviment cardíac per al diagnòstic, (ii) estimació de forces i (iii) predicció del moviment, orientant-se les dues últimes en cirurgia robòtica. En primer lloc, ens centrem en un tema principal en la imatge cardíaca, que és l’estimació del moviment cardíac. L’objectiu principal és oferir als metges mesures objectives i comprensibles per ajudar-los en el diagnòstic de les malalties cardiovasculars. Fem servir dades d’ultrasons ultraràpids i eines per al moviment d’imatges procedents de diverses àrees, com ara l’anàlisi de baix rang i la deformació variacional, per fer una estimació realista del moviment cardíac. La importància rau en que, en prendre les dades de baix rang amb una penalització acurada, es poden crear sinergies en aquest problema variacional complex. Mitjançant acurats experiments numèrics, amb dades realístiques i simulades, hem demostrat com les nostres propostes solucionen deformacions complexes. Després passem del diagnòstic a la cirurgia robòtica, on els cirurgians realitzen procediments delicats remotament, a través de manipuladors robòtics, sense interactuar directament amb els pacients. Com a conseqüència, no tenen la percepció de la força com a resposta, que és un sentit primari important per augmentar la transparència entre el cirurgià i el pacient, per evitar lesions i per reduir la càrrega de treball mental. Resolem aquest problema seguint els principis de conservació de la mecànica del medi continu, en els quals està clar que el canvi en la forma d’un objecte elàstic és directament proporcional a la força aplicada. Per això hem creat un marc variacional que adquireix la deformació que pateixen els teixits per l’aplicació d’una força. Aquesta informació s’utilitza en un sistema d’aprenentatge, per trobar la relació no lineal entre les dades donades i la força aplicada. Hem dut a terme experiments amb dades in-vivo i ex-vivo i hem combinat l’anàlisi estadístic, gràfic i de percepció que demostren la robustesa de la nostra solució. Finalment, explorem la cirurgia cardíaca robòtica, la qual cosa permet realitzar procediments complexos, incloent la cirurgia coronària sense bomba (off-pump coronary artery bypass grafting o OPCAB). Aquest procediment evita les complicacions associades a l’ús de circulació extracorpòria (Cardiopulmonary Bypass o CPB), ja que el cor no s’atura mentre es realitza la cirurgia. Això comporta que els cirurgians han de tractar amb un objectiu dinàmic que compromet la seva destresa i la precisió de la cirurgia. Per compensar el moviment del cor, proposem una solució composta de tres elements: un funcional d’energia per estimar el moviment tridimensional del cor, una estratègia de detecció de les reflexions especulars i una aproximació basada en mètodes de predicció, per tal d’augmentar la robustesa de la solució. L’avaluació de la nostra solució s’ha dut a terme mitjançant conjunts de dades sintètiques i realistes. La tesi conclou informant dels nostres resultats en aquestes tres aplicacions i posant de relleu la dependència entre l’estimació i la comprensió del moviment en qualsevol esdeveniment dinàmic, especialment en escenaris clínics.Postprint (published version

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    Image and Video Forensics

    Get PDF
    Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity

    Deep Learning in Medical Image Analysis

    Get PDF
    The accelerating power of deep learning in diagnosing diseases will empower physicians and speed up decision making in clinical environments. Applications of modern medical instruments and digitalization of medical care have generated enormous amounts of medical images in recent years. In this big data arena, new deep learning methods and computational models for efficient data processing, analysis, and modeling of the generated data are crucially important for clinical applications and understanding the underlying biological process. This book presents and highlights novel algorithms, architectures, techniques, and applications of deep learning for medical image analysis
    corecore