21 research outputs found

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Smart and Secure Augmented Reality for Assisted Living

    Get PDF
    Augmented reality (AR) is one of the biggest technology trends which enables people to see the real-life surrounding environment with a layer of virtual information overlaid on it. Assistive devices use this match of information to help people better understand the environment and consequently be more efficient. Specially, AR has been extremely useful in the area of Ambient Assisted Living (AAL). AR-based AAL solutions are designed to support people in maintaining their autonomy and compensate for slight physical and mental restrictions by instructing them on everyday tasks. The discovery of visual attention for assistive aims is a big challenge since in dynamic cluttered environments objects are constantly overlapped and partial object occlusion is also frequent. Current solutions use egocentric object recognition techniques. However, the lack of accuracy affects the system's ability to predict users’ needs and consequently provide them with the proper support. Another issue is the manner that sensitive data is treated. This highly private information is crucial for improving the quality of healthcare services. However, current blockchain approaches are used only as a permission management system, while the data is still stored locally. As a result, there is a potential risk of security breaches. Privacy risk in the blockchain domain is also a concern. As major investigation tackles privacy issues based on off-chain approaches, there is a lack of effective solutions for providing on-chain data privacy. Finally, the Blockchain size has been shown to be a limiting factor even for chains that store simple transactional data, much less the massive blocks that would be required for storing medical imaging studies. To tackle the aforementioned major issues, this research proposes a framework to provide a smarter and more secure AR-based solution for AAL. Firstly, a combination of head-worn eye-trackers cameras with egocentric video is designed to improve the accuracy of visual attention object recognition in free-living settings. A heuristic function is designed to generate a probability estimation of visual attention over objects within an egocentric video. Secondly, a novel methodology for the storage of large sensitive AR-based AAL data is introduced in a decentralized fashion. By leveraging the power of the IPFS (InterPlanetary File System) protocol to tackle the lack of storage issue in the Blockchain. Meanwhile, a blockchain solution on the Secret Network blockchain is developed to tackle the existent lack of privacy on smart contracts, which provides data privacy at both transactional and computational levels. In addition, is included a new off-chain solution encapsulates a governing body for permission management purposes to solve the problem of the lost or eventual theft of private keys. Based on the research findings, that visual attention-object detection approach is applicable to cluttered environments which presents a transcend performance compared to the current methods. This study also produced an egocentric indoor dataset annotated with human fixation during natural exploration in a cluttered environment. Comparing to previous works, this dataset is more realistic because it was recorded in real settings with variations in terms of objects overlapping regions and object sizes. With respect to the novel decentralized storage methodology, results indicate that sensitive data can be stored and queried efficiently using the Secret Network blockchain. The proposed approach achieves both computational and transactional privacy with significantly less cost. Additionally, this approach mitigates the risk of permanent loss of access to the patient on-chain data records. The proposed framework can be applied as an assistive technology in a wide range of sectors that requires AR-based solution with high-precision visual-attention object detection, efficient data access, high-integrity data storage and full data privacy and security

    Street Scenes : towards scene understanding in still images

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 171-182).This thesis describes an effort to construct a scene understanding system that is able to analyze the content of real images. While constructing the system we had to provide solutions to many of the fundamental questions that every student of object recognition deals with daily. These include the choice of data set, the choice of success measurement, the representation of the image content, the selection of inference engine, and the representation of the relations between objects. The main test-bed for our system is the CBCL StreetScenes data base. It is a carefully labeled set of images, much larger than any similar data set available at the time it was collected. Each image in this data set was labeled for 9 common classes such as cars, pedestrians, roads and trees. Our system represents each image using a set of features that are based on a model of the human visual system constructed in our lab. We demonstrate that this biologically motivated image representation, along with its extensions, constitutes an effective representation for object detection, facilitating unprecedented levels of detection accuracy. Similarly to biological vision systems, our system uses hierarchical representations.(cont.) We therefore explore the possible ways of combining information across the hierarchy into the final perception. Our system is trained using standard machine learning machinery, which was first applied to computer vision in earlier work of Prof. Poggio and others. We demonstrate how the same standard methods can be used to model relations between objects in images as well, capturing context information. The resulting system detects and localizes, using a unified set of tools and image representations, compact objects such as cars, amorphous objects such as trees and roads, and the relations between objects within the scene. The same representation also excels in identifying objects in clutter without scanning the image. Much of the work presented in the thesis was devoted to a rigorous comparison of our system to alternative object recognition systems. The results of these experiments support the effectiveness of simple feed-forward systems for the basic tasks involved in scene understanding. We make our results fully available to the public by publishing our code and data sets in hope that others may improve and extend our results.by Stanley Michael Bileschi.Ph.D

    Human face detection techniques: A comprehensive review and future research directions

    Get PDF
    Face detection which is an effortless task for humans are complex to perform on machines. Recent veer proliferation of computational resources are paving the way for a frantic advancement of face detection technology. Many astutely developed algorithms have been proposed to detect faces. However, there is a little heed paid in making a comprehensive survey of the available algorithms. This paper aims at providing fourfold discussions on face detection algorithms. At first, we explore a wide variety of available face detection algorithms in five steps including history, working procedure, advantages, limitations, and use in other fields alongside face detection. Secondly, we include a comparative evaluation among different algorithms in each single method. Thirdly, we provide detailed comparisons among the algorithms epitomized to have an all inclusive outlook. Lastly, we conclude this study with several promising research directions to pursue. Earlier survey papers on face detection algorithms are limited to just technical details and popularly used algorithms. In our study, however, we cover detailed technical explanations of face detection algorithms and various recent sub-branches of neural network. We present detailed comparisons among the algorithms in all-inclusive and also under sub-branches. We provide strengths and limitations of these algorithms and a novel literature survey including their use besides face detection

    Computer vision-based structural assessment exploiting large volumes of images

    Get PDF
    Visual assessment is a process to understand the state of a structure based on evaluations originating from visual information. Recent advances in computer vision to explore new sensors, sensing platforms and high-performance computing have shed light on the potential for vision-based visual assessment in civil engineering structures. The use of low-cost, high-resolution visual sensors in conjunction with mobile and aerial platforms can overcome spatial and temporal limitations typically associated with other forms of sensing in civil structures. Also, GPU-accelerated and parallel computing offer unprecedented speed and performance, accelerating processing the collected visual data. However, despite the enormous endeavor in past research to implement such technologies, there are still many practical challenges to overcome to successfully apply these techniques in real world situations. A major challenge lies in dealing with a large volume of unordered and complex visual data, collected under uncontrolled circumstance (e.g. lighting, cluttered region, and variations in environmental conditions), while just a tiny fraction of them are useful for conducting actual assessment. Such difficulty induces an undesirable high rate of false-positive and false-negative errors, reducing the trustworthiness and efficiency of their implementation. To overcome the inherent challenges in using such images for visual assessment, high-level computer vision algorithms must be integrated with relevant prior knowledge and guidance, thus aiming to have similar performance with those of humans conducting visual assessment. Moreover, the techniques must be developed and validated in the realistic context of a large volume of real-world images, which is likely contain numerous practical challenges. In this dissertation, the novel use of computer vision algorithms is explored to address two promising applications of vision-based visual assessment in civil engineering: visual inspection, and visual data analysis for post-disaster evaluation. For both applications, powerful techniques are developed here to enable reliable and efficient visual assessment for civil structures and demonstrate them using a large volume of real-world images collected from actual structures. State-of-art computer vision techniques, such as structure-from-motion and convolutional neural network techniques, facilitate these tasks. The core techniques derived from this study are scalable and expandable to many other applications in vision-based visual assessment, and will serve to close the existing gaps between past research efforts and real-world implementations

    Graph matching using position coordinates and local features for image analysis

    Get PDF
    Encontrar las correspondencias entre dos imágenes es un problema crucial en el campo de la visión por ordenador i el reconocimiento de patrones. Es relevante para un amplio rango de propósitos des de aplicaciones de reconocimiento de objetos en las áreas de biometría, análisis de documentos i análisis de formas hasta aplicaciones relacionadas con la geometría desde múltiples puntos de vista tales cómo la recuperación de la pose, estructura desde el movimiento y localización y mapeo. La mayoría de las técnicas existentes enfocan este problema o bien usando características locales en la imagen o bien usando métodos de registro de conjuntos de puntos (o bien una mezcla de ambos). En las primeras, un conjunto disperso de características es primeramente extraído de las imágenes y luego caracterizado en la forma de vectores descriptores usando evidencias locales de la imagen. Las características son asociadas según la similitud entre sus descriptores. En las segundas, los conjuntos de características son considerados cómo conjuntos de puntos los cuales son asociados usando técnicas de optimización no lineal. Estos son procedimientos iterativos que estiman los parámetros de correspondencia y de alineamiento en pasos alternados. Los grafos son representaciones que contemplan relaciones binarias entre las características. Tener en cuenta relaciones binarias al problema de la correspondencia a menudo lleva al llamado problema del emparejamiento de grafos. Existe cierta cantidad de métodos en la literatura destinados a encontrar soluciones aproximadas a diferentes instancias del problema de emparejamiento de grafos, que en la mayoría de casos es del tipo "NP-hard". El cuerpo de trabajo principal de esta tesis está dedicado a formular ambos problemas de asociación de características de imagen y registro de conjunto de puntos como instancias del problema de emparejamiento de grafos. En todos los casos proponemos algoritmos aproximados para solucionar estos problemas y nos comparamos con un número de métodos existentes pertenecientes a diferentes áreas como eliminadores de "outliers", métodos de registro de conjuntos de puntos y otros métodos de emparejamiento de grafos. Los experimentos muestran que en la mayoría de casos los métodos propuestos superan al resto. En ocasiones los métodos propuestos o bien comparten el mejor rendimiento con algún método competidor o bien obtienen resultados ligeramente peores. En estos casos, los métodos propuestos normalmente presentan tiempos computacionales inferiores.Trobar les correspondències entre dues imatges és un problema crucial en el camp de la visió per ordinador i el reconeixement de patrons. És rellevant per un ampli ventall de propòsits des d’aplicacions de reconeixement d’objectes en les àrees de biometria, anàlisi de documents i anàlisi de formes fins aplicacions relacionades amb geometria des de múltiples punts de vista tals com recuperació de pose, estructura des del moviment i localització i mapeig. La majoria de les tècniques existents enfoquen aquest problema o bé usant característiques locals a la imatge o bé usant mètodes de registre de conjunts de punts (o bé una mescla d’ambdós). En les primeres, un conjunt dispers de característiques és primerament extret de les imatges i després caracteritzat en la forma de vectors descriptors usant evidències locals de la imatge. Les característiques son associades segons la similitud entre els seus descriptors. En les segones, els conjunts de característiques son considerats com conjunts de punts els quals son associats usant tècniques d’optimització no lineal. Aquests son procediments iteratius que estimen els paràmetres de correspondència i d’alineament en passos alternats. Els grafs son representacions que contemplen relacions binaries entre les característiques. Tenir en compte relacions binàries al problema de la correspondència sovint porta a l’anomenat problema de l’emparellament de grafs. Existeix certa quantitat de mètodes a la literatura destinats a trobar solucions aproximades a diferents instàncies del problema d’emparellament de grafs, el qual en la majoria de casos és del tipus “NP-hard”. Una part del nostre treball està dedicat a investigar els beneficis de les mesures de ``bins'' creuats per a la comparació de característiques locals de les imatges. La resta està dedicat a formular ambdós problemes d’associació de característiques d’imatge i registre de conjunt de punts com a instàncies del problema d’emparellament de grafs. En tots els casos proposem algoritmes aproximats per solucionar aquests problemes i ens comparem amb un nombre de mètodes existents pertanyents a diferents àrees com eliminadors d’“outliers”, mètodes de registre de conjunts de punts i altres mètodes d’emparellament de grafs. Els experiments mostren que en la majoria de casos els mètodes proposats superen a la resta. En ocasions els mètodes proposats o bé comparteixen el millor rendiment amb algun mètode competidor o bé obtenen resultats lleugerament pitjors. En aquests casos, els mètodes proposats normalment presenten temps computacionals inferiors

    Image and Video Forensics

    Get PDF
    Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity
    corecore