779 research outputs found

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table

    Identification, indexing, and retrieval of cardio-pulmonary resuscitation (CPR) video scenes of simulated medical crisis.

    Get PDF
    Medical simulations, where uncommon clinical situations can be replicated, have proved to provide a more comprehensive training. Simulations involve the use of patient simulators, which are lifelike mannequins. After each session, the physician must manually review and annotate the recordings and then debrief the trainees. This process can be tedious and retrieval of specific video segments should be automated. In this dissertation, we propose a machine learning based approach to detect and classify scenes that involve rhythmic activities such as Cardio-Pulmonary Resuscitation (CPR) from training video sessions simulating medical crises. This applications requires different preprocessing techniques from other video applications. In particular, most processing steps require the integration of multiple features such as motion, color and spatial and temporal constrains. The first step of our approach consists of segmenting the video into shots. This is achieved by extracting color and motion information from each frame and identifying locations where consecutive frames have different features. We propose two different methods to identify shot boundaries. The first one is based on simple thresholding while the second one uses unsupervised learning techniques. The second step of our approach consists of selecting one key frame from each shot and segmenting it into homogeneous regions. Then few regions of interest are identified for further processing. These regions are selected based on the type of motion of their pixels and their likelihood to be skin-like regions. The regions of interest are tracked and a sequence of observations that encode their motion throughout the shot is extracted. The next step of our approach uses an HMM classiffier to discriminate between regions that involve CPR actions and other regions. We experiment with both continuous and discrete HMM. Finally, to improve the accuracy of our system, we also detect faces in each key frame, track them throughout the shot, and fuse their HMM confidence with the region\u27s confidence. To allow the user to view and analyze the video training session much more efficiently, we have also developed a graphical user interface (GUI) for CPR video scene retrieval and analysis with several desirable features. To validate our proposed approach to detect CPR scenes, we use one video simulation session recorded by the SPARC group to train the HMM classifiers and learn the system\u27s parameters. Then, we analyze the proposed system on other video recordings. We show that our approach can identify most CPR scenes with few false alarms

    Color and Texture Feature Extraction Using Gabor Filter - Local Binary Patterns for Image Segmentation with Fuzzy C-Means

    Full text link
    Image segmentation to be basic for image analysis and recognition process. Segmentation divides the image into several regions based on the unique homogeneous image pixel. Image segmentation classify homogeneous pixels basedon several features such as color, texture and others. Color contains a lot of information and human vision can see thousands of color combinations and intensity compared with grayscale or with black and white (binary). The method is easy to implement to segementation is clustering method such as the Fuzzy C-Means (FCM) algorithm. Features to beextracted image is color and texture, to use the color vector L* a* b* color space and to texture using Gabor filters. However, Gabor filters have poor performance when the image is segmented many micro texture, thus affecting the accuracy of image segmentation. As support in improving the accuracy of the extracted micro texture used method of Local Binary Patterns (LBP). Experimental use of color features compared with grayscales increased 16.54% accuracy rate for texture Gabor filters and 14.57% for filter LBP. While the LBP texture features can help improve the accuracy of image segmentation, although small at 2% on a grayscales and 0.05% on the color space L* a* b*

    Measuring concept similarities in multimedia ontologies: analysis and evaluations

    Get PDF
    The recent development of large-scale multimedia concept ontologies has provided a new momentum for research in the semantic analysis of multimedia repositories. Different methods for generic concept detection have been extensively studied, but the question of how to exploit the structure of a multimedia ontology and existing inter-concept relations has not received similar attention. In this paper, we present a clustering-based method for modeling semantic concepts on low-level feature spaces and study the evaluation of the quality of such models with entropy-based methods. We cover a variety of methods for assessing the similarity of different concepts in a multimedia ontology. We study three ontologies and apply the proposed techniques in experiments involving the visual and semantic similarities, manual annotation of video, and concept detection. The results show that modeling inter-concept relations can provide a promising resource for many different application areas in semantic multimedia processing

    Goal Detection in Soccer Video: Role-Based Events Detection Approach

    Get PDF
    Soccer video processing and analysis to find critical events such as occurrences of goal event have been one of the important issues and topics of active researches in recent years. In this paper, a new role-based framework is proposed for goal event detection in which the semantic structure of soccer game is used. Usually after a goal scene, the audiences’ and reporters’ sound intensity is increased, ball is sent back to the center and the camera may: zoom on Player, show audiences’ delighting, repeat the goal scene or display a combination of them. Thus, the occurrence of goal event will be detectable by analysis of sequences of above roles. The proposed framework in this paper consists of four main procedures: 1- detection of game’s critical events by using audio channel, 2- detection of shot boundary and shots classification, 3- selection of candidate events according to the type of shot and existence of goalmouth in the shot, 4- detection of restarting the game from the center of the field. A new method for shot classification is also presented in this framework. Finally, by applying the proposed method it was shown that the goal events detection has a good accuracy and the percentage of detection failure is also very low.DOI:http://dx.doi.org/10.11591/ijece.v4i6.637

    Bridging semantic gap: learning and integrating semantics for content-based retrieval

    Full text link
    Digital cameras have entered ordinary homes and produced^incredibly large number of photos. As a typical example of broad image domain, unconstrained consumer photos vary significantly. Unlike professional or domain-specific images, the objects in the photos are ill-posed, occluded, and cluttered with poor lighting, focus, and exposure. Content-based image retrieval research has yet to bridge the semantic gap between computable low-level information and high-level user interpretation. In this thesis, we address the issue of semantic gap with a structured learning framework to allow modular extraction of visual semantics. Semantic image regions (e.g. face, building, sky etc) are learned statistically, detected directly from image without segmentation, reconciled across multiple scales, and aggregated spatially to form compact semantic index. To circumvent the ambiguity and subjectivity in a query, a new query method that allows spatial arrangement of visual semantics is proposed. A query is represented as a disjunctive normal form of visual query terms and processed using fuzzy set operators. A drawback of supervised learning is the manual labeling of regions as training samples. In this thesis, a new learning framework to discover local semantic patterns and to generate their samples for training with minimal human intervention has been developed. The discovered patterns can be visualized and used in semantic indexing. In addition, three new class-based indexing schemes are explored. The winnertake- all scheme supports class-based image retrieval. The class relative scheme and the local classification scheme compute inter-class memberships and local class patterns as indexes for similarity matching respectively. A Bayesian formulation is proposed to unify local and global indexes in image comparison and ranking that resulted in superior image retrieval performance over those of single indexes. Query-by-example experiments on 2400 consumer photos with 16 semantic queries show that the proposed approaches have significantly better (18% to 55%) average precisions than a high-dimension feature fusion approach. The thesis has paved two promising research directions, namely the semantics design approach and the semantics discovery approach. They form elegant dual frameworks that exploits pattern classifiers in learning and integrating local and global image semantics

    Agente de visão semântica para robótica

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaVisão semântica é uma importante linha de investigação na área de visão por computador. A palavra-chave “semântica” implica a extracção de características não apenas visuais (cor, forma, textura), mas também qualquer tipo de informação de “alto-nível”. Em particular, a visão semântica procura compreender ou interpretar imagens de cenas em termos dos objectos presentes e eventualmente das relações entre eles. Uma das principais áreas de aplicação actual é a robótica. Sendo o mundo que nos rodeia extremamente visual, a interacção entre um utilizador humano não especializado e um robô requer que o robô seja capaz de detectar, reconhecer e compreender qualquer tipo de referências visuais fornecidas no âmbito da comunicação entre o utilizador e o robô. Para que tal seja possível, é necessária uma fase de aprendizagem, através da qual várias categorias de objectos são aprendidas pelo robô. Depois deste processo, o robô será capaz de reconhecer novas instâncias das categorias anteriormente aprendidas. Foi desenvolvido um novo agente de visão semântica que recorre a serviços de pesquisa de imagens na Web para aprender um conjunto de categorias gerais a partir apenas dos seus respectivos nomes. O trabalho teve como ponto de partida o agente UA@SRVC, anteriormente desenvolvido na Universidade de Aveiro para participação no Semantic Robot Vision Challenge. O trabalho começou pelo desenvolvimento de uma nova técnica de segmentação de objectos baseada nas suas arestas e na diversidade de cor. De seguida, a técnica de pesquisa semântica e selecção de imagens de treino do agente UA@SRVC foi revista e reimplementada utilizando, entre outros componentes, o novo módulo de segmentação. Por fim foram desenvolvidos novos classificadores para o reconhecimento de objectos. Apreendemos que, mesmo com pouca informação prévia sobre um objecto, é possível segmentá-lo correctamente utilizando para isso uma heurística simples que combina a diversidade da cor e a distância entre segmentos. Recorrendo a uma técnica de agrupamento conceptual, é possível criar um sistema de votos que permite efectuar uma boa selecção de instâncias para o treino de categorias. Conclui-se também que diferentes classificadores são mais eficientes quando a fase de aprendizagem é supervisionada ou automatizada.Semantic vision is an important line of research in computer vision. The keyword “semantic” means the extraction of features, not only visual (color, shape, texture), but also any “higher level” information. In particular, semantic vision seeks to understand or interpret images of scenes in terms of present objects and possible relations between them. One of the main areas of current application is robotics. As the world around us is extremely visual, interaction between a non specialized human user and a robot requires the robot to be able to detect, recognize and understand any kind of visual cues provided in the communication between user and robot. To make this possible, a learning phase is needed, in which various categories of objects are learned by the robot. After this process, the robot will be able to recognize new instances of the categories previously learned. We developed a new semantic vision agent that uses image search web services to learn a set of general categories based only on their respective names. The work had as starting point the agent UA@SRVC, previously developed at the University of Aveiro for participation in the Semantic Robot Vision Challenge. This work began by developing a new technique for segmentation of objects based on their edges and diversity of color. Then, the technique of semantic search and selection of images from the agent UA@SRVC was revised and reimplemented using, among other components, the new object extracting module. Finally new classifiers were developed for the recognition of objects. We learned that, even with little prior information about an object, it is possible to segment it correctly using a simple heuristic that combines colour disparity and distance between segments. Drawing on a conceptual clustering technique, we can create a voting system that allows a good selection of instances for training the categories. We also conclude that various classifiers are most effective when the learning phase is supervised or automated
    corecore