3,957 research outputs found

    An audio-based sports video segmentation and event detection algorithm

    Get PDF
    In this paper, we present an audio-based event detection algorithm shown to be effective when applied to Soccer video. The main benefit of this approach is the ability to recognise patterns that display high levels of crowd response correlated to key events. The soundtrack from a Soccer sequence is first parameterised using Mel-frequency Cepstral coefficients. It is then segmented into homogenous components using a windowing algorithm with a decision process based on Bayesian model selection. This decision process eliminated the need for defining a heuristic set of rules for segmentation. Each audio segment is then labelled using a series of Hidden Markov model (HMM) classifiers, each a representation of one of 6 predefined semantic content classes found in Soccer video. Exciting events are identified as those segments belonging to a crowd cheering class. Experimentation indicated that the algorithm was more effective for classifying crowd response when compared to traditional model-based segmentation and classification techniques

    Edge cross-section profile for colonoscopic object detection

    Get PDF
    Colorectal cancer is the second leading cause of cancer-related deaths, claiming close to 50,000 lives annually in the United States alone. Colonoscopy is an important screening tool that has contributed to a significant decline in colorectal cancer-related deaths. During colonoscopy, a tiny video camera at the tip of the endoscope generates a video signal of the internal mucosa of the human colon. The video data is displayed on a monitor for real-time diagnosis by the endoscopist. Despite the success of colonoscopy in lowering cancer-related deaths, a significant miss rate for detection of both large polyps and cancers is estimated around 4-12%. As a result, in recent years, many computer-aided object detection techniques have been developed with the ultimate goal to assist the endoscopist in lowering the polyp miss rate. Automatic object detection in recorded video data during colonoscopy is challenging due to the noisy nature of endoscopic images caused by camera motion, strong light reflections, the wide angle lens that cannot be automatically focused, and the location and appearance variations of objects within the colon. The unique characteristics of colonoscopy video require new image/video analysis techniques. The dissertation presents our investigation on edge cross-section profile (ECSP), a local appearance model, for colonoscopic object detection. We propose several methods to derive new features on ECSP from its surrounding region pixels, its first-order derivative profile, and its second-order derivative profile. These ECSP features describe discriminative patterns for different types of objects in colonoscopy. The new algorithms and software using the ECSP features can effectively detect three representative types of objects and extract their corresponding semantic unit in terms of both accuracy and analysis time. The main contributions of dissertation are summarized as follows. The dissertation presents 1) a new ECSP calculation method and feature-based ECSP method that extracts features on ECSP for object detection, 2) edgeless ECSP method that calculates ECSP without using edges, 3) part-based multi-derivative ECSP algorithm that segments ECSP, its 1st - order and its 2nd - order derivative functions into parts and models each part using the method that is suitable to that part, 4) ECSP based algorithms for detecting three representative types of colonoscopic objects including appendiceal orifices, endoscopes during retroflexion operations, and polyps and extracting videos or segmented shots containing these objects as semantic units, and 5) a software package that implements these techniques and provides meaningful visual feedback of the detected results to the endoscopist. Ideally, we would like the software to provide feedback to the endoscopist before the next video frame becomes available and to process video data at the rate in which the data are captured (typically at about 30 frames per second (fps)). This real-time requirement is difficult to achieve using today\u27s affordable off-the-shelf workstations. We aim for achieving near real-time performance where the analysis and feedback complete at the rate of at least 1 fps. The dissertation has the following broad impacts. Firstly, the performance study shows that our proposed ECSP based techniques are promising both in terms of the detection rate and execution time for detecting the appearance of the three aforementioned types of objects in colonoscopy video. Our ECSP based techniques can be extended to both detect other types of colonoscopic objects such as diverticula, lumen and vessel, and analyze other endoscopy procedures, such as laparoscopy, upper gastrointestinal endoscopy, wireless capsule endoscopy and EGD. Secondly, to our best knowledge, our polyp detection system is the only computer-aided system that can warn the endoscopist the appearance of polyps in near real time. Our retroflexion detection system is also the first computer-aided system that can detect retroflexion in near real-time. Retroflexion is a maneuver used by the endoscopist to inspect the colon area that is hard to reach. The use of our system in future clinical trials may contribute to the decline in the polyp miss rate during live colonoscopy. Our system may be used as a training platform for novice endoscopists. Lastly, the automatic documentation of detected semantic units of colonoscopic objects can be helpful to discover unknown patterns of colorectal cancers or new diseases and used as educational resources for endoscopic research

    Identification, indexing, and retrieval of cardio-pulmonary resuscitation (CPR) video scenes of simulated medical crisis.

    Get PDF
    Medical simulations, where uncommon clinical situations can be replicated, have proved to provide a more comprehensive training. Simulations involve the use of patient simulators, which are lifelike mannequins. After each session, the physician must manually review and annotate the recordings and then debrief the trainees. This process can be tedious and retrieval of specific video segments should be automated. In this dissertation, we propose a machine learning based approach to detect and classify scenes that involve rhythmic activities such as Cardio-Pulmonary Resuscitation (CPR) from training video sessions simulating medical crises. This applications requires different preprocessing techniques from other video applications. In particular, most processing steps require the integration of multiple features such as motion, color and spatial and temporal constrains. The first step of our approach consists of segmenting the video into shots. This is achieved by extracting color and motion information from each frame and identifying locations where consecutive frames have different features. We propose two different methods to identify shot boundaries. The first one is based on simple thresholding while the second one uses unsupervised learning techniques. The second step of our approach consists of selecting one key frame from each shot and segmenting it into homogeneous regions. Then few regions of interest are identified for further processing. These regions are selected based on the type of motion of their pixels and their likelihood to be skin-like regions. The regions of interest are tracked and a sequence of observations that encode their motion throughout the shot is extracted. The next step of our approach uses an HMM classiffier to discriminate between regions that involve CPR actions and other regions. We experiment with both continuous and discrete HMM. Finally, to improve the accuracy of our system, we also detect faces in each key frame, track them throughout the shot, and fuse their HMM confidence with the region\u27s confidence. To allow the user to view and analyze the video training session much more efficiently, we have also developed a graphical user interface (GUI) for CPR video scene retrieval and analysis with several desirable features. To validate our proposed approach to detect CPR scenes, we use one video simulation session recorded by the SPARC group to train the HMM classifiers and learn the system\u27s parameters. Then, we analyze the proposed system on other video recordings. We show that our approach can identify most CPR scenes with few false alarms

    RGBD Datasets: Past, Present and Future

    Full text link
    Since the launch of the Microsoft Kinect, scores of RGBD datasets have been released. These have propelled advances in areas from reconstruction to gesture recognition. In this paper we explore the field, reviewing datasets across eight categories: semantics, object pose estimation, camera tracking, scene reconstruction, object tracking, human actions, faces and identification. By extracting relevant information in each category we help researchers to find appropriate data for their needs, and we consider which datasets have succeeded in driving computer vision forward and why. Finally, we examine the future of RGBD datasets. We identify key areas which are currently underexplored, and suggest that future directions may include synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style

    Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking

    Full text link
    Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a novel, complete deep learning framework for multi-person localisation and tracking. In this context we first introduce a light weight sequential Generative Adversarial Network architecture for person localisation, which overcomes issues related to occlusions and noisy detections, typically found in a multi person environment. In the proposed tracking framework we build upon recent advances in pedestrian trajectory prediction approaches and propose a novel data association scheme based on predicted trajectories. This removes the need for computationally expensive person re-identification systems based on appearance features and generates human like trajectories with minimal fragmentation. The proposed method is evaluated on multiple public benchmarks including both static and dynamic cameras and is capable of generating outstanding performance, especially among other recently proposed deep neural network based approaches.Comment: To appear in IEEE Winter Conference on Applications of Computer Vision (WACV), 201

    Multimodal segmentation of lifelog data

    Get PDF
    A personal lifelog of visual and audio information can be very helpful as a human memory augmentation tool. The SenseCam, a passive wearable camera, used in conjunction with an iRiver MP3 audio recorder, will capture over 20,000 images and 100 hours of audio per week. If used constantly, very soon this would build up to a substantial collection of personal data. To gain real value from this collection it is important to automatically segment the data into meaningful units or activities. This paper investigates the optimal combination of data sources to segment personal data into such activities. 5 data sources were logged and processed to segment a collection of personal data, namely: image processing on captured SenseCam images; audio processing on captured iRiver audio data; and processing of the temperature, white light level, and accelerometer sensors onboard the SenseCam device. The results indicate that a combination of the image, light and accelerometer sensor data segments our collection of personal data better than a combination of all 5 data sources. The accelerometer sensor is good for detecting when the user moves to a new location, while the image and light sensors are good for detecting changes in wearer activity within the same location, as well as detecting when the wearer socially interacts with others
    • 

    corecore