6,740 research outputs found

    Identification, indexing, and retrieval of cardio-pulmonary resuscitation (CPR) video scenes of simulated medical crisis.

    Get PDF
    Medical simulations, where uncommon clinical situations can be replicated, have proved to provide a more comprehensive training. Simulations involve the use of patient simulators, which are lifelike mannequins. After each session, the physician must manually review and annotate the recordings and then debrief the trainees. This process can be tedious and retrieval of specific video segments should be automated. In this dissertation, we propose a machine learning based approach to detect and classify scenes that involve rhythmic activities such as Cardio-Pulmonary Resuscitation (CPR) from training video sessions simulating medical crises. This applications requires different preprocessing techniques from other video applications. In particular, most processing steps require the integration of multiple features such as motion, color and spatial and temporal constrains. The first step of our approach consists of segmenting the video into shots. This is achieved by extracting color and motion information from each frame and identifying locations where consecutive frames have different features. We propose two different methods to identify shot boundaries. The first one is based on simple thresholding while the second one uses unsupervised learning techniques. The second step of our approach consists of selecting one key frame from each shot and segmenting it into homogeneous regions. Then few regions of interest are identified for further processing. These regions are selected based on the type of motion of their pixels and their likelihood to be skin-like regions. The regions of interest are tracked and a sequence of observations that encode their motion throughout the shot is extracted. The next step of our approach uses an HMM classiffier to discriminate between regions that involve CPR actions and other regions. We experiment with both continuous and discrete HMM. Finally, to improve the accuracy of our system, we also detect faces in each key frame, track them throughout the shot, and fuse their HMM confidence with the region\u27s confidence. To allow the user to view and analyze the video training session much more efficiently, we have also developed a graphical user interface (GUI) for CPR video scene retrieval and analysis with several desirable features. To validate our proposed approach to detect CPR scenes, we use one video simulation session recorded by the SPARC group to train the HMM classifiers and learn the system\u27s parameters. Then, we analyze the proposed system on other video recordings. We show that our approach can identify most CPR scenes with few false alarms

    Audio-visual football video analysis, from structure detection to attention analysis

    Get PDF
    Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic fields. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop specific techniques for content-based sports video analysis to utilise these characteristics. For an efficient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identification. Replay segments convey the most important contents in sports videos. It is an efficient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a five-layer adaboost classifier and a logo template matching throughout an entire video. The five-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to filter out logo transition candidates. Subsequently, a logo template is constructed and employed to find all transition logo sequences. The precision and recall of this system in replay detection is 100% in a five-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identified by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a suffix tree is proposed to find the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reflection bias among modality salient signals and combines these signals by reflectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are filled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can find goal events at a high precision. Moreover, results of MAR-based highlight detection on the final game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA

    Ensemble learning method for hidden markov models.

    Get PDF
    For complex classification systems, data are gathered from various sources and potentially have different representations. Thus, data may have large intra-class variations. In fact, modeling each data class with a single model might lead to poor generalization. The classification error can be more severe for temporal data where each sample is represented by a sequence of observations. Thus, there is a need for building a classification system that takes into account the variations within each class in the data. This dissertation introduces an ensemble learning method for temporal data that uses a mixture of Hidden Markov Model (HMM) classifiers. We hypothesize that the data are generated by K models, each of which reacts a particular trend in the data. Model identification could be achieved through clustering in the feature space or in the parameters space. However, this approach is inappropriate in the context of sequential data. The proposed approach is based on clustering in the log-likelihood space, and has two main steps. First, one HMM is fit to each of the N individual sequences. For each fitted model, we evaluate the log-likelihood of each sequence. This will result in an N-by-N log-likelihood distance matrix that will be partitioned into K groups using a relational clustering algorithm. In the second step, we learn the parameters of one HMM per group. We propose using and optimizing various training approaches for the different K groups depending on their size and homogeneity. In particular, we investigate the maximum likelihood (ML), the minimum classification error (MCE) based discriminative, and the Variational Bayesian (VB) training approaches. Finally, to test a new sequence, its likelihood is computed in all the models and a final confidence value is assigned by combining the multiple models outputs using a decision level fusion method such as an artificial neural network or a hierarchical mixture of experts. Our approach was evaluated on two real-world applications: (1) identification of Cardio-Pulmonary Resuscitation (CPR) scenes in video simulating medical crises; and (2) landmine detection using Ground Penetrating Radar (GPR). Results on both applications show that the proposed method can identify meaningful and coherent HMM mixture components that describe different properties of the data. Each HMM mixture component models a group of data that share common attributes. The results indicate that the proposed method outperforms the baseline HMM that uses one model for each class in the data

    A new diagnostic for ASDEX Upgrade edge ion temperatures by lithium-beam charge exchange recombination spectroscopy

    Get PDF
    This thesis work investigates the measurement of ion temperatures at the edge of a magnetically confined plasma used for fusion research at the ASDEX Upgrade tokamak operated by Max-Planck-Institut für Plasmaphysik in Garching. The tokamak is the most advanced concept in toroidal magnetic confinement fusion. The H-mode plasma regime, default scenario of the next step experiment ITER, is characterized by an edge transport barrier, which is not yet fully explained by theory. Experimentally measured edge ion temperature profiles will help to test and develop models for these barriers. Transport theory on a basic level is introduced as background and motivation for the new diagnostic. The standard model for an edge plasma instability named "edge localized mode" (ELM) observed in H-mode is described. The implementation of a new diagnostic for ion temperature measurements with high spatial resolution in the plasma edge region, its commissioning and the validation of the measurements comprises the main part of this work. The emission of line radiation induced by charge exchange processes between lithium atoms injected by a beam source and fully ionized impurities (of C and He) is observed with a detection system consisting of spectrometers and fast cameras. Due to the narrow beam (1 cm) and closely staggered optical fibers (6 mm), unprecedented spatial resolution of edge ion temperatures in all major plasma regimes of the ASDEX Upgrade tokamak was achieved. The spectral width of the line radiation (He II at 468.5 nm and C VI at 529.0 nm) contains information about the local ion temperature from thermal Doppler-broadening, which is the dominant broadening mechanism for these lines. The charge-exchange contribution to the total line radiation locally generated by the lithium is determined by gating the beam. Fitting a Gaussian model function to the local line radiation results in absolute line widths which can be directly converted into a temperature. The equilibration of impurities with the main plasma is fast enough that the assumption of nearly identical temperatures as the main plasma is justified. Corrections for systematic line broadening effects from collisional mixing and Zeeman broadening are incorporated by model calculations using existing routines for the involved atomic physics. Time resolution of the diagnostic is still not suffcient to resolve ELM events, but measuring between ELMs is possible if their frequency is low. L-mode plasmas with and without additional heating can be reliably diagnosed with a time resolution depending on the lithium beam intensity and plasma density, in best cases down to 100 ms. It was shown that diagnostic He puffing can be used to enhance the signal-to-noise ratio. Results from L-mode plasmas with electron heating show that ion temperatures can be significantly different from electron temperatures at the edge. For the verification of the new ion temperatures, comparison with data from already established diagnostics was done. In neutral beam heated L-mode and various H-mode plasmas the ion temperatures agree with those from a similar diagnostic measuring in the core using heating beams where both diagnostics overlap. They can be combined to form a complete ion temperature profile over the whole plasma radius. In a first application, transport coefficients have been determined by interpretative modeling for an ohmic plasma. In summary, a new method for measuring ion temperatures in the edge of a magnetically confined fusion plasma has been established. The results provide an important input to further understanding of transport in these plasmas

    Representation and Matching of Articulated Shapes

    Get PDF
    We consider the problem of localizing the articulated and deformable shape of a walking person in a single view. We represent the non-rigid 2D body contour by a Bayesian graphical model whose nodes correspond to point positions along the contour. The deformability of the model is constrained by learned priors corresponding to two basic mechanisms: local non-rigid deformation, and rotation motion of the joints. Four types of image cues are combined to relate the model configuration to the observed image, including edge gradient map, foreground/background mask, skin color mask, and appearance consistency constraints. The constructed Bayes network is sparse and chain-like, enabling efficient spatial inference through Sequential Monte Carlo sampling methods. We evaluate the performance of the model on images taken in cluttered, outdoor scenes. The utility of each image cue is also empirically explored

    The Earliest Near-infrared Time-series Spectroscopy of a Type Ia Supernova

    Get PDF
    We present ten medium-resolution, high signal-to-noise ratio near-infrared (NIR) spectra of SN 2011fe from SpeX on the NASA Infrared Telescope Facility (IRTF) and Gemini Near-Infrared Spectrograph (GNIRS) on Gemini North, obtained as part of the Carnegie Supernova Project. This data set constitutes the earliest time-series NIR spectroscopy of a Type Ia supernova (SN Ia), with the first spectrum obtained at 2.58 days past the explosion and covering -14.6 to +17.3 days relative to B-band maximum. C I {\lambda}1.0693 {\mu}m is detected in SN 2011fe with increasing strength up to maximum light. The delay in the onset of the NIR C I line demonstrates its potential to be an effective tracer of unprocessed material. For the first time in a SN Ia, the early rapid decline of the Mg II {\lambda}1.0927 {\mu}m velocity was observed, and the subsequent velocity is remarkably constant. The Mg II velocity during this constant phase locates the inner edge of carbon burning and probes the conditions under which the transition from deflagration to detonation occurs. We show that the Mg II velocity does not correlate with the optical light-curve decline rate {\Delta}m15. The prominent break at ~1.5 {\mu}m is the main source of concern for NIR k-correction calculations. We demonstrate here that the feature has a uniform time evolution among SNe Ia, with the flux ratio across the break strongly correlated with {\Delta}m15. The predictability of the strength and the onset of this feature suggests that the associated k-correction uncertainties can be minimized with improved spectral templates.Comment: 14 pages, 13 figures, accepted for publication in Ap

    Segmentation and Classification of Multimodal Imagery

    Get PDF
    Segmentation and classification are two important computer vision tasks that transform input data into a compact representation that allow fast and efficient analysis. Several challenges exist in generating accurate segmentation or classification results. In a video, for example, objects often change the appearance and are partially occluded, making it difficult to delineate the object from its surroundings. This thesis proposes video segmentation and aerial image classification algorithms to address some of the problems and provide accurate results. We developed a gradient driven three-dimensional segmentation technique that partitions a video into spatiotemporal objects. The algorithm utilizes the local gradient computed at each pixel location together with the global boundary map acquired through deep learning methods to generate initial pixel groups by traversing from low to high gradient regions. A local clustering method is then employed to refine these initial pixel groups. The refined sub-volumes in the homogeneous regions of video are selected as initial seeds and iteratively combined with adjacent groups based on intensity similarities. The volume growth is terminated at the color boundaries of the video. The over-segments obtained from the above steps are then merged hierarchically by a multivariate approach yielding a final segmentation map for each frame. In addition, we also implemented a streaming version of the above algorithm that requires a lower computational memory. The results illustrate that our proposed methodology compares favorably well, on a qualitative and quantitative level, in segmentation quality and computational efficiency with the latest state of the art techniques. We also developed a convolutional neural network (CNN)-based method to efficiently combine information from multisensor remotely sensed images for pixel-wise semantic classification. The CNN features obtained from multiple spectral bands are fused at the initial layers of deep neural networks as opposed to final layers. The early fusion architecture has fewer parameters and thereby reduces the computational time and GPU memory during training and inference. We also introduce a composite architecture that fuses features throughout the network. The methods were validated on four different datasets: ISPRS Potsdam, Vaihingen, IEEE Zeebruges, and Sentinel-1, Sentinel-2 dataset. For the Sentinel-1,-2 datasets, we obtain the ground truth labels for three classes from OpenStreetMap. Results on all the images show early fusion, specifically after layer three of the network, achieves results similar to or better than a decision level fusion mechanism. The performance of the proposed architecture is also on par with the state-of-the-art results

    The Carbon Content of Intergalactic Gas at z=4.25 and its Evolution Toward z=2.4

    Get PDF
    This paper presents ionization-corrected measurements of the carbon abundance in intergalactic gas at 4.0 < z < 4.5, using spectra of three bright quasars obtained with the MIKE spectrograph on Magellan. By measuring the CIV strength in a sample of 131 discrete HI-selected quasar absorbers with \rho/\bar{\rho}>1.6, we derive a median carbon abundance of [C/H]=-3.55, with lognormal scatter of approximately ~0.8 dex. This median value is a factor of two to three lower than similar measurements made at z~2.4 using CIV and OVI. The strength of evolution is modestly dependent on the choice of UV background spectrum used to make ionization corrections, although our detection of an abundance evolution is generally robust with respect to this model uncertainty. We present a framework for analyzing the effects of spatial fluctuations in the UV ionizing background at frequencies relevant for CIV production. We also explore the effects of reduced flux between 3-4 Rydbergs (as from HeII Lyman series absorption) on our abundance estimates. At HeII line absorption levels similar to published estimates the effects are very small, although a larger optical depth could reduce the strength of the abundance evolution. Our results imply that ~50% of the heavy elements seen in the IGM at z~2.4 were deposited in the 1.3 Gyr between z~4.3 and z~2.4. The total implied mass flux of carbon into the Lyman alpha forest would constitute ~30% of the IMF-weighted carbon yield from known star forming populations over this period.Comment: Accepted for publication in the Astrophysical Journal. 23 pages, 24 figures, 2 table
    corecore