49 research outputs found
Spatiotemporal oriented energies for spacetime stereo
This paper presents a novel approach to recovering tem-porally coherent estimates of 3D structure of a dynamic scene from a sequence of binocular stereo images. The approach is based on matching spatiotemporal orientation distributions between left and right temporal image streams, which encapsulates both local spatial and temporal struc-ture for disparity estimation. By capturing spatial and tem-poral structure in this unified fashion, both sources of in-formation combine to yield disparity estimates that are nat-urally temporal coherent, while helping to resolve matches that might be ambiguous when either source is considered alone. Further, by allowing subsets of the orientation mea-surements to support different disparity estimates, an ap-proach to recovering multilayer disparity from spacetime stereo is realized. The approach has been implemented with real-time performance on commodity GPUs. Empir-ical evaluation shows that the approach yields qualitatively and quantitatively superior disparity estimates in compari-son to various alternative approaches, including the ability to provide accurate multilayer estimates in the presence of (semi)transparent and specular surfaces. 1
Understanding Video Transformers for Segmentation: A Survey of Application and Interpretability
Video segmentation encompasses a wide range of categories of problem
formulation, e.g., object, scene, actor-action and multimodal video
segmentation, for delineating task-specific scene components with pixel-level
masks. Recently, approaches in this research area shifted from concentrating on
ConvNet-based to transformer-based models. In addition, various
interpretability approaches have appeared for transformer models and video
temporal dynamics, motivated by the growing interest in basic scientific
understanding, model diagnostics and societal implications of real-world
deployment. Previous surveys mainly focused on ConvNet models on a subset of
video segmentation tasks or transformers for classification tasks. Moreover,
component-wise discussion of transformer-based video segmentation models has
not yet received due focus. In addition, previous reviews of interpretability
methods focused on transformers for classification, while analysis of video
temporal dynamics modelling capabilities of video models received less
attention. In this survey, we address the above with a thorough discussion of
various categories of video segmentation, a component-wise discussion of the
state-of-the-art transformer-based models, and a review of related
interpretability methods. We first present an introduction to the different
video segmentation task categories, their objectives, specific challenges and
benchmark datasets. Next, we provide a component-wise review of recent
transformer-based models and document the state of the art on different video
segmentation tasks. Subsequently, we discuss post-hoc and ante-hoc
interpretability methods for transformer models and interpretability methods
for understanding the role of the temporal dimension in video models. Finally,
we conclude our discussion with future research directions
StepFormer: Self-supervised Step Discovery and Localization in Instructional Videos
Instructional videos are an important resource to learn procedural tasks from
human demonstrations. However, the instruction steps in such videos are
typically short and sparse, with most of the video being irrelevant to the
procedure. This motivates the need to temporally localize the instruction steps
in such videos, i.e. the task called key-step localization. Traditional methods
for key-step localization require video-level human annotations and thus do not
scale to large datasets. In this work, we tackle the problem with no human
supervision and introduce StepFormer, a self-supervised model that discovers
and localizes instruction steps in a video. StepFormer is a transformer decoder
that attends to the video with learnable queries, and produces a sequence of
slots capturing the key-steps in the video. We train our system on a large
dataset of instructional videos, using their automatically-generated subtitles
as the only source of supervision. In particular, we supervise our system with
a sequence of text narrations using an order-aware loss function that filters
out irrelevant phrases. We show that our model outperforms all previous
unsupervised and weakly-supervised approaches on step detection and
localization by a large margin on three challenging benchmarks. Moreover, our
model demonstrates an emergent property to solve zero-shot multi-step
localization and outperforms all relevant baselines at this task.Comment: CVPR'2
Stacking Interactions in Denaturation of DNA Fragments
A mesoscopic model for heterogeneous DNA denaturation is developed in the
framework of the path integral formalism. The base pair stretchings are treated
as one-dimensional, time dependent paths contributing to the partition
function. The size of the paths ensemble, which measures the degree of
cooperativity of the system, is computed versus temperature consistently with
the model potential physical requirements. It is shown that the ensemble size
strongly varies with the molecule backbone stiffness providing a quantitative
relation between stacking and features of the melting transition. The latter is
an overall smooth crossover which begins from the \emph{adenine-thymine} rich
portions of the fragment. The harmonic stacking coupling shifts, along the
-axis, the occurrence of the multistep denaturation but it does not change
the character of the crossover. The methods to compute the fractions of open
base pairs versus temperature are discussed: by averaging the base pair
displacements over the path ensemble we find that such fractions signal the
multisteps of the transition in good agreement with the indications provided by
the specific heat plots.Comment: European Physical Journal E (2011) in pres
Global patient outcomes after elective surgery: prospective cohort study in 27 low-, middle- and high-income countries.
BACKGROUND: As global initiatives increase patient access to surgical treatments, there remains a need to understand the adverse effects of surgery and define appropriate levels of perioperative care. METHODS: We designed a prospective international 7-day cohort study of outcomes following elective adult inpatient surgery in 27 countries. The primary outcome was in-hospital complications. Secondary outcomes were death following a complication (failure to rescue) and death in hospital. Process measures were admission to critical care immediately after surgery or to treat a complication and duration of hospital stay. A single definition of critical care was used for all countries. RESULTS: A total of 474 hospitals in 19 high-, 7 middle- and 1 low-income country were included in the primary analysis. Data included 44 814 patients with a median hospital stay of 4 (range 2-7) days. A total of 7508 patients (16.8%) developed one or more postoperative complication and 207 died (0.5%). The overall mortality among patients who developed complications was 2.8%. Mortality following complications ranged from 2.4% for pulmonary embolism to 43.9% for cardiac arrest. A total of 4360 (9.7%) patients were admitted to a critical care unit as routine immediately after surgery, of whom 2198 (50.4%) developed a complication, with 105 (2.4%) deaths. A total of 1233 patients (16.4%) were admitted to a critical care unit to treat complications, with 119 (9.7%) deaths. Despite lower baseline risk, outcomes were similar in low- and middle-income compared with high-income countries. CONCLUSIONS: Poor patient outcomes are common after inpatient surgery. Global initiatives to increase access to surgical treatments should also address the need for safe perioperative care. STUDY REGISTRATION: ISRCTN5181700
A Unifying Theoretical Framework for Region Tracking A Unifying Theoretical Framework for Region Tracking
Abstract Visual region-based tracking is a heavily researched general approach to following a target across a temporal image sequence. Little research, however, has addressed the interrelationships of the various proposed approaches at a theoretical level. In response to this situation, the present paper describes a unifying framework for a wide range of region trackers in terms of the amount of spatial layout that they maintain in their target representation. This framework yields a general notation from which any of these trackers can be instantiated. To illustrate the practical utility of the framework, a range of region trackers are instantiated within its formalism and used to document empirically the impact of maintaining variable amounts of spatial information during target tracking
On Interpreting Stereo Disparity
The problems under consideration center around the interpretation of binocular stereo disparity. In particular, the goal is to establish a set of mappings from stereo disparity to corresponding three-dimensional scene geometry. An analysis has been developed that shows how disparity information can be interpreted in terms of three-dimensional scene properties, such as surface depth, discontinuities, and orientation. These theoretical developments have been embodied in a set of computer algorithms for the recovery of scene geometry from input stereo disparity. The results of applying these algorithms to several disparity maps are presented. Comparisons are made to the interpretation of stereo disparity by biological systems