1,142 research outputs found

    Reconstruction and Synthesis of Human-Scene Interaction

    Get PDF
    In this thesis, we argue that the 3D scene is vital for understanding, reconstructing, and synthesizing human motion. We present several approaches which take the scene into consideration in reconstructing and synthesizing Human-Scene Interaction (HSI). We first observe that state-of-the-art pose estimation methods ignore the 3D scene and hence reconstruct poses that are inconsistent with the scene. We address this by proposing a pose estimation method that takes the 3D scene explicitly into account. We call our method PROX for Proximal Relationships with Object eXclusion. We leverage the data generated using PROX and build a method to automatically place 3D scans of people with clothing in scenes. The core novelty of our method is encoding the proximal relationships between the human and the scene in a novel HSI model, called POSA for Pose with prOximitieS and contActs. POSA is limited to static HSI, however. We propose a real-time method for synthesizing dynamic HSI, which we call SAMP for Scene-Aware Motion Prediction. SAMP enables virtual humans to navigate cluttered indoor scenes and naturally interact with objects. Data-driven kinematic models, like SAMP, can produce high-quality motion when applied in environments similar to those shown in the dataset. However, when applied to new scenarios, kinematic models can struggle to generate realistic behaviors that respect scene constraints. In contrast, we present InterPhys which uses adversarial imitation learning and reinforcement learning to train physically-simulated characters that perform scene interaction tasks in a physical and life-like manner

    Deep learning in crowd counting: A survey

    Get PDF
    Counting high-density objects quickly and accurately is a popular area of research. Crowd counting has significant social and economic value and is a major focus in artificial intelligence. Despite many advancements in this field, many of them are not widely known, especially in terms of research data. The authors proposed a three-tier standardised dataset taxonomy (TSDT). The Taxonomy divides datasets into small-scale, large-scale and hyper-scale, according to different application scenarios. This theory can help researchers make more efficient use of datasets and improve the performance of AI algorithms in specific fields. Additionally, the authors proposed a new evaluation index for the clarity of the dataset: average pixel occupied by each object (APO). This new evaluation index is more suitable for evaluating the clarity of the dataset in the object counting task than the image resolution. Moreover, the authors classified the crowd counting methods from a data-driven perspective: multi-scale networks, single-column networks, multi-column networks, multi-task networks, attention networks and weak-supervised networks and introduced the classic crowd counting methods of each class. The authors classified the existing 36 datasets according to the theory of three-tier standardised dataset taxonomy and discussed and evaluated these datasets. The authors evaluated the performance of more than 100 methods in the past five years on different levels of popular datasets. Recently, progress in research on small-scale datasets has slowed down. There are few new datasets and algorithms on small-scale datasets. The studies focused on large or hyper-scale datasets appear to be reaching a saturation point. The combined use of multiple approaches began to be a major research direction. The authors discussed the theoretical and practical challenges of crowd counting from the perspective of data, algorithms and computing resources. The field of crowd counting is moving towards combining multiple methods and requires fresh, targeted datasets. Despite advancements, the field still faces challenges such as handling real-world scenarios and processing large crowds in real-time. Researchers are exploring transfer learning to overcome the limitations of small datasets. The development of effective algorithms for crowd counting remains a challenging and important task in computer vision and AI, with many opportunities for future research.BHF, AA/18/3/34220Hope Foundation for Cancer Research, RM60G0680GCRF, P202PF11;Sino‐UK Industrial Fund, RP202G0289LIAS, P202ED10, P202RE969Data Science Enhancement Fund, P202RE237Sino‐UK Education Fund, OP202006Fight for Sight, 24NN201Royal Society International Exchanges Cost Share Award, RP202G0230MRC, MC_PC_17171BBSRC, RM32G0178B

    Novel 129Xe Magnetic Resonance Imaging and Spectroscopy Measurements of Pulmonary Gas-Exchange

    Get PDF
    Gas-exchange is the primary function of the lungs and involves removing carbon dioxide from the body and exchanging it within the alveoli for inhaled oxygen. Several different pulmonary, cardiac and cardiovascular abnormalities have negative effects on pulmonary gas-exchange. Unfortunately, clinical tests do not always pinpoint the problem; sensitive and specific measurements are needed to probe the individual components participating in gas-exchange for a better understanding of pathophysiology, disease progression and response to therapy. In vivo Xenon-129 gas-exchange magnetic resonance imaging (129Xe gas-exchange MRI) has the potential to overcome these challenges. When participants inhale hyperpolarized 129Xe gas, it has different MR spectral properties as a gas, as it diffuses through the alveolar membrane and as it binds to red-blood-cells. 129Xe MR spectroscopy and imaging provides a way to tease out the different anatomic components of gas-exchange simultaneously and provides spatial information about where abnormalities may occur. In this thesis, I developed and applied 129Xe MR spectroscopy and imaging to measure gas-exchange in the lungs alongside other clinical and imaging measurements. I measured 129Xe gas-exchange in asymptomatic congenital heart disease and in prospective, controlled studies of long-COVID. I also developed mathematical tools to model 129Xe MR signals during acquisition and reconstruction. The insights gained from my work underscore the potential for 129Xe gas-exchange MRI biomarkers towards a better understanding of cardiopulmonary disease. My work also provides a way to generate a deeper imaging and physiologic understanding of gas-exchange in vivo in healthy participants and patients with chronic lung and heart disease

    Cryo-Electron Microscopy to Investigate Molecular Dynamics and Conformational Changes in Protein Complexes

    Get PDF
    Stress can be considered as one of the most fundamental aspects in life, and all living organisms are constantly exposed to a variety of different stress situations. Thus, efficient stress sensing and reaction mechanisms are crucial for their survival. Stress response mechanisms are as diverse as the causative stimuli and oftentimes cross-linked forming a versatile reaction network, to ensure the cells’ survival under critical situations. Notably, stress response mechanisms play a major role in pathogenicity, virulence and disease. Pathogenic Bacteria are permanently facing environmental pressure originating from the host’s defense systems or drug treatments, while mutations in eukaryotic stress response systems have been shown to cause a large number of severe human diseases such as diabetes, cancer or Parkinson’s and Alzheimer’s disease. A profound molecular knowledge on the respective mechanisms is thus the inevitable prerequisite towards a global understanding of this fundamental aspect of life, paving the way for the development of new drugs or therapeutic approaches. Within this thesis, various aspects of stress response mechanisms in three different systems were investigated using state-of-the-art electron microscopy techniques. First, I set out to solve the structure of the Vibrio vulnificus stressosome complex, a key player in the bacterial environmental stress response. Currently, there is no structural data available for any gram-negative stressosome. A medium-resolution cryo-electron microscopy (cryo-EM) structure of the minimal complex could be obtained, which features an exceptional symmetry break originating from its unique, regulatory stoichiometry. Based on the structural data, it was possible to propose an activation mechanism and to pinpoint a number of significant differences in comparison to gram-positive stressosome complexes. Undoubtedly, the structure contributes a major piece of information necessary to understand stress sensing and signal transduction in this human pathogen. This study was complemented by a number of physiological and phylogenetic experiments contributed by our co-workers, and published recently (VIII. PUBLICATION 1). The second project focused on the gram-positive soil bacterium Corynebacterium glutamicum, a prime model organism for investigations of the bacterial osmostress response. Sensing of hyper-osmotic stress and regulation of the respective stress response in C. glutamicum are simultaneously performed by BetP, a conformationally asymmetric-trimeric secondary active transporter able to import the compatible solute betaine. Two stimuli are identified to initiate the full osmostress response in BetP, namely an elevated cytoplasmic K+ concentration and a loosely defined ‘membrane stimulus’. Despite the availability of functional data on BetP regulation, structural information especially of the down-regulated state and the subsequent transition events are absent. Using single particle cryo-EM analysis, I was able to provide high-resolution structures of the down-regulated and a transition state, which elucidated a number of important structural features not described so far. It could be shown that down-regulated BetP adopts a symmetric arrangement stabilized by antight cytoplasmic interaction network of the sensory domain, further strengthened by Cardiolipin molecules located at regulatory lipid binding sites. These constraints are released upon stress sensing, as demonstrated by fourier transform infrared (FTIR) spectroscopy and molecular dynamics simulation (MD) data contributed by our co-workers, resulting in the well-established, asymmetric-trimeric structures previously known. The wealth of new data on the down-regulated state allowed to propose a detailed regulation mechanism and to further sharpen the previously vague picture of the membrane stimulus. The data are summarized and presented in IX. PREPRINT 1. A third topic of this thesis was the three dimensional investigation via dual-axis scanning transmission electron microscopy (STEM) tomography of crystalloid-ER structures we identified before in human embryonic kidney (HEK) cells upon over-expression of polycystin-2 (PC-2). In this study presented in X. MANUSCRIPT 1, I was further able to proof the presence of ER whorls, and to obtain high-resolution three-dimensional (3D) reconstructions of the two different ER morphotypes. These data provided unmatched insights into the cellular ER interaction partners and clearly demonstrated the dynamic nature of the organelle even under stress situations. A detailed discussion of the identified morphological features in their respective cellular context finally allowed for the description of the organellar membrane architecture at a high level of detail. Lastly, the discussion addresses the electron microscopy techniques and instruments used and contains an outlook on further perspectives for the projects. Overall, this thesis yielded intriguing mechanistic insights into the versatile bacterial and eukaryotic stress response mechanisms, reflecting their manifold nature ultimately converging to a common outcome

    Ball Trajectory Inference from Multi-Agent Sports Contexts Using Set Transformer and Hierarchical Bi-LSTM

    Full text link
    As artificial intelligence spreads out to numerous fields, the application of AI to sports analytics is also in the spotlight. However, one of the major challenges is the difficulty of automated acquisition of continuous movement data during sports matches. In particular, it is a conundrum to reliably track a tiny ball on a wide soccer pitch with obstacles such as occlusion and imitations. Tackling the problem, this paper proposes an inference framework of ball trajectory from player trajectories as a cost-efficient alternative to ball tracking. We combine Set Transformers to get permutation-invariant and equivariant representations of the multi-agent contexts with a hierarchical architecture that intermediately predicts the player ball possession to support the final trajectory inference. Also, we introduce the reality loss term and postprocessing to secure the estimated trajectories to be physically realistic. The experimental results show that our model provides natural and accurate trajectories as well as admissible player ball possession at the same time. Lastly, we suggest several practical applications of our framework including missing trajectory imputation, semi-automated pass annotation, automated zoom-in for match broadcasting, and calculating possession-wise running performance metrics

    Occlusion-Ordered Semantic Instance Segmentation

    Get PDF
    Conventional semantic ‘instance’ segmentation methods offer a segmentation mask for each object instance in an image along with its semantic class label. These methods excel in distinguishing instances, whether they belong to the same class or different classes, providing valuable information about the scene. However, these methods lack the ability to provide depth-related information, thus unable to capture the 3D geometry of the scene. One option to derive 3D information about a scene is monocular depth estimation. It predicts the absolute distance from the camera to each pixel in an image. However, monocular depth estimation has limitations. It lacks semantic information about object classes. Furthermore, it is not precise enough to reliably detect instances or establish depth order for known instances. Even a coarse 3D geometry, such as the relative depth or occlusion order of objects is useful to obtain rich 3D-informed scene analysis. Based on this, we address occlusion-ordered semantic instance segmentation (OOSIS), which augments standard semantic instance segmentation by incorporating a coarse 3D geometry of the scene. By leveraging occlusion as a strong depth cue, OOSIS estimates a partial relative depth ordering of instances based on their occlusion relations. OOSIS produces two outputs: instance masks and their classes, as well as the occlusion ordering of those predicted instances. Existing works pre-date deep learning and rely on simple visual cues such as the y-coordinate of objects for occlusion ordering. This thesis introduces two deep learning-based approaches for OOSIS. The first approach, following a top-down strategy, determines pairwise occlusion order between instances obtained by a standard instance segmentation method. However, this approach lacks global occlusion ordering consistency, having undesired cyclic orderings. Our second approach is bottom-up. It simultaneously derives instances and their occlusion order by grouping pixels into instances and assigning occlusion order labels. This approach ensures a globally consistent occlusion ordering. As part of this approach, we develop a novel deep model that predicts the boundaries where occlusion occurs plus the orientation of occlusion at the boundary, indicating which side of it occludes the other. The output of this model is utilized to obtain instances and their corresponding ordering by our proposed discrete optimization formulation. To assess the performance of OOSIS methods, we introduce a novel evaluation metric capable of simultaneously evaluating instance segmentation and occlusion ordering. In addition, we utilize standard metrics for evaluating the quality of instance masks. We also evaluate occlusion ordering consistency, and oriented occlusion boundaries. We conduct evaluations on KINS and COCOA datasets

    Optimization for Deep Learning Systems Applied to Computer Vision

    Get PDF
    149 p.Since the DL revolution and especially over the last years (2010-2022), DNNs have become an essentialpart of the CV field, and they are present in all its sub-fields (video-surveillance, industrialmanufacturing, autonomous driving, ...) and in almost every new state-of-the-art application that isdeveloped. However, DNNs are very complex and the architecture needs to be carefully selected andadapted in order to maximize its efficiency. In many cases, networks are not specifically designed for theconsidered use case, they are simply recycled from other applications and slightly adapted, without takinginto account the particularities of the use case or the interaction with the rest of the system components,which usually results in a performance drop.This research work aims at providing knowledge and tools for the optimization of systems based on DeepLearning applied to different real use cases within the field of Computer Vision, in order to maximizetheir effectiveness and efficiency

    AI for time-resolved imaging: from fluorescence lifetime to single-pixel time of flight

    Get PDF
    Time-resolved imaging is a field of optics which measures the arrival time of light on the camera. This thesis looks at two time-resolved imaging modalities: fluorescence lifetime imaging and time-of-flight measurement for depth imaging and ranging. Both of these applications require temporal accuracy on the order of pico- or nanosecond (10−12 − 10−9s) scales. This demands special camera technology and optics that can sample light-intensity extremely quickly, much faster than an ordinary video camera. However, such detectors can be very expensive compared to regular cameras while offering lower image quality. Further, information of interest is often hidden (encoded) in the raw temporal data. Therefore, computational imaging algorithms are used to enhance, analyse and extract information from time-resolved images. "A picture is worth a thousand words". This describes a fundamental blessing and curse of image analysis: images contain extreme amounts of data. Consequently, it is very difficult to design algorithms that encompass all the possible pixel permutations and combinations that can encode this information. Fortunately, the rise of AI and machine learning (ML) allow us to instead create algorithms in a data-driven way. This thesis demonstrates the application of ML to time-resolved imaging tasks, ranging from parameter estimation in noisy data and decoding of overlapping information, through super-resolution, to inferring 3D information from 1D (temporal) data

    Toward Efficient Rendering: A Neural Network Approach

    Get PDF
    Physically-based image synthesis has attracted considerable attention due to its wide applications in visual effects, video games, design visualization, and simulation. However, obtaining visually satisfactory renderings with ray tracing algorithms often requires casting a large number of rays and thus takes a vast amount of computation. The extensive computational and memory requirements of ray tracing methods pose a challenge, especially when running these rendering algorithms on resource-constrained platforms, and impede their applications that require high resolutions and refresh rates. This thesis presents three methods to address the challenge of efficient rendering. First, we present a hybrid rendering method to speed up Monte Carlo rendering algorithms. Our method first generates two versions of a rendering: one at a low resolution with a high sample rate (LRHS) and the other at a high resolution with a low sample rate (HRLS). We then develop a deep convolutional neural network to fuse these two renderings into a high-quality image as if it were rendered at a high resolution with a high sample rate. Specifically, we formulate this fusion task as a super-resolution problem that generates a high-resolution rendering from a low-resolution input (LRHS), assisted with the HRLS rendering. The HRLS rendering provides critical high-frequency details which are difficult to recover from the LRHS for any super-resolution methods. Our experiments show that our hybrid rendering algorithm is significantly faster than the state-of-the-art Monte Carlo denoising methods while rendering high-quality images when tested on both our own BCR dataset and the Gharbi dataset. Second, we investigate super-resolution to reduce the number of pixels to render and thus speed up Monte Carlo rendering algorithms. While great progress has been made in super-resolution technologies, it is essentially an ill-posed problem and cannot recover high-frequency details in renderings. To address this problem, we exploit high-resolution auxiliary features to guide the super-resolution of low-resolution renderings. These high-resolution auxiliary features can be quickly rendered by a rendering engine and, at the same time, provide valuable high-frequency details to assist super-resolution. To this end, we develop a cross-modality Transformer network that consists of an auxiliary feature branch and a low-resolution rendering branch. These two branches are designed to fuse high-resolution auxiliary features with the corresponding low-resolution rendering. Furthermore, we design residual densely-connected Swin Transformer groups for learning to extract representative features to enable high-quality super-resolution. Our experiments show that our auxiliary features-guided super-resolution method outperforms both state-of-the-art super-resolution methods and Monte Carlo denoising methods in producing high-quality renderings. Third, we present a deep-learning-based Monte Carlo Denoising method for the stereoscopic images. Research on deep-learning-based Monte Carlo denoising has made significant progress in recent years. However, existing methods are mostly designed for single-image Monte Carlo denoising, and stereoscopic image Monte Carlo denoising is less explored. Traditional methods require first rendering a noiseless for one view, which is time-consuming. Recent deep-learning-based methods achieve promising results on single-image Monte Carlo denoising, but their performance on the stereoscopic image is compromised as they do not consider the spatial correspondence between the left image and the right image. In this thesis, we present a deep-learning-based Monte Carlo denoising method for stereoscopic images. It takes low sampling per pixel (spp) stereoscopic images as inputs and estimates the high-quality result. Specifically, we extract features from two stereoscopic images and warp the features from one image to the other using the disparity finetuned from the disparity calculated from geometry. To train our network, we collected a large-scale Blender Cycles Stereo Ray-tracing dataset. Our experiments show that our method outperforms state-of-the-art methods when the sampling rates are low
    corecore