51 research outputs found

    A metadata-enhanced framework for high performance visual effects

    No full text
    This thesis is devoted to reducing the interactive latency of image processing computations in visual effects. Film and television graphic artists depend upon low-latency feedback to receive a visual response to changes in effect parameters. We tackle latency with a domain-specific optimising compiler which leverages high-level program metadata to guide key computational and memory hierarchy optimisations. This metadata encodes static and dynamic information about data dependence and patterns of memory access in the algorithms constituting a visual effect – features that are typically difficult to extract through program analysis – and presents it to the compiler in an explicit form. By using domain-specific information as a substitute for program analysis, our compiler is able to target a set of complex source-level optimisations that a vendor compiler does not attempt, before passing the optimised source to the vendor compiler for lower-level optimisation. Three key metadata-supported optimisations are presented. The first is an adaptation of space and schedule optimisation – based upon well-known compositions of the loop fusion and array contraction transformations – to the dynamic working sets and schedules of a runtimeparameterised visual effect. This adaptation sidesteps the costly solution of runtime code generation by specialising static parameters in an offline process and exploiting dynamic metadata to adapt the schedule and contracted working sets at runtime to user-tunable parameters. The second optimisation comprises a set of transformations to generate SIMD ISA-augmented source code. Our approach differs from autovectorisation by using static metadata to identify parallelism, in place of data dependence analysis, and runtime metadata to tune the data layout to user-tunable parameters for optimal aligned memory access. The third optimisation comprises a related set of transformations to generate code for SIMT architectures, such as GPUs. Static dependence metadata is exploited to guide large-scale parallelisation for tens of thousands of in-flight threads. Optimal use of the alignment-sensitive, explicitly managed memory hierarchy is achieved by identifying inter-thread and intra-core data sharing opportunities in memory access metadata. A detailed performance analysis of these optimisations is presented for two industrially developed visual effects. In our evaluation we demonstrate up to 8.1x speed-ups on Intel and AMD multicore CPUs and up to 6.6x speed-ups on NVIDIA GPUs over our best hand-written implementations of these two effects. Programmability is enhanced by automating the generation of SIMD and SIMT implementations from a single programmer-managed scalar representation

    A Statistical Perspective of the Empirical Mode Decomposition

    Get PDF
    This research focuses on non-stationary basis decompositions methods in time-frequency analysis. Classical methodologies in this field such as Fourier Analysis and Wavelet Transforms rely on strong assumptions of the underlying moment generating process, which, may not be valid in real data scenarios or modern applications of machine learning. The literature on non-stationary methods is still in its infancy, and the research contained in this thesis aims to address challenges arising in this area. Among several alternatives, this work is based on the method known as the Empirical Mode Decomposition (EMD). The EMD is a non-parametric time-series decomposition technique that produces a set of time-series functions denoted as Intrinsic Mode Functions (IMFs), which carry specific statistical properties. The main focus is providing a general and flexible family of basis extraction methods with minimal requirements compared to those within the Fourier or Wavelet techniques. This is highly important for two main reasons: first, more universal applications can be taken into account; secondly, the EMD has very little a priori knowledge of the process required to apply it, and as such, it can have greater generalisation properties in statistical applications across a wide array of applications and data types. The contributions of this work deal with several aspects of the decomposition. The first set regards the construction of an IMF from several perspectives: (1) achieving a semi-parametric representation of each basis; (2) extracting such semi-parametric functional forms in a computationally efficient and statistically robust framework. The EMD belongs to the class of path-based decompositions and, therefore, they are often not treated as a stochastic representation. (3) A major contribution involves the embedding of the deterministic pathwise decomposition framework into a formal stochastic process setting. One of the assumptions proper of the EMD construction is the requirement for a continuous function to apply the decomposition. In general, this may not be the case within many applications. (4) Various multi-kernel Gaussian Process formulations of the EMD will be proposed through the introduced stochastic embedding. Particularly, two different models will be proposed: one modelling the temporal mode of oscillations of the EMD and the other one capturing instantaneous frequencies location in specific frequency regions or bandwidths. (5) The construction of the second stochastic embedding will be achieved with an optimisation method called the cross-entropy method. Two formulations will be provided and explored in this regard. Application on speech time-series are explored to study such methodological extensions given that they are non-stationary

    Visual Perception For Robotic Spatial Understanding

    Get PDF
    Humans understand the world through vision without much effort. We perceive the structure, objects, and people in the environment and pay little direct attention to most of it, until it becomes useful. Intelligent systems, especially mobile robots, have no such biologically engineered vision mechanism to take for granted. In contrast, we must devise algorithmic methods of taking raw sensor data and converting it to something useful very quickly. Vision is such a necessary part of building a robot or any intelligent system that is meant to interact with the world that it is somewhat surprising we don\u27t have off-the-shelf libraries for this capability. Why is this? The simple answer is that the problem is extremely difficult. There has been progress, but the current state of the art is impressive and depressing at the same time. We now have neural networks that can recognize many objects in 2D images, in some cases performing better than a human. Some algorithms can also provide bounding boxes or pixel-level masks to localize the object. We have visual odometry and mapping algorithms that can build reasonably detailed maps over long distances with the right hardware and conditions. On the other hand, we have robots with many sensors and no efficient way to compute their relative extrinsic poses for integrating the data in a single frame. The same networks that produce good object segmentations and labels in a controlled benchmark still miss obvious objects in the real world and have no mechanism for learning on the fly while the robot is exploring. Finally, while we can detect pose for very specific objects, we don\u27t yet have a mechanism that detects pose that generalizes well over categories or that can describe new objects efficiently. We contribute algorithms in four of the areas mentioned above. First, we describe a practical and effective system for calibrating many sensors on a robot with up to 3 different modalities. Second, we present our approach to visual odometry and mapping that exploits the unique capabilities of RGB-D sensors to efficiently build detailed representations of an environment. Third, we describe a 3-D over-segmentation technique that utilizes the models and ego-motion output in the previous step to generate temporally consistent segmentations with camera motion. Finally, we develop a synthesized dataset of chair objects with part labels and investigate the influence of parts on RGB-D based object pose recognition using a novel network architecture we call PartNet

    Bayesian Optimisation for Planning under Uncertainty

    Get PDF
    Under an increasing demand for data to understand critical processes in our world, robots have become powerful tools to automatically gather data and interact with their environments. In this context, this thesis addresses planning problems where limited prior information leads to uncertainty about the outcomes of a robot's decisions. The methods are based on Bayesian optimisation (BO), which provides a framework to solve planning problems under uncertainty by means of probabilistic modelling. As a first contribution, the thesis provides a method to find energy-efficient paths over unknown terrains. The method applies a Gaussian process (GP) model to learn online how a robot's power consumption varies as a function of its configuration while moving over the terrain. BO is applied to optimise trajectories over the GP model being learnt so that they are informative and energetically efficient. The method was tested in experiments on simulated and physical environments. A second contribution addresses the problem of policy search in high-dimensional parameter spaces. To deal with high dimensionality the method combines BO with a coordinate-descent scheme that greatly improves BO's performance when compared to conventional approaches. The method was applied to optimise a control policy for a race car in a simulated environment and shown to outperform other optimisation approaches. Finally, the thesis provides two methods to address planning problems involving uncertainty in the inputs space. The first method is applied to actively learn terrain roughness models via proprioceptive sensing with a mobile robot under localisation uncertainty. Experiments demonstrate the method's performance in both simulations and a physical environment. The second method is derived for more general optimisation problems. In particular, this method is provided with theoretical guarantees and empirical performance comparisons against other approaches in simulated environments

    Deep Learning for Learning Representation and Its Application to Natural Language Processing

    Get PDF
    As the web evolves even faster than expected, the exponential growth of data becomes overwhelming. Textual data is being generated at an ever-increasing pace via emails, documents on the web, tweets, online user reviews, blogs, and so on. As the amount of unstructured text data grows, so does the need for intelligently processing and understanding it. The focus of this dissertation is on developing learning models that automatically induce representations of human language to solve higher level language tasks. In contrast to most conventional learning techniques, which employ certain shallow-structured learning architectures, deep learning is a newly developed machine learning technique which uses supervised and/or unsupervised strategies to automatically learn hierarchical representations in deep architectures and has been employed in varied tasks such as classification or regression. Deep learning was inspired by biological observations on human brain mechanisms for processing natural signals and has attracted the tremendous attention of both academia and industry in recent years due to its state-of-the-art performance in many research domains such as computer vision, speech recognition, and natural language processing. This dissertation focuses on how to represent the unstructured text data and how to model it with deep learning models in different natural language processing viii applications such as sequence tagging, sentiment analysis, semantic similarity and etc. Specifically, my dissertation addresses the following research topics: In Chapter 3, we examine one of the fundamental problems in NLP, text classification, by leveraging contextual information [MLX18a]; In Chapter 4, we propose a unified framework for generating an informative map from review corpus [MLX18b]; Chapter 5 discusses the tagging address queries in map search [Mok18]. This research was performed in collaboration with Microsoft; and In Chapter 6, we discuss an ongoing research work in the neural language sentence matching problem. We are working on extending this work to a recommendation system

    Modelação comportamental e pré-distorção digital de transmissores de rádio-frequência

    Get PDF
    Doutoramento em Engenharia ElectrotécnicaNos atuais sistemas de telecomunicações, os transmissores de rádio-frequência são desenvolvidos tendo maioritariamente em conta a eficiência da conversão da potência fornecida da fonte em potência de rádio-frequência. Este tipo de desenho resulta em amplificadores de potência com características de transmissão não-lineares, que distorcem severamente o envelope de informação no processo de amplificação, gerando distorção fora da banda. Para corrigir este problema utiliza-se um processo de compensação não linear, sendo que a pré-distorção digital se tem favorecido pela sua flexibilidade e precisão. Este método é tipicamente aplicado de uma forma cega, por força bruta até se obter a compensação desejada. No entanto, quando o método se mostra ineficaz, como se verificou em amplificadores de potência baseados em transístores de nitreto de gálio, é difícil saber o que modificar nos sistemas para os tornar de novo úteis. De forma a compreender e desenhar sistemas de pré-distorção digital robustos é necessário, por um lado, perceber o comportamento dos amplificadores de rádio-frequência, por outro, perceber as limitações e relações entre os modelos digitais e o comportamento real do amplificador. Nesse sentido, esta tese explora e descreve estas relações de forma a suportar a escolha de modelos de pré-distorção, desenvolve novos modelos baseados no comportamento dos transístores, e propõe métodos de caracterização para os amplificadores de RF.In current telecommunication systems, the main concern when developing the radio frequency transmitter is power efficiency. This type of design generally leads to a highly nonlinear transmission characteristic, mainly due to the radio frequency power amplifier. This nonlinear transmission severely distorts the information envelope, leading to spectral regrowth, out-of-band distortion. To correct this problem a nonlinear compensation process is employed. For this application, digital predistortion is generally favored for its flexibility and accuracy. Digital predistortion is mostly applied in a blind manner, using brute force until the desired compensation is achieved. Because of this, when the method fails, as it has in gallium nitride based power amplifiers, it is difficult to modify the system to achieve the desired results. To understand and design robust predistortion systems, it is both necessary to have knowledge of the power amplifiers’ behavior, on one hand, and understand the limitations and relations between the digital models and these behaviors, on the other. To do this, this thesis explores and describes these relationships, granting support to the digital predistortion model choice, it further develops new predistortion models based on the physics of the transistors’ behaviors, and it proposes methods for the characterization of radio frequency power amplifiers

    Human Action Recognition from Various Data Modalities:A Review

    Get PDF
    Human Action Recognition (HAR), aiming to understand human behaviors and then assign category labels, has a wide range of applications, and thus has been attracting increasing attention in the field of computer vision. Generally, human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared sequence, point cloud, event stream, audio, acceleration, radar, and WiFi, etc., which encode different sources of useful yet distinct information and have various advantages and application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this paper, we give a comprehensive survey for HAR from the perspective of the input data modalities. Specifically, we review both the hand-crafted feature-based and deep learning-based methods for single data modalities, and also review the methods based on multiple modalities, including the fusion-based frameworks and the co-learning-based approaches. The current benchmark datasets for HAR are also introduced. Finally, we discuss some potentially important research directions in this area
    • …
    corecore