34 research outputs found

    A Variational Approach to Joint Denoising, Edge Detection and Motion Estimation

    Get PDF

    A Variational Approach to Joint Denoising, Edge Detection and Motion Estimation

    Get PDF

    IMPROVING EFFICIENCY AND SCALABILITY IN VISUAL SURVEILLANCE APPLICATIONS

    Get PDF
    We present four contributions to visual surveillance: (a) an action recognition method based on the characteristics of human motion in image space; (b) a study of the strengths of five regression techniques for monocular pose estimation that highlights the advantages of kernel PLS; (c) a learning-based method for detecting objects carried by humans requiring minimal annotation; (d) an interactive video segmentation system that reduces supervision by using occlusion and long term spatio-temporal structure information. We propose a representation for human actions that is based solely on motion information and that leverages the characteristics of human movement in the image space. The representation is best suited to visual surveillance settings in which the actions of interest are highly constrained, but also works on more general problems if the actions are ballistic in nature. Our computationally efficient representation achieves good recognition performance on both a commonly used action recognition dataset and on a dataset we collected to simulate a checkout counter. We study discriminative methods for 3D human pose estimation from single images, which build a map from image features to pose. The main difficulty with these methods is the insufficiency of training data due to the high dimensionality of the pose space. However, real datasets can be augmented with data from character animation software, so the scalability of existing approaches becomes important. We argue that Kernel Partial Least Squares approximates Gaussian Process regression robustly, enabling the use of larger datasets, and we show in experiments that kPLS outperforms two state-of-the-art methods based on GP. The high variability in the appearance of carried objects suggests using their relation to the human silhouette to detect them. We adopt a generate-and-test approach that produces candidate regions from protrusion, color contrast and occlusion boundary cues and then filters them with a kernel SVM classifier on context features. Our method exceeds state of the art accuracy and has good generalization capability. We also propose a Multiple Instance Learning framework for the classifier that reduces annotation effort by two orders of magnitude while maintaining comparable accuracy. Finally, we present an interactive video segmentation system that trades off a small amount of segmentation quality for significantly less supervision than necessary in systems in the literature. While applications like video editing could not directly use the output of our system, reasoning about the trajectories of objects in a scene or learning coarse appearance models is still possible. The unsupervised segmentation component at the base of our system effectively employs occlusion boundary cues and achieves competitive results on an unsupervised segmentation dataset. On videos used to evaluate interactive methods, our system requires less interaction time than others, does not rely on appearance information and can extract multiple objects at the same time

    Natural image processing and synthesis using deep learning

    Full text link
    Nous Ă©tudions dans cette thĂšse comment les rĂ©seaux de neurones profonds peuvent ĂȘtre utilisĂ©s dans diffĂ©rents domaines de la vision artificielle. La vision artificielle est un domaine interdisciplinaire qui traite de la comprĂ©hension d’images et de vidĂ©os numĂ©riques. Les problĂšmes de ce domaine ont traditionnellement Ă©tĂ© adressĂ©s avec des mĂ©thodes ad-hoc nĂ©cessitant beaucoup de rĂ©glages manuels. En effet, ces systĂšmes de vision artificiels comprenaient jusqu’à rĂ©cemment une sĂ©rie de modules optimisĂ©s indĂ©pendamment. Cette approche est trĂšs raisonnable dans la mesure oĂč, avec peu de donnĂ©es, elle bĂ©nĂ©ficient autant que possible des connaissances du chercheur. Mais cette avantage peut se rĂ©vĂ©ler ĂȘtre une limitation si certaines donnĂ©es d’entrĂ© n’ont pas Ă©tĂ© considĂ©rĂ©es dans la conception de l’algorithme. Avec des volumes et une diversitĂ© de donnĂ©es toujours plus grands, ainsi que des capacitĂ©s de calcul plus rapides et Ă©conomiques, les rĂ©seaux de neurones profonds optimisĂ©s d’un bout Ă  l’autre sont devenus une alternative attrayante. Nous dĂ©montrons leur avantage avec une sĂ©rie d’articles de recherche, chacun d’entre eux trouvant une solution Ă  base de rĂ©seaux de neurones profonds Ă  un problĂšme d’analyse ou de synthĂšse visuelle particulier. Dans le premier article, nous considĂ©rons un problĂšme de vision classique: la dĂ©tection de bords et de contours. Nous partons de l’approche classique et la rendons plus ‘neurale’ en combinant deux Ă©tapes, la dĂ©tection et la description de motifs visuels, en un seul rĂ©seau convolutionnel. Cette mĂ©thode, qui peut ainsi s’adapter Ă  de nouveaux ensembles de donnĂ©es, s’avĂšre ĂȘtre au moins aussi prĂ©cis que les mĂ©thodes conventionnelles quand il s’agit de domaines qui leur sont favorables, tout en Ă©tant beaucoup plus robuste dans des domaines plus gĂ©nĂ©rales. Dans le deuxiĂšme article, nous construisons une nouvelle architecture pour la manipulation d’images qui utilise l’idĂ©e que la majoritĂ© des pixels produits peuvent d’ĂȘtre copiĂ©s de l’image d’entrĂ©e. Cette technique bĂ©nĂ©ficie de plusieurs avantages majeurs par rapport Ă  l’approche conventionnelle en apprentissage profond. En effet, elle conserve les dĂ©tails de l’image d’origine, n’introduit pas d’aberrations grĂące Ă  la capacitĂ© limitĂ©e du rĂ©seau sous-jacent et simplifie l’apprentissage. Nous dĂ©montrons l’efficacitĂ© de cette architecture dans le cadre d’une tĂąche de correction du regard, oĂč notre systĂšme produit d’excellents rĂ©sultats. Dans le troisiĂšme article, nous nous Ă©clipsons de la vision artificielle pour Ă©tudier le problĂšme plus gĂ©nĂ©rale de l’adaptation Ă  de nouveaux domaines. Nous dĂ©veloppons un nouvel algorithme d’apprentissage, qui assure l’adaptation avec un objectif auxiliaire Ă  la tĂąche principale. Nous cherchons ainsi Ă  extraire des motifs qui permettent d’accomplir la tĂąche mais qui ne permettent pas Ă  un rĂ©seau dĂ©diĂ© de reconnaĂźtre le domaine. Ce rĂ©seau est optimisĂ© de maniĂšre simultanĂ© avec les motifs en question, et a pour tĂąche de reconnaĂźtre le domaine de provenance des motifs. Cette technique est simple Ă  implĂ©menter, et conduit pourtant Ă  l’état de l’art sur toutes les tĂąches de rĂ©fĂ©rence. Enfin, le quatriĂšme article prĂ©sente un nouveau type de modĂšle gĂ©nĂ©ratif d’images. À l’opposĂ© des approches conventionnels Ă  base de rĂ©seaux de neurones convolutionnels, notre systĂšme baptisĂ© SPIRAL dĂ©crit les images en termes de programmes bas-niveau qui sont exĂ©cutĂ©s par un logiciel de graphisme ordinaire. Entre autres, ceci permet Ă  l’algorithme de ne pas s’attarder sur les dĂ©tails de l’image, et de se concentrer plutĂŽt sur sa structure globale. L’espace latent de notre modĂšle est, par construction, interprĂ©table et permet de manipuler des images de façon prĂ©visible. Nous montrons la capacitĂ© et l’agilitĂ© de cette approche sur plusieurs bases de donnĂ©es de rĂ©fĂ©rence.In the present thesis, we study how deep neural networks can be applied to various tasks in computer vision. Computer vision is an interdisciplinary field that deals with understanding of digital images and video. Traditionally, the problems arising in this domain were tackled using heavily hand-engineered adhoc methods. A typical computer vision system up until recently consisted of a sequence of independent modules which barely talked to each other. Such an approach is quite reasonable in the case of limited data as it takes major advantage of the researcher's domain expertise. This strength turns into a weakness if some of the input scenarios are overlooked in the algorithm design process. With the rapidly increasing volumes and varieties of data and the advent of cheaper and faster computational resources end-to-end deep neural networks have become an appealing alternative to the traditional computer vision pipelines. We demonstrate this in a series of research articles, each of which considers a particular task of either image analysis or synthesis and presenting a solution based on a ``deep'' backbone. In the first article, we deal with a classic low-level vision problem of edge detection. Inspired by a top-performing non-neural approach, we take a step towards building an end-to-end system by combining feature extraction and description in a single convolutional network. The resulting fully data-driven method matches or surpasses the detection quality of the existing conventional approaches in the settings for which they were designed while being significantly more usable in the out-of-domain situations. In our second article, we introduce a custom architecture for image manipulation based on the idea that most of the pixels in the output image can be directly copied from the input. This technique bears several significant advantages over the naive black-box neural approach. It retains the level of detail of the original images, does not introduce artifacts due to insufficient capacity of the underlying neural network and simplifies training process, to name a few. We demonstrate the efficiency of the proposed architecture on the challenging gaze correction task where our system achieves excellent results. In the third article, we slightly diverge from pure computer vision and study a more general problem of domain adaption. There, we introduce a novel training-time algorithm (\ie, adaptation is attained by using an auxilliary objective in addition to the main one). We seek to extract features that maximally confuse a dedicated network called domain classifier while being useful for the task at hand. The domain classifier is learned simultaneosly with the features and attempts to tell whether those features are coming from the source or the target domain. The proposed technique is easy to implement, yet results in superior performance in all the standard benchmarks. Finally, the fourth article presents a new kind of generative model for image data. Unlike conventional neural network based approaches our system dubbed SPIRAL describes images in terms of concise low-level programs executed by off-the-shelf rendering software used by humans to create visual content. Among other things, this allows SPIRAL not to waste its capacity on minutae of datasets and focus more on the global structure. The latent space of our model is easily interpretable by design and provides means for predictable image manipulation. We test our approach on several popular datasets and demonstrate its power and flexibility

    Unsteady flow around bluïŹ€ bodies spanning thin rectangular ducts

    Get PDF
    This project consists of a two-pronged computational and experimental approach to the study of ïŹ‚ow in closed, thin rectangular ducts with a partial cubic blockage. Results are presented at three diïŹ€erent bulk Reynolds numbers, ReD = 5600, 10400 and 15600, based on the channel height, which is also the blockage dimension. The new experimental data produced consists of ïŹ‚uctuating pressure measurements at the cube surface, with 2D-2C PIV snapshots captured simultaneously in the wake region. In addition to this, DNS data is produced at the lowest Reynolds number of ReD = 5600, allowing more detailed comparisons where PIV laser access was not possible. Comparisons are drawn between the data and URANS CFD simulations. A literature review and preliminary testing process narrowed down the considered URANS models to the two-layer k−Δ model and the Elliptic Blending Reynolds Stress Model, or EBRSM. In the light of the new data, these two URANS models are compared in order to better understand their strengths and weaknesses. Particular regard is given to the prediction of large-scale unsteady behaviour, with a focus on vortex shedding. This unsteady phenomenon was found to be present and to have a signiïŹcant eïŹ€ect on the ïŹ‚ow in the near-cube and wake regions. Results show that certain aspects of this behaviour are captured with only limited accuracy by the URANS models tested. As a result, inaccuracies are also found in the mean simulated velocity ïŹelds. The shortcomings appear more pronounced at higher ïŹ‚ow rates. At a given ïŹ‚ow rate, they are more severe in regions of the ïŹ‚ow where organised unsteadiness is large relative to the mean values. It is suggested that inaccuracies in mean URANS predictions are a result of limitations in model capability for unsteady ïŹ‚ows, and that validation cases may be pertinent to address this.Open Acces

    The Role of Vision Algorithms for Micro Aerial Vehicles

    Get PDF
    This work investigates the research topics related to visual aerial navigation in loosely structured and cluttered environments. During the inspection of the desired infrastructure the robot is required to fly in an environment which is uncertain and only partially structured because, usually, no reliable layouts and drawings of the surroundings are available. To support these features, advanced cognitive capabilities are required, and in particular the role played by vision is of paramount importance. The use of vision and other onboard sensors such as IMU and GPS play a fundamental to provide high level degree of autonomy to flying vehicles. In detail, the outline of this thesis is organized as follows ‱ Chapter 1 is a general introduction of the aerial robotic field, the quadrotor platform, the use of onboard sensors like cameras and IMU for autonomous navigation. A discussion about camera modeling, current state of art on vision based control, navigation, environment reconstruction and sensor fusion is presented. ‱ Chapter 2 presents vision based control algorithms useful for reactive control like collision avoidance, perching and grasping tasks. Two main contributions are presented based on relative depth map and image based visual servoing respectively. ‱ Chapter 3 discusses the use of vision algorithms for localization and mapping. Compared to the previous chapter, the vision algorithm is more complex involving vehicle’s poses estimation and environment reconstruction. An algorithm based on RGB-D sensors for localization, extendable to localization of multiple vehicles, is presented. Moreover, an environment representation for planning purposes, applied to industrial environments, is introduced. ‱ Chapter 4 introduces the possibility to combine vision measurements and IMU to estimate the motion of the vehicle. A new contribution based on Pareto Optimization, which overcome classical Kalman filtering techniques, is presented. ‱ Chapter 5 contains conclusion, remarks and proposals for possible developments

    Optical-Flow Based Detection of Moving Objects in Traffic Scenes

    Get PDF
    Traffic is increasing continuously. Nevertheless the number of traffic fatalities decreased in the past. One reason for this are the passive safety systems, such as side crash protection or airbag, which have been engineered the last decades and which are standard in today's cars. Active safety systems are increasingly developed. They are able to avoid or at least to mitigate accidents. For example, the adaptive cruise control (ACC) original designed as a comfort system is developed towards an emergency brake system. Active safety requires sensors perceiving the vehicle environment. ACC uses radar or laser scanner. However, cameras are also interesting sensors as they are capable of processing visual information such as traffic signs or lane markings. In traffic moving objects (cars, bicyclists, pedestrians) play an important role. To perceive them is essential for active safety systems. This thesis deals with the detection of moving objects utilizing a monocular camera. The detection is based on the motions within the video stream (optical flow). If the ego-motion and the location of the camera with respect to the road plane are known the viewed scene can be 3D reconstructed exploiting the measured optical flow. In this thesis an overview of existing algorithms estimating the ego-motion is given. Based on it a suitable algorithm is selected and extended by a motion model. The latter one considerably increases the accuracy as well as the robustness of the estimate. The location of the camera with respect to the road plane is estimated using the optical flow on the road. The road might be temporary low-textured making it hard to measure the optical flow. Consequently, the road homography estimate will be poor. A novel Kalman filtering approach combining the estimate of the ego-motion and the estimate of the road homography leads to far better results. The 3D reconstruction of the viewed scene is performed pointwise for each measured optical flow vector. A point is reconstructed through intersection of the viewing rays which are determined by the optical flow vector. This only yields a correct result for static, i.e. non-moving, points. Further, static points fulfill four constraints: epipolar constraint, trifocal constraint, positive depth constraint, and positive height constraint. If at least one constraint is violated the point is moving. For the first time an error metric is developed exploiting all four constraints. It measures the deviation from the constraints quantitatively in a unified manner. Based on this error metric the detection limits are investigated. It is shown that overtaking objects are detected very well whereas objects being overtaken are detected hardly. Oncoming objects on a straight road are not detected by means of the available constraints. Only if one assumes that these objects are opaque and touch the ground the detection becomes feasible. An appropriate heuristic is introduced. In conclusion, the developed algorithms are a system to detect moving points robustly. The problem of clustering the detected moving points to objects is outlined. It serves as a starting point for further research activities

    Biologically Inspired Visual Control of Flying Robots

    Get PDF
    Insects posses an incredible ability to navigate their environment at high speed, despite having small brains and limited visual acuity. Through selective pressure they have evolved computationally efficient means for simultaneously performing navigation tasks and instantaneous control responses. The insect’s main source of information is visual, and through a hierarchy of processes this information is used for perception; at the lowest level are local neurons for detecting image motion and edges, at the higher level are interneurons to spatially integrate the output of previous stages. These higher level processes could be considered as models of the insect's environment, reducing the amount of information to only that which evolution has determined relevant. The scope of this thesis is experimenting with biologically inspired visual control of flying robots through information processing, models of the environment, and flight behaviour. In order to test these ideas I developed a custom quadrotor robot and experimental platform; the 'wasp' system. All algorithms ran on the robot, in real-time or better, and hypotheses were always verified with flight experiments. I developed a new optical flow algorithm that is computationally efficient, and able to be applied in a regular pattern to the image. This technique is used later in my work when considering patterns in the image motion field. Using optical flow in the log-polar coordinate system I developed attitude estimation and time-to-contact algorithms. I find that the log-polar domain is useful for analysing global image motion; and in many ways equivalent to the retinotopic arrange- ment of neurons in the optic lobe of insects, used for the same task. I investigated the role of depth in insect flight using two experiments. In the first experiment, to study how concurrent visual control processes might be combined, I developed a control system using the combined output of two algorithms. The first algorithm was a wide-field optical flow balance strategy and the second an obstacle avoidance strategy which used inertial information to estimate the depth to objects in the environment - objects whose depth was significantly different to their surround- ings. In the second experiment I created an altitude control system which used a model of the environment in the Hough space, and a biologically inspired sampling strategy, to efficiently detect the ground. Both control systems were used to control the flight of a quadrotor in an indoor environment. The methods that insects use to perceive edges and control their flight in response had not been applied to artificial systems before. I developed a quadrotor control system that used the distribution of edges in the environment to regulate the robot height and avoid obstacles. I also developed a model that predicted the distribution of edges in a static scene, and using this prediction was able to estimate the quadrotor altitude

    Sensitivity and background estimates towards Phase-I of the COMET muon-to-electron conversion search

    Get PDF
    COMET is a future high-precision experiment searching for charged lepton flavour violation through the muon-to-electron conversion process. It aims to push the intensity frontier of particle physics by coupling an intense muon beam with cutting-edge detector technology. The first stage of the experiment, COMET Phase-I, is currently being assembled and will soon enter its data acquisition period. It plans to achieve a single event sensitivity to ÎŒ-e conversion in aluminium of 3.1x10⁻Âč⁔. This thesis presents a study of the sensitivity and backgrounds of COMET Phase-I using the latest Monte Carlo simulation data produced. The background contribution from cosmic ray-induced atmospheric muons is estimated using a backward Monte Carlo approach, which allows computational resources to be focused on the most critical signal-mimicking events. Analysis of a ÎŒ-e conversion simulation sample suggests that COMET Phase-I will reach a single event sensitivity of 3.6x10⁻Âč⁔ within 146 days of data acquisition. Our results suggest that, in that period, on the order of 10Âł atmospheric muons will enter the detector system and produce an event similar enough to the conversion signal to pass all the signal selection criteria. Most of these events will be rejected by the Cosmic Ray Veto system, however, we expect at least 2.2 background events to sneak in unnoticed. It is vital for the conversion search that these events be discriminated from conversion electrons, for instance by using Cherenkov threshold counters to distinguish between muons and electrons or, alternatively, by developing a direction identification algorithm to reject some fraction of the ÎŒâș-induced events.Open Acces

    On discretisation drift and smoothness regularisation in neural network training

    Get PDF
    The deep learning recipe of casting real-world problems as mathematical optimisation and tackling the optimisation by training deep neural networks using gradient-based optimisation has undoubtedly proven to be a fruitful one. The understanding behind why deep learning works, however, has lagged behind its practical significance. We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation. We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms. Understanding the dynamics of GD has been hindered by the presence of discretisation drift, the numerical integration error between GD and its often studied continuous-time counterpart, the negative gradient flow (NGF). To add to the toolkit available to study GD, we derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games. We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regularisers that do not require additional hyperparameters. Like optimisation, smoothness regularisation is another pillar of deep learning's success with wide use in supervised learning and generative modelling. Despite their individual significance, the interactions between smoothness regularisation and optimisation have yet to be explored. We find that smoothness regularisation affects optimisation across multiple deep learning domains, and that incorporating smoothness regularisation in reinforcement learning leads to a performance boost that can be recovered using adaptions to optimisation methods
    corecore