Search CORE

34 research outputs found

A Variational Approach to Joint Denoising, Edge Detection and Motion Estimation

Author: Droske Marc
Garbe Christoph
Preusser Tobias
Rumpf Martin
Telea Alexandru
Publication venue: University of Groningen, Johann Bernoulli Institute for Mathematics and Computer Science
Publication date: 01/01/2006
Field of study

ARTS repository - University of Groningen

A Variational Approach to Joint Denoising, Edge Detection and Motion Estimation

Author: Droske Marc
Garbe Christoph
Preusser Tobias
Rumpf Martin
Telea Alexandru
Publication venue: University of Groningen, Johann Bernoulli Institute for Mathematics and Computer Science
Publication date: 01/01/2006
Field of study

Proceedings - University of Groningen

IMPROVING EFFICIENCY AND SCALABILITY IN VISUAL SURVEILLANCE APPLICATIONS

Author: Dondera Radu
Publication venue
Publication date: 01/01/2013
Field of study

We present four contributions to visual surveillance: (a) an action recognition method based on the characteristics of human motion in image space; (b) a study of the strengths of five regression techniques for monocular pose estimation that highlights the advantages of kernel PLS; (c) a learning-based method for detecting objects carried by humans requiring minimal annotation; (d) an interactive video segmentation system that reduces supervision by using occlusion and long term spatio-temporal structure information. We propose a representation for human actions that is based solely on motion information and that leverages the characteristics of human movement in the image space. The representation is best suited to visual surveillance settings in which the actions of interest are highly constrained, but also works on more general problems if the actions are ballistic in nature. Our computationally efficient representation achieves good recognition performance on both a commonly used action recognition dataset and on a dataset we collected to simulate a checkout counter. We study discriminative methods for 3D human pose estimation from single images, which build a map from image features to pose. The main difficulty with these methods is the insufficiency of training data due to the high dimensionality of the pose space. However, real datasets can be augmented with data from character animation software, so the scalability of existing approaches becomes important. We argue that Kernel Partial Least Squares approximates Gaussian Process regression robustly, enabling the use of larger datasets, and we show in experiments that kPLS outperforms two state-of-the-art methods based on GP. The high variability in the appearance of carried objects suggests using their relation to the human silhouette to detect them. We adopt a generate-and-test approach that produces candidate regions from protrusion, color contrast and occlusion boundary cues and then filters them with a kernel SVM classifier on context features. Our method exceeds state of the art accuracy and has good generalization capability. We also propose a Multiple Instance Learning framework for the classifier that reduces annotation effort by two orders of magnitude while maintaining comparable accuracy. Finally, we present an interactive video segmentation system that trades off a small amount of segmentation quality for significantly less supervision than necessary in systems in the literature. While applications like video editing could not directly use the output of our system, reasoning about the trajectories of objects in a scene or learning coarse appearance models is still possible. The unsupervised segmentation component at the base of our system effectively employs occlusion boundary cues and achieves competitive results on an unsupervised segmentation dataset. On videos used to evaluate interactive methods, our system requires less interaction time than others, does not rely on appearance information and can extract multiple objects at the same time

Digital Repository at the University of Maryland

Natural image processing and synthesis using deep learning

Author: Ganin Iaroslav
Publication venue
Publication date: 01/09/2019
Field of study

Nous étudions dans cette thèse comment les réseaux de neurones profonds peuvent être utilisés dans différents domaines de la vision artificielle. La vision artificielle est un domaine interdisciplinaire qui traite de la compréhension d’images et de vidéos numériques. Les problèmes de ce domaine ont traditionnellement été adressés avec des méthodes ad-hoc nécessitant beaucoup de réglages manuels. En effet, ces systèmes de vision artificiels comprenaient jusqu’à récemment une série de modules optimisés indépendamment. Cette approche est très raisonnable dans la mesure où, avec peu de données, elle bénéficient autant que possible des connaissances du chercheur. Mais cette avantage peut se révéler être une limitation si certaines données d’entré n’ont pas été considérées dans la conception de l’algorithme. Avec des volumes et une diversité de données toujours plus grands, ainsi que des capacités de calcul plus rapides et économiques, les réseaux de neurones profonds optimisés d’un bout à l’autre sont devenus une alternative attrayante. Nous démontrons leur avantage avec une série d’articles de recherche, chacun d’entre eux trouvant une solution à base de réseaux de neurones profonds à un problème d’analyse ou de synthèse visuelle particulier. Dans le premier article, nous considérons un problème de vision classique: la détection de bords et de contours. Nous partons de l’approche classique et la rendons plus ‘neurale’ en combinant deux étapes, la détection et la description de motifs visuels, en un seul réseau convolutionnel. Cette méthode, qui peut ainsi s’adapter à de nouveaux ensembles de données, s’avère être au moins aussi précis que les méthodes conventionnelles quand il s’agit de domaines qui leur sont favorables, tout en étant beaucoup plus robuste dans des domaines plus générales. Dans le deuxième article, nous construisons une nouvelle architecture pour la manipulation d’images qui utilise l’idée que la majorité des pixels produits peuvent d’être copiés de l’image d’entrée. Cette technique bénéficie de plusieurs avantages majeurs par rapport à l’approche conventionnelle en apprentissage profond. En effet, elle conserve les détails de l’image d’origine, n’introduit pas d’aberrations grâce à la capacité limitée du réseau sous-jacent et simplifie l’apprentissage. Nous démontrons l’efficacité de cette architecture dans le cadre d’une tâche de correction du regard, où notre système produit d’excellents résultats. Dans le troisième article, nous nous éclipsons de la vision artificielle pour étudier le problème plus générale de l’adaptation à de nouveaux domaines. Nous développons un nouvel algorithme d’apprentissage, qui assure l’adaptation avec un objectif auxiliaire à la tâche principale. Nous cherchons ainsi à extraire des motifs qui permettent d’accomplir la tâche mais qui ne permettent pas à un réseau dédié de reconnaître le domaine. Ce réseau est optimisé de manière simultané avec les motifs en question, et a pour tâche de reconnaître le domaine de provenance des motifs. Cette technique est simple à implémenter, et conduit pourtant à l’état de l’art sur toutes les tâches de référence. Enfin, le quatrième article présente un nouveau type de modèle génératif d’images. À l’opposé des approches conventionnels à base de réseaux de neurones convolutionnels, notre système baptisé SPIRAL décrit les images en termes de programmes bas-niveau qui sont exécutés par un logiciel de graphisme ordinaire. Entre autres, ceci permet à l’algorithme de ne pas s’attarder sur les détails de l’image, et de se concentrer plutôt sur sa structure globale. L’espace latent de notre modèle est, par construction, interprétable et permet de manipuler des images de façon prévisible. Nous montrons la capacité et l’agilité de cette approche sur plusieurs bases de données de référence.In the present thesis, we study how deep neural networks can be applied to various tasks in computer vision. Computer vision is an interdisciplinary field that deals with understanding of digital images and video. Traditionally, the problems arising in this domain were tackled using heavily hand-engineered adhoc methods. A typical computer vision system up until recently consisted of a sequence of independent modules which barely talked to each other. Such an approach is quite reasonable in the case of limited data as it takes major advantage of the researcher's domain expertise. This strength turns into a weakness if some of the input scenarios are overlooked in the algorithm design process. With the rapidly increasing volumes and varieties of data and the advent of cheaper and faster computational resources end-to-end deep neural networks have become an appealing alternative to the traditional computer vision pipelines. We demonstrate this in a series of research articles, each of which considers a particular task of either image analysis or synthesis and presenting a solution based on a ``deep'' backbone. In the first article, we deal with a classic low-level vision problem of edge detection. Inspired by a top-performing non-neural approach, we take a step towards building an end-to-end system by combining feature extraction and description in a single convolutional network. The resulting fully data-driven method matches or surpasses the detection quality of the existing conventional approaches in the settings for which they were designed while being significantly more usable in the out-of-domain situations. In our second article, we introduce a custom architecture for image manipulation based on the idea that most of the pixels in the output image can be directly copied from the input. This technique bears several significant advantages over the naive black-box neural approach. It retains the level of detail of the original images, does not introduce artifacts due to insufficient capacity of the underlying neural network and simplifies training process, to name a few. We demonstrate the efficiency of the proposed architecture on the challenging gaze correction task where our system achieves excellent results. In the third article, we slightly diverge from pure computer vision and study a more general problem of domain adaption. There, we introduce a novel training-time algorithm (\ie, adaptation is attained by using an auxilliary objective in addition to the main one). We seek to extract features that maximally confuse a dedicated network called domain classifier while being useful for the task at hand. The domain classifier is learned simultaneosly with the features and attempts to tell whether those features are coming from the source or the target domain. The proposed technique is easy to implement, yet results in superior performance in all the standard benchmarks. Finally, the fourth article presents a new kind of generative model for image data. Unlike conventional neural network based approaches our system dubbed SPIRAL describes images in terms of concise low-level programs executed by off-the-shelf rendering software used by humans to create visual content. Among other things, this allows SPIRAL not to waste its capacity on minutae of datasets and focus more on the global structure. The latent space of our model is easily interpretable by design and provides means for predictable image manipulation. We test our approach on several popular datasets and demonstrate its power and flexibility

Dépôt Institutionnel Numérique

Unsteady flow around bluﬀ bodies spanning thin rectangular ducts

Author: Harland David George
Publication venue: Mechanical Engineering, Imperial College London
Publication date: 01/02/2018
Field of study

This project consists of a two-pronged computational and experimental approach to the study of ﬂow in closed, thin rectangular ducts with a partial cubic blockage. Results are presented at three diﬀerent bulk Reynolds numbers, ReD = 5600, 10400 and 15600, based on the channel height, which is also the blockage dimension. The new experimental data produced consists of ﬂuctuating pressure measurements at the cube surface, with 2D-2C PIV snapshots captured simultaneously in the wake region. In addition to this, DNS data is produced at the lowest Reynolds number of ReD = 5600, allowing more detailed comparisons where PIV laser access was not possible. Comparisons are drawn between the data and URANS CFD simulations. A literature review and preliminary testing process narrowed down the considered URANS models to the two-layer k−ε model and the Elliptic Blending Reynolds Stress Model, or EBRSM. In the light of the new data, these two URANS models are compared in order to better understand their strengths and weaknesses. Particular regard is given to the prediction of large-scale unsteady behaviour, with a focus on vortex shedding. This unsteady phenomenon was found to be present and to have a signiﬁcant eﬀect on the ﬂow in the near-cube and wake regions. Results show that certain aspects of this behaviour are captured with only limited accuracy by the URANS models tested. As a result, inaccuracies are also found in the mean simulated velocity ﬁelds. The shortcomings appear more pronounced at higher ﬂow rates. At a given ﬂow rate, they are more severe in regions of the ﬂow where organised unsteadiness is large relative to the mean values. It is suggested that inaccuracies in mean URANS predictions are a result of limitations in model capability for unsteady ﬂows, and that validation cases may be pertinent to address this.Open Acces

Spiral - Imperial College Digital Repository

The Role of Vision Algorithms for Micro Aerial Vehicles

Author: Loianno Giuseppe
Publication venue
Publication date: 31/03/2014
Field of study

This work investigates the research topics related to visual aerial navigation in loosely structured and cluttered environments. During the inspection of the desired infrastructure the robot is required to fly in an environment which is uncertain and only partially structured because, usually, no reliable layouts and drawings of the surroundings are available. To support these features, advanced cognitive capabilities are required, and in particular the role played by vision is of paramount importance. The use of vision and other onboard sensors such as IMU and GPS play a fundamental to provide high level degree of autonomy to flying vehicles. In detail, the outline of this thesis is organized as follows • Chapter 1 is a general introduction of the aerial robotic field, the quadrotor platform, the use of onboard sensors like cameras and IMU for autonomous navigation. A discussion about camera modeling, current state of art on vision based control, navigation, environment reconstruction and sensor fusion is presented. • Chapter 2 presents vision based control algorithms useful for reactive control like collision avoidance, perching and grasping tasks. Two main contributions are presented based on relative depth map and image based visual servoing respectively. • Chapter 3 discusses the use of vision algorithms for localization and mapping. Compared to the previous chapter, the vision algorithm is more complex involving vehicle’s poses estimation and environment reconstruction. An algorithm based on RGB-D sensors for localization, extendable to localization of multiple vehicles, is presented. Moreover, an environment representation for planning purposes, applied to industrial environments, is introduced. • Chapter 4 introduces the possibility to combine vision measurements and IMU to estimate the motion of the vehicle. A new contribution based on Pareto Optimization, which overcome classical Kalman filtering techniques, is presented. • Chapter 5 contains conclusion, remarks and proposals for possible developments

Università degli Studi di Napoli Federico Il Open Archive

Optical-Flow Based Detection of Moving Objects in Traffic Scenes

Author: Klappstein Jens
Publication venue
Publication date: 01/01/2008
Field of study

Traffic is increasing continuously. Nevertheless the number of traffic fatalities decreased in the past. One reason for this are the passive safety systems, such as side crash protection or airbag, which have been engineered the last decades and which are standard in today's cars. Active safety systems are increasingly developed. They are able to avoid or at least to mitigate accidents. For example, the adaptive cruise control (ACC) original designed as a comfort system is developed towards an emergency brake system. Active safety requires sensors perceiving the vehicle environment. ACC uses radar or laser scanner. However, cameras are also interesting sensors as they are capable of processing visual information such as traffic signs or lane markings. In traffic moving objects (cars, bicyclists, pedestrians) play an important role. To perceive them is essential for active safety systems. This thesis deals with the detection of moving objects utilizing a monocular camera. The detection is based on the motions within the video stream (optical flow). If the ego-motion and the location of the camera with respect to the road plane are known the viewed scene can be 3D reconstructed exploiting the measured optical flow. In this thesis an overview of existing algorithms estimating the ego-motion is given. Based on it a suitable algorithm is selected and extended by a motion model. The latter one considerably increases the accuracy as well as the robustness of the estimate. The location of the camera with respect to the road plane is estimated using the optical flow on the road. The road might be temporary low-textured making it hard to measure the optical flow. Consequently, the road homography estimate will be poor. A novel Kalman filtering approach combining the estimate of the ego-motion and the estimate of the road homography leads to far better results. The 3D reconstruction of the viewed scene is performed pointwise for each measured optical flow vector. A point is reconstructed through intersection of the viewing rays which are determined by the optical flow vector. This only yields a correct result for static, i.e. non-moving, points. Further, static points fulfill four constraints: epipolar constraint, trifocal constraint, positive depth constraint, and positive height constraint. If at least one constraint is violated the point is moving. For the first time an error metric is developed exploiting all four constraints. It measures the deviation from the constraints quantitatively in a unified manner. Based on this error metric the detection limits are investigated. It is shown that overtaking objects are detected very well whereas objects being overtaken are detected hardly. Oncoming objects on a straight road are not detected by means of the available constraints. Only if one assumes that these objects are opaque and touch the ground the detection becomes feasible. An appropriate heuristic is introduced. In conclusion, the developed algorithms are a system to detect moving points robustly. The problem of clustering the detected moving points to objects is outlined. It serves as a starting point for further research activities

Heidelberger Dokumentenserver

Biologically Inspired Visual Control of Flying Robots

Author: Stowers John Ross
Publication venue: University of Canterbury. Electrical and Computer Engineering
Publication date: 01/01/2013
Field of study

Insects posses an incredible ability to navigate their environment at high speed, despite having small brains and limited visual acuity. Through selective pressure they have evolved computationally efficient means for simultaneously performing navigation tasks and instantaneous control responses. The insect’s main source of information is visual, and through a hierarchy of processes this information is used for perception; at the lowest level are local neurons for detecting image motion and edges, at the higher level are interneurons to spatially integrate the output of previous stages. These higher level processes could be considered as models of the insect's environment, reducing the amount of information to only that which evolution has determined relevant. The scope of this thesis is experimenting with biologically inspired visual control of flying robots through information processing, models of the environment, and flight behaviour. In order to test these ideas I developed a custom quadrotor robot and experimental platform; the 'wasp' system. All algorithms ran on the robot, in real-time or better, and hypotheses were always verified with flight experiments. I developed a new optical flow algorithm that is computationally efficient, and able to be applied in a regular pattern to the image. This technique is used later in my work when considering patterns in the image motion field. Using optical flow in the log-polar coordinate system I developed attitude estimation and time-to-contact algorithms. I find that the log-polar domain is useful for analysing global image motion; and in many ways equivalent to the retinotopic arrange- ment of neurons in the optic lobe of insects, used for the same task. I investigated the role of depth in insect flight using two experiments. In the first experiment, to study how concurrent visual control processes might be combined, I developed a control system using the combined output of two algorithms. The first algorithm was a wide-field optical flow balance strategy and the second an obstacle avoidance strategy which used inertial information to estimate the depth to objects in the environment - objects whose depth was significantly different to their surround- ings. In the second experiment I created an altitude control system which used a model of the environment in the Hough space, and a biologically inspired sampling strategy, to efficiently detect the ground. Both control systems were used to control the flight of a quadrotor in an indoor environment. The methods that insects use to perceive edges and control their flight in response had not been applied to artificial systems before. I developed a quadrotor control system that used the distribution of edges in the environment to regulate the robot height and avoid obstacles. I also developed a model that predicted the distribution of edges in a static scene, and using this prediction was able to estimate the quadrotor altitude

UC Research Repository

Sensitivity and background estimates towards Phase-I of the COMET muon-to-electron conversion search

Author: Dubouchet Matthias
Publication venue: Physics, Imperial College London
Publication date: 01/06/2023
Field of study

COMET is a future high-precision experiment searching for charged lepton flavour violation through the muon-to-electron conversion process. It aims to push the intensity frontier of particle physics by coupling an intense muon beam with cutting-edge detector technology. The first stage of the experiment, COMET Phase-I, is currently being assembled and will soon enter its data acquisition period. It plans to achieve a single event sensitivity to μ-e conversion in aluminium of 3.1x10⁻¹⁵. This thesis presents a study of the sensitivity and backgrounds of COMET Phase-I using the latest Monte Carlo simulation data produced. The background contribution from cosmic ray-induced atmospheric muons is estimated using a backward Monte Carlo approach, which allows computational resources to be focused on the most critical signal-mimicking events. Analysis of a μ-e conversion simulation sample suggests that COMET Phase-I will reach a single event sensitivity of 3.6x10⁻¹⁵ within 146 days of data acquisition. Our results suggest that, in that period, on the order of 10³ atmospheric muons will enter the detector system and produce an event similar enough to the conversion signal to pass all the signal selection criteria. Most of these events will be rejected by the Cosmic Ray Veto system, however, we expect at least 2.2 background events to sneak in unnoticed. It is vital for the conversion search that these events be discriminated from conversion electrons, for instance by using Cherenkov threshold counters to distinguish between muons and electrons or, alternatively, by developing a direction identification algorithm to reject some fraction of the μ⁺-induced events.Open Acces

Spiral - Imperial College Digital Repository

On discretisation drift and smoothness regularisation in neural network training

Author: Rosca Mihaela Claudia
Publication venue: UCL (University College London)
Publication date: 28/06/2023
Field of study

The deep learning recipe of casting real-world problems as mathematical optimisation and tackling the optimisation by training deep neural networks using gradient-based optimisation has undoubtedly proven to be a fruitful one. The understanding behind why deep learning works, however, has lagged behind its practical significance. We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation. We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms. Understanding the dynamics of GD has been hindered by the presence of discretisation drift, the numerical integration error between GD and its often studied continuous-time counterpart, the negative gradient flow (NGF). To add to the toolkit available to study GD, we derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games. We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regularisers that do not require additional hyperparameters. Like optimisation, smoothness regularisation is another pillar of deep learning's success with wide use in supervised learning and generative modelling. Despite their individual significance, the interactions between smoothness regularisation and optimisation have yet to be explored. We find that smoothness regularisation affects optimisation across multiple deep learning domains, and that incorporating smoothness regularisation in reinforcement learning leads to a performance boost that can be recovered using adaptions to optimisation methods

UCL Discovery