9,808 research outputs found
Recommended from our members
Explainable and Advisable Learning for Self-driving Vehicles
Deep neural perception and control networks are likely to be a key component of self-driving vehicles. These models need to be explainable - they should provide easy-to-interpret rationales for their behavior - so that passengers, insurance companies, law enforcement, developers, etc., can understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller's output, namely rationalizations. Our work has focused on the challenge of generating introspective explanations of deep models for self-driving vehicles. In Chapter 3, we begin by exploring the use of visual explanations. These explanations take the form of real-time highlighted regions of an image that causally influence the network's output (steering control). In the first stage, we use a visual attention model to train a convolution network end-to-end from images to steering angle. The attention model highlights image regions that potentially influence the network's output. Some of these are true influences, but some are spurious. We then apply a causal filtering step to determine which input regions actually influence the output. This produces more succinct visual explanations and more accurately exposes the network's behavior. In Chapter 4, we add an attention-based video-to-text model to produce textual explanations of model actions, e.g. "the car slows down because the road is wet". The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. These explainable systems represent an externalization of tacit knowledge. The network's opaque reasoning is simplified to a situation-specific dependence on a visible object in the image. This makes them brittle and potentially unsafe in situations that do not match training data. In Chapter 5, we propose to address this issue by augmenting training data with natural language advice from a human. Advice includes guidance about what to do and where to attend. We present the first step toward advice-giving, where we train an end-to-end vehicle controller that accepts advice. The controller adapts the way it attends to the scene (visual attention) and the control (steering and speed). Further, in Chapter 6, we propose a new approach that learns vehicle control with the help of long-term (global) human advice. Specifically, our system learns to summarize its visual observations in natural language, predict an appropriate action response (e.g. "I see a pedestrian crossing, so I stop"), and predict the controls, accordingly
Extracting 3D parametric curves from 2D images of Helical objects
Helical objects occur in medicine, biology, cosmetics, nanotechnology, and engineering. Extracting a 3D parametric curve from a 2D image of a helical object has many practical applications, in particular being able to extract metrics such as tortuosity, frequency, and pitch. We present a method that is able to straighten the image object and derive a robust 3D helical curve from peaks in the object boundary. The algorithm has a small number of stable parameters that require little tuning, and the curve is validated against both synthetic and real-world data. The results show that the extracted 3D curve comes within close Hausdorff distance to the ground truth, and has near identical tortuosity for helical objects with a circular profile. Parameter insensitivity and robustness against high levels of image noise are demonstrated thoroughly and quantitatively
Combining Multiple Algorithms for Road Network Tracking from Multiple Source Remotely Sensed Imagery: a Practical System and Performance Evaluation
In light of the increasing availability of commercial high-resolution imaging sensors, automatic interpretation tools are needed to extract road features. Currently, many approaches for road extraction are available, but it is acknowledged that there is no single method that would be successful in extracting all types of roads from any remotely sensed imagery. In this paper, a novel classification of roads is proposed, based on both the roads' geometrical, radiometric properties and the characteristics of the sensors. Subsequently, a general road tracking framework is proposed, and one or more suitable road trackers are designed or combined for each type of roads. Extensive experiments are performed to extract roads from aerial/satellite imagery, and the results show that a combination strategy can automatically extract more than 60% of the total roads from very high resolution imagery such as QuickBird and DMC images, with a time-saving of approximately 20%, and acceptable spatial accuracy. It is proven that a combination of multiple algorithms is more reliable, more efficient and more robust for extracting road networks from multiple-source remotely sensed imagery than the individual algorithms
Local object gist: meaningful shapes and spatial layout at a very early stage of visual processing
In his introduction, Pinna (2010) quoted one of Wertheimer’s observations: “I
stand at the window and see a house, trees, sky. Theoretically I might say there
were 327 brightnesses and nuances of color. Do I have ‘327’? No. I have sky,
house, and trees.” This seems quite remarkable, for Max Wertheimer, together
with Kurt Koffka and Wolfgang Koehler, was a pioneer of Gestalt Theory:
perceptual organisation was tackled considering grouping rules of line and edge
elements in relation to figure-ground segregation, i.e., a meaningful object (the
figure) as perceived against a complex background (the ground).
At the lowest level – line and edge elements – Wertheimer (1923) himself
formulated grouping principles on the basis of proximity, good continuation,
convexity, symmetry and, often forgotten, past experience of the observer. Rubin
(1921) formulated rules for figure-ground segregation using surroundedness, size
and orientation, but also convexity and symmetry. Almost a century of research
into Gestalt later, Pinna and Reeves (2006) introduced the notion of figurality,
meant to represent the integrated set of properties of visual objects, from the
principles of grouping and figure-ground to the colour and volume of objects
with shading. Pinna, in 2010, went one important step further and studied
perceptual meaning, i.e., the interpretation of complex figures on the basis of
past experience of the observer. Re-establishing a link to Wertheimer’s rule about
past experience, he formulated five propositions, three definitions and seven
properties on the basis of observations made on graphically manipulated patterns.
For example, he introduced the illusion of meaning by comics-like elements
suggesting wind, therefore inducing a learned interpretation. His last figure
shows a regular array of squares but with irregular positions on the right side.
This pile of (ir)regular squares can be interpreted as the result of an earthquake
which destroyed part of an apartment block. This is much more intuitive, direct
and economic than describing the complexity of the array of squares
- …