7,767 research outputs found
Recommended from our members
Explainable and Advisable Learning for Self-driving Vehicles
Deep neural perception and control networks are likely to be a key component of self-driving vehicles. These models need to be explainable - they should provide easy-to-interpret rationales for their behavior - so that passengers, insurance companies, law enforcement, developers, etc., can understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller's output, namely rationalizations. Our work has focused on the challenge of generating introspective explanations of deep models for self-driving vehicles. In Chapter 3, we begin by exploring the use of visual explanations. These explanations take the form of real-time highlighted regions of an image that causally influence the network's output (steering control). In the first stage, we use a visual attention model to train a convolution network end-to-end from images to steering angle. The attention model highlights image regions that potentially influence the network's output. Some of these are true influences, but some are spurious. We then apply a causal filtering step to determine which input regions actually influence the output. This produces more succinct visual explanations and more accurately exposes the network's behavior. In Chapter 4, we add an attention-based video-to-text model to produce textual explanations of model actions, e.g. "the car slows down because the road is wet". The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. These explainable systems represent an externalization of tacit knowledge. The network's opaque reasoning is simplified to a situation-specific dependence on a visible object in the image. This makes them brittle and potentially unsafe in situations that do not match training data. In Chapter 5, we propose to address this issue by augmenting training data with natural language advice from a human. Advice includes guidance about what to do and where to attend. We present the first step toward advice-giving, where we train an end-to-end vehicle controller that accepts advice. The controller adapts the way it attends to the scene (visual attention) and the control (steering and speed). Further, in Chapter 6, we propose a new approach that learns vehicle control with the help of long-term (global) human advice. Specifically, our system learns to summarize its visual observations in natural language, predict an appropriate action response (e.g. "I see a pedestrian crossing, so I stop"), and predict the controls, accordingly
Recommended from our members
Analysis of the visual spatiotemporal properties of American Sign Language.
Careful measurements of the temporal dynamics of speech have provided important insights into phonetic properties of spoken languages, which are important for understanding auditory perception. By contrast, analytic quantification of the visual properties of signed languages is still largely uncharted. Exposure to sign language is a unique experience that could shape and modify low-level visual processing for those who use it regularly (i.e., what we refer to as the Enhanced Exposure Hypothesis). The purpose of the current study was to characterize the visual spatiotemporal properties of American Sign Language (ASL) so that future studies can test the enhanced exposure hypothesis in signers, with the prediction that altered vision should be observed within, more so than outside, the range of properties found in ASL. Using an ultrasonic motion tracking system, we recorded the hand position in 3-dimensional space over time during sign language production of signs, sentences, and narratives. From these data, we calculated several metrics: hand position and eccentricity in space and hand motion speed. For individual signs, we also measured total distance travelled by the dominant hand and total duration of each sign. These metrics were found to fall within a selective range, suggesting that exposure to signs is a specific and unique visual experience, which might alter visual perceptual abilities in signers for visual information within the experienced range, even for non-language stimuli
Recommended from our members
Band-collision gel electrophoresis.
Electrophoretic mobility shift assays are widely used in gel electrophoresis to study binding interactions between different molecular species loaded into the same well. However, shift assays can access only a subset of reaction possibilities that could be otherwise seen if separate bands of reagent species might instead be collisionally reacted. Here, we adapt gel electrophoresis by fabricating two or more wells in the same lane, loading these wells with different reagent species, and applying an electric field, thereby producing collisional reactions between propagating pulse-like bands of these species, which we image optically. For certain pairs of anionic and cationic dyes, propagating bands pass through each other unperturbed; yet, for other pairs, we observe complexing and precipitation reactions, indicating strong attractive interactions. We generalize this band-collision gel electrophoresis (BCGE) approach to other reaction types, including acid-base, ligand exchange, and redox, as well as to colloidal species in passivated large-pore gels
Spatial-Temporal Deep Embedding for Vehicle Trajectory Reconstruction from High-Angle Video
Spatial-temporal Map (STMap)-based methods have shown great potential to
process high-angle videos for vehicle trajectory reconstruction, which can meet
the needs of various data-driven modeling and imitation learning applications.
In this paper, we developed Spatial-Temporal Deep Embedding (STDE) model that
imposes parity constraints at both pixel and instance levels to generate
instance-aware embeddings for vehicle stripe segmentation on STMap. At pixel
level, each pixel was encoded with its 8-neighbor pixels at different ranges,
and this encoding is subsequently used to guide a neural network to learn the
embedding mechanism. At the instance level, a discriminative loss function is
designed to pull pixels belonging to the same instance closer and separate the
mean value of different instances far apart in the embedding space. The output
of the spatial-temporal affinity is then optimized by the mutex-watershed
algorithm to obtain final clustering results. Based on segmentation metrics,
our model outperformed five other baselines that have been used for STMap
processing and shows robustness under the influence of shadows, static noises,
and overlapping. The designed model is applied to process all public NGSIM
US-101 videos to generate complete vehicle trajectories, indicating a good
scalability and adaptability. Last but not least, the strengths of the scanline
method with STDE and future directions were discussed. Code, STMap dataset and
video trajectory are made publicly available in the online repository. GitHub
Link: shorturl.at/jklT0
DxNAT - Deep Neural Networks for Explaining Non-Recurring Traffic Congestion
Non-recurring traffic congestion is caused by temporary disruptions, such as
accidents, sports games, adverse weather, etc. We use data related to real-time
traffic speed, jam factors (a traffic congestion indicator), and events
collected over a year from Nashville, TN to train a multi-layered deep neural
network. The traffic dataset contains over 900 million data records. The
network is thereafter used to classify the real-time data and identify
anomalous operations. Compared with traditional approaches of using statistical
or machine learning techniques, our model reaches an accuracy of 98.73 percent
when identifying traffic congestion caused by football games. Our approach
first encodes the traffic across a region as a scaled image. After that the
image data from different timestamps is fused with event- and time-related
data. Then a crossover operator is used as a data augmentation method to
generate training datasets with more balanced classes. Finally, we use the
receiver operating characteristic (ROC) analysis to tune the sensitivity of the
classifier. We present the analysis of the training time and the inference time
separately
- …