821,032 research outputs found
Gesture-aware Interactive Machine Teaching with In-situ Object Annotations
Interactive Machine Teaching (IMT) systems allow non-experts to easily create
Machine Learning (ML) models. However, existing vision-based IMT systems either
ignore annotations on the objects of interest or require users to annotate in a
post-hoc manner. Without the annotations on objects, the model may misinterpret
the objects using unrelated features. Post-hoc annotations cause additional
workload, which diminishes the usability of the overall model building process.
In this paper, we develop LookHere, which integrates in-situ object annotations
into vision-based IMT. LookHere exploits users' deictic gestures to segment the
objects of interest in real time. This segmentation information can be
additionally used for training. To achieve the reliable performance of this
object segmentation, we utilize our custom dataset called HuTics, including
2040 front-facing images of deictic gestures toward various objects by 170
people. The quantitative results of our user study showed that participants
were 16.3 times faster in creating a model with our system compared to a
standard IMT system with a post-hoc annotation process while demonstrating
comparable accuracies. Additionally, models created by our system showed a
significant accuracy improvement () in segmenting the
objects of interest compared to those without annotations.Comment: UIST 202
Weed Identification by Single-Stage and Two-Stage Neural Networks: A Study on the Impact of Image Resizers and Weights Optimization Algorithms.
The accurate identification of weeds is an essential step for a site-specific weed management system. In recent years, deep learning (DL) has got rapid advancements to perform complex agricultural tasks. The previous studies emphasized the evaluation of advanced training techniques or modifying the well-known DL models to improve the overall accuracy. In contrast, this research attempted to improve the mean average precision (mAP) for the detection and classification of eight classes of weeds by proposing a novel DL-based methodology. First, a comprehensive analysis of single-stage and two-stage neural networks including Single-shot MultiBox Detector (SSD), You look only Once (YOLO-v4), EfficientDet, CenterNet, RetinaNet, Faster Region-based Convolutional Neural Network (RCNN), and Region-based Fully Convolutional Network (RFCN), has been performed. Next, the effects of image resizing techniques along with four image interpolation methods have been studied. It led to the final stage of the research through optimization of the weights of the best-acquired model by initialization techniques, batch normalization, and DL optimization algorithms. The effectiveness of the proposed work is proven due to a high mAP of 93.44% and validated by the stratified k-fold cross-validation technique. It was 5.8% improved as compared to the results obtained by the default settings of the best-suited DL architecture (Faster RCNN ResNet-101). The presented pipeline would be a baseline study for the research community to explore several tasks such as real-time detection and reducing the computation/training time. All the relevant data including the annotated dataset, configuration files, and inference graph of the final model are provided with this article. Furthermore, the selection of the DeepWeeds dataset shows the robustness/practicality of the study because it contains images collected in a real/complex agricultural environment. Therefore, this research would be a considerable step toward an efficient and automatic weed control system.Published onlin
Workload-Aware Scheduling using Markov Decision Process for Infrastructure-Assisted Learning-Based Multi-UAV Surveillance Networks
In modern networking research, infrastructure-assisted unmanned autonomous
vehicles (UAVs) are actively considered for real-time learning-based
surveillance and aerial data-delivery under unexpected 3D free mobility and
coordination. In this system model, it is essential to consider the power
limitation in UAVs and autonomous object recognition (for abnormal behavior
detection) deep learning performance in infrastructure/towers. To overcome the
power limitation of UAVs, this paper proposes a novel aerial scheduling
algorithm between multi-UAVs and multi-towers where the towers conduct wireless
power transfer toward UAVs. In addition, to take care of the high-performance
learning model training in towers, we also propose a data delivery scheme which
makes UAVs deliver the training data to the towers fairly to prevent problems
due to data imbalance (e.g., huge computation overhead caused by larger data
delivery or overfitting from less data delivery). Therefore, this paper
proposes a novel workload-aware scheduling algorithm between multi-towers and
multi-UAVs for joint power-charging from towers to their associated UAVs and
training data delivery from UAVs to their associated towers. To compute the
workload-aware optimal scheduling decisions in each unit time, our solution
approach for the given scheduling problem is designed based on Markov decision
process (MDP) to deal with (i) time-varying low-complexity computation and (ii)
pseudo-polynomial optimality. As shown in performance evaluation results, our
proposed algorithm ensures (i) sufficient times for resource exchanges between
towers and UAVs, (ii) the most even and uniform data collection during the
processes compared to the other algorithms, and (iii) the performance of all
towers convergence to optimal levels.Comment: 15 pages, 10 figure
Recommended from our members
Explainable and Advisable Learning for Self-driving Vehicles
Deep neural perception and control networks are likely to be a key component of self-driving vehicles. These models need to be explainable - they should provide easy-to-interpret rationales for their behavior - so that passengers, insurance companies, law enforcement, developers, etc., can understand what triggered a particular behavior. Explanations may be triggered by the neural controller, namely introspective explanations, or informed by the neural controller's output, namely rationalizations. Our work has focused on the challenge of generating introspective explanations of deep models for self-driving vehicles. In Chapter 3, we begin by exploring the use of visual explanations. These explanations take the form of real-time highlighted regions of an image that causally influence the network's output (steering control). In the first stage, we use a visual attention model to train a convolution network end-to-end from images to steering angle. The attention model highlights image regions that potentially influence the network's output. Some of these are true influences, but some are spurious. We then apply a causal filtering step to determine which input regions actually influence the output. This produces more succinct visual explanations and more accurately exposes the network's behavior. In Chapter 4, we add an attention-based video-to-text model to produce textual explanations of model actions, e.g. "the car slows down because the road is wet". The attention maps of controller and explanation model are aligned so that explanations are grounded in the parts of the scene that mattered to the controller. We explore two approaches to attention alignment, strong- and weak-alignment. These explainable systems represent an externalization of tacit knowledge. The network's opaque reasoning is simplified to a situation-specific dependence on a visible object in the image. This makes them brittle and potentially unsafe in situations that do not match training data. In Chapter 5, we propose to address this issue by augmenting training data with natural language advice from a human. Advice includes guidance about what to do and where to attend. We present the first step toward advice-giving, where we train an end-to-end vehicle controller that accepts advice. The controller adapts the way it attends to the scene (visual attention) and the control (steering and speed). Further, in Chapter 6, we propose a new approach that learns vehicle control with the help of long-term (global) human advice. Specifically, our system learns to summarize its visual observations in natural language, predict an appropriate action response (e.g. "I see a pedestrian crossing, so I stop"), and predict the controls, accordingly
Exploring Real-Time Bio-Behaviorally-Aware Feedback Interventions for Mitigating Public Speaking Anxiety
Effective public speaking skills are crucial to one’s academic and professional success. Individuals who are good in public speaking are more likely to graduate college and obtain a leadership position compared to their counter-peers. Yet, public speaking anxiety (PSA) is one of the most common social fear faced by people directly affecting one’s academic and professional success.
This Master’s thesis investigates the effectiveness of in-the-moment bio-behaviorally aware feedback in mitigating public speaking anxiety in a virtual training environment. The training environment exposes participants to various virtual stimuli and at the same time, captures their audio and physiological signals. These signals are used to extract bio-behavioral measures (e.g., speech intonation, electrodermal activity mean) and serve as an input to a machine learning model that provides real-time estimates of state anxiety. Based on these state anxiety estimates, the system provides real-time feedback of positive reinforcement and cognitive restructuring–grounded on theoretical rationale from behavioral sciences–when an increase in state anxiety is detected. The system is evaluated through a small-scale study of participants using a pre/post evaluation design.
Results indicate that in-the-moment feedback prompts provided to the participants affect their in-the-moment state-based anxiety. Statistical analysis indicates significant differences of biobehavioral measures before and after the in-the-moment feedback prompts. The self-reported POST-study results from the participants of the user study also indicate that 5 out of 7 participants found this study beneficial for their public speaking skills. Results of this work also highlight the effect of type of audience on the positive reinforcement feedback provided to the participants. It is observed that when the audience is negative, the positive reinforcement feedback prompts provided to the participants by the real-time model were more compared to a positive audience. Findings from this work provide a foundation toward designing artificial intelligence systems for personalized in-the-moment interventions for mitigating adverse behavioral outcomes
Cortical Learning of Recognition Categories: A Resolution of the Exemplar Vs. Prototype Debate
Do humans and animals learn exemplars or prototypes when they categorize objects and events in the world? How are different degrees of abstraction realized through learning by neurons in inferotemporal and prefrontal cortex? How do top-down expectations influence the course of learning? Thirty related human cognitive experiments (the 5-4 category structure) have been used to test competing views in the prototype-exemplar debate. In these experiments, during the test phase, subjects unlearn in a characteristic way items that they had learned to categorize perfectly in the training phase. Many cognitive models do not describe how an individual learns or forgets such categories through time. Adaptive Resonance Theory (ART) neural models provide such a description, and also clarify both psychological and neurobiological data. Matching of bottom-up signals with learned top-down expectations plays a key role in ART model learning. Here, an ART model is used to learn incrementally in response to 5-4 category structure stimuli. Simulation results agree with experimental data, achieving perfect categorization in training and a good match to the pattern of errors exhibited by human subjects in the testing phase. These results show how the model learns both prototypes and certain exemplars in the training phase. ART prototypes are, however, unlike the ones posited in the traditional prototype-exemplar debate. Rather, they are critical patterns of features to which a subject learns to pay attention based on past predictive success and the order in which exemplars are experienced. Perturbations of old memories by newly arriving test items generate a performance curve that closely matches the performance pattern of human subjects. The model also clarifies exemplar-based accounts of data concerning amnesia.Defense Advanced Projects Research Agency SyNaPSE program (Hewlett-Packard Company, DARPA HR0011-09-3-0001; HRL Laboratories LLC #801881-BS under HR0011-09-C-0011); Science of Learning Centers program of the National Science Foundation (NSF SBE-0354378
Mean-Field Theory of Meta-Learning
We discuss here the mean-field theory for a cellular automata model of
meta-learning. The meta-learning is the process of combining outcomes of
individual learning procedures in order to determine the final decision with
higher accuracy than any single learning method. Our method is constructed from
an ensemble of interacting, learning agents, that acquire and process incoming
information using various types, or different versions of machine learning
algorithms. The abstract learning space, where all agents are located, is
constructed here using a fully connected model that couples all agents with
random strength values. The cellular automata network simulates the higher
level integration of information acquired from the independent learning trials.
The final classification of incoming input data is therefore defined as the
stationary state of the meta-learning system using simple majority rule, yet
the minority clusters that share opposite classification outcome can be
observed in the system. Therefore, the probability of selecting proper class
for a given input data, can be estimated even without the prior knowledge of
its affiliation. The fuzzy logic can be easily introduced into the system, even
if learning agents are build from simple binary classification machine learning
algorithms by calculating the percentage of agreeing agents.Comment: 23 page
Speech Development by Imitation
The Double Cone Model (DCM) is a model
of how the brain transforms sensory input to
motor commands through successive stages of
data compression and expansion. We have
tested a subset of the DCM on speech recognition, production and imitation. The experiments show that the DCM is a good candidate
for an artificial speech processing system that
can develop autonomously. We show that the
DCM can learn a repertoire of speech sounds
by listening to speech input. It is also able to
link the individual elements of speech to sequences that can be recognized or reproduced,
thus allowing the system to imitate spoken
language
- …