466 research outputs found
Control of an AUV from thruster actuated hover to control surface actuated flight
An autonomous underwater vehicle (AUV) capable of both low speed hovering and high speed flight-style operation is introduced. To have this capability the AUV is over-actuated with a rear propeller, four control surfaces and four through-body tunnel thrusters. In this work the actuators are modelled and the non-linearities and uncertainties are identified and discussed with specific regard to operation at different speeds. A thruster-actuated depth control algorithm and a flight-style control-surface actuated depth controller are presented. These controllers are then coupled using model reference feedback to enable transition between the two controllers to enable vehicle stability throughout the speed range. Results from 3 degrees-of-freedom simulations of the AUV using the new controller are presented, showing that the controller works well to smoothly transition between controllers. The performance of the depth controller appears asymmetric with better performance whilst diving than ascendin
Localizing the Common Action Among a Few Videos
This paper strives to localize the temporal extent of an action in a long
untrimmed video. Where existing work leverages many examples with their start,
their ending, and/or the class of the action during training time, we propose
few-shot common action localization. The start and end of an action in a long
untrimmed video is determined based on just a hand-full of trimmed video
examples containing the same action, without knowing their common class label.
To address this task, we introduce a new 3D convolutional network architecture
able to align representations from the support videos with the relevant query
video segments. The network contains: (\textit{i}) a mutual enhancement module
to simultaneously complement the representation of the few trimmed support
videos and the untrimmed query video; (\textit{ii}) a progressive alignment
module that iteratively fuses the support videos into the query branch; and
(\textit{iii}) a pairwise matching module to weigh the importance of different
support videos. Evaluation of few-shot common action localization in untrimmed
videos containing a single or multiple action instances demonstrates the
effectiveness and general applicability of our proposal.Comment: ECCV 202
Dynamic Key-Value Memory Networks for Knowledge Tracing
Knowledge Tracing (KT) is a task of tracing evolving knowledge state of
students with respect to one or more concepts as they engage in a sequence of
learning activities. One important purpose of KT is to personalize the practice
sequence to help students learn knowledge concepts efficiently. However,
existing methods such as Bayesian Knowledge Tracing and Deep Knowledge Tracing
either model knowledge state for each predefined concept separately or fail to
pinpoint exactly which concepts a student is good at or unfamiliar with. To
solve these problems, this work introduces a new model called Dynamic Key-Value
Memory Networks (DKVMN) that can exploit the relationships between underlying
concepts and directly output a student's mastery level of each concept. Unlike
standard memory-augmented neural networks that facilitate a single memory
matrix or two static memory matrices, our model has one static matrix called
key, which stores the knowledge concepts and the other dynamic matrix called
value, which stores and updates the mastery levels of corresponding concepts.
Experiments show that our model consistently outperforms the state-of-the-art
model in a range of KT datasets. Moreover, the DKVMN model can automatically
discover underlying concepts of exercises typically performed by human
annotations and depict the changing knowledge state of a student.Comment: To appear in 26th International Conference on World Wide Web (WWW),
201
Beyond web-scraping: Crowd-sourcing a geographically diverse image dataset
Current dataset collection methods typically scrape large amounts of data
from the web. While this technique is extremely scalable, data collected in
this way tends to reinforce stereotypical biases, can contain personally
identifiable information, and typically originates from Europe and North
America. In this work, we rethink the dataset collection paradigm and introduce
GeoDE, a geographically diverse dataset with 61,940 images from 40 classes and
6 world regions, and no personally identifiable information, collected through
crowd-sourcing. We analyse GeoDE to understand differences in images collected
in this manner compared to web-scraping. Despite the smaller size of this
dataset, we demonstrate its use as both an evaluation and training dataset,
highlight shortcomings in current models, as well as show improved performances
when even small amounts of GeoDE (1000 - 2000 images per region) are added to a
training dataset. We release the full dataset and code at
https://geodiverse-data-collection.cs.princeton.edu
Photometric stereo for 3D face reconstruction using non-linear illumination models
Face recognition in presence of illumination changes, variant pose and different facial expressions is a challenging problem. In this paper, a method for 3D face reconstruction using photometric stereo and without knowing the illumination directions and facial expression is proposed in order to achieve improvement in face recognition. A dimensionality reduction method was introduced to represent the face deformations due to illumination variations and self shadows in a lower space. The obtained mapping function was used to determine the illumination direction of each input image and that direction was used to apply photometric stereo. Experiments with faces were performed in order to evaluate the performance of the proposed scheme. From the experiments it was shown that the proposed approach results very accurate 3D surfaces without knowing the light directions and with a very small differences compared to the case of known directions. As a result the proposed approach is more general and requires less restrictions enabling 3D face recognition methods to operate with less data
Graph Layouts by t‐SNE
We propose a new graph layout method based on a modification of the t-distributed Stochastic Neighbor Embedding (t-SNE) dimensionality reduction technique. Although t-SNE is one of the best techniques for visualizing high-dimensional data as 2D scatterplots, t-SNE has not been used in the context of classical graph layout. We propose a new graph layout method, tsNET, based on representing a graph with a distance matrix, which together with a modified t-SNE cost function results in desirable layouts. We evaluate our method by a formal comparison with state-of-the-art methods, both visually and via established quality metrics on a comprehensive benchmark, containing real-world and synthetic graphs. As evidenced by the quality metrics and visual inspection, tsNET produces excellent layouts
Jointly Learning Word Embeddings and Latent Topics
Word embedding models such as Skip-gram learn a vector-space representation for each word, based on the local word collocation patterns that are observed in a text corpus. Latent topic models, on the other hand, take a more global view, looking at the word distributions across the corpus to assign a topic to each word occurrence. These two paradigms are complementary in how they represent the meaning of word occurrences. While some previous works have already looked at using word embeddings for improving the quality of latent topics, and conversely, at using latent topics for improving word embeddings, such "two-step'' methods cannot capture the mutual interaction between the two paradigms. In this paper, we propose STE, a framework which can learn word embeddings and latent topics in a unified manner. STE naturally obtains topic-specific word embeddings, and thus addresses the issue of polysemy. At the same time, it also learns the term distributions of the topics, and the topic distributions of the documents. Our experimental results demonstrate that the STE model can indeed generate useful topic-specific word embeddings and coherent latent topics in an effective and efficient way
Non-equilibrium relaxation of hot states in organic semiconductors: Impact of mode-selective excitation on charge transfer.
The theoretical study of open quantum systems strongly coupled to a vibrational environment remains computationally challenging due to the strongly non-Markovian characteristics of the dynamics. We study this problem in the case of a molecular dimer of the organic semiconductor tetracene, the exciton states of which are strongly coupled to a few hundreds of molecular vibrations. To do so, we employ a previously developed tensor network approach, based on the formalism of matrix product states. By analyzing the entanglement structure of the system wavefunction, we can expand it in a tree tensor network state, which allows us to perform a fully quantum mechanical time evolution of the exciton-vibrational system, including the effect of 156 molecular vibrations. We simulate the dynamics of hot states, i.e., states resulting from excess energy photoexcitation, by constructing various initial bath states, and show that the exciton system indeed has a memory of those initial configurations. In particular, the specific pathway of vibrational relaxation is shown to strongly affect the quantum coherence between exciton states in time scales relevant for the ultrafast dynamics of application-relevant processes such as charge transfer. The preferential excitation of low-frequency modes leads to a limited number of relaxation pathways, thus "protecting" quantum coherence and leading to a significant increase in the charge transfer yield in the dimer structure.A.M.A. acknowledges the support of the Engineering and Physical Sciences Research Council (EPSRC) for funding under Grant No. EP/L015552/1
Unified Image and Video Saliency Modeling
Visual saliency modeling for images and videos is treated as two independent
tasks in recent computer vision literature. While image saliency modeling is a
well-studied problem and progress on benchmarks like SALICON and MIT300 is
slowing, video saliency models have shown rapid gains on the recent DHF1K
benchmark. Here, we take a step back and ask: Can image and video saliency
modeling be approached via a unified model, with mutual benefit? We identify
different sources of domain shift between image and video saliency data and
between different video saliency datasets as a key challenge for effective
joint modelling. To address this we propose four novel domain adaptation
techniques - Domain-Adaptive Priors, Domain-Adaptive Fusion, Domain-Adaptive
Smoothing and Bypass-RNN - in addition to an improved formulation of learned
Gaussian priors. We integrate these techniques into a simple and lightweight
encoder-RNN-decoder-style network, UNISAL, and train it jointly with image and
video saliency data. We evaluate our method on the video saliency datasets
DHF1K, Hollywood-2 and UCF-Sports, and the image saliency datasets SALICON and
MIT300. With one set of parameters, UNISAL achieves state-of-the-art
performance on all video saliency datasets and is on par with the
state-of-the-art for image saliency datasets, despite faster runtime and a 5 to
20-fold smaller model size compared to all competing deep methods. We provide
retrospective analyses and ablation studies which confirm the importance of the
domain shift modeling. The code is available at
https://github.com/rdroste/unisalComment: Presented at the European Conference on Computer Vision (ECCV) 2020.
R. Droste and J. Jiao contributed equally to this work. v3: Updated Fig. 5a)
and added new MTI300 benchmark results to supp. materia
IdeaHound: Improving Large-scale Collaborative Ideation with Crowd-powered Real-time Semantic Modeling
Prior work on creativity support tools demonstrates how a computational semantic model of a solution space can enable interventions that substantially improve the number, quality and diversity of ideas. However, automated semantic modeling often falls short when people contribute short text snippets or sketches. Innovation platforms can employ humans to provide semantic judgments to construct a semantic model, but this relies on external workers completing a large number of tedious micro tasks. This requirement threatens both accuracy (external workers may lack expertise and context to make accurate semantic judgments) and scalability (external workers are costly). In this paper, we introduce IDEAHOUND, an ideation system that seamlessly integrates the task of defining semantic relationships among ideas into the primary task of idea generation. The system combines implicit human actions with machine learning to create a computational semantic model of the emerging solution space. The integrated nature of these judgments allows IDEAHOUND to leverage the expertise and efforts of participants who are already motivated to contribute to idea generation, overcoming the issues of scalability inherent to existing approaches. Our results show that participants were equally willing to use (and just as productive using) IDEAHOUND compared to a conventional platform that did not require organizing ideas. Our integrated crowdsourcing approach also creates a more accurate semantic model than an existing crowdsourced approach (performed by external crowds). We demonstrate how this model enables helpful creative interventions: providing diverse inspirational examples, providing similar ideas for a given idea and providing a visual overview of the solution space.Engineering and Applied Science
- …