179 research outputs found
A new type of eye movement model based on recurrent neural networks for simulating the gaze behavior of human reading.
Traditional eye movement models are based on psychological assumptions and empirical data that are not able to simulate eye movement on previously unseen text data. To address this problem, a new type of eye movement model is presented and tested in this paper. In contrast to conventional psychology-based eye movement models, ours is based on a recurrent neural network (RNN) to generate a gaze point prediction sequence, by using the combination of convolutional neural networks (CNN), bidirectional long short-term memory networks (LSTM), and conditional random fields (CRF). The model uses the eye movement data of a reader reading some texts as training data to predict the eye movements of the same reader reading a previously unseen text. A theoretical analysis of the model is presented to show its excellent convergence performance. Experimental results are then presented to demonstrate that the proposed model can achieve similar prediction accuracy while requiring fewer features than current machine learning models
Usage of Blogging Software for Laboratory Management to Support Weekly Seminars Using Research Activity Reports
AbstractThis paper reports the design and use of blogging software in laboratory management to support weekly seminars, in which activity reports are an important resource for checking participants’ research activity. The software has three basic functions to support seminars: a report editing, comment, and chat. In order to support knowledge management, we added an evaluation function corresponding to each seminar report and a To-Do-List function to support driven objects as sub-goals. The blogging system was installed in a laboratory seminar, in which a teacher, a doctoral student, and seven students pursuing their master's degree participated over the course of five months. Results show that seminars conducted using the blogging software were evaluated more highly than paper-based seminars. However, only a few participants used the comment function, and the chat function was minimally used
Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation
Impressive performance on point cloud semantic segmentation has been achieved
by fully-supervised methods with large amounts of labelled data. As it is
labour-intensive to acquire large-scale point cloud data with point-wise
labels, many attempts have been made to explore learning 3D point cloud
segmentation with limited annotations. Active learning is one of the effective
strategies to achieve this purpose but is still under-explored. The most recent
methods of this kind measure the uncertainty of each pre-divided region for
manual labelling but they suffer from redundant information and require
additional efforts for region division. This paper aims at addressing this
issue by developing a hierarchical point-based active learning strategy.
Specifically, we measure the uncertainty for each point by a hierarchical
minimum margin uncertainty module which considers the contextual information at
multiple levels. Then, a feature-distance suppression strategy is designed to
select important and representative points for manual labelling. Besides, to
better exploit the unlabelled data, we build a semi-supervised segmentation
framework based on our active strategy. Extensive experiments on the S3DIS and
ScanNetV2 datasets demonstrate that the proposed framework achieves 96.5% and
100% performance of fully-supervised baseline with only 0.07% and 0.1% training
data, respectively, outperforming the state-of-the-art weakly-supervised and
active learning methods. The code will be available at
https://github.com/SmiletoE/HPAL.Comment: International Conference on Computer Vision (ICCV) 202
CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization
We propose CatVersion, an inversion-based method that learns the personalized
concept through a handful of examples. Subsequently, users can utilize text
prompts to generate images that embody the personalized concept, thereby
achieving text-to-image personalization. In contrast to existing approaches
that emphasize word embedding learning or parameter fine-tuning for the
diffusion model, which potentially causes concept dilution or overfitting, our
method concatenates embeddings on the feature-dense space of the text encoder
in the diffusion model to learn the gap between the personalized concept and
its base class, aiming to maximize the preservation of prior knowledge in
diffusion models while restoring the personalized concepts. To this end, we
first dissect the text encoder's integration in the image generation process to
identify the feature-dense space of the encoder. Afterward, we concatenate
embeddings on the Keys and Values in this space to learn the gap between the
personalized concept and its base class. In this way, the concatenated
embeddings ultimately manifest as a residual on the original attention output.
To more accurately and unbiasedly quantify the results of personalized image
generation, we improve the CLIP image alignment score based on masks.
Qualitatively and quantitatively, CatVersion helps to restore personalization
concepts more faithfully and enables more robust editing.Comment: For the project page, please visit
https://royzhao926.github.io/CatVersion-page
What were we all looking at? Identifying objects of collective visual attention
The file attached to this record is the authors final peer reviewed version. The publisher's final version can be found by following the DOI link below.We aim to identify the salient objects in an image by applying a model of visual attention. We automate the process by predicting those objects in an image that are most likely to be the focus of someone’s visual attention. Concretely, we first generate fixation maps from the eye tracking data, which express the ground truth of people’s visual attention for each training image. Then, we extract the high-level features based on the bag-of-visual-words image representation as input attributes along with the fixation maps to train a support vector regression model. With this model, we can predict a new query image’s saliency. Our experiments show that the model is capable of providing a good estimate for human visual attention in test images sets with one salient object and multiple salient objects. In this way, we seek to reduce the redundant information within the scene, and thus provide a more accurate depiction of the scene
- …