12 research outputs found
ATRank: An Attention-Based User Behavior Modeling Framework for Recommendation
A user can be represented as what he/she does along the history. A common way
to deal with the user modeling problem is to manually extract all kinds of
aggregated features over the heterogeneous behaviors, which may fail to fully
represent the data itself due to limited human instinct. Recent works usually
use RNN-based methods to give an overall embedding of a behavior sequence,
which then could be exploited by the downstream applications. However, this can
only preserve very limited information, or aggregated memories of a person.
When a downstream application requires to facilitate the modeled user features,
it may lose the integrity of the specific highly correlated behavior of the
user, and introduce noises derived from unrelated behaviors. This paper
proposes an attention based user behavior modeling framework called ATRank,
which we mainly use for recommendation tasks. Heterogeneous user behaviors are
considered in our model that we project all types of behaviors into multiple
latent semantic spaces, where influence can be made among the behaviors via
self-attention. Downstream applications then can use the user behavior vectors
via vanilla attention. Experiments show that ATRank can achieve better
performance and faster training process. We further explore ATRank to use one
unified model to predict different types of user behaviors at the same time,
showing a comparable performance with the highly optimized individual models.Comment: AAAI 201
TouchStone: Evaluating Vision-Language Models by Language Models
Large vision-language models (LVLMs) have recently witnessed rapid
advancements, exhibiting a remarkable capacity for perceiving, understanding,
and processing visual information by connecting visual receptor with large
language models (LLMs). However, current assessments mainly focus on
recognizing and reasoning abilities, lacking direct evaluation of
conversational skills and neglecting visual storytelling abilities. In this
paper, we propose an evaluation method that uses strong LLMs as judges to
comprehensively evaluate the various abilities of LVLMs. Firstly, we construct
a comprehensive visual dialogue dataset TouchStone, consisting of open-world
images and questions, covering five major categories of abilities and 27
subtasks. This dataset not only covers fundamental recognition and
comprehension but also extends to literary creation. Secondly, by integrating
detailed image annotations we effectively transform the multimodal input
content into a form understandable by LLMs. This enables us to employ advanced
LLMs for directly evaluating the quality of the multimodal dialogue without
requiring human intervention. Through validation, we demonstrate that powerful
LVLMs, such as GPT-4, can effectively score dialogue quality by leveraging
their textual capabilities alone, aligning with human preferences. We hope our
work can serve as a touchstone for LVLMs' evaluation and pave the way for
building stronger LVLMs. The evaluation code is available at
https://github.com/OFA-Sys/TouchStone.Comment: https://github.com/OFA-Sys/TouchSton
Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities
We introduce the Qwen-VL series, a set of large-scale vision-language models
designed to perceive and understand both text and images. Comprising Qwen-VL
and Qwen-VL-Chat, these models exhibit remarkable performance in tasks like
image captioning, question answering, visual localization, and flexible
interaction. The evaluation covers a wide range of tasks including zero-shot
captioning, visual or document visual question answering, and grounding. We
demonstrate the Qwen-VL outperforms existing Large Vision Language Models
(LVLMs). We present their architecture, training, capabilities, and
performance, highlighting their contributions to advancing multimodal
artificial intelligence. Code, demo and models are available at
https://github.com/QwenLM/Qwen-VL.Comment: Code, demo and models are available at
https://github.com/QwenLM/Qwen-V
Qwen Technical Report
Large language models (LLMs) have revolutionized the field of artificial
intelligence, enabling natural language processing tasks that were previously
thought to be exclusive to humans. In this work, we introduce Qwen, the first
installment of our large language model series. Qwen is a comprehensive
language model series that encompasses distinct models with varying parameter
counts. It includes Qwen, the base pretrained language models, and Qwen-Chat,
the chat models finetuned with human alignment techniques. The base language
models consistently demonstrate superior performance across a multitude of
downstream tasks, and the chat models, particularly those trained using
Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The
chat models possess advanced tool-use and planning capabilities for creating
agent applications, showcasing impressive performance even when compared to
bigger models on complex tasks like utilizing a code interpreter. Furthermore,
we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as
well as mathematics-focused models, Math-Qwen-Chat, which are built upon base
language models. These models demonstrate significantly improved performance in
comparison with open-source models, and slightly fall behind the proprietary
models.Comment: 59 pages, 5 figure
Fractal complex transform technology for fractal Kkorteweg-de Vries equation within a local fractional derivative
In this paper, we present the fractal complex transform via a local
fractional derivative. The traveling wave solutions for the fractal
Korteweg-de Vries equations within local fractional derivative are obtained
based on the special functions defined on Cantor sets. The technology is a
powerful tool for solving the local fractional non-linear partial
differential equations
Fabrication of Ordered SnO2 Nanostructures with Enhanced Humidity Sensing Performance
Ordered SnO2 nanostructures were prepared as humidity sensors by nanosphere lithography with the magnetron sputtering technique. The X-ray diffraction patterns of SnO2 nanostructures show that all intense diffraction peaks correspond to the crystallographic planes of SnO2. The Atomic Force Microscope (AFM) mage shows that these SnO2 nanostructures exhibited a classic honeycomb structure. The resistance of this sensor was measured to show that the resistance of the sensor decreases with an increase from lower relative humidity (RH) to higher RH. Additionally, the longest response/recovery time was 32 s/42 s for 11–96% RH. The hysteresis for the SnO2 nanostructure sensor was <5%
Enhancement of Detecting Permanent Water and Temporary Water in Flood Disasters by Fusing Sentinel-1 and Sentinel-2 Imagery Using Deep Learning Algorithms: Demonstration of Sen1Floods11 Benchmark Datasets
Identifying permanent water and temporary water in flood disasters efficiently has mainly relied on change detection method from multi-temporal remote sensing imageries, but estimating the water type in flood disaster events from only post-flood remote sensing imageries still remains challenging. Research progress in recent years has demonstrated the excellent potential of multi-source data fusion and deep learning algorithms in improving flood detection, while this field has only been studied initially due to the lack of large-scale labelled remote sensing images of flood events. Here, we present new deep learning algorithms and a multi-source data fusion driven flood inundation mapping approach by leveraging a large-scale publicly available Sen1Flood11 dataset consisting of roughly 4831 labelled Sentinel-1 SAR and Sentinel-2 optical imagery gathered from flood events worldwide in recent years. Specifically, we proposed an automatic segmentation method for surface water, permanent water, and temporary water identification, and all tasks share the same convolutional neural network architecture. We utilize focal loss to deal with the class (water/non-water) imbalance problem. Thorough ablation experiments and analysis confirmed the effectiveness of various proposed designs. In comparison experiments, the method proposed in this paper is superior to other classical models. Our model achieves a mean Intersection over Union (mIoU) of 52.99%, Intersection over Union (IoU) of 52.30%, and Overall Accuracy (OA) of 92.81% on the Sen1Flood11 test set. On the Sen1Flood11 Bolivia test set, our model also achieves very high mIoU (47.88%), IoU (76.74%), and OA (95.59%) and shows good generalization ability