42 research outputs found
Articulated Pose Estimation Using Hierarchical Exemplar-Based Models
Exemplar-based models have achieved great success on localizing the parts of
semi-rigid objects. However, their efficacy on highly articulated objects such
as humans is yet to be explored. Inspired by hierarchical object representation
and recent application of Deep Convolutional Neural Networks (DCNNs) on human
pose estimation, we propose a novel formulation that incorporates both
hierarchical exemplar-based models and DCNNs in the spatial terms.
Specifically, we obtain more expressive spatial models by assuming independence
between exemplars at different levels in the hierarchy; we also obtain stronger
spatial constraints by inferring the spatial relations between parts at the
same level. As our method strikes a good balance between expressiveness and
strength of spatial models, it is both effective and generalizable, achieving
state-of-the-art results on different benchmarks: Leeds Sports Dataset and
CUB-200-2011.Comment: 8 pages, 6 figure
Looking Fast and Slow: Memory-Guided Mobile Video Object Detection
With a single eye fixation lasting a fraction of a second, the human visual
system is capable of forming a rich representation of a complex environment,
reaching a holistic understanding which facilitates object recognition and
detection. This phenomenon is known as recognizing the "gist" of the scene and
is accomplished by relying on relevant prior knowledge. This paper addresses
the analogous question of whether using memory in computer vision systems can
not only improve the accuracy of object detection in video streams, but also
reduce the computation time. By interleaving conventional feature extractors
with extremely lightweight ones which only need to recognize the gist of the
scene, we show that minimal computation is required to produce accurate
detections when temporal memory is present. In addition, we show that the
memory contains enough information for deploying reinforcement learning
algorithms to learn an adaptive inference policy. Our model achieves
state-of-the-art performance among mobile methods on the Imagenet VID 2015
dataset, while running at speeds of up to 70+ FPS on a Pixel 3 phone
Fusion-Eval: Integrating Evaluators with LLMs
Evaluating Large Language Models (LLMs) is a complex task, especially
considering the intricacies of natural language understanding and the
expectations for high-level reasoning. Traditional evaluations typically lean
on human-based, model-based, or automatic-metrics-based paradigms, each with
its own advantages and shortcomings. We introduce "Fusion-Eval", a system that
employs LLMs not solely for direct evaluations, but to skillfully integrate
insights from diverse evaluators. This gives Fusion-Eval flexibility, enabling
it to work effectively across diverse tasks and make optimal use of multiple
references. In testing on the SummEval dataset, Fusion-Eval achieved a Spearman
correlation of 0.96, outperforming other evaluators. The success of Fusion-Eval
underscores the potential of LLMs to produce evaluations that closely align
human perspectives, setting a new standard in the field of LLM evaluation
DeepStore: an interaction-aware Wide&Deep model for store site recommendation with attentional spatial embeddings
International audienceStore site recommendation is one of the essential business services in smart cities for brick-and-mortar enterprises. In recent years, the proliferation of multisource data in cities has fostered unprecedented opportunities to the data-driven store site recommendation, which aims at leveraging large-scale user-generated data to analyze and mine users' preferences for identifying the optimal location for a new store. However, most works in store site recommendation pay more attention to a single data source which lacks some significant data (e.g., consumption data and user profile data). In this paper, we aim to study the store site recommendation in a fine-grained manner. Specifically, we predict the consumption level of different users at the store based on multisource data, which can not only help the store placement but also benefit analyzing customer behavior in the store at different time periods. To solve this problem, we design a novel model based on the deep neural network, named DeepStore, which learns low-and high-order feature interactions explicitly and implicitly from dense and sparse features simultaneously. In particular, DeepStore incorporates three modules: 1) the cross network; 2) the deep network; and 3) the linear component. In addition, to learn the latent feature representation from multisource data, we propose two embedding methods for different types of data: 1) the filed embedding and 2) attention-based spatial embedding. Extensive experiments are conducted on a real-world dataset including store data, user data, and point-of-interest data, the results demonstrate that DeepStore outperforms the state-of-the-art models
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
Large Language Models (LLMs) have demonstrated impressive capabilities in
creative tasks such as storytelling and E-mail generation. However, as LLMs are
primarily trained on final text results rather than intermediate revisions, it
might be challenging for them to perform text rewriting tasks. Most studies in
the rewriting tasks focus on a particular transformation type within the
boundaries of single sentences. In this work, we develop new strategies for
instruction tuning and reinforcement learning to better align LLMs for
cross-sentence rewriting tasks using diverse wording and structures expressed
through natural languages including 1) generating rewriting instruction data
from Wiki edits and public corpus through instruction generation and
chain-of-thought prompting; 2) collecting comparison data for reward model
training through a new ranking function. To facilitate this research, we
introduce OpenRewriteEval, a novel benchmark covers a wide variety of rewriting
types expressed through natural language instructions. Our results show
significant improvements over a variety of baselines. The public repository is
available on GitHub under Google Research
(https://github.com/google-research/google-research/tree/master/rewritelm)
Dynamic Knowledge Distillation with Noise Elimination for RGB-D Salient Object Detection
RGB-D salient object detection (SOD) demonstrates its superiority in detecting in complex environments due to the additional depth information introduced in the data. Inevitably, an independent stream is introduced to extract features from depth images, leading to extra computation and parameters. This methodology sacrifices the model size to improve the detection accuracy which may impede the practical application of SOD problems. To tackle this dilemma, we propose a dynamic knowledge distillation (DKD) method, along with a lightweight structure, which significantly reduces the computational burden while maintaining validity. This method considers the factors of both teacher and student performance within the training stage and dynamically assigns the distillation weight instead of applying a fixed weight on the student model. We also investigate the issue of RGB-D early fusion strategy in distillation and propose a simple noise elimination method to mitigate the impact of distorted training data caused by low quality depth maps. Extensive experiments are conducted on five public datasets to demonstrate that our method can achieve competitive performance with a fast inference speed (136FPS) compared to 12 prior methods