1,098 research outputs found
A mosaic of eyes
Autonomous navigation is a traditional research topic in intelligent robotics and vehicles, which requires a robot to perceive its environment through onboard sensors such as cameras or laser scanners, to enable it to drive to its goal. Most research to date has focused on the development of a large and smart brain to gain autonomous capability for robots. There are three fundamental questions to be answered by an autonomous mobile robot: 1) Where am I going? 2) Where am I? and 3) How do I get there? To answer these basic questions, a robot requires a massive spatial memory and considerable computational resources to accomplish perception, localization, path planning, and control. It is not yet possible to deliver the centralized intelligence required for our real-life applications, such as autonomous ground vehicles and wheelchairs in care centers. In fact, most autonomous robots try to mimic how humans navigate, interpreting images taken by cameras and then taking decisions accordingly. They may encounter the following difficulties
Supervised coordinate descent method with a 3D bilinear model for face alignment and tracking
Face alignment and tracking play important roles in facial performance capture. Existing data-driven methods for monocular videos suffer from large variations of
pose and expression. In this paper, we propose an efficient and robust method for this task by introducing a novel supervised coordinate descent method with 3D bilinear representation. Instead of learning the mapping between the whole parameters and image features directly with a cascaded regression framework in current methods,
we learn individual sets of parameters mappings separately step by step by a coordinate descent mean. Because different parameters make different contributions to the displacement of facial landmarks, our method is more discriminative to current whole-parameter cascaded regression methods. Benefiting from a 3D bilinear model learned from public databases, the proposed method can handle the head pose changes and extreme expressions out of plane better than other 2D-based methods. We present the reliable result of face tracking under various head poses and facial expressions on challenging video sequences collected online. The experimental results show that our method outperforms state-of-art data-driven methods
The sky brightness and transparency in i-band at Dome A, Antarctica
The i-band observing conditions at Dome A on the Antarctic plateau have been
investigated using data acquired during 2008 with the Chinese Small Telescope
ARray. The sky brightness, variations in atmospheric transparency, cloud cover,
and the presence of aurorae are obtained from these images. The median sky
brightness of moonless clear nights is 20.5 mag arcsec^{-2} in the SDSS
band at the South Celestial Pole (which includes a contribution of about 0.06
mag from diffuse Galactic light). The median over all Moon phases in the
Antarctic winter is about 19.8 mag arcsec^{-2}. There were no thick clouds in
2008. We model contributions of the Sun and the Moon to the sky background to
obtain the relationship between the sky brightness and transparency. Aurorae
are identified by comparing the observed sky brightness to the sky brightness
expected from this model. About 2% of the images are affected by relatively
strong aurorae.Comment: There are 1 Latex file and 14 figures accepted by A
Unfalsified visual servoing for simultaneous object recognition and pose tracking
In a complex environment, simultaneous object recognition and tracking has been one of the challenging topics in computer vision and robotics. Current approaches are usually fragile due to spurious feature matching and local convergence for pose determination. Once a failure happens, these approaches lack a mechanism to recover automatically. In this paper, data-driven unfalsified control is proposed for solving this problem in visual servoing. It recognizes a target through matching image features with a 3-D model and then tracks them through dynamic visual servoing. The features can be falsified or unfalsified by a supervisory mechanism according to their tracking performance. Supervisory visual servoing is repeated until a consensus between the model and the selected features is reached, so that model recognition and object tracking are accomplished. Experiments show the effectiveness and robustness of the proposed algorithm to deal with matching and tracking failures caused by various disturbances, such as fast motion, occlusions, and illumination variation
Enhancing Subtask Performance of Multi-modal Large Language Model
Multi-modal Large Language Model (MLLM) refers to a model expanded from a
Large Language Model (LLM) that possesses the capability to handle and infer
multi-modal data. Current MLLMs typically begin by using LLMs to decompose
tasks into multiple subtasks, then employing individual pre-trained models to
complete specific subtasks, and ultimately utilizing LLMs to integrate the
results of each subtasks to obtain the results of the task. In real-world
scenarios, when dealing with large projects, it is common practice to break
down the project into smaller sub-projects, with different teams providing
corresponding solutions or results. The project owner then decides which
solution or result to use, ensuring the best possible outcome for each subtask
and, consequently, for the entire project. Inspired by this, this study
considers selecting multiple pre-trained models to complete the same subtask.
By combining the results from multiple pre-trained models, the optimal subtask
result is obtained, enhancing the performance of the MLLM. Specifically, this
study first selects multiple pre-trained models focused on the same subtask
based on distinct evaluation approaches, and then invokes these models in
parallel to process input data and generate corresponding subtask results.
Finally, the results from multiple pre-trained models for the same subtask are
compared using the LLM, and the best result is chosen as the outcome for that
subtask. Extensive experiments are conducted in this study using GPT-4
annotated datasets and human-annotated datasets. The results of various
evaluation metrics adequately demonstrate the effectiveness of the proposed
approach in this paper
- …
