162 research outputs found
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Vision Language Models (VLMs), which extend Large Language Models (LLM) by
incorporating visual understanding capability, have demonstrated significant
advancements in addressing open-ended visual question-answering (VQA) tasks.
However, these models cannot accurately interpret images infused with text, a
common occurrence in real-world scenarios. Standard procedures for extracting
information from images often involve learning a fixed set of query embeddings.
These embeddings are designed to encapsulate image contexts and are later used
as soft prompt inputs in LLMs. Yet, this process is limited to the token count,
potentially curtailing the recognition of scenes with text-rich context. To
improve upon them, the present study introduces BLIVA: an augmented version of
InstructBLIP with Visual Assistant. BLIVA incorporates the query embeddings
from InstructBLIP and also directly projects encoded patch embeddings into the
LLM, a technique inspired by LLaVA. This approach assists the model to capture
intricate details potentially missed during the query decoding process.
Empirical evidence demonstrates that our model, BLIVA, significantly enhances
performance in processing text-rich VQA benchmarks (up to 17.76% in OCR-VQA
benchmark) and in undertaking general (not particularly text-rich) VQA
benchmarks (up to 7.9% in Visual Spatial Reasoning benchmark), and achieved
17.72% overall improvement in a comprehensive multimodal LLM benchmark (MME),
comparing to our baseline InstructBLIP. BLIVA demonstrates significant
capability in decoding real-world images, irrespective of text presence. To
demonstrate the broad industry applications enabled by BLIVA, we evaluate the
model using a new dataset comprising YouTube thumbnails paired with
question-answer sets across 11 diverse categories. Our code and models are
freely accessible at https://github.com/mlpc-ucsd/BLIVA.Comment: Accepted at AAAI Conference on Artificial Intelligence (AAAI-24
Analysis of changes in large-scale circulation patterns driving extreme precipitation events over the central-eastern China
To an extent, large-scale circulation situations and moisture transport are responsible for extreme precipitation occurrence. The aim of our study is to investigate the possible modifications of circulation patterns (CPs) in driving extreme precipitation over the central-eastern China (CEC). The self-organizing map (SOM) and event synchronization methods are used to link the extreme precipitation events with CPs. Results show that 23% of rain gauges have a significant change point (at the 90% confidence level) in annual extreme precipitation from 1960 to 2015. Based on the identified change points, we classified the data into two periods, that is, 1960–1989 and 1990–2015. Overall, CPs characterized by obvious positive anomalies of 500 hPa geopotential height over the Eastern Eurasia continent and negative values over the surrounding oceans are highly synchronized with extreme precipitation events. During 1990–2015, the predominant CPs are more related to the extreme precipitation with enhanced event synchronization. We found that the CP changes produce an increase in extreme precipitation frequency from 1960–1989 to 1990–2015
Learning Probabilistic Coordinate Fields for Robust Correspondences
We introduce Probabilistic Coordinate Fields (PCFs), a novel
geometric-invariant coordinate representation for image correspondence
problems. In contrast to standard Cartesian coordinates, PCFs encode
coordinates in correspondence-specific barycentric coordinate systems (BCS)
with affine invariance. To know \textit{when and where to trust} the encoded
coordinates, we implement PCFs in a probabilistic network termed PCF-Net, which
parameterizes the distribution of coordinate fields as Gaussian mixture models.
By jointly optimizing coordinate fields and their confidence conditioned on
dense flows, PCF-Net can work with various feature descriptors when quantifying
the reliability of PCFs by confidence maps. An interesting observation of this
work is that the learned confidence map converges to geometrically coherent and
semantically consistent regions, which facilitates robust coordinate
representation. By delivering the confident coordinates to keypoint/feature
descriptors, we show that PCF-Net can be used as a plug-in to existing
correspondence-dependent approaches. Extensive experiments on both indoor and
outdoor datasets suggest that accurate geometric invariant coordinates help to
achieve the state of the art in several correspondence problems, such as sparse
feature matching, dense image registration, camera pose estimation, and
consistency filtering. Further, the interpretable confidence map predicted by
PCF-Net can also be leveraged to other novel applications from texture transfer
to multi-homography classification.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine
Intelligenc
Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells
Learning-based multi-view stereo (MVS) methods deal with predicting accurate
depth maps to achieve an accurate and complete 3D representation. Despite the
excellent performance, existing methods ignore the fact that a suitable depth
geometry is also critical in MVS. In this paper, we demonstrate that different
depth geometries have significant performance gaps, even using the same depth
prediction error. Therefore, we introduce an ideal depth geometry composed of
Saddle-Shaped Cells, whose predicted depth map oscillates upward and downward
around the ground-truth surface, rather than maintaining a continuous and
smooth depth plane. To achieve it, we develop a coarse-to-fine framework called
Dual-MVSNet (DMVSNet), which can produce an oscillating depth plane.
Technically, we predict two depth values for each pixel (Dual-Depth), and
propose a novel loss function and a checkerboard-shaped selecting strategy to
constrain the predicted depth geometry. Compared to existing methods,DMVSNet
achieves a high rank on the DTU benchmark and obtains the top performance on
challenging scenes of Tanks and Temples, demonstrating its strong performance
and generalization ability. Our method also points to a new research direction
for considering depth geometry in MVS.Comment: Accepted by ICCV 202
Detecting GPC3-Expressing Hepatocellular Carcinoma with L5 Peptide-Guided Pretargeting Approach: An In Vitro MRI Experiment
Background and Aim: Glypican-3 (GPC3) is a novel molecular target for hepatocellular carcinoma (HCC). This study investigated the potential of an L5 peptide-guided pretargeting approach to identify GPC3-expressing HCC cells using ultra-small super-paramagnetic iron oxide (USPIO) as the MRI probe.Methods: Immunofluorescence with carboxyfluorescein (FAM)-labeled L5 peptide was performed in HepG2 and HL-7702 cells. Polyethylene glycol-modified ultrasmall superparamagnetic iron oxide (PEG-USPIO) and its conjugates with streptavidin (SA-PEG-USPIO) were synthesized, and hydrodynamic diameters, zeta potential, T2 relaxivity, and cytotoxicity were measured. MR T2-weighted imaging of HepG2 was performed to observe signal changes in the pretargeting group, which was first incubated with biotinylated L5 peptide and then with SA-PEG-USPIO. Prussian blue staining of cells was used to assess iron deposition.Results: Immunofluorescence assays showed high specificity of L5 peptide for GPC3. SA-PEG-USPIO nanoparticles had ≈36 nm hydrodynamic diameter, low toxicity, negative charge and high T2 relaxivity. MR imaging revealed that a significant negative enhancement was only observed in HepG2 cells from the pretargeting group, which also showed significant iron deposition with Prussian blue staining.Conclusion: MR imaging with USPIO as the probe has potential to identify GPC3-expressing HCC through L5 peptide-guided pretargeting approach
Fast Full-frame Video Stabilization with Iterative Optimization
Video stabilization refers to the problem of transforming a shaky video into
a visually pleasing one. The question of how to strike a good trade-off between
visual quality and computational speed has remained one of the open challenges
in video stabilization. Inspired by the analogy between wobbly frames and
jigsaw puzzles, we propose an iterative optimization-based learning approach
using synthetic datasets for video stabilization, which consists of two
interacting submodules: motion trajectory smoothing and full-frame outpainting.
First, we develop a two-level (coarse-to-fine) stabilizing algorithm based on
the probabilistic flow field. The confidence map associated with the estimated
optical flow is exploited to guide the search for shared regions through
backpropagation. Second, we take a divide-and-conquer approach and propose a
novel multiframe fusion strategy to render full-frame stabilized views. An
important new insight brought about by our iterative optimization approach is
that the target video can be interpreted as the fixed point of nonlinear
mapping for video stabilization. We formulate video stabilization as a problem
of minimizing the amount of jerkiness in motion trajectories, which guarantees
convergence with the help of fixed-point theory. Extensive experimental results
are reported to demonstrate the superiority of the proposed approach in terms
of computational speed and visual quality. The code will be available on
GitHub.Comment: Accepted by ICCV202
Annual precipitation and daily extreme precipitation distribution: possible trends from 1960 to 2010 in urban areas of China
With global warming, precipitation events are often prone to intensify in some regions. Understanding the changing characteristics of annual and daily extreme precipitation as well as the underlying mechanisms plays an import role for early warning of precipitation-induced disaster (e.g. floods, landslides) and water resources management, especially in densely populated urban areas. In this study, we investigate the long-term trend of annual and daily extreme precipitation in China during 1960–2010 based on daily observations from 539 meteorological stations, and the land cover map with impervious information. We find an overall increasing trend in annual and daily extreme precipitation, particularly in South-East and North-West of China. Moreover, 157 stations located in metropolitan regions experience higher increasing trends of daily extreme precipitation, particularly in Shanghai and Guangzhou metropolitan areas. It is noted that the central urban area of one metropolitan region may have significantly higher increasing trends of daily extreme precipitation than corresponding surrounding areas
Thermal-Enhanced bri1-301 Instability Reveals a Plasma Membrane Protein Quality Control System in Plants
Brassinosteroids (BRs) are essential phytohormones mainly perceived by a single-pass transmembrane receptor-like protein kinase (RLK), BRASSINOSTEROID INSENSITIVE 1 (BRI1). bri1-5 and bri1-9, two distinct mutants with point mutations in the extracellular domain of BRI1, show weak defective phenotypes. Previous studies indicated that bri1-5 and bri1-9 mutated proteins can be recognized and eliminated via an endoplasmic reticulum quality control (ERQC) mechanism. Most of these two proteins, therefore, cannot reach their destination, plasma membrane. Here, we report our functional characterization of bri1-301, another BRI1 mutant protein with an amino acid substitution in the cytoplasmic kinase domain. bri1-301 is a partially functional BR receptor with significantly decreased protein abundance. Interestingly, protein stability and subcellular localization of bri1-301 are temperature-sensitive. At 22°C, an optimal temperature for indoor Arabidopsis growth, bri1-301 shows a weak defective phenotype. At a lower temperature condition such as 18°C, bri1-301 exhibits subtle morphological defects. At a higher temperature condition such as 28°C, on the other hand, bri1-301 displays an extremely severe phenotype reminiscent to that of a null bri1 mutant due to greatly increased bri1-301 internalization and degradation. Our detailed analyses suggest that bri1-301 stability is controlled by ERQC and plasma membrane quality control (PMQC) systems. Since PMQC has not been well studied in plants, bri1-301 can be used as a model mutant for future genetic dissection of this critical process
- …