160 research outputs found

    BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

    Full text link
    Vision Language Models (VLMs), which extend Large Language Models (LLM) by incorporating visual understanding capability, have demonstrated significant advancements in addressing open-ended visual question-answering (VQA) tasks. However, these models cannot accurately interpret images infused with text, a common occurrence in real-world scenarios. Standard procedures for extracting information from images often involve learning a fixed set of query embeddings. These embeddings are designed to encapsulate image contexts and are later used as soft prompt inputs in LLMs. Yet, this process is limited to the token count, potentially curtailing the recognition of scenes with text-rich context. To improve upon them, the present study introduces BLIVA: an augmented version of InstructBLIP with Visual Assistant. BLIVA incorporates the query embeddings from InstructBLIP and also directly projects encoded patch embeddings into the LLM, a technique inspired by LLaVA. This approach assists the model to capture intricate details potentially missed during the query decoding process. Empirical evidence demonstrates that our model, BLIVA, significantly enhances performance in processing text-rich VQA benchmarks (up to 17.76% in OCR-VQA benchmark) and in undertaking general (not particularly text-rich) VQA benchmarks (up to 7.9% in Visual Spatial Reasoning benchmark), and achieved 17.72% overall improvement in a comprehensive multimodal LLM benchmark (MME), comparing to our baseline InstructBLIP. BLIVA demonstrates significant capability in decoding real-world images, irrespective of text presence. To demonstrate the broad industry applications enabled by BLIVA, we evaluate the model using a new dataset comprising YouTube thumbnails paired with question-answer sets across 11 diverse categories. Our code and models are freely accessible at https://github.com/mlpc-ucsd/BLIVA.Comment: Accepted at AAAI Conference on Artificial Intelligence (AAAI-24

    Analysis of changes in large-scale circulation patterns driving extreme precipitation events over the central-eastern China

    Get PDF
    To an extent, large-scale circulation situations and moisture transport are responsible for extreme precipitation occurrence. The aim of our study is to investigate the possible modifications of circulation patterns (CPs) in driving extreme precipitation over the central-eastern China (CEC). The self-organizing map (SOM) and event synchronization methods are used to link the extreme precipitation events with CPs. Results show that 23% of rain gauges have a significant change point (at the 90% confidence level) in annual extreme precipitation from 1960 to 2015. Based on the identified change points, we classified the data into two periods, that is, 1960–1989 and 1990–2015. Overall, CPs characterized by obvious positive anomalies of 500 hPa geopotential height over the Eastern Eurasia continent and negative values over the surrounding oceans are highly synchronized with extreme precipitation events. During 1990–2015, the predominant CPs are more related to the extreme precipitation with enhanced event synchronization. We found that the CP changes produce an increase in extreme precipitation frequency from 1960–1989 to 1990–2015

    Learning Probabilistic Coordinate Fields for Robust Correspondences

    Full text link
    We introduce Probabilistic Coordinate Fields (PCFs), a novel geometric-invariant coordinate representation for image correspondence problems. In contrast to standard Cartesian coordinates, PCFs encode coordinates in correspondence-specific barycentric coordinate systems (BCS) with affine invariance. To know \textit{when and where to trust} the encoded coordinates, we implement PCFs in a probabilistic network termed PCF-Net, which parameterizes the distribution of coordinate fields as Gaussian mixture models. By jointly optimizing coordinate fields and their confidence conditioned on dense flows, PCF-Net can work with various feature descriptors when quantifying the reliability of PCFs by confidence maps. An interesting observation of this work is that the learned confidence map converges to geometrically coherent and semantically consistent regions, which facilitates robust coordinate representation. By delivering the confident coordinates to keypoint/feature descriptors, we show that PCF-Net can be used as a plug-in to existing correspondence-dependent approaches. Extensive experiments on both indoor and outdoor datasets suggest that accurate geometric invariant coordinates help to achieve the state of the art in several correspondence problems, such as sparse feature matching, dense image registration, camera pose estimation, and consistency filtering. Further, the interpretable confidence map predicted by PCF-Net can also be leveraged to other novel applications from texture transfer to multi-homography classification.Comment: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligenc

    Constraining Depth Map Geometry for Multi-View Stereo: A Dual-Depth Approach with Saddle-shaped Depth Cells

    Full text link
    Learning-based multi-view stereo (MVS) methods deal with predicting accurate depth maps to achieve an accurate and complete 3D representation. Despite the excellent performance, existing methods ignore the fact that a suitable depth geometry is also critical in MVS. In this paper, we demonstrate that different depth geometries have significant performance gaps, even using the same depth prediction error. Therefore, we introduce an ideal depth geometry composed of Saddle-Shaped Cells, whose predicted depth map oscillates upward and downward around the ground-truth surface, rather than maintaining a continuous and smooth depth plane. To achieve it, we develop a coarse-to-fine framework called Dual-MVSNet (DMVSNet), which can produce an oscillating depth plane. Technically, we predict two depth values for each pixel (Dual-Depth), and propose a novel loss function and a checkerboard-shaped selecting strategy to constrain the predicted depth geometry. Compared to existing methods,DMVSNet achieves a high rank on the DTU benchmark and obtains the top performance on challenging scenes of Tanks and Temples, demonstrating its strong performance and generalization ability. Our method also points to a new research direction for considering depth geometry in MVS.Comment: Accepted by ICCV 202

    Detecting GPC3-Expressing Hepatocellular Carcinoma with L5 Peptide-Guided Pretargeting Approach: An In Vitro MRI Experiment

    Get PDF
    Background and Aim: Glypican-3 (GPC3) is a novel molecular target for hepatocellular carcinoma (HCC). This study investigated the potential of an L5 peptide-guided pretargeting approach to identify GPC3-expressing HCC cells using ultra-small super-paramagnetic iron oxide (USPIO) as the MRI probe.Methods: Immunofluorescence with carboxyfluorescein (FAM)-labeled L5 peptide was performed in HepG2 and HL-7702 cells. Polyethylene glycol-modified ultrasmall superparamagnetic iron oxide (PEG-USPIO) and its conjugates with streptavidin (SA-PEG-USPIO) were synthesized, and hydrodynamic diameters, zeta potential, T2 relaxivity, and cytotoxicity were measured. MR T2-weighted imaging of HepG2 was performed to observe signal changes in the pretargeting group, which was first incubated with biotinylated L5 peptide and then with SA-PEG-USPIO. Prussian blue staining of cells was used to assess iron deposition.Results: Immunofluorescence assays showed high specificity of L5 peptide for GPC3. SA-PEG-USPIO nanoparticles had ≈36 nm hydrodynamic diameter, low toxicity, negative charge and high T2 relaxivity. MR imaging revealed that a significant negative enhancement was only observed in HepG2 cells from the pretargeting group, which also showed significant iron deposition with Prussian blue staining.Conclusion: MR imaging with USPIO as the probe has potential to identify GPC3-expressing HCC through L5 peptide-guided pretargeting approach

    Fast Full-frame Video Stabilization with Iterative Optimization

    Full text link
    Video stabilization refers to the problem of transforming a shaky video into a visually pleasing one. The question of how to strike a good trade-off between visual quality and computational speed has remained one of the open challenges in video stabilization. Inspired by the analogy between wobbly frames and jigsaw puzzles, we propose an iterative optimization-based learning approach using synthetic datasets for video stabilization, which consists of two interacting submodules: motion trajectory smoothing and full-frame outpainting. First, we develop a two-level (coarse-to-fine) stabilizing algorithm based on the probabilistic flow field. The confidence map associated with the estimated optical flow is exploited to guide the search for shared regions through backpropagation. Second, we take a divide-and-conquer approach and propose a novel multiframe fusion strategy to render full-frame stabilized views. An important new insight brought about by our iterative optimization approach is that the target video can be interpreted as the fixed point of nonlinear mapping for video stabilization. We formulate video stabilization as a problem of minimizing the amount of jerkiness in motion trajectories, which guarantees convergence with the help of fixed-point theory. Extensive experimental results are reported to demonstrate the superiority of the proposed approach in terms of computational speed and visual quality. The code will be available on GitHub.Comment: Accepted by ICCV202

    Annual precipitation and daily extreme precipitation distribution: possible trends from 1960 to 2010 in urban areas of China

    Get PDF
    With global warming, precipitation events are often prone to intensify in some regions. Understanding the changing characteristics of annual and daily extreme precipitation as well as the underlying mechanisms plays an import role for early warning of precipitation-induced disaster (e.g. floods, landslides) and water resources management, especially in densely populated urban areas. In this study, we investigate the long-term trend of annual and daily extreme precipitation in China during 1960–2010 based on daily observations from 539 meteorological stations, and the land cover map with impervious information. We find an overall increasing trend in annual and daily extreme precipitation, particularly in South-East and North-West of China. Moreover, 157 stations located in metropolitan regions experience higher increasing trends of daily extreme precipitation, particularly in Shanghai and Guangzhou metropolitan areas. It is noted that the central urban area of one metropolitan region may have significantly higher increasing trends of daily extreme precipitation than corresponding surrounding areas

    Thermal-Enhanced bri1-301 Instability Reveals a Plasma Membrane Protein Quality Control System in Plants

    Get PDF
    Brassinosteroids (BRs) are essential phytohormones mainly perceived by a single-pass transmembrane receptor-like protein kinase (RLK), BRASSINOSTEROID INSENSITIVE 1 (BRI1). bri1-5 and bri1-9, two distinct mutants with point mutations in the extracellular domain of BRI1, show weak defective phenotypes. Previous studies indicated that bri1-5 and bri1-9 mutated proteins can be recognized and eliminated via an endoplasmic reticulum quality control (ERQC) mechanism. Most of these two proteins, therefore, cannot reach their destination, plasma membrane. Here, we report our functional characterization of bri1-301, another BRI1 mutant protein with an amino acid substitution in the cytoplasmic kinase domain. bri1-301 is a partially functional BR receptor with significantly decreased protein abundance. Interestingly, protein stability and subcellular localization of bri1-301 are temperature-sensitive. At 22°C, an optimal temperature for indoor Arabidopsis growth, bri1-301 shows a weak defective phenotype. At a lower temperature condition such as 18°C, bri1-301 exhibits subtle morphological defects. At a higher temperature condition such as 28°C, on the other hand, bri1-301 displays an extremely severe phenotype reminiscent to that of a null bri1 mutant due to greatly increased bri1-301 internalization and degradation. Our detailed analyses suggest that bri1-301 stability is controlled by ERQC and plasma membrane quality control (PMQC) systems. Since PMQC has not been well studied in plants, bri1-301 can be used as a model mutant for future genetic dissection of this critical process
    • …
    corecore