91 research outputs found

    VisualGPTScore: Visio-Linguistic Reasoning with Multimodal Generative Pre-Training Scores

    Full text link
    Vision-language models (VLMs) discriminatively pre-trained with contrastive image-text matching losses such as P(matchtext,image)P(\text{match}|\text{text}, \text{image}) have been criticized for lacking compositional understanding. This means they might output similar scores even if the original caption is rearranged into a different semantic statement. To address this, we propose to use the V{\bf V}isual G{\bf G}enerative P{\bf P}re-T{\bf T}raining Score (VisualGPTScore{\bf VisualGPTScore}) of P(textimage)P(\text{text}|\text{image}), a multimodal generative\textit{multimodal generative} score that captures the likelihood of a text caption conditioned on an image using an image-conditioned language model. Contrary to the belief that VLMs are mere bag-of-words models, our off-the-shelf VisualGPTScore demonstrates top-tier performance on recently proposed image-text retrieval benchmarks like ARO and Crepe that assess compositional reasoning. Furthermore, we factorize VisualGPTScore into a product of the marginal\textit{marginal} P(text) and the Pointwise Mutual Information\textit{Pointwise Mutual Information} (PMI). This helps to (a) diagnose datasets with strong language bias, and (b) debias results on other benchmarks like Winoground using an information-theoretic framework. VisualGPTScore provides valuable insights and serves as a strong baseline for future evaluation of visio-linguistic compositionality.Comment: Website: https://linzhiqiu.github.io/papers/visual_gpt_score/ Code: https://github.com/linzhiqiu/visual_gpt_score

    How to get better embeddings with code pre-trained models? An empirical study

    Full text link
    Pre-trained language models have demonstrated powerful capabilities in the field of natural language processing (NLP). Recently, code pre-trained model (PTM), which draw from the experiences of the NLP field, have also achieved state-of-the-art results in many software engineering (SE) downstream tasks. These code PTMs take into account the differences between programming languages and natural languages during pre-training and make adjustments to pre-training tasks and input data. However, researchers in the SE community still inherit habits from the NLP field when using these code PTMs to generate embeddings for SE downstream classification tasks, such as generating semantic embeddings for code snippets through special tokens and inputting code and text information in the same way as pre-training the PTMs. In this paper, we empirically study five different PTMs (i.e. CodeBERT, CodeT5, PLBART, CodeGPT and CodeGen) with three different architectures (i.e. encoder-only, decoder-only and encoder-decoder) on four SE downstream classification tasks (i.e. code vulnerability detection, code clone detection, just-in-time defect prediction and function docstring mismatch detection) with respect to the two aforementioned aspects. Our experimental results indicate that (1) regardless of the architecture of the code PTMs used, embeddings obtained through special tokens do not sufficiently aggregate the semantic information of the entire code snippet; (2) the quality of code embeddings obtained by combing code data and text data in the same way as pre-training the PTMs is poor and cannot guarantee richer semantic information; (3) using the method that aggregates the vector representations of all code tokens, the decoder-only PTMs can obtain code embeddings with semantics as rich as or even better quality than those obtained from the encoder-only and encoder-decoder PTMs

    Towards an Accurate and Secure Detector against Adversarial Perturbations

    Full text link
    The vulnerability of deep neural networks to adversarial perturbations has been widely perceived in the computer vision community. From a security perspective, it poses a critical risk for modern vision systems, e.g., the popular Deep Learning as a Service (DLaaS) frameworks. For protecting off-the-shelf deep models while not modifying them, current algorithms typically detect adversarial patterns through discriminative decomposition of natural-artificial data. However, these decompositions are biased towards frequency or spatial discriminability, thus failing to capture adversarial patterns comprehensively. More seriously, successful defense-aware (secondary) adversarial attack (i.e., evading the detector as well as fooling the model) is practical under the assumption that the adversary is fully aware of the detector (i.e., the Kerckhoffs's principle). Motivated by such facts, we propose an accurate and secure adversarial example detector, relying on a spatial-frequency discriminative decomposition with secret keys. It expands the above works on two aspects: 1) the introduced Krawtchouk basis provides better spatial-frequency discriminability and thereby is more suitable for capturing adversarial patterns than the common trigonometric or wavelet basis; 2) the extensive parameters for decomposition are generated by a pseudo-random function with secret keys, hence blocking the defense-aware adversarial attack. Theoretical and numerical analysis demonstrates the increased accuracy and security of our detector with respect to a number of state-of-the-art algorithms

    Correction to a Simple Biosphere Model 2 (SiB2) Simulation of Energy and Carbon Dioxide Fluxes over a Wheat Cropland in East China Using the Random Forest Model

    No full text
    Modeling the heat and carbon dioxide (CO2) exchanges in agroecosystems is critical for better understanding water and carbon cycling, improving crop production, and even mitigating climate change, in agricultural regions. While previous studies mainly focused on simulations of the energy and CO2 fluxes in agroecosystems on the North China Plain, their corrections, simulations and driving forces in East China are less understood. In this study, the dynamic variations of heat and CO2 fluxes were simulated by a standalone version of the Simple Biosphere 2 (SiB2) model and subsequently corrected using a Random Forest (RF) machine learning model, based on measurements from 1 January to 31 May 2015–2017 in eastern China. Through validation with direct measurements, it was found that the SiB2 model overestimated the sensible heat flux (H) and latent heat flux (LE), but underestimated soil heat flux (G0) and CO2 flux (Fc). Thus, the RF model was used to correct the results modeled by SiB2. The RF model showed that disturbances in temperature, net radiation, the G0 output of SiB2, and the Fc output of SiB2 were the key driving factors modulating the H, LE, G0, and Fc. The RF model performed well and significantly reduced the biases for H, LE, G0, and Fc simulated by SiB2, with higher R2 values of 0.99, 0.87, 0.75, and 0.71, respectively. The SiB2 and RF models combine physical mechanisms and mathematical correction to enable simulations with both physical meaning and accuracy

    Current and future precipitation extremes over Mississippi and Yangtze River basins as simulated in CMIP5 models

    No full text
    Both central-eastern U.S. and China are prone to increasing flooding from Mississippi River and Yangtze River basins respectively. This paper contrasts historical and projected spatialtemporal distribution of extreme precipitation in these two large river basins using 31 CMIP5 (coupled model intercomparison project phase 5) models' historical and RCP8.5 (representative concentration pathway) experiments. Results show that (1) over both river basins, the heaviest rainfall events have increased in recent decades while the lightest precipitation reduced in frequency. Over Mississippi River Basin, both the lightest precipitation (< 2.5 mm/day) and heaviest (> 50 mm/day) would decrease in frequency notably after mid-2020s while intermediate events occur more frequently in future; whereas over the Yangtze River Basin, all categories of precipitation are projected to increase in frequency over the coming decades. (2) Although the consensus of CMIP5 models was able to reproduce well domain-time mean and even time-averaged spatial distribution of precipitation, they failed to simulate precipitation trends both in spatial distribution and time means. In a similar fashion, models captured well statistics of precipitation but they had difficulty in representing temporal variations of different precipitation intensity categories. (3) The well-documented 2nd half of the 20th century surface summer cooling over the two river basins showed different associations with precipitation trends with higher anti-correlation between them over the U.S. region, implying different processes contributing to the cooling mechanisms of the two river basins

    Long-Term Observation of the Quasi-3-Hour Large-Scale Traveling Ionospheric Disturbances by the Oblique-Incidence Ionosonde Network in North China

    No full text
    The oblique-incidence ionosonde network in North China is a very unique system for regional ionospheric observation. It contains 5 transmitters and 20 receivers, and it has 99 ionospheric observation points between 22.40° N and 33.19° N geomagnetic latitudes. The data of the ionosonde network were used to investigate the statistical characteristics of the quasi-3-h large-scale traveling ionospheric disturbances (LSTIDs). From September 2009 to August 2011, 157 cases of the quiet-time LSTIDs were recorded; 110 cases traveled southward, 46 cases traveled southwestward and only 1 case traveled southeastward. The LSTIDs mainly appeared between 10:00 and 19:00 LT in the months from September to the following May. We compared the data of the Beijing, Mohe and Yakutsk digisondes and found that the LSTIDs are most likely to come from the northern auroral region. These LSTIDs may be induced by the atmospheric gravity waves (AGWs) and presented obvious seasonal and diurnal varying features, indicating that the thermospheric wind field has played an important role

    Soil Compressibility and Resilience Based on Uniaxial Compression Loading Test in Response to Soil Water Suction and Soil Organic Matter Content in Northeast China

    No full text
    Due to the widespread use of heavy machinery, improper soil tillage practices, and insufficient soil organic materials input, soil compaction has become a major issue affecting soil function in modern agriculture and the sustainability of the environment. The aim of the present study was to evaluate the responses of soil mechanical parameters to soil water content and soil organic matter content (SOM), and to investigate the physical properties of nine disturbed soils in a black soil region in Northeast China. The soil samples were capillary saturated and subjected to 6, 10, 100, 600, and 800 kPa soil water suction (SWS), and pre-compression stress (σp), compression index (Cc), and decompression index (Dc) were measured. SWS and SOM, and their interaction, significantly influenced the mechanical parameters. σp increased with an increase in SWS until 600 kPa, while Dc exhibited an opposite trend with an increase in SWS. Cc had a peak value at SWS of 100 kPa. All mechanical parameter values were higher under high SOM than under low SOM. σp, Cc, and Dc were influenced variably by different soil physicochemical factors. Structural equation modeling results revealed that soil mechanical parameters were directly and indirectly influenced by soil texture and mean weight diameter of aggregates, in addition to SOM and SWS. According to the results of the present study, based on soil mechanical and physical properties, increasing SOM and ensuring suitable soil water content during tillage could be applied as management strategies to minimize further soil compaction and improve soil resilience, and thus promote the sustainable development of agriculture in Northeast China
    corecore