279 research outputs found

    Asymptotic-Preserving Neural Networks for Multiscale Kinetic Equations

    Full text link
    In this paper, we present two novel Asymptotic-Preserving Neural Networks (APNNs) for tackling multiscale time-dependent kinetic problems, encompassing the linear transport equation and Bhatnagar-Gross-Krook (BGK) equation with diffusive scaling. Our primary objective is to devise efficient and accurate APNN approaches for resolving multiscale kinetic equations. We have established a neural network based on even-odd decomposition and concluded that enforcing the initial condition for the linear transport equation with inflow boundary conditions is crucial. This APNN method based on even-odd parity relaxes the stringent conservation prerequisites while concurrently introducing an auxiliary deep neural network. Additionally, we have incorporated the conservation laws of mass, momentum, and energy for the Boltzmann-BGK equation into the APNN framework by enforcing exact boundary conditions. This is our second contribution. The most notable finding of this study is that approximating the zeroth, first and second moments of the particle density distribution is simpler than the distribution itself. Furthermore, a compelling phenomenon in the training process is that the convergence of density is swifter than that of momentum and energy. Finally, we investigate several benchmark problems to demonstrate the efficacy of our proposed APNN methods

    Understanding Loss Landscapes of Neural Network Models in Solving Partial Differential Equations

    Full text link
    Solving partial differential equations (PDEs) by parametrizing its solution by neural networks (NNs) has been popular in the past a few years. However, different types of loss functions can be proposed for the same PDE. For the Poisson equation, the loss function can be based on the weak formulation of energy variation or the least squares method, which leads to the deep Ritz model and deep Galerkin model, respectively. But loss landscapes from these different models give arise to different practical performance of training the NN parameters. To investigate and understand such practical differences, we propose to compare the loss landscapes of these models, which are both high dimensional and highly non-convex. In such settings, the roughness is more important than the traditional eigenvalue analysis to describe the non-convexity. We contribute to the landscape comparisons by proposing a roughness index to scientifically and quantitatively describe the heuristic concept of "roughness" of landscape around minimizers. This index is based on random projections and the variance of (normalized) total variation for one dimensional projected functions, and it is efficient to compute. A large roughness index hints an oscillatory landscape profile as a severe challenge for the first order optimization method. We apply this index to the two models for the Poisson equation and our empirical results reveal a consistent general observation that the landscapes from the deep Galerkin method around its local minimizers are less rough than the deep Ritz method, which supports the observed gain in accuracy of the deep Galerkin method

    Asymptotic-Preserving Convolutional DeepONets Capture the Diffusive Behavior of the Multiscale Linear Transport Equations

    Full text link
    In this paper, we introduce two types of novel Asymptotic-Preserving Convolutional Deep Operator Networks (APCONs) designed to address the multiscale time-dependent linear transport problem. We observe that the vanilla physics-informed DeepONets with modified MLP may exhibit instability in maintaining the desired limiting macroscopic behavior. Therefore, this necessitates the utilization of an asymptotic-preserving loss function. Drawing inspiration from the heat kernel in the diffusion equation, we propose a new architecture called Convolutional Deep Operator Networks, which employ multiple local convolution operations instead of a global heat kernel, along with pooling and activation operations in each filter layer. Our APCON methods possess a parameter count that is independent of the grid size and are capable of capturing the diffusive behavior of the linear transport problem. Finally, we validate the effectiveness of our methods through several numerical examples

    Data, Data, Everywhere: Uncovering Everyday Data Experiences for People with Intellectual and Developmental Disabilities

    Full text link
    Data is everywhere but may not be accessible to everyone. Conventional data visualization tools and guidelines often do not actively consider the specific needs and abilities of people with Intellectual and Developmental Disabilities (IDD), leaving them excluded from data-driven activities and vulnerable to ethical issues. To understand the needs and challenges people with IDD have with data, we conducted 15 semi-structured interviews with individuals with IDD and their caregivers. Our algorithmic interview approach situated data in the lived experiences of people with IDD to uncover otherwise hidden data encounters in their everyday life. Drawing on findings and observations, we characterize how they conceptualize data, when and where they use data, and what barriers exist when they interact with data. We use our results as a lens to reimagine the role of visualization in data accessibility and establish a critical near-term research agenda for cognitively accessible visualization

    Align before Search: Aligning Ads Image to Text for Accurate Cross-Modal Sponsored Search

    Full text link
    Cross-Modal sponsored search displays multi-modal advertisements (ads) when consumers look for desired products by natural language queries in search engines. Since multi-modal ads bring complementary details for query-ads matching, the ability to align ads-specific information in both images and texts is crucial for accurate and flexible sponsored search. Conventional research mainly studies from the view of modeling the implicit correlations between images and texts for query-ads matching, ignoring the alignment of detailed product information and resulting in suboptimal search performance.In this work, we propose a simple alignment network for explicitly mapping fine-grained visual parts in ads images to the corresponding text, which leverages the co-occurrence structure consistency between vision and language spaces without requiring expensive labeled training data. Moreover, we propose a novel model for cross-modal sponsored search that effectively conducts the cross-modal alignment and query-ads matching in two separate processes. In this way, the model matches the multi-modal input in the same language space, resulting in a superior performance with merely half of the training data. Our model outperforms the state-of-the-art models by 2.57% on a large commercial dataset. Besides sponsored search, our alignment method is applicable for general cross-modal search. We study a typical cross-modal retrieval task on the MSCOCO dataset, which achieves consistent performance improvement and proves the generalization ability of our method. Our code is available at https://github.com/Pter61/AlignCMSS

    Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

    Full text link
    Different from Composed Image Retrieval task that requires expensive labels for training task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks with a broad range of visual content manipulation intent that could be related to domain, scene, object, and attribute. The key challenge for ZS-CIR tasks is to learn a more accurate image representation that has adaptive attention to the reference image for various manipulation descriptions. In this paper, we propose a novel context-dependent mapping network, named Context-I2W, for adaptively converting description-relevant Image information into a pseudo-word token composed of the description for accurate ZS-CIR. Specifically, an Intent View Selector first dynamically learns a rotation rule to map the identical image to a task-specific manipulation view. Then a Visual Target Extractor further captures local information covering the main targets in ZS-CIR tasks under the guidance of multiple learnable queries. The two complementary modules work together to map an image to a context-dependent pseudo-word token without extra supervision. Our model shows strong generalization ability on four ZS-CIR tasks, including domain conversion, object composition, object manipulation, and attribute manipulation. It obtains consistent and significant performance boosts ranging from 1.88% to 3.60% over the best methods and achieves new state-of-the-art results on ZS-CIR. Our code is available at https://github.com/Pter61/context_i2w

    Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service

    Full text link
    Recent advances in vision-language pre-trained models (VLPs) have significantly increased visual understanding and cross-modal analysis capabilities. Companies have emerged to provide multi-modal Embedding as a Service (EaaS) based on VLPs (e.g., CLIP-based VLPs), which cost a large amount of training data and resources for high-performance service. However, existing studies indicate that EaaS is vulnerable to model extraction attacks that induce great loss for the owners of VLPs. Protecting the intellectual property and commercial ownership of VLPs is increasingly crucial yet challenging. A major solution of watermarking model for EaaS implants a backdoor in the model by inserting verifiable trigger embeddings into texts, but it is only applicable for large language models and is unrealistic due to data and model privacy. In this paper, we propose a safe and robust backdoor-based embedding watermarking method for VLPs called VLPMarker. VLPMarker utilizes embedding orthogonal transformation to effectively inject triggers into the VLPs without interfering with the model parameters, which achieves high-quality copyright verification and minimal impact on model performance. To enhance the watermark robustness, we further propose a collaborative copyright verification strategy based on both backdoor trigger and embedding distribution, enhancing resilience against various attacks. We increase the watermark practicality via an out-of-distribution trigger selection approach, removing access to the model training data and thus making it possible for many real-world scenarios. Our extensive experiments on various datasets indicate that the proposed watermarking approach is effective and safe for verifying the copyright of VLPs for multi-modal EaaS and robust against model extraction attacks. Our code is available at https://github.com/Pter61/vlpmarker
    • …
    corecore