376,251 research outputs found

    PuMer: Pruning and Merging Tokens for Efficient Vision Language Models

    Full text link
    Large-scale vision language (VL) models use Transformers to perform cross-modal interactions between the input text and image. These cross-modal interactions are computationally expensive and memory-intensive due to the quadratic complexity of processing the input image and text. We present PuMer: a token reduction framework that uses text-informed Pruning and modality-aware Merging strategies to progressively reduce the tokens of input image and text, improving model inference speed and reducing memory footprint. PuMer learns to keep salient image tokens related to the input text and merges similar textual and visual tokens by adding lightweight token reducer modules at several cross-modal layers in the VL model. Training PuMer is mostly the same as finetuning the original VL model but faster. Our evaluation for two vision language models on four downstream VL tasks shows PuMer increases inference throughput by up to 2x and reduces memory footprint by over 50% while incurring less than a 1% accuracy drop.Comment: Accepted to ACL 2023 Main Conferenc

    Large Model Visualization : Techniques and Applications

    Get PDF
    The size of datasets in scientific computing is rapidly increasing. This increase is caused by a boost of processing power in the past years, which in turn was invested in an increase of the accuracy and the size of the models. A similar trend enabled a significant improvement of medical scanners; more than 1000 slices of a resolution of 512x512 can be generated by modern scanners in daily practice. Even in computer-aided engineering typical models eas-ily contain several million polygons. Unfortunately, the data complexity is growing faster than the rendering performance of modern computer systems. This is not only due to the slower growing graphics performance of the graphics subsystems, but in particular because of the significantly slower growing memory bandwidth for the transfer of the geometry and image data from the main memory to the graphics accelerator. Large model visualization addresses this growing divide between data complexity and rendering performance. Most methods focus on the reduction of the geometric or pixel complexity, and hence also the memory bandwidth requirements are reduced. In this dissertation, we discuss new approaches from three different research areas. All approaches target at the reduction of the processing complexity to achieve an interactive visualization of large datasets. In the second part, we introduce applications of the presented ap-proaches. Specifically, we introduce the new VIVENDI system for the interactive virtual endoscopy and other applications from mechanical engineering, scientific computing, and architecture.The size of datasets in scientific computing is rapidly increasing. This increase is caused by a boost of processing power in the past years, which in turn was invested in an increase of the accuracy and the size of the models. A similar trend enabled a significant improvement of medical scanners; more than 1000 slices of a resolution of 512x512 can be generated by modern scanners in daily practice. Even in computer-aided engineering typical models eas-ily contain several million polygons. Unfortunately, the data complexity is growing faster than the rendering performance of modern computer systems. This is not only due to the slower growing graphics performance of the graphics subsystems, but in particular because of the significantly slower growing memory bandwidth for the transfer of the geometry and image data from the main memory to the graphics accelerator. Large model visualization addresses this growing divide between data complexity and rendering performance. Most methods focus on the reduction of the geometric or pixel complexity, and hence also the memory bandwidth requirements are reduced. In this dissertation, we discuss new approaches from three different research areas. All approaches target at the reduction of the processing complexity to achieve an interactive visualization of large datasets. In the second part, we introduce applications of the presented ap-proaches. Specifically, we introduce the new VIVENDI system for the interactive virtual endoscopy and other applications from mechanical engineering, scientific computing, and architecture

    Low complexity object detection with background subtraction for intelligent remote monitoring

    Get PDF

    Memory vectors for similarity search in high-dimensional spaces

    Get PDF
    We study an indexing architecture to store and search in a database of high-dimensional vectors from the perspective of statistical signal processing and decision theory. This architecture is composed of several memory units, each of which summarizes a fraction of the database by a single representative vector. The potential similarity of the query to one of the vectors stored in the memory unit is gauged by a simple correlation with the memory unit's representative vector. This representative optimizes the test of the following hypothesis: the query is independent from any vector in the memory unit vs. the query is a simple perturbation of one of the stored vectors. Compared to exhaustive search, our approach finds the most similar database vectors significantly faster without a noticeable reduction in search quality. Interestingly, the reduction of complexity is provably better in high-dimensional spaces. We empirically demonstrate its practical interest in a large-scale image search scenario with off-the-shelf state-of-the-art descriptors.Comment: Accepted to IEEE Transactions on Big Dat

    The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch

    Get PDF
    Recent and forthcoming advances in instrumentation, and giant new surveys, are creating astronomical data sets that are not amenable to the methods of analysis familiar to astronomers. Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets. Mathematical limitations of familiar algorithms and techniques in dealing with such data sets create a critical need for new paradigms for the representation, analysis and scientific visualization (as opposed to illustrative visualization) of heterogeneous, multiresolution data across application domains. Some of the problems presented by the new data sets have been addressed by other disciplines such as applied mathematics, statistics and machine learning and have been utilized by other sciences such as space-based geosciences. Unfortunately, valuable results pertaining to these problems are mostly to be found only in publications outside of astronomy. Here we offer brief overviews of a number of concepts, techniques and developments, some "old" and some new. These are generally unknown to most of the astronomical community, but are vital to the analysis and visualization of complex datasets and images. In order for astronomers to take advantage of the richness and complexity of the new era of data, and to be able to identify, adopt, and apply new solutions, the astronomical community needs a certain degree of awareness and understanding of the new concepts. One of the goals of this paper is to help bridge the gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in Astronomy, special issue "Robotic Astronomy
    corecore