43 research outputs found

    From Pixels to Prose: A Large Dataset of Dense Image Captions

    Full text link
    Training large vision-language models requires extensive, high-quality image-text pairs. Existing web-scraped datasets, however, are noisy and lack detailed image descriptions. To bridge this gap, we introduce PixelProse, a comprehensive dataset of over 16M (million) synthetically generated captions, leveraging cutting-edge vision-language models for detailed and accurate descriptions. To ensure data integrity, we rigorously analyze our dataset for problematic content, including child sexual abuse material (CSAM), personally identifiable information (PII), and toxicity. We also provide valuable metadata such as watermark presence and aesthetic scores, aiding in further dataset filtering. We hope PixelProse will be a valuable resource for future vision-language research. PixelProse is available at https://huggingface.co/datasets/tomg-group-umd/pixelproseComment: pixelprose 16M datase

    Solubility trapping as a potential secondary mechanism for CO2 sequestration during enhanced gas recovery by CO2 injection in conventional natural gas reservoirs : an experimental approach

    Get PDF
    This study aims to experimentally investigate the potential of solubility trapping mechanism in increasing CO2 storage during EGR by CO2 injection and sequestration in conventional natural gas reservoirs. A laboratory core flooding process was carried out to simulate EGR on a sandstone core at 0, 5, 10 wt% NaCl formation water salinity at 1300 psig, 50 °C and 0.3 ml/min injection rate. The results show that CO2 storage capacity was improved significantly when solubility trapping was considered. Lower connate water salinities (0 and 5 wt%) showed higher CO2 solubility from IFT measurements. With 10% connate water salinity, the highest accumulation of the CO2 in the reservoir was realised with about 63% of the total CO2 injected stored; an indication of improved storage capacity. Therefore, solubility trapping can potentially increase the CO2 storage capacity of the gas reservoir by serving as a secondary trapping mechanism in addition to the primary structural and stratigraphic trapping and improving CH4 recovery

    Pruning for Efficient Deep Learning: From CNNs to Generative Models

    No full text
    Deep learning models have shown remarkable success in visual recognition and generative modeling tasks in computer vision in the last decade. A general trend is that their performance improves with an increase in the size of their training data, model capacity, and training iterations on modern hardware. However, the increase in model size naturally leads to higher computational complexity and memory footprint, thereby necessitating high-end hardware for their deployment. This trade-off prevents the deployment of deep learning models in resource-constrained environments such as robotic applications, mobile phones, and edge devices employed in the Artificial Internet of Things (AIoT). In addition, private companies and organizations have to spend significant resources on cloud services to serve deep models for their customers. In this dissertation, we develop model pruning and Neural Architecture Search (NAS) methods to improve the inference efficiency of deep learning models for visual recognition and generative modeling applications. We design our methods to be tailored to the unique characteristics of each model and its task. In the first part, we present model pruning and efficient NAS methods for Convolutional Neural Network (CNN) classifiers. We start by proposing a pruning method that leverages interpretations of a pretrained model's decisions to prune its redundant structures. Then, we provide an efficient NAS method to learn kernel sizes of a CNN model using their training dataset and given a parameter budget for the model, enabling designing efficient CNNs customized for their target application. Finally, we develop a framework for simultaneous pretraining and pruning of CNNs, which combines the first two stage of the pretrain-prune-finetune pipeline commonly used in model pruning and reduces its complexity. In the second part, we propose model pruning methods for visual generative models. First, we present a pruning method for conditional Generative Adversarial Networks (GANs) in which we prune the generator and discriminator models in a collaborative manner. We then address the inference efficiency of diffusion models by proposing a method that prunes a pretrained diffusion model into a mixture of efficient experts, each handling a separate part of the denoising process. Finally, we develop an adaptive prompt-tailored pruning method for modern text-to-image diffusion models. It prunes a pretrained model like Stable Diffusion into a mixture of efficient experts such that each expert specializes in certain type of input prompts

    Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment

    Full text link
    Structural model pruning is a prominent approach used for reducing the computational cost of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained devices. Yet, the majority of proposed ideas require a pretrained model before pruning, which is costly to secure. In this paper, we propose a novel structural pruning approach to jointly learn the weights and structurally prune architectures of CNN models. The core element of our method is a Reinforcement Learning (RL) agent whose actions determine the pruning ratios of the CNN model's layers, and the resulting model's accuracy serves as its reward. We conduct the joint training and pruning by iteratively training the model's weights and the agent's policy, and we regularize the model's weights to align with the selected structure by the agent. The evolving model's weights result in a dynamic reward function for the agent, which prevents using prominent episodic RL methods with stationary environment assumption for our purpose. We address this challenge by designing a mechanism to model the complex changing dynamics of the reward function and provide a representation of it to the RL agent. To do so, we take a learnable embedding for each training epoch and employ a recurrent model to calculate a representation of the changing environment. We train the recurrent model and embeddings using a decoder model to reconstruct observed rewards. Such a design empowers our agent to effectively leverage episodic observations along with the environment representations to learn a proper policy to determine performant sub-networks of the CNN model. Our extensive experiments on CIFAR-10 and ImageNet using ResNets and MobileNets demonstrate the effectiveness of our method.Comment: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 202
    corecore