967 research outputs found

    Improved SVD + + Recommendation Algorithm Based on Fusion Time Factor

    Get PDF
    Collaborative filtering algorithm is widely used in recommendation system. Aiming at the problems of data sparsity and low recommendation accuracy in traditional collaborative filtering algorithm, an improved recommendation algorithm is proposed PT _ SVD++. Firstly, the attribute information of users and the implicit feedback information of items are introduced to improve the SVD++ algorithm, which solves the insufficient utilization of information and alleviates the problem of sparse data;Secondly the time effect model is established to further improve the accuracy of the prediction results. The experimental results on MovieLens dataset show that compared with other algorithms, the average absolute error and root mean square error of this algorithm are lower, and its recommendation accuracy is higher

    Concatenated Frame Image Based CNN for Visual Speech Recognition

    Get PDF
    AbstractThis paper proposed a novel sequence image representation method called concatenated frame image (CFI), two types of data augmentation methods for CFI, and a framework of CFI-based convolutional neural network (CNN) for visual speech recognition (VSR) task. CFI is a simple, however, it contains spatial-temporal information of a whole image sequence. The proposed method was evaluated with a public database OuluVS2. This is a multi-view audio-visual dataset recorded from 52 subjects. The speaker independent recognition tasks were carried out with various experimental conditions. As the result, the proposed method obtained high recognition accuracy.Abstract This paper proposed a novel sequence image representation method called concatenated frame image (CFI), two types of data augmentation methods for CFI, and a framework of CFI-based convolutional neural network (CNN) for visual speech recognition (VSR) task. CFI is a simple, however, it contains spatial-temporal information of a whole image sequence. The proposed method was evaluated with a public database OuluVS2. This is a multi-view audio-visual dataset recorded from 52 subjects. The speaker independent recognition tasks were carried out with various experimental conditions. As the result, the proposed method obtained high recognition accuracy

    A Penalty Function-based Modelica Library for Multi-body Contact Collision

    Get PDF
    Contact collisions are prevalent in mechanical multi-body systems and have always been a significant limiting factor for engineering technology development. This paper examines the fundamental types of contact in multi-body dynamics systems and explores their inherent topological relationships. Based on the multi-body dynamics theory and penalty function contact algorithm, this paper constructed the multi-body dynamics contact model using Modelica, which is a multi-domain unified modeling language. To enhance the applicability of the contact model library in the modeling of multi-body system, the contact model provides a connection interface compatible with the multi-body library in the Modelica standard library

    Characterizing subtle facial movements via Riemannian manifold

    Get PDF
    AbstractCharacterizing subtle facial movements from videos is one of the most intensive topics in computer vision research. It is, however, challenging, since (1) the intensity of subtle facial muscle movement is usually low, (2) the duration may be transient, and (3) datasets containing spontaneous subtle movements with reliable annotations are painful to obtain and often of small sizes.This article is targeted at addressing these problems for characterizing subtle facial movements from both the aspects of motion elucidation and description. First, we propose an efficient method for elucidating hidden and repressed movements to make them easier to get noticed. We explore the feasibility of linearizing motion magnification and temporal interpolation, which is obscured by the architecture of existing methods. On this basis, we propose a consolidated framework, termed MOTEL, to expand temporal duration and amplify subtle facial movements simultaneously. Second, we make our contribution to dynamic description. One major challenge is to capture the intrinsic temporal variations caused by movements and omit extrinsic ones caused by different individuals and various environments. To diminish the influences of such extrinsic diversity, we propose the tangent delta descriptor to characterize the dynamics of short-term movements using the differences between points on the tangent spaces to the manifolds, rather than the points themselves. We then relax the trajectory-smooth assumption of the conventional manifold-based trajectory modeling methods and incorporate the tangent delta descriptor with the sequential inference approaches to cover the period of facial movements. The proposed motion modeling approach is validated by a series of experiments on publicly available datasets in the tasks of micro-expression recognition and visual speech recognition.Abstract Characterizing subtle facial movements from videos is one of the most intensive topics in computer vision research. It is, however, challenging, since (1) the intensity of subtle facial muscle movement is usually low, (2) the duration may be transient, and (3) datasets containing spontaneous subtle movements with reliable annotations are painful to obtain and often of small sizes. This article is targeted at addressing these problems for characterizing subtle facial movements from both the aspects of motion elucidation and description. First, we propose an efficient method for elucidating hidden and repressed movements to make them easier to get noticed. We explore the feasibility of linearizing motion magnification and temporal interpolation, which is obscured by the architecture of existing methods. On this basis, we propose a consolidated framework, termed MOTEL, to expand temporal duration and amplify subtle facial movements simultaneously. Second, we make our contribution to dynamic description. One major challenge is to capture the intrinsic temporal variations caused by movements and omit extrinsic ones caused by different individuals and various environments. To diminish the influences of such extrinsic diversity, we propose the tangent delta descriptor to characterize the dynamics of short-term movements using the differences between points on the tangent spaces to the manifolds, rather than the points themselves. We then relax the trajectory-smooth assumption of the conventional manifold-based trajectory modeling methods and incorporate the tangent delta descriptor with the sequential inference approaches to cover the period of facial movements. The proposed motion modeling approach is validated by a series of experiments on publicly available datasets in the tasks of micro-expression recognition and visual speech recognition

    Experimental study of carbonated water imbibition in deep coal rocks using nuclear magnetic resonance spectroscopy

    Get PDF
    The deep eastern edge of the Ordos Basin is rich in coalbed methane, presenting great potential for development. Meanwhile, CO₂ imbibition is an important method to increase production. To study the CO₂-water-rock interactions and microstructural damage characteristics before and after supercritical carbon dioxide immersion in deep coal rocks, CO₂ imbibition experiments were conducted on these rocks by using nuclear magnetic resonance and scanning electron microscopy imaging techniques. The results showed that CO₂ imbibition leads to pore dilatation and reveals the key role of coal rock anisotropy on imbibition efficiency under different physicochemical conditions. Specifically, the immersion of CO₂ produces cracks due to the brittle action of the coal rock, as well as calcite dissolution that exacerbates crack production and expansion. Due to adsorption of CO₂, part of the coal rock becomes swollen, which leads to detachment and changed the physical properties and surface characteristics of the coal rock.Document Type: Original articleCited as: Yang, L., Liu, Z., Zhao, Z., Li, W., Ding, J., Sun, L. Experimental study of carbonated water imbibition in deep coal rocks using nuclear magnetic resonance spectroscopy. Capillarity, 2025, 16(2): 27-38. https://doi.org/10.46690/capi.2025.08.0

    RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis

    Full text link
    Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field

    PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

    Full text link
    In this paper, we focus on the problem of Medical Visual Question Answering (MedVQA), which is crucial in efficiently interpreting medical images with vital clinic-relevant information. Firstly, we reframe the problem of MedVQA as a generation task that naturally follows the human-machine interaction, we propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model. Secondly, we establish a scalable pipeline to construct a large-scale medical visual question-answering dataset, named PMC-VQA, which contains 227k VQA pairs of 149k images that cover various modalities or diseases. Thirdly, we pre-train our proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD and SLAKE, outperforming existing work by a large margin. Additionally, we propose a test set that has undergone manual verification, which is significantly more challenging, even the best models struggle to solve

    One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts

    Full text link
    In this study, we aim to build up a model that can Segment Anything in radiology scans, driven by Text prompts, termed as SAT. Our main contributions are three folds: (i) for dataset construction, we construct the first multi-modal knowledge tree on human anatomy, including 6502 anatomical terminologies; Then we build up the largest and most comprehensive segmentation dataset for training, by collecting over 22K 3D medical image scans from 72 segmentation datasets, across 497 classes, with careful standardization on both image scans and label space; (ii) for architecture design, we propose to inject medical knowledge into a text encoder via contrastive learning, and then formulate a universal segmentation model, that can be prompted by feeding in medical terminologies in text form; (iii) As a result, we have trained SAT-Nano (110M parameters) and SAT-Pro (447M parameters), demonstrating comparable performance to 72 specialist nnU-Nets trained on each dataset/subsets. We validate SAT as a foundational segmentation model, with better generalization ability on external (unseen) datasets, and can be further improved on specific tasks after fine-tuning adaptation. Comparing with interactive segmentation model, for example, MedSAM, segmentation model prompted by text enables superior performance, scalability and robustness. As a use case, we demonstrate that SAT can act as a powerful out-of-the-box agent for large language models, enabling visual grounding in clinical procedures such as report generation. All the data, codes, and models in this work have been released.Comment: 59 page

    Background subtraction using spatio-temporal group sparsity recovery

    Get PDF
    AbstractBackground subtraction is a key step in a wide spectrum of video applications, such as object tracking and human behavior analysis. Compressive sensing-based methods, which make little specific assumptions about the background, have recently attracted wide attention in background subtraction. Within the framework of compressive sensing, background subtraction is solved as a decomposition and optimization problem, where the foreground is typically modeled as pixel-wised sparse outliers. However, in real videos, foreground pixels are often not randomly distributed, but instead, group clustered. Moreover, due to costly computational expenses, most compressive sensing-based methods are unable to process frames online. In this paper, we take into account the group properties of foreground signals in both spatial and temporal domains, and propose a greedy pursuit-based method called spatio-temporal group sparsity recovery, which prunes data residues in an iterative process, according to both sparsity and group clustering priors, rather than merely sparsity. Furthermore, a random strategy for background dictionary learning is used to handle complex background variations, while foreground-free training is not required. Finally, we propose a two-pass framework to achieve online processing. The proposed method is validated on multiple challenging video sequences. Experiments demonstrate that our approach effectively works on a wide range of complex scenarios and achieves a state-of-the-art performance with far fewer computations.Abstract Background subtraction is a key step in a wide spectrum of video applications, such as object tracking and human behavior analysis. Compressive sensing-based methods, which make little specific assumptions about the background, have recently attracted wide attention in background subtraction. Within the framework of compressive sensing, background subtraction is solved as a decomposition and optimization problem, where the foreground is typically modeled as pixel-wised sparse outliers. However, in real videos, foreground pixels are often not randomly distributed, but instead, group clustered. Moreover, due to costly computational expenses, most compressive sensing-based methods are unable to process frames online. In this paper, we take into account the group properties of foreground signals in both spatial and temporal domains, and propose a greedy pursuit-based method called spatio-temporal group sparsity recovery, which prunes data residues in an iterative process, according to both sparsity and group clustering priors, rather than merely sparsity. Furthermore, a random strategy for background dictionary learning is used to handle complex background variations, while foreground-free training is not required. Finally, we propose a two-pass framework to achieve online processing. The proposed method is validated on multiple challenging video sequences. Experiments demonstrate that our approach effectively works on a wide range of complex scenarios and achieves a state-of-the-art performance with far fewer computations
    corecore