88 research outputs found

    Task Difficulty Aware Parameter Allocation & Regularization for Lifelong Learning

    Full text link
    Parameter regularization or allocation methods are effective in overcoming catastrophic forgetting in lifelong learning. However, they solve all tasks in a sequence uniformly and ignore the differences in the learning difficulty of different tasks. So parameter regularization methods face significant forgetting when learning a new task very different from learned tasks, and parameter allocation methods face unnecessary parameter overhead when learning simple tasks. In this paper, we propose the Parameter Allocation & Regularization (PAR), which adaptively select an appropriate strategy for each task from parameter allocation and regularization based on its learning difficulty. A task is easy for a model that has learned tasks related to it and vice versa. We propose a divergence estimation method based on the Nearest-Prototype distance to measure the task relatedness using only features of the new task. Moreover, we propose a time-efficient relatedness-aware sampling-based architecture search strategy to reduce the parameter overhead for allocation. Experimental results on multiple benchmarks demonstrate that, compared with SOTAs, our method is scalable and significantly reduces the model's redundancy while improving the model's performance. Further qualitative analysis indicates that PAR obtains reasonable task-relatedness.Comment: Accepted by CVPR2023. Code is available at https://github.com/WenjinW/PA

    Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering

    Full text link
    The pre-training-fine-tuning paradigm based on layout-aware multimodal pre-trained models has achieved significant progress on document image question answering. However, domain pre-training and task fine-tuning for additional visual, layout, and task modules prevent them from directly utilizing off-the-shelf instruction-tuning language foundation models, which have recently shown promising potential in zero-shot learning. Contrary to aligning language models to the domain of document image question answering, we align document image question answering to off-the-shell instruction-tuning language foundation models to utilize their zero-shot capability. Specifically, we propose layout and task aware instruction prompt called LATIN-Prompt, which consists of layout-aware document content and task-aware descriptions. The former recovers the layout information among text segments from OCR tools by appropriate spaces and line breaks. The latter ensures that the model generates answers that meet the requirements, especially format requirements, through a detailed description of task. Experimental results on three benchmarks show that LATIN-Prompt can improve the zero-shot performance of instruction-tuning language foundation models on document image question answering and help them achieve comparable levels to SOTAs based on the pre-training-fine-tuning paradigm. Quantitative analysis and qualitative analysis demonstrate the effectiveness of LATIN-Prompt. We provide the code in supplementary and will release the code to facilitate future research.Comment: Code is available at https://github.com/WenjinW/LATIN-Promp

    Worker Activity Recognition in Smart Manufacturing Using IMU and sEMG Signals with Convolutional Neural Networks

    Get PDF
    In a smart manufacturing system involving workers, recognition of the worker\u27s activity can be used for quantification and evaluation of the worker\u27s performance, as well as to provide onsite instructions with augmented reality. In this paper, we propose a method for activity recognition using Inertial Measurement Unit (IMU) and surface electromyography (sEMG) signals obtained from a Myo armband. The raw 10-channel IMU signals are stacked to form a signal image. This image is transformed into an activity image by applying Discrete Fourier Transformation (DFT) and then fed into a Convolutional Neural Network (CNN) for feature extraction, resulting in a high-level feature vector. Another feature vector representing the level of muscle activation is evaluated with the raw 8-channel sEMG signals. Then these two vectors are concatenated and used for work activity classification. A worker activity dataset is established, which at present contains 6 common activities in assembly tasks, i.e., grab tool/part, hammer nail, use power-screwdriver, rest arm, turn screwdriver, and use wrench. The developed CNN model is evaluated on this dataset and achieves 98% and 87% recognition accuracy in the half-half and leave-one-out experiments, respectively

    Real-Time Assembly Operation Recognition with Fog Computing and Transfer Learning for Human-Centered Intelligent Manufacturing

    Get PDF
    In a human-centered intelligent manufacturing system, every element is to assist the operator in achieving the optimal operational performance. The primary task of developing such a human-centered system is to accurately understand human behavior. In this paper, we propose a fog computing framework for assembly operation recognition, which brings computing power close to the data source in order to achieve real-time recognition. For data collection, the operator\u27s activity is captured using visual cameras from different perspectives. For operation recognition, instead of directly building and training a deep learning model from scratch, which needs a huge amount of data, transfer learning is applied to transfer the learning abilities to our application. A worker assembly operation dataset is established, which at present contains 10 sequential operations in an assembly task of installing a desktop CNC machine. The developed transfer learning model is evaluated on this dataset and achieves a recognition accuracy of 95% in the testing experiments

    ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding

    Full text link
    Recent efforts of multimodal Transformers have improved Visually Rich Document Understanding (VrDU) tasks via incorporating visual and textual information. However, existing approaches mainly focus on fine-grained elements such as words and document image patches, making it hard for them to learn from coarse-grained elements, including natural lexical units like phrases and salient visual regions like prominent image regions. In this paper, we attach more importance to coarse-grained elements containing high-density information and consistent semantics, which are valuable for document understanding. At first, a document graph is proposed to model complex relationships among multi-grained multimodal elements, in which salient visual regions are detected by a cluster-based method. Then, a multi-grained multimodal Transformer called mmLayout is proposed to incorporate coarse-grained information into existing pre-trained fine-grained multimodal Transformers based on the graph. In mmLayout, coarse-grained information is aggregated from fine-grained, and then, after further processing, is fused back into fine-grained for final prediction. Furthermore, common sense enhancement is introduced to exploit the semantic information of natural lexical units. Experimental results on four tasks, including information extraction and document question answering, show that our method can improve the performance of multimodal Transformers based on fine-grained elements and achieve better performance with fewer parameters. Qualitative analyses show that our method can capture consistent semantics in coarse-grained elements.Comment: Accepted by ACM Multimedia 202

    Action Recognition in Manufacturing Assembly using Multimodal Sensor Fusion

    Get PDF
    Production innovations are occurring faster than ever. Manufacturing workers thus need to frequently learn new methods and skills. In fast changing, largely uncertain production systems, manufacturers with the ability to comprehend workers\u27 behavior and assess their operation performance in near real-time will achieve better performance than peers. Action recognition can serve this purpose. Despite that human action recognition has been an active field of study in machine learning, limited work has been done for recognizing worker actions in performing manufacturing tasks that involve complex, intricate operations. Using data captured by one sensor or a single type of sensor to recognize those actions lacks reliability. The limitation can be surpassed by sensor fusion at data, feature, and decision levels. This paper presents a study that developed a multimodal sensor system and used sensor fusion methods to enhance the reliability of action recognition. One step in assembling a Bukito 3D printer, which composed of a sequence of 7 actions, was used to illustrate and assess the proposed method. Two wearable sensors namely Myo-armband captured both Inertial Measurement Unit (IMU) and electromyography (EMG) signals of assembly workers. Microsoft Kinect, a vision based sensor, simultaneously tracked predefined skeleton joints of them. The collected IMU, EMG, and skeleton data were respectively used to train five individual Convolutional Neural Network (CNN) models. Then, various fusion methods were implemented to integrate the prediction results of independent models to yield the final prediction. Reasons for achieving better performance using sensor fusion were identified from this study

    Site-specific relapse pattern of the triple negative tumors in Chinese breast cancer patients

    Get PDF
    BACKGROUND: It has been reported that triple negative phenotype is characterized by aggressive clinical history in Western breast cancer patients, however its pattern of metastatic spread had never been reported in the Chinese population. Considering racial disparities, we sought to analyze the spread pattern for different sites of first recurrence in Chinese triple negative breast cancers. METHODS: A retrospective study of 1662 patients was carried out from a large database of breast cancer patients undergoing surgery between January 1, 2000 and March 31, 2004 at the Cancer Hospital, Fudan University, Shanghai, China. Survival curves were generated using the Kaplan-Meier method and annual relapse hazards were estimated by the hazard function. RESULTS: We found a statistically significant difference in relapse-free survival (RFS) for locoregional and visceral recurrence (P = 0.007 and P = 0.025, respectively) among the triple negative, ERBB2+ and HR+/ERBB2- subgroups in univariate analysis. In the multivariate Cox proportional hazards regression analysis, RFS for either locoregional or visceral relapse in the triple negative category was inferior to that in HR+/ERBB2- patients (P = 0.027 and P = 0.005, respectively), but comparable to that in ERBB2+ women (both P >0.05). Furthermore, the early relapse peak appeared later in the triple negative group than that in the ERBB2+ counterpart for both locoregional and visceral relapse. On the other hand, when compared with triple negative breast cancers, a significantly lower risk of developing bone relapse was discerned for ERBB2+ women (P = 0.048; HR = 0.384, 95% CI 0.148-0.991), with the borderline significance for HR+/ERBB2- breast cancers (P = 0.058; HR = 0.479, 95% CI 0.224-1.025). In terms of bone metastasis, the hazard rate remained higher for the triple negative category than that for the ERBB2+ subtype. CONCLUSION: Based on the site-specific spread pattern in different subgroups, the triple negative category of breast cancers in the Chinese population exhibits a different pattern of relapse, which indicates that different organotropism may be due to the different intrinsic subtypes. A better knowledge of the triple negative category is warranted for efficacious systemic regimens to decrease and/or delay the relapse hazard
    • …
    corecore