10 research outputs found

    Analyzing the Efficacy of an LLM-Only Approach for Image-based Document Question Answering

    Full text link
    Recent document question answering models consist of two key components: the vision encoder, which captures layout and visual elements in images, and a Large Language Model (LLM) that helps contextualize questions to the image and supplements them with external world knowledge to generate accurate answers. However, the relative contributions of the vision encoder and the language model in these tasks remain unclear. This is especially interesting given the effectiveness of instruction-tuned LLMs, which exhibit remarkable adaptability to new tasks. To this end, we explore the following aspects in this work: (1) The efficacy of an LLM-only approach on document question answering tasks (2) strategies for serializing textual information within document images and feeding it directly to an instruction-tuned LLM, thus bypassing the need for an explicit vision encoder (3) thorough quantitative analysis on the feasibility of such an approach. Our comprehensive analysis encompasses six diverse benchmark datasets, utilizing LLMs of varying scales. Our findings reveal that a strategy exclusively reliant on the LLM yields results that are on par with or closely approach state-of-the-art performance across a range of datasets. We posit that this evaluation framework will serve as a guiding resource for selecting appropriate datasets for future research endeavors that emphasize the fundamental importance of layout and image content information

    Is it an i or an l: Test-time Adaptation of Text Line Recognition Models

    Full text link
    Recognizing text lines from images is a challenging problem, especially for handwritten documents due to large variations in writing styles. While text line recognition models are generally trained on large corpora of real and synthetic data, such models can still make frequent mistakes if the handwriting is inscrutable or the image acquisition process adds corruptions, such as noise, blur, compression, etc. Writing style is generally quite consistent for an individual, which can be leveraged to correct mistakes made by such models. Motivated by this, we introduce the problem of adapting text line recognition models during test time. We focus on a challenging and realistic setting where, given only a single test image consisting of multiple text lines, the task is to adapt the model such that it performs better on the image, without any labels. We propose an iterative self-training approach that uses feedback from the language model to update the optical model, with confident self-labels in each iteration. The confidence measure is based on an augmentation mechanism that evaluates the divergence of the prediction of the model in a local region. We perform rigorous evaluation of our method on several benchmark datasets as well as their corrupted versions. Experimental results on multiple datasets spanning multiple scripts show that the proposed adaptation method offers an absolute improvement of up to 8% in character error rate with just a few iterations of self-training at test time

    Private and Efficient Meta-Learning with Low Rank and Sparse Decomposition

    Full text link
    Meta-learning is critical for a variety of practical ML systems -- like personalized recommendations systems -- that are required to generalize to new tasks despite a small number of task-specific training points. Existing meta-learning techniques use two complementary approaches of either learning a low-dimensional representation of points for all tasks, or task-specific fine-tuning of a global model trained using all the tasks. In this work, we propose a novel meta-learning framework that combines both the techniques to enable handling of a large number of data-starved tasks. Our framework models network weights as a sum of low-rank and sparse matrices. This allows us to capture information from multiple domains together in the low-rank part while still allowing task specific personalization using the sparse part. We instantiate and study the framework in the linear setting, where the problem reduces to that of estimating the sum of a rank-rr and a kk-column sparse matrix using a small number of linear measurements. We propose an alternating minimization method with hard thresholding -- AMHT-LRS -- to learn the low-rank and sparse part effectively and efficiently. For the realizable, Gaussian data setting, we show that AMHT-LRS indeed solves the problem efficiently with nearly optimal samples. We extend AMHT-LRS to ensure that it preserves privacy of each individual user in the dataset, while still ensuring strong generalization with nearly optimal number of samples. Finally, on multiple datasets, we demonstrate that the framework allows personalized models to obtain superior performance in the data-scarce regime.Comment: 97 pages, 3 figure

    Study on Machine Learning and Deep Learning Methods for Cancer Detection

    Get PDF
    Cancer causes death of about million people every year. Cancer is the frequently recognized and is the major reason of death in men and women. Cancer is a group of diseases involving abnormal cell growth which will spread to other parts of the body. Colonography makes use of low dose radiation Computed tomography (CT) scanning to get an internal view of the cancer tumors making use of special x-ray machine to view tumors. Radiologists examine these images to find tumor like structure using computer tools. As CT Colonography image contain noise such as lungs, small intestine, instruments during image capturing. Cancer occurrence can be detected mainly using shape feature; eliminating shapes similar to tumor is challenging. Hence, to tackle above issues, image processing techniques are used by applying deep learning algorithm- Convolution Neural Network (CNN) and the results are compared with classical machine learning algorithm. The analysis is done with classical machine learning algorithms - Random Forest algorithm (RF) and k-nearest neighbour algorithm (KNN) by extracting texture feature - Local binary pattern (LBP) and shape feature - Histogram oriented gradient (HOG) for comparison

    Involvement of mitochondrial intrinsic pathway in rhSP-D (recombinant human Surfactant Protein D) induced apoptosis of prostate cancer cells

    No full text
    Surfactant protein D (SP-D), an innate immune molecule, has an indispensable role in host defense and regulation of inflammation. We reported a novel anti-cancer role of a recombinant fragment of human SP-D (rhSP-D) in leukemic and breast tumor cell lines. A recent study revealed correlation of SP-D expression in Prostate cancer tissues with increased Gleason score and tumor volume. In the present study, we elucidated the role of rhSP-D in prostate cancer using LNCaP (androgen dependent), PC3 (androgen independent) cell lines and primary prostate cancer cells. In accordance with our previous finding, rhSP-D induced apoptosis in LNCaP and PC3 cell lines in a time and dose dependent manner. Isolated primary prostate cancer epithelial cells from explant cultures of tissue biopsies of prostate cancer patients were characterised for the presence of Cytokeratin (epithelial cell), CD10 (negative) and CD164 (positive) markers at protein and transcript level. Anti-prostate tumor effect of rhSP-D was established in the isolated primary prostate cancer epithelial cells. Importantly, primary normal prostate epithelial cells treated with similar concentrations of rhSP-D showed no adverse effect on viability. rhSP-D upregulated phospho p53 and transcripts of Bax and reduced Bcl2 transcripts, suggesting p53 mediated apoptosis in LNCaP cells. rhSP-D induced apoptosis in PC3 cells by lowering phospho ERK1/2 levels and increased BAD transcripts, a distinct mechanism of programmed cell death. Increased release of cytochrome c upon rhSP-D confirmed the activation of mitochondrial intrinsic apoptotic pathway in both the cell types. rhSP-D treatment downregulated transcripts of Bcl2 while upregulated PUMA transcripts, suggesting p53 mediated apoptosis primary prostate cancer cells. Also, positive TUNEL assay confirmed induction of apoptosis by rhSP-D in cancer tissue biopsies. Collectively, our findings reveal an integral role of SP-D in immune surveillance against prostate cancer mediated by two distinct mitochondrial apoptotic mechanisms

    Epidemiology of Plasmodium vivax

    No full text
    corecore