16 research outputs found

    SubTuning: Efficient Finetuning for Multi-Task Learning

    Full text link
    Finetuning a pretrained model has become a standard approach for training neural networks on novel tasks, resulting in fast convergence and improved performance. In this work, we study an alternative finetuning method, where instead of finetuning all the weights of the network, we only train a carefully chosen subset of layers, keeping the rest of the weights frozen at their initial (pretrained) values. We demonstrate that \emph{subset finetuning} (or SubTuning) often achieves accuracy comparable to full finetuning of the model, and even surpasses the performance of full finetuning when training data is scarce. Therefore, SubTuning allows deploying new tasks at minimal computational cost, while enjoying the benefits of finetuning the entire model. This yields a simple and effective method for multi-task learning, where different tasks do not interfere with one another, and yet share most of the resources at inference time. We demonstrate the efficiency of SubTuning across multiple tasks, using different network architectures and pretraining methods

    Sequence-to-Sequence Contrastive Learning for Text Recognition

    Get PDF
    We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account for the sequence-to-sequence structure, each feature map is divided into different instances over which the contrastive loss is computed. This operation enables us to contrast in a sub-word level, where from each image we extract several positive pairs and multiple negative examples. To yield effective visual representations for text recognition, we further suggest novel augmentation heuristics, different encoder architectures and custom projection heads. Experiments on handwritten text and on scene text show that when a text decoder is trained on the learned representations, our method outperforms non-sequential contrastive methods. In addition, when the amount of supervision is reduced, SeqCLR significantly improves performance compared with supervised training, and when fine-tuned with 100% of the labels, our method achieves state-of-the-art results on standard handwritten text recognition benchmarks

    CLIPTER: Looking at the Bigger Picture in Scene Text Recognition

    Full text link
    Reading text in real-world scenarios often requires understanding the context surrounding it, especially when dealing with poor-quality text. However, current scene text recognizers are unaware of the bigger picture as they operate on cropped text images. In this study, we harness the representative capabilities of modern vision-language models, such as CLIP, to provide scene-level information to the crop-based recognizer. We achieve this by fusing a rich representation of the entire image, obtained from the vision-language model, with the recognizer word-level features via a gated cross-attention mechanism. This component gradually shifts to the context-enhanced representation, allowing for stable fine-tuning of a pretrained recognizer. We demonstrate the effectiveness of our model-agnostic framework, CLIPTER (CLIP TExt Recognition), on leading text recognition architectures and achieve state-of-the-art results across multiple benchmarks. Furthermore, our analysis highlights improved robustness to out-of-vocabulary words and enhanced generalization in low-data regimes.Comment: Accepted for publication by ICCV 202

    Processing DNA molecules as text

    Get PDF
    Polymerase Chain Reaction (PCR) is the DNA-equivalent of Gutenberg’s movable type printing, both allowing large-scale replication of a piece of text. De novo DNA synthesis is the DNA-equivalent of mechanical typesetting, both ease the setting of text for replication. What is the DNA-equivalent of the word processor? Biology labs engage daily in DNA processing—the creation of variations and combinations of existing DNA—using a plethora of manual labor-intensive methods such as site-directed mutagenesis, error-prone PCR, assembly PCR, overlap extension PCR, cleavage and ligation, homologous recombination, and others. So far no universal method for DNA processing has been proposed and, consequently, no engineering discipline that could eliminate this manual labor has emerged. Here we present a novel operation on DNA molecules, called Y, which joins two DNA fragments into one, and show that it provides a foundation for DNA processing as it can implement all basic text processing operations on DNA molecules including insert, delete, replace, cut and paste and copy and paste. In addition, complicated DNA processing tasks such as the creation of libraries of DNA variants, chimeras and extensions can be accomplished with DNA processing plans consisting of multiple Y operations, which can be executed automatically under computer control. The resulting DNA processing system, which incorporates our earlier work on recursive DNA composition and error correction, is the first demonstration of a unified approach to DNA synthesis, editing, and library construction

    Virtual reality utilization for left atrial appendage occluder device size prediction

    No full text
    Aim: To explore the feasibility and accuracy of virtual reality (VR) derived from cardiac computed angiography (CCTA) data to predict left atrial appendage occlusion (LAAO) device size. Method: Retrospective data of patients who underwent LAAO according to clinical indication were reviewed; all patients underwent a pre-procedural CCTA. Measurements of the left atrial appendage (LAA) orifice diameters by CCTA, VR, and transesophageal echocardiography (TEE) (acquired during the procedure) were compared to the implanted device size. The LAA perimeter was calculated using the Ramanujan approximation. Statistical analyses included Lin's Concordance Correlation Coefficient (ρc), the mean difference, and the mean square error (MSE). Results: The sample was composed of 20 patients (mean age 75.7 ± 7.5 years, 60% males) who underwent successful LAAO insertion (ACP™ N = 8, Watchman™ N = 12). The CCTA, VR, and TEE maximal diameter ρc was 0.52, 0.78 and 0.60, respectively with mean differences of +0.92 ± 4.0 mm, −1.12 ± 2.3 mm, and −3.45 ± 2.69 mm, respectively. The CCTA, VR, and TEE perimeter calculations ρc were 0.49, 0.54, and 0.39 respectively with mean differences of +4.69 ± 11.5 mm, −9.88 ± 8.0 mm, and −16.79 ± 7.8 respectively. Discussion: A VR visualization of the LAA ostium in different perspectives allows for a better understanding of its funnel-shaped structure. VR measurement of the maximal ostium diameter had the strongest correlation with the diameter of the inserted device. VR may thus provide new imaging possibilities for the evaluation of complex pre-procedural structures such as the LAA
    corecore