16 research outputs found
SubTuning: Efficient Finetuning for Multi-Task Learning
Finetuning a pretrained model has become a standard approach for training
neural networks on novel tasks, resulting in fast convergence and improved
performance. In this work, we study an alternative finetuning method, where
instead of finetuning all the weights of the network, we only train a carefully
chosen subset of layers, keeping the rest of the weights frozen at their
initial (pretrained) values. We demonstrate that \emph{subset finetuning} (or
SubTuning) often achieves accuracy comparable to full finetuning of the model,
and even surpasses the performance of full finetuning when training data is
scarce. Therefore, SubTuning allows deploying new tasks at minimal
computational cost, while enjoying the benefits of finetuning the entire model.
This yields a simple and effective method for multi-task learning, where
different tasks do not interfere with one another, and yet share most of the
resources at inference time. We demonstrate the efficiency of SubTuning across
multiple tasks, using different network architectures and pretraining methods
Sequence-to-Sequence Contrastive Learning for Text Recognition
We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account for the sequence-to-sequence structure, each feature map is divided into different instances over which the contrastive loss is computed. This operation enables us to contrast in a sub-word level, where from each image we extract several positive pairs and multiple negative examples. To yield effective visual representations for text recognition, we further suggest novel augmentation heuristics, different encoder architectures and custom projection heads. Experiments on handwritten text and on scene text show that when a text decoder is trained on the learned representations, our method outperforms non-sequential contrastive methods. In addition, when the amount of supervision is reduced, SeqCLR significantly improves performance compared with supervised training, and when fine-tuned with 100% of the labels, our method achieves state-of-the-art results on standard handwritten text recognition benchmarks
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
Reading text in real-world scenarios often requires understanding the context
surrounding it, especially when dealing with poor-quality text. However,
current scene text recognizers are unaware of the bigger picture as they
operate on cropped text images. In this study, we harness the representative
capabilities of modern vision-language models, such as CLIP, to provide
scene-level information to the crop-based recognizer. We achieve this by fusing
a rich representation of the entire image, obtained from the vision-language
model, with the recognizer word-level features via a gated cross-attention
mechanism. This component gradually shifts to the context-enhanced
representation, allowing for stable fine-tuning of a pretrained recognizer. We
demonstrate the effectiveness of our model-agnostic framework, CLIPTER (CLIP
TExt Recognition), on leading text recognition architectures and achieve
state-of-the-art results across multiple benchmarks. Furthermore, our analysis
highlights improved robustness to out-of-vocabulary words and enhanced
generalization in low-data regimes.Comment: Accepted for publication by ICCV 202
Processing DNA molecules as text
Polymerase Chain Reaction (PCR) is the DNA-equivalent of Gutenberg’s movable type printing, both allowing large-scale replication of a piece of text. De novo DNA synthesis is the DNA-equivalent of mechanical typesetting, both ease the setting of text for replication. What is the DNA-equivalent of the word processor? Biology labs engage daily in DNA processing—the creation of variations and combinations of existing DNA—using a plethora of manual labor-intensive methods such as site-directed mutagenesis, error-prone PCR, assembly PCR, overlap extension PCR, cleavage and ligation, homologous recombination, and others. So far no universal method for DNA processing has been proposed and, consequently, no engineering discipline that could eliminate this manual labor has emerged. Here we present a novel operation on DNA molecules, called Y, which joins two DNA fragments into one, and show that it provides a foundation for DNA processing as it can implement all basic text processing operations on DNA molecules including insert, delete, replace, cut and paste and copy and paste. In addition, complicated DNA processing tasks such as the creation of libraries of DNA variants, chimeras and extensions can be accomplished with DNA processing plans consisting of multiple Y operations, which can be executed automatically under computer control. The resulting DNA processing system, which incorporates our earlier work on recursive DNA composition and error correction, is the first demonstration of a unified approach to DNA synthesis, editing, and library construction
Virtual reality utilization for left atrial appendage occluder device size prediction
Aim: To explore the feasibility and accuracy of virtual reality (VR) derived from cardiac computed angiography (CCTA) data to predict left atrial appendage occlusion (LAAO) device size. Method: Retrospective data of patients who underwent LAAO according to clinical indication were reviewed; all patients underwent a pre-procedural CCTA. Measurements of the left atrial appendage (LAA) orifice diameters by CCTA, VR, and transesophageal echocardiography (TEE) (acquired during the procedure) were compared to the implanted device size. The LAA perimeter was calculated using the Ramanujan approximation. Statistical analyses included Lin's Concordance Correlation Coefficient (ρc), the mean difference, and the mean square error (MSE). Results: The sample was composed of 20 patients (mean age 75.7 ± 7.5 years, 60% males) who underwent successful LAAO insertion (ACP™ N = 8, Watchman™ N = 12). The CCTA, VR, and TEE maximal diameter ρc was 0.52, 0.78 and 0.60, respectively with mean differences of +0.92 ± 4.0 mm, −1.12 ± 2.3 mm, and −3.45 ± 2.69 mm, respectively. The CCTA, VR, and TEE perimeter calculations ρc were 0.49, 0.54, and 0.39 respectively with mean differences of +4.69 ± 11.5 mm, −9.88 ± 8.0 mm, and −16.79 ± 7.8 respectively. Discussion: A VR visualization of the LAA ostium in different perspectives allows for a better understanding of its funnel-shaped structure. VR measurement of the maximal ostium diameter had the strongest correlation with the diameter of the inserted device. VR may thus provide new imaging possibilities for the evaluation of complex pre-procedural structures such as the LAA