37 research outputs found

    Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

    Full text link
    Scene text recognition (STR) in the wild frequently encounters challenges when coping with domain variations, font diversity, shape deformations, etc. A straightforward solution is performing model fine-tuning tailored to a specific scenario, but it is computationally intensive and requires multiple model copies for various scenarios. Recent studies indicate that large language models (LLMs) can learn from a few demonstration examples in a training-free manner, termed "In-Context Learning" (ICL). Nevertheless, applying LLMs as a text recognizer is unacceptably resource-consuming. Moreover, our pilot experiments on LLMs show that ICL fails in STR, mainly attributed to the insufficient incorporation of contextual information from diverse samples in the training stage. To this end, we introduce E2^2STR, a STR model trained with context-rich scene text sequences, where the sequences are generated via our proposed in-context training strategy. E2^2STR demonstrates that a regular-sized model is sufficient to achieve effective ICL capabilities in STR. Extensive experiments show that E2^2STR exhibits remarkable training-free adaptation in various scenarios and outperforms even the fine-tuned state-of-the-art approaches on public benchmarks. The code is released at https://github.com/bytedance/E2STR .Comment: Accepted to CVPR202

    Non-Parametric Change-Point Method for Differential Gene Expression Detection

    Get PDF
    We proposed a non-parametric method, named Non-Parametric Change Point Statistic (NPCPS for short), by using a single equation for detecting differential gene expression (DGE) in microarray data. NPCPS is based on the change point theory to provide effective DGE detecting ability.NPCPS used the data distribution of the normal samples as input, and detects DGE in the cancer samples by locating the change point of gene expression profile. An estimate of the change point position generated by NPCPS enables the identification of the samples containing DGE. Monte Carlo simulation and ROC study were applied to examine the detecting accuracy of NPCPS, and the experiment on real microarray data of breast cancer was carried out to compare NPCPS with other methods.Simulation study indicated that NPCPS was more effective for detecting DGE in cancer subset compared with five parametric methods and one non-parametric method. When there were more than 8 cancer samples containing DGE, the type I error of NPCPS was below 0.01. Experiment results showed both good accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using NPCPS, 16 genes were reported as relevant to cancer. Correlations between the detecting result of NPCPS and the compared methods were less than 0.05, while between the other methods the values were from 0.20 to 0.84. This indicates that NPCPS is working on different features and thus provides DGE identification from a distinct perspective comparing with the other mean or median based methods

    Numerical Simulation of Contaminated Sediment Transport in the Upper Columbia Slough System

    Get PDF
    An integrated modeling system has been developed to simulate sediment and contaminated sediment transport in the Upper Columbia Slough System during the Flood of 1996. The modeling system consists of a hydrodynamic model, a sediment transport model, and a contaminant transport model. The hydrodynamic model predicts the Slough flow dynamics and sediment transporting power; the sediment transport model predicts flow-induced sediment transport; and the contaminant transport model predicts the migration of contaminated sediments in the Slough system. The Upper Columbia Slough modeling system is characterized by its recognition of the complex interplay of Slough hydrology, hydrodynamics, sediment transport, contaminant transport. The modeling system is also characterized by its ability to simulate both cohesive and noncohesive sediments and the associated contaminant transport. The hydrodynamic model was calibrated using water level obtained over a twomonth dry period in 1993. The sediment transport model was not calibrated. The sediment model utilized the typical values from extensive literature review and from the calibration results of the Lower Columbia Slough sediment transport model to specify the parameters governing deposition and resuspension processes. The model results indicate that the hydraulic power in Upper Slough is weak under normal conditions, and existing sediment in the Slough moves very little unless there is a major stonn. Winter large storm events dominate Upper Slough dynamics and dictates sediment transport. The model predicted significant sediment resuspension but minimal bedload transport further downstream from the Mid-Dike and upstream of MCDD#4 on the Slough main channel during the 1996 flood. The model predicted little or no sediment transport throughout the southern arm system The study also concluded that none of the contaminated sediment priority sites would have significant sediment transport during the major storm events. The study points out that future model improvement should be focused on calibrating the key model parameters that determine sedimentation rates, such as critical shear stress for erosion and deposition

    Calculation Model for the Exit Decision Sight Distance of Right-Turn Ramps on the Left at Interchange

    No full text
    Given the rapid construction of freeways in developing countries such as China, land use is constantly under strict constraints, leading to challenges in adopting conventional layouts for interchanges. Implementing right-turn ramps on the left (RTRL) at interchanges can minimize land occupancy; however, the traffic safety level in this type of diversion area design requires extra attention. This study examines the decision sight distance for right-turn exit ramps on the left side. Utilizing unmanned aerial vehicle (UAV) video and the YOLOv3 target detection algorithm, the original trajectory data of vehicles in the diversion area is extracted. Employing Kalman filtering and Frenet coordinate system conversion reveals microscopic vehicle lane-change patterns, velocities, and time headways. Furthermore, the driving simulation experiment assesses driver behaviors in RTRL, with subjective, task performance, and physiological measure indicators. Ultimately, the range of the decision sight distance is defined, and establishing a calculation model involves determining relevant parameters based on measured data and simulation outcomes. The results indicate potential insufficiencies in the decision sight distance when standardized values are applied to RTRL

    A Benchmark for Morphological Segmentation in Uyghur and Kazakh

    No full text
    Morphological segmentation and stemming are foundational tasks in natural language processing. They have become effective ways to alleviate data sparsity in agglutinative languages because of the nature of agglutinative language word formation. Uyghur and Kazakh, as typical agglutinative languages, have made significant progress in morphological segmentation and stemming in recent years. However, the evaluation metrics used in previous work are character-level based, which may not comprehensively reflect the performance of models in morphological segmentation or stemming. Moreover, existing methods avoid manual feature extraction, but the model’s ability to learn features is inadequate in complex scenarios, and the correlation between different features has not been considered. Consequently, these models lack representation in complex contexts, affecting their effective generalization in practical scenarios. To address these issues, this paper redefines the morphological-level evaluation metrics: F1-score and accuracy (ACC) for morphological segmentation and stemming tasks. In addition, two models are proposed for morpheme segmentation and stem extraction tasks: supervised model and unsupervised model. The supervised model learns character and contextual features simultaneously, then feature embeddings are input into a Transformer encoder to study the correlation between character and context embeddings. The last layer of the model uses a CRF or softmax layer to determine morphological boundaries. In unsupervised learning, an encoder–decoder structure introduces n-gram correlation assumptions and masked attention mechanisms, enhancing the correlation between characters within n-grams and reducing the impact of characters outside n-grams on boundaries. Finally, comprehensive comparative analyses of the performance of different models are conducted from various points of view. Experimental results demonstrate that: (1) The proposed evaluation method effectively reflects the differences in morphological segmentation and stemming for Uyghur and Kazakh; (2) Learning different features and their correlation can enhance the model’s generalization ability in complex contexts. The proposed models achieve state-of-the-art performance on Uyghur and Kazakh datasets

    Contrastive Centroid Supervision Alleviates Domain Shift in Medical Image Classification

    Full text link
    Deep learning based medical imaging classification models usually suffer from the domain shift problem, where the classification performance drops when training data and real-world data differ in imaging equipment manufacturer, image acquisition protocol, patient populations, etc. We propose Feature Centroid Contrast Learning (FCCL), which can improve target domain classification performance by extra supervision during training with contrastive loss between instance and class centroid. Compared with current unsupervised domain adaptation and domain generalization methods, FCCL performs better while only requires labeled image data from a single source domain and no target domain. We verify through extensive experiments that FCCL can achieve superior performance on at least three imaging modalities, i.e. fundus photographs, dermatoscopic images, and H & E tissue images
    corecore