37 research outputs found
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Scene text recognition (STR) in the wild frequently encounters challenges
when coping with domain variations, font diversity, shape deformations, etc. A
straightforward solution is performing model fine-tuning tailored to a specific
scenario, but it is computationally intensive and requires multiple model
copies for various scenarios. Recent studies indicate that large language
models (LLMs) can learn from a few demonstration examples in a training-free
manner, termed "In-Context Learning" (ICL). Nevertheless, applying LLMs as a
text recognizer is unacceptably resource-consuming. Moreover, our pilot
experiments on LLMs show that ICL fails in STR, mainly attributed to the
insufficient incorporation of contextual information from diverse samples in
the training stage. To this end, we introduce ESTR, a STR model trained
with context-rich scene text sequences, where the sequences are generated via
our proposed in-context training strategy. ESTR demonstrates that a
regular-sized model is sufficient to achieve effective ICL capabilities in STR.
Extensive experiments show that ESTR exhibits remarkable training-free
adaptation in various scenarios and outperforms even the fine-tuned
state-of-the-art approaches on public benchmarks. The code is released at
https://github.com/bytedance/E2STR .Comment: Accepted to CVPR202
Non-Parametric Change-Point Method for Differential Gene Expression Detection
We proposed a non-parametric method, named Non-Parametric Change Point
Statistic (NPCPS for short), by using a single equation for detecting
differential gene expression (DGE) in microarray data. NPCPS is based on the
change point theory to provide effective DGE detecting ability.NPCPS used the data distribution of the normal samples as input, and detects
DGE in the cancer samples by locating the change point of gene expression
profile. An estimate of the change point position generated by NPCPS enables
the identification of the samples containing DGE. Monte Carlo simulation and
ROC study were applied to examine the detecting accuracy of NPCPS, and the
experiment on real microarray data of breast cancer was carried out to
compare NPCPS with other methods.Simulation study indicated that NPCPS was more effective for detecting DGE in
cancer subset compared with five parametric methods and one non-parametric
method. When there were more than 8 cancer samples containing DGE, the type
I error of NPCPS was below 0.01. Experiment results showed both good
accuracy and reliability of NPCPS. Out of the 30 top genes ranked by using
NPCPS, 16 genes were reported as relevant to cancer. Correlations between
the detecting result of NPCPS and the compared methods were less than 0.05,
while between the other methods the values were from 0.20 to 0.84. This
indicates that NPCPS is working on different features and thus provides DGE
identification from a distinct perspective comparing with the other mean or
median based methods
Numerical Simulation of Contaminated Sediment Transport in the Upper Columbia Slough System
An integrated modeling system has been developed to simulate sediment and contaminated sediment transport in the Upper Columbia Slough System during the Flood of 1996. The modeling system consists of a hydrodynamic model, a sediment transport model, and a contaminant transport model. The hydrodynamic model predicts the Slough flow dynamics and sediment transporting power; the sediment transport model predicts flow-induced sediment transport; and the contaminant transport model predicts the migration of contaminated sediments in the Slough system.
The Upper Columbia Slough modeling system is characterized by its recognition of the complex interplay of Slough hydrology, hydrodynamics, sediment transport, contaminant transport. The modeling system is also characterized by its ability to simulate both cohesive and noncohesive sediments and the associated contaminant transport.
The hydrodynamic model was calibrated using water level obtained over a twomonth dry period in 1993. The sediment transport model was not calibrated. The sediment model utilized the typical values from extensive literature review and from the calibration results of the Lower Columbia Slough sediment transport model to specify the parameters governing deposition and resuspension processes.
The model results indicate that the hydraulic power in Upper Slough is weak under normal conditions, and existing sediment in the Slough moves very little unless there is a major stonn. Winter large storm events dominate Upper Slough dynamics and dictates sediment transport. The model predicted significant sediment resuspension but minimal bedload transport further downstream from the Mid-Dike and upstream of MCDD#4 on the Slough main channel during the 1996 flood. The model predicted little or no sediment transport throughout the southern arm system The study also concluded that none of the contaminated sediment priority sites would have significant sediment transport during the major storm events.
The study points out that future model improvement should be focused on calibrating the key model parameters that determine sedimentation rates, such as critical shear stress for erosion and deposition
Calculation Model for the Exit Decision Sight Distance of Right-Turn Ramps on the Left at Interchange
Given the rapid construction of freeways in developing countries such as China, land use is constantly under strict constraints, leading to challenges in adopting conventional layouts for interchanges. Implementing right-turn ramps on the left (RTRL) at interchanges can minimize land occupancy; however, the traffic safety level in this type of diversion area design requires extra attention. This study examines the decision sight distance for right-turn exit ramps on the left side. Utilizing unmanned aerial vehicle (UAV) video and the YOLOv3 target detection algorithm, the original trajectory data of vehicles in the diversion area is extracted. Employing Kalman filtering and Frenet coordinate system conversion reveals microscopic vehicle lane-change patterns, velocities, and time headways. Furthermore, the driving simulation experiment assesses driver behaviors in RTRL, with subjective, task performance, and physiological measure indicators. Ultimately, the range of the decision sight distance is defined, and establishing a calculation model involves determining relevant parameters based on measured data and simulation outcomes. The results indicate potential insufficiencies in the decision sight distance when standardized values are applied to RTRL
A Benchmark for Morphological Segmentation in Uyghur and Kazakh
Morphological segmentation and stemming are foundational tasks in natural language processing. They have become effective ways to alleviate data sparsity in agglutinative languages because of the nature of agglutinative language word formation. Uyghur and Kazakh, as typical agglutinative languages, have made significant progress in morphological segmentation and stemming in recent years. However, the evaluation metrics used in previous work are character-level based, which may not comprehensively reflect the performance of models in morphological segmentation or stemming. Moreover, existing methods avoid manual feature extraction, but the model’s ability to learn features is inadequate in complex scenarios, and the correlation between different features has not been considered. Consequently, these models lack representation in complex contexts, affecting their effective generalization in practical scenarios. To address these issues, this paper redefines the morphological-level evaluation metrics: F1-score and accuracy (ACC) for morphological segmentation and stemming tasks. In addition, two models are proposed for morpheme segmentation and stem extraction tasks: supervised model and unsupervised model. The supervised model learns character and contextual features simultaneously, then feature embeddings are input into a Transformer encoder to study the correlation between character and context embeddings. The last layer of the model uses a CRF or softmax layer to determine morphological boundaries. In unsupervised learning, an encoder–decoder structure introduces n-gram correlation assumptions and masked attention mechanisms, enhancing the correlation between characters within n-grams and reducing the impact of characters outside n-grams on boundaries. Finally, comprehensive comparative analyses of the performance of different models are conducted from various points of view. Experimental results demonstrate that: (1) The proposed evaluation method effectively reflects the differences in morphological segmentation and stemming for Uyghur and Kazakh; (2) Learning different features and their correlation can enhance the model’s generalization ability in complex contexts. The proposed models achieve state-of-the-art performance on Uyghur and Kazakh datasets
Contrastive Centroid Supervision Alleviates Domain Shift in Medical Image Classification
Deep learning based medical imaging classification models usually suffer from
the domain shift problem, where the classification performance drops when
training data and real-world data differ in imaging equipment manufacturer,
image acquisition protocol, patient populations, etc. We propose Feature
Centroid Contrast Learning (FCCL), which can improve target domain
classification performance by extra supervision during training with
contrastive loss between instance and class centroid. Compared with current
unsupervised domain adaptation and domain generalization methods, FCCL performs
better while only requires labeled image data from a single source domain and
no target domain. We verify through extensive experiments that FCCL can achieve
superior performance on at least three imaging modalities, i.e. fundus
photographs, dermatoscopic images, and H & E tissue images
Actual and average estimate of change point using NPCPS in Monte Carlo simulation.
<p>: Actual change point.
:
Average estimate of change point.</p