Traditional ear disease diagnosis heavily depends on experienced specialists
and specialized equipment, frequently resulting in misdiagnoses, treatment
delays, and financial burdens for some patients. Utilizing deep learning models
for efficient ear disease diagnosis has proven effective and affordable.
However, existing research overlooked model inference speed and parameter size
required for deployment. To tackle these challenges, we constructed a
large-scale dataset comprising eight ear disease categories and normal ear
canal samples from two hospitals. Inspired by ShuffleNetV2, we developed
Best-EarNet, an ultrafast and ultralight network enabling real-time ear disease
diagnosis. Best-EarNet incorporates the novel Local-Global Spatial Feature
Fusion Module which can capture global and local spatial information
simultaneously and guide the network to focus on crucial regions within feature
maps at various levels, mitigating low accuracy issues. Moreover, our network
uses multiple auxiliary classification heads for efficient parameter
optimization. With 0.77M parameters, Best-EarNet achieves an average frames per
second of 80 on CPU. Employing transfer learning and five-fold cross-validation
with 22,581 images from Hospital-1, the model achieves an impressive 95.23%
accuracy. External testing on 1,652 images from Hospital-2 validates its
performance, yielding 92.14% accuracy. Compared to state-of-the-art networks,
Best-EarNet establishes a new state-of-the-art (SOTA) in practical
applications. Most importantly, we developed an intelligent diagnosis system
called Ear Keeper, which can be deployed on common electronic devices. By
manipulating a compact electronic otoscope, users can perform comprehensive
scanning and diagnosis of the ear canal using real-time video. This study
provides a novel paradigm for ear endoscopy and other medical endoscopic image
recognition applications.Comment: This manuscript has been submitted to Neural Network