4,789 research outputs found
Unconstrained Scene Text and Video Text Recognition for Arabic Script
Building robust recognizers for Arabic has always been challenging. We
demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid
architecture in recognizing Arabic text in videos and natural scenes. We
outperform previous state-of-the-art on two publicly available video text
datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a
new Arabic scene text dataset and establish baseline results. For scripts like
Arabic, a major challenge in developing robust recognizers is the lack of large
quantity of annotated data. We overcome this by synthesising millions of Arabic
text images from a large vocabulary of Arabic words and phrases. Our
implementation is built on top of the model introduced here [37] which is
proven quite effective for English scene text recognition. The model follows a
segmentation-free, sequence to sequence transcription approach. The network
transcribes a sequence of convolutional features from the input image to a
sequence of target labels. This does away with the need for segmenting input
image into constituent characters/glyphs, which is often difficult for Arabic
script. Further, the ability of RNNs to model contextual dependencies yields
superior recognition results.Comment: 5 page
Machine Learning Techniques, Detection and Prediction of Glaucoma– A Systematic Review
Globally, glaucoma is the most common factor in both permanent blindness and impairment. However, the majority of patients are unaware they have the condition, and clinical practise continues to face difficulties in detecting glaucoma progression using current technology. An expert ophthalmologist examines the retinal portion of the eye to see how the glaucoma is progressing. This method is quite time-consuming, and doing it manually takes more time. Therefore, using deep learning and machine learning techniques, this problem can be resolved by automatically diagnosing glaucoma. This systematic review involved a comprehensive analysis of various automated glaucoma prediction and detection techniques. More than 100 articles on Machine learning (ML) techniques with understandable graph and tabular column are reviewed considering summery, method, objective, performance, advantages and disadvantages. In the ML techniques such as support vector machine (SVM), and K-means. Fuzzy c-means clustering algorithm are widely used in glaucoma detection and prediction. Through the systematic review, the most accurate technique to detect and predict glaucoma can be determined which can be utilized for future betterment
Oxygen consumption of Mugil macrolepis in relation to different salinities
Laboratory experiments conducted on the oxygen consumption of Miigil
macrolepis in relation to 3 different salinities show that a straight-line relationship
can be derived by usdng the formula: Q = aw "> the oxygen consumption being
0.756W°-^^^' 0.538W°-*''^' and 0.486W°'^^''' respectively at salinities 7%„,
20%. and 30%„
Understanding Video Scenes through Text: Insights from Text-based Video Question Answering
Researchers have extensively studied the field of vision and language,
discovering that both visual and textual content is crucial for understanding
scenes effectively. Particularly, comprehending text in videos holds great
significance, requiring both scene text understanding and temporal reasoning.
This paper focuses on exploring two recently introduced datasets, NewsVideoQA
and M4-ViteVQA, which aim to address video question answering based on textual
content. The NewsVideoQA dataset contains question-answer pairs related to the
text in news videos, while M4-ViteVQA comprises question-answer pairs from
diverse categories like vlogging, traveling, and shopping. We provide an
analysis of the formulation of these datasets on various levels, exploring the
degree of visual understanding and multi-frame comprehension required for
answering the questions. Additionally, the study includes experimentation with
BERT-QA, a text-only model, which demonstrates comparable performance to the
original methods on both datasets, indicating the shortcomings in the
formulation of these datasets. Furthermore, we also look into the domain
adaptation aspect by examining the effectiveness of training on M4-ViteVQA and
evaluating on NewsVideoQA and vice-versa, thereby shedding light on the
challenges and potential benefits of out-of-domain training
Evaluation of a Cold Staining Method for Acid-Fast Bacilli in Sputum
Comparison between the Ziehl-Neelscn staining method for acid-fast bacilli, applied with and
without heating, was carried out in a controlled investigation using smears prepared from 306
sputum samples collected prior to treatment from suspected cases of pulmonary tuberculosis.
Smear and culture positively were graded and the colour intensity of bacilli recorded. Results
showed that the chance corrected agreement (Kappa) between Z-N and cold methods was only
78%. The sensitivity of the Z-N and cold methods were 84% and 77% respectively when
compared with culture results. Assuming 10% smear positivity among symptomatics reporting to
Peripheral Health Institutions (PHIs), the positive predictive value of the cold method was very
low(53%). When compared to culture, the positive predictive value is 71% for the Z-N method
and 57% for the cold method for a symptomatic population with 15% culture positivity.
In the absence of heating. penetration of the stain was significantly reduced and consequently
the number of bacilli detected was less. The inability to take the stain without heating was seen in
smears from all grades of culture positive samples: thus even heavy positives were missed by the
cold method. The evaluation of the cold method against the standard Z-N method highlights its
limitations and demonstrates that it is not as reliable as the standard Z-N method
Reading Between the Lanes: Text VideoQA on the Road
Text and signs around roads provide crucial information for drivers, vital
for safe navigation and situational awareness. Scene text recognition in motion
is a challenging problem, while textual cues typically appear for a short time
span, and early detection at a distance is necessary. Systems that exploit such
information to assist the driver should not only extract and incorporate visual
and textual cues from the video stream but also reason over time. To address
this issue, we introduce RoadTextVQA, a new dataset for the task of video
question answering (VideoQA) in the context of driver assistance. RoadTextVQA
consists of driving videos collected from multiple countries, annotated
with questions, all based on text or road signs present in the driving
videos. We assess the performance of state-of-the-art video question answering
models on our RoadTextVQA dataset, highlighting the significant potential for
improvement in this domain and the usefulness of the dataset in advancing
research on in-vehicle support systems and text-aware multimodal question
answering. The dataset is available at
http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvq
- …