4,789 research outputs found

    Unconstrained Scene Text and Video Text Recognition for Arabic Script

    Full text link
    Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesising millions of Arabic text images from a large vocabulary of Arabic words and phrases. Our implementation is built on top of the model introduced here [37] which is proven quite effective for English scene text recognition. The model follows a segmentation-free, sequence to sequence transcription approach. The network transcribes a sequence of convolutional features from the input image to a sequence of target labels. This does away with the need for segmenting input image into constituent characters/glyphs, which is often difficult for Arabic script. Further, the ability of RNNs to model contextual dependencies yields superior recognition results.Comment: 5 page

    Machine Learning Techniques, Detection and Prediction of Glaucoma– A Systematic Review

    Get PDF
    Globally, glaucoma is the most common factor in both permanent blindness and impairment. However, the majority of patients are unaware they have the condition, and clinical practise continues to face difficulties in detecting glaucoma progression using current technology. An expert ophthalmologist examines the retinal portion of the eye to see how the glaucoma is progressing. This method is quite time-consuming, and doing it manually takes more time. Therefore, using deep learning and machine learning techniques, this problem can be resolved by automatically diagnosing glaucoma. This systematic review involved a comprehensive analysis of various automated glaucoma prediction and detection techniques. More than 100 articles on Machine learning (ML) techniques with understandable graph and tabular column are reviewed considering summery, method, objective, performance, advantages and disadvantages. In the ML techniques such as support vector machine (SVM), and K-means. Fuzzy c-means clustering algorithm are widely used in glaucoma detection and prediction. Through the systematic review, the most accurate technique to detect and predict glaucoma can be determined which can be utilized for future betterment

    Oxygen consumption of Mugil macrolepis in relation to different salinities

    Get PDF
    Laboratory experiments conducted on the oxygen consumption of Miigil macrolepis in relation to 3 different salinities show that a straight-line relationship can be derived by usdng the formula: Q = aw "> the oxygen consumption being 0.756W°-^^^' 0.538W°-*''^' and 0.486W°'^^''' respectively at salinities 7%„, 20%. and 30%„

    Understanding Video Scenes through Text: Insights from Text-based Video Question Answering

    Full text link
    Researchers have extensively studied the field of vision and language, discovering that both visual and textual content is crucial for understanding scenes effectively. Particularly, comprehending text in videos holds great significance, requiring both scene text understanding and temporal reasoning. This paper focuses on exploring two recently introduced datasets, NewsVideoQA and M4-ViteVQA, which aim to address video question answering based on textual content. The NewsVideoQA dataset contains question-answer pairs related to the text in news videos, while M4-ViteVQA comprises question-answer pairs from diverse categories like vlogging, traveling, and shopping. We provide an analysis of the formulation of these datasets on various levels, exploring the degree of visual understanding and multi-frame comprehension required for answering the questions. Additionally, the study includes experimentation with BERT-QA, a text-only model, which demonstrates comparable performance to the original methods on both datasets, indicating the shortcomings in the formulation of these datasets. Furthermore, we also look into the domain adaptation aspect by examining the effectiveness of training on M4-ViteVQA and evaluating on NewsVideoQA and vice-versa, thereby shedding light on the challenges and potential benefits of out-of-domain training

    Evaluation of a Cold Staining Method for Acid-Fast Bacilli in Sputum

    Get PDF
    Comparison between the Ziehl-Neelscn staining method for acid-fast bacilli, applied with and without heating, was carried out in a controlled investigation using smears prepared from 306 sputum samples collected prior to treatment from suspected cases of pulmonary tuberculosis. Smear and culture positively were graded and the colour intensity of bacilli recorded. Results showed that the chance corrected agreement (Kappa) between Z-N and cold methods was only 78%. The sensitivity of the Z-N and cold methods were 84% and 77% respectively when compared with culture results. Assuming 10% smear positivity among symptomatics reporting to Peripheral Health Institutions (PHIs), the positive predictive value of the cold method was very low(53%). When compared to culture, the positive predictive value is 71% for the Z-N method and 57% for the cold method for a symptomatic population with 15% culture positivity. In the absence of heating. penetration of the stain was significantly reduced and consequently the number of bacilli detected was less. The inability to take the stain without heating was seen in smears from all grades of culture positive samples: thus even heavy positives were missed by the cold method. The evaluation of the cold method against the standard Z-N method highlights its limitations and demonstrates that it is not as reliable as the standard Z-N method

    Reading Between the Lanes: Text VideoQA on the Road

    Full text link
    Text and signs around roads provide crucial information for drivers, vital for safe navigation and situational awareness. Scene text recognition in motion is a challenging problem, while textual cues typically appear for a short time span, and early detection at a distance is necessary. Systems that exploit such information to assist the driver should not only extract and incorporate visual and textual cues from the video stream but also reason over time. To address this issue, we introduce RoadTextVQA, a new dataset for the task of video question answering (VideoQA) in the context of driver assistance. RoadTextVQA consists of 3,2223,222 driving videos collected from multiple countries, annotated with 10,50010,500 questions, all based on text or road signs present in the driving videos. We assess the performance of state-of-the-art video question answering models on our RoadTextVQA dataset, highlighting the significant potential for improvement in this domain and the usefulness of the dataset in advancing research on in-vehicle support systems and text-aware multimodal question answering. The dataset is available at http://cvit.iiit.ac.in/research/projects/cvit-projects/roadtextvq
    corecore