Search CORE

27 research outputs found

Optic Disc and Optic Cup Segmentation for Glaucoma Detection from Blur Retinal Images Using Improved Mask-RCNN

Author: Aun Irtaza
Nazir T.
Starovoitov V.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2021
Field of study

Glaucoma is a fatal eye disease that harms the optic disc (OD) and optic cup (OC) and results into blindness in progressed phases. Because of slow progress, the disease exhibits a small number of symptoms in the initial stages, therefore causing the disease identification to be a complicated task. So, a fully automatic framework is mandatory, which can support the screening process and increase the chances of disease detection in the early stages. In this paper, we deal with the localization and segmentation of the OD and OC for glaucoma detection from blur retinal images. We have presented a novel method that is Densenet-77-based Mask-RCNN to overcome the challenges of the glaucoma detection. Initially, we have performed the data augmentation step together with adding blurriness in samples to increase the diversity of data. Then, we have generated the annotations from ground-truth (GT) images. After that, the Densenet-77 framework is employed at the feature extraction level of Mask-RCNN to compute the deep key points. Finally, the calculated features are used to localize and segment the OD and OC by the custom Mask-RCNN model. For performance evaluation, we have used the ORIGA dataset that is publicly available. Furthermore, we have performed cross-dataset validation on the HRF database to show the robustness of the presented framework. The presented framework has achieved an average precision, recall, F-measure, and IOU as 0.965, 0.963, 0.97, and 0.972, respectively. The proposed method achieved remarkable performance in terms of both efficiency and effectiveness as compared to the latest techniques under the presence of blurring, noise, and light variations

Belarusian State University of Informatics and Radioelectronics Repository

Instructor activity recognition through deep spatiotemporal features and feedforward Extreme Learning Machines

Author: Irtaza Aun
Nida Nudrat
Velastin Carroza Sergio Alejandro
Yousaf Muhammad Haroon
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2019
Field of study

Human action recognition has the potential to predict the activities of an instructor within the lecture room. Evaluation of lecture delivery can help teachers analyze shortcomings and plan lectures more effectively. However, manual or peer evaluation is time-consuming, tedious and sometimes it is difficult to remember all the details of the lecture. Therefore, automation of lecture delivery evaluation significantly improves teaching style. In this paper, we propose a feedforward learning model for instructor's activity recognition in the lecture room. The proposed scheme represents a video sequence in the form of a single frame to capture the motion profile of the instructor by observing the spatiotemporal relation within the video frames. First, we segment the instructor silhouettes from input videos using graph-cut segmentation and generate a motion profile. These motion profiles are centered by obtaining the largest connected components and normalized. Then, these motion profiles are represented in the form of feature maps by a deep convolutional neural network. Then, an extreme learning machine (ELM) classifier is trained over the obtained feature representations to recognize eight different activities of the instructor within the classroom. For the evaluation of the proposed method, we created an instructor activity video (IAVID-1) dataset and compared our method against different state-of-the-art activity recognition methods. Furthermore, two standard datasets, MuHAVI and IXMAS, were also considered for the evaluation of the proposed scheme.We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for research work carried in Centre for Computer Vision Research (C2VR) at University of Engineering and Technology Taxila, Pakistan. Sergio A Velastin acknowledges funding by the Universidad Carlos III de Madrid, the European Union’s Seventh Framework Programme for Research, Technological Development and Demonstration under grant agreement no. 600371, el Ministerio de Economía y Competitividad (COFUND2013-51509), and Banco Santander.We are also very thankful to participants, faculty, and postgraduate students of Computer Engineering Department who took part in the data acquisition phase.Without their consent, this work was not possible

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Universidad Carlos III de Madrid e-Archivo

Multimodal framework based on audio‐visual features for summarisation of cricket videos

Author: Adnan Syed
Irtaza Aun
Javed Ali
Mahmood Muhammad Tariq
Malik Hafiz
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/03/2019
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/166171/1/ipr2bf02094.pd

Deep Blue Documents at the University of Michigan

Deep temporal motion descriptor (DTMD) for human action recognition

Author: Irtaza Aun
Nida Nudrat
Velastin Carroza Sergio Alejandro
Yousaf Muhammad Haroon
Publication venue: 'The Scientific and Technological Research Council of Turkey'
Publication date: 05/08/2020
Field of study

Spatiotemporal features have significant importance in human action recognition, as they provide the actor's shape and motion characteristics specific to each action class. This paper presents a new deep spatiotemporal human action representation, \Deep Temporal Motion Descriptor (DTMD)", which shares the attributes of holistic and deep learned features. To generate the DTMD descriptor, the actor's silhouettes are gathered into single motion templates through applying motion history images. These motion templates capture the spatiotemporal movements of the actor and compactly represents the human actions using a single 2D template. Then, deep convolutional neural networks are used to compute discriminative deep features from motion history templates to produce DTMD. Later, DTMD is used for learn a model to recognise human actions using a softmax classifier. The advantage of DTMD comes from (i) DTMD is automatically learned from videos and contains higher dimensional discriminative spatiotemporal representation as compared to handcrafted features; (ii) DTMD reduces the computational complexity of human activity recognition as all the video frames are compactly represented as a single motion template; (iii) DTMD works e ectively for single and multiview action recognition. We conducted experiments on three challenging datasets: MuHAVI-Uncut, iXMAS, and IAVID-1. The experimental findings reveal that DTMD outperforms previous methods and achieves the highest action prediction rate on the MuHAVI-Uncut datase

Universidad Carlos III de Madrid e-Archivo

Stochastic Optimized Relevance Feedback Particle Swarm Optimization for Content Based Image Retrieval

Author: Abd Khalid Noor Elaiza
Aun Irtaza
Muhammad Imran
Rathiah Hashim
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Crossref

A hybrid egocentric video summarization method to improve the healthcare for Alzheimer patients

Author: Bashir Ali Kashif
Dawood Hassan
Dawood Hussain
Irtaza Aun
Javed Ali
Sultan Saba
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/09/2019
Field of study

Alzheimer patients face difficulty to remember the identity of persons and performing daily life activities. This paper presents a hybrid method to generate the egocentric video summary of important people, objects and medicines to facilitate the Alzheimer patients to recall their deserted memories. Lifelogging video data analysis is used to recall the human memory; however, the massive amount of lifelogging data makes it a challenging task to select the most relevant content to educate the Alzheimer’s patient. To address the challenges associated with massive lifelogging content, static video summarization approach is applied to select the key-frames that are more relevant in the context of recalling the deserted memories of the Alzheimer patients. This paper consists of three main modules that are face, object, and medicine recognition. Histogram of oriented gradient features are used to train the multi-class SVM for face recognition. SURF descriptors are employed to extract the features from the input video frames that are then used to find the corresponding points between the objects in the input video and the reference objects stored in the database. Morphological operators are applied followed by the optical character recognition to recognize and tag the medicines for Alzheimer patients. The performance of the proposed system is evaluated on 18 real-world homemade videos. Experimental results signify the effectiveness of the proposed system in terms of providing the most relevant content to enhance the memory of Alzheimer patients

E-space: Manchester Metropolitan University's Research Repository

Copy-Move Forgery Detection Technique for Forensic Analysis in Digital Images

Author: Aun Irtaza
Mohsin Shah
Muhammad Tariq Mahmood
Rehan Ashraf
Tabassam Nawaz
Toqeer Mahmood
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Due to the powerful image editing tools images are open to several manipulations; therefore, their authenticity is becoming questionable especially when images have influential power, for example, in a court of law, news reports, and insurance claims. Image forensic techniques determine the integrity of images by applying various high-tech mechanisms developed in the literature. In this paper, the images are analyzed for a particular type of forgery where a region of an image is copied and pasted onto the same image to create a duplication or to conceal some existing objects. To detect the copy-move forgery attack, images are first divided into overlapping square blocks and DCT components are adopted as the block representations. Due to the high dimensional nature of the feature space, Gaussian RBF kernel PCA is applied to achieve the reduced dimensional feature vector representation that also improved the efficiency during the feature matching. Extensive experiments are performed to evaluate the proposed method in comparison to state of the art. The experimental results reveal that the proposed technique precisely determines the copy-move forgery even when the images are contaminated with blurring, noise, and compression and can effectively detect multiple copy-move forgeries. Hence, the proposed technique provides a computationally efficient and reliable way of copy-move forgery detection that increases the credibility of images in evidence centered applications

Crossref

Directory of Open Access Journals

EDL-Det: A Robust TTS Synthesis Detector Using VGG19-Based YAMNet and Ensemble Learning Block

Author: Ali Javed
Aun Irtaza
Rabbia Mahum
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

Various audio deep fake synthesis algorithms exist, such as deep voice, tacotron, fastspeech, and imitation techniques. Despite the existence of various spoofing speech detectors, they are not ready to distinguish unseen audio samples with high precision. In this study, we suggest a robust model, namely an Ensemble Deep Learning Detector (EDL-Det), to detect text-to-speech (TTS) and categorize it into spoofed and bonafide classes. Our proposed model is an improved method based on Yet Another Multi-scale Convolutional Neural Network (YAMNet) employing VGG19 as a base network combined with two other deep learning(DL) techniques. Our proposed system effectively analyzes the audio to extract better artifacts. We have added an ensemble learning block that consists of ResNet50 and InceptionNetv2. First, we convert speech into mel-spectrograms that consist of time-frequency representations. Second, we train our model using the ASVspoof-2019 dataset. Ultimately, we classified the audios, transforming them into mel-spectrograms using our trained binary classifier and a majority voting scheme by three networks. Due to ensemble architecture, our proposed model effectively extracts the most representative features from the mel-spectrograms. Furthermore, we have performed extensive experiments to assess the performance of the suggested model using the ASVspoof 2019 corpus. Additionally, our proposed model is robust enough to identify the unseen spoofed audios and accurately classify the attacks based on cloning algorithms

Directory of Open Access Journals