16 research outputs found

    Bullying10K: A Large-Scale Neuromorphic Dataset towards Privacy-Preserving Bullying Recognition

    Full text link
    The prevalence of violence in daily life poses significant threats to individuals' physical and mental well-being. Using surveillance cameras in public spaces has proven effective in proactively deterring and preventing such incidents. However, concerns regarding privacy invasion have emerged due to their widespread deployment. To address the problem, we leverage Dynamic Vision Sensors (DVS) cameras to detect violent incidents and preserve privacy since it captures pixel brightness variations instead of static imagery. We introduce the Bullying10K dataset, encompassing various actions, complex movements, and occlusions from real-life scenarios. It provides three benchmarks for evaluating different tasks: action recognition, temporal action localization, and pose estimation. With 10,000 event segments, totaling 12 billion events and 255 GB of data, Bullying10K contributes significantly by balancing violence detection and personal privacy persevering. And it also poses a challenge to the neuromorphic dataset. It will serve as a valuable resource for training and developing privacy-protecting video systems. The Bullying10K opens new possibilities for innovative approaches in these domains.Comment: Accepted at the 37th Conference on Neural Information Processing Systems (NeurIPS 2023) Track on Datasets and Benchmark

    Tissue-like P system for Segmentation of 2D Hexagonal Images

    Get PDF
    Membrane computing, which is a new computational model inspired by the structure and functioning of biological cells and by the way the cells are organized in tissues. MC has been adopted in many real world applications including image segmentation. In contrast to the traditional square grid for representing and sampling digital images, hexagonal grid is an alternative efficient mechanism which can better represents and visualizes the curved objects. In this paper, a tissue-like P system with region-based and edge-based segmentation is used to segment two dimensional hexagonal images, wherein P-Lingua programming language is used to implement and validate the proposed system. The achieved experimental results clearly demonstrated the effectiveness of using hexagonal connectivity to segment two dimensional images in a less number of rules and computational steps. Moreover, the results reveal that this approach has the potential of segmenting large images in few number of steps

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    A Survey of Deep Learning in Sports Applications: Perception, Comprehension, and Decision

    Full text link
    Deep learning has the potential to revolutionize sports performance, with applications ranging from perception and comprehension to decision. This paper presents a comprehensive survey of deep learning in sports performance, focusing on three main aspects: algorithms, datasets and virtual environments, and challenges. Firstly, we discuss the hierarchical structure of deep learning algorithms in sports performance which includes perception, comprehension and decision while comparing their strengths and weaknesses. Secondly, we list widely used existing datasets in sports and highlight their characteristics and limitations. Finally, we summarize current challenges and point out future trends of deep learning in sports. Our survey provides valuable reference material for researchers interested in deep learning in sports applications

    Eye detection using discriminatory features and an efficient support vector machine

    Get PDF
    Accurate and efficient eye detection has broad applications in computer vision, machine learning, and pattern recognition. This dissertation presents a number of accurate and efficient eye detection methods using various discriminatory features and a new efficient Support Vector Machine (eSVM). This dissertation first introduces five popular image representation methods - the gray-scale image representation, the color image representation, the 2D Haar wavelet image representation, the Histograms of Oriented Gradients (HOG) image representation, and the Local Binary Patterns (LBP) image representation - and then applies these methods to derive five types of discriminatory features. Comparative assessments are then presented to evaluate the performance of these discriminatory features on the problem of eye detection. This dissertation further proposes two discriminatory feature extraction (DFE) methods for eye detection. The first DFE method, discriminant component analysis (DCA), improves upon the popular principal component analysis (PCA) method. The PCA method can derive the optimal features for data representation but not for classification. In contrast, the DCA method, which applies a new criterion vector that is defined on two novel measure vectors, derives the optimal discriminatory features in the whitened PCA space for two-class classification problems. The second DFE method, clustering-based discriminant analysis (CDA), improves upon the popular Fisher linear discriminant (FLD) method. A major disadvantage of the FLD is that it may not be able to extract adequate features in order to achieve satisfactory performance, especially for two-class problems. To address this problem, three CDA models (CDA-1, -2, and -3) are proposed by taking advantage of the clustering technique. For every CDA model anew between-cluster scatter matrix is defined. The CDA method thus can derive adequate features to achieve satisfactory performance for eye detection. Furthermore, the clustering nature of the three CDA models and the nonparametric nature of the CDA-2 and -3 models can further improve the detection performance upon the conventional FLD method. This dissertation finally presents a new efficient Support Vector Machine (eSVM) for eye detection that improves the computational efficiency of the conventional Support Vector Machine (SVM). The eSVM first defines a 螛 set that consists of the training samples on the wrong side of their margin derived from the conventional soft-margin SVM. The 螛 set plays an important role in controlling the generalization performance of the eSVM. The eSVM then introduces only a single slack variable for all the training samples in the 螛 set, and as a result, only a very small number of those samples in the 螛 set become support vectors. The eSVM hence significantly reduces the number of support vectors and improves the computational efficiency without sacrificing the generalization performance. A modified Sequential Minimal Optimization (SMO) algorithm is then presented to solve the large Quadratic Programming (QP) problem defined in the optimization of the eSVM. Three large-scale face databases, the Face Recognition Grand challenge (FRGC) version 2 database, the BioID database, and the FERET database, are applied to evaluate the proposed eye detection methods. Experimental results show the effectiveness of the proposed methods that improve upon some state-of-the-art eye detection methods

    Automatic analysis of retinal images to aid in the diagnosis and grading of diabetic retinopathy

    Get PDF
    Diabetic retinopathy (DR) is the most common complication of diabetes mellitus and one of the leading causes of preventable blindness in the adult working population. Visual loss can be prevented from the early stages of DR, when the treatments are effective. Therefore, early diagnosis is paramount. However, DR may be clinically asymptomatic until the advanced stage, when vision is already affected and treatment may become difficult. For this reason, diabetic patients should undergo regular eye examinations through screening programs. Traditionally, DR screening programs are run by trained specialists through visual inspection of the retinal images. However, this manual analysis is time consuming and expensive. With the increasing incidence of diabetes and the limited number of clinicians and sanitary resources, the early detection of DR becomes non-viable. For this reason, computed-aided diagnosis (CAD) systems are required to assist specialists for a fast, reliable diagnosis, allowing to reduce the workload and the associated costs. We hypothesize that the application of novel, automatic algorithms for fundus image analysis could contribute to the early diagnosis of DR. Consequently, the main objective of the present Doctoral Thesis is to study, design and develop novel methods based on the automatic analysis of fundus images to aid in the screening, diagnosis, and treatment of DR. In order to achieve the main goal, we built a private database and used five retinal public databases: DRIMDB, DIARETDB1, DRIVE, Messidor and Kaggle. The stages of fundus image processing covered in this Thesis are: retinal image quality assessment (RIQA), the location of the optic disc (OD) and the fovea, the segmentation of RLs and EXs, and the DR severity grading. RIQA was studied with two different approaches. The first approach was based on the combination of novel, global features. Results achieved 91.46% accuracy, 92.04% sensitivity, and 87.92% specificity using the private database. We developed a second approach aimed at RIQA based on deep learning. We achieved 95.29% accuracy with the private database and 99.48% accuracy with the DRIMDB database. The location of the OD and the fovea was performed using a combination of saliency maps. The proposed methods were evaluated over the private database and the public databases DRIVE, DIARETDB1 and Messidor. For the OD, we achieved 100% accuracy for all databases except Messidor (99.50%). As for the fovea location, we also reached 100% accuracy for all databases except Messidor (99.67%). The joint segmentation of RLs and EXs was accomplished by decomposing the fundus image into layers. Results were computed per pixel and per image. Using the private database, 88.34% per-image accuracy (ACCi) was reached for the RL detection and 95.41% ACCi for EX detection. An additional method was proposed for the segmentation of RLs based on superpixels. Evaluating this method with the private database, we obtained 84.45% ACCi. Results were validated using the DIARETDB1 database. Finally, we proposed a deep learning framework for the automatic DR severity grading. The method was based on a novel attention mechanism which performs a separate attention of the dark and the bright structures of the retina. The Kaggle DR detection dataset was used for development and validation. The International Clinical DR Scale was considered, which is made up of 5 DR severity levels. Classification results for all classes achieved 83.70% accuracy and a Quadratic Weighted Kappa of 0.78. The methods proposed in this Doctoral Thesis form a complete, automatic DR screening system, contributing to aid in the early detection of DR. In this way, diabetic patients could receive better attention for their ocular health avoiding vision loss. In addition, the workload of specialists could be relieved while healthcare costs are reduced.La retinopat铆a diab茅tica (RD) es la complicaci贸n m谩s com煤n de la diabetes mellitus y una de las principales causas de ceguera prevenible en la poblaci贸n activa adulta. El diagn贸stico precoz es primordial para prevenir la p茅rdida visual. Sin embargo, la RD es cl铆nicamente asintom谩tica hasta etapas avanzadas, cuando la visi贸n ya est谩 afectada. Por eso, los pacientes diab茅ticos deben someterse a ex谩menes oftalmol贸gicos peri贸dicos a trav茅s de programas de cribado. Tradicionalmente, estos programas est谩n a cargo de especialistas y se basan de la inspecci贸n visual de retinograf铆as. Sin embargo, este an谩lisis manual requiere mucho tiempo y es costoso. Con la creciente incidencia de la diabetes y la escasez de recursos sanitarios, la detecci贸n precoz de la RD se hace inviable. Por esta raz贸n, se necesitan sistemas de diagn贸stico asistido por ordenador (CAD) que ayuden a los especialistas a realizar un diagn贸stico r谩pido y fiable, que permita reducir la carga de trabajo y los costes asociados. El objetivo principal de la presente Tesis Doctoral es estudiar, dise帽ar y desarrollar nuevos m茅todos basados en el an谩lisis autom谩tico de retinograf铆as para ayudar en el cribado, diagn贸stico y tratamiento de la RD. Las etapas estudiadas fueron: la evaluaci贸n de la calidad de la imagen retiniana (RIQA), la localizaci贸n del disco 贸ptico (OD) y la f贸vea, la segmentaci贸n de RL y EX y la graduaci贸n de la severidad de la RD. RIQA se estudi贸 con dos enfoques diferentes. El primer enfoque se bas贸 en la combinaci贸n de caracter铆sticas globales. Los resultados lograron una precisi贸n del 91,46% utilizando la base de datos privada. El segundo enfoque se bas贸 en aprendizaje profundo. Logramos un 95,29% de precisi贸n con la base de datos privada y un 99,48% con la base de datos DRIMDB. La localizaci贸n del OD y la f贸vea se realiz贸 mediante una combinaci贸n de mapas de saliencia. Los m茅todos propuestos fueron evaluados sobre la base de datos privada y las bases de datos p煤blicas DRIVE, DIARETDB1 y Messidor. Para el OD, logramos una precisi贸n del 100% para todas las bases de datos excepto Messidor (99,50%). En cuanto a la ubicaci贸n de la f贸vea, tambi茅n alcanzamos un 100% de precisi贸n para todas las bases de datos excepto Messidor (99,67%). La segmentaci贸n conjunta de RL y EX se logr贸 descomponiendo la imagen del fondo de ojo en capas. Utilizando la base de datos privada, se alcanz贸 un 88,34% de precisi贸n por imagen (ACCi) para la detecci贸n de RL y un 95,41% de ACCi para la detecci贸n de EX. Se propuso un m茅todo adicional para la segmentaci贸n de RL basado en superp铆xeles. Evaluando este m茅todo con la base de datos privada, obtuvimos 84.45% ACCi. Los resultados se validaron utilizando la base de datos DIARETDB1. Finalmente, propusimos un m茅todo de aprendizaje profundo para la graduaci贸n autom谩tica de la gravedad de la DR. El m茅todo se bas贸 en un mecanismo de atenci贸n. Se utiliz贸 la base de datos Kaggle y la Escala Cl铆nica Internacional de RD (5 niveles de severidad). Los resultados de clasificaci贸n para todas las clases alcanzaron una precisi贸n del 83,70% y un Kappa ponderado cuadr谩tico de 0,78. Los m茅todos propuestos en esta Tesis Doctoral forman un sistema completo y autom谩tico de cribado de RD, contribuyendo a ayudar en la detecci贸n precoz de la RD. De esta forma, los pacientes diab茅ticos podr铆an recibir una mejor atenci贸n para su salud ocular evitando la p茅rdida de visi贸n. Adem谩s, se podr铆a aliviar la carga de trabajo de los especialistas al mismo tiempo que se reducen los costes sanitarios.Escuela de DoctoradoDoctorado en Tecnolog铆as de la Informaci贸n y las Telecomunicacione
    corecore