46 research outputs found

    BidNet : Binocular Image Dehazing without explicit disparity estimation

    Get PDF
    Heavy haze results in severe image degradation and thus hampers the performance of visual perception, object detection, etc. On the assumption that dehazed binocular images are superior to the hazy ones for stereo vision tasks such as 3D object detection and according to the fact that image haze is a function of depth, this paper proposes a Binocular image dehazing Network (BidNet) aiming at dehazing both the left and right images of binocular images within the deep learning framework. Existing binocular dehazing methods rely on simultaneously dehazing and estimating disparity, whereas BidNet does not need to explicitly perform time-consuming and well-known challenging disparity estimation. Note that a small error in disparity gives rise to a large variation in depth and in estimation of haze-free image. The relationship and correlation between binocular images are explored and encoded by the proposed Stereo Transformation Module (STM). Jointly dehazing binocular image pairs is mutually beneficial, which is better than only dehazing left images. We extend the Foggy Cityscapes dataset to a Stereo Foggy Cityscapes dataset with binocular foggy image pairs. Experimental results demonstrate that BidNet significantly outperforms state-of-the-art dehazing methods in both subjective and objective assessments

    Effective image enhancement and fast object detection for improved UAV applications

    Get PDF
    As an emerging field, unmanned aerial vehicles (UAVs) feature from interdisciplinary techniques in science, engineering and industrial sectors. The massive applications span from remote sensing, precision agriculture, marine inspection, coast guarding, environmental monitoring, natural resources monitoring, e.g. forest, land and river, and disaster assessment, to smart city, intelligent transportation and logistics and delivery. With the fast growing demands from a wide range of application sectors, there is always a bottleneck how to improve the efficiency and efficacy of UAV in operation. Often, smart decision making is needed from the captured footages in a real-time manner, yet this is severely affected by the poor image quality, ineffective object detection and recognition models, and lack of robust and light models for supporting the edge computing and real deployment. In this thesis, several innovative works have been focused and developed to tackle some of the above issues. First of all, considering the quality requirements of the UAV images, various approaches and models have been proposed, yet they focus on different aspects and produce inconsistent results. As such, the work in this thesis has been categorised into denoising and dehazing focused, followed by comprehensive evaluation in terms of both qualitative and quantitative assessment. These will provide valuable insights and useful guidance to help the end user and research community. For fast and effective object detection and recognition, deep learning based models, especially the YOLO series, are popularly used. However, taking the YOLOv7 as the baseline, the performance is very much affected by a few factors, such as the low quality of the UAV images and the high-level of demanding of resources, leading to unsatisfactory performance in accuracy and processing speed. As a result, three major improvements, namely transformer, CIoULoss and the GhostBottleneck module, are introduced in this work to improve feature extraction, decision making in detection and recognition, and running efficiency. Comprehensive experiments on both publicly available and self-collected datasets have validated the efficiency and efficacy of the proposed algorithm. In addition, to facilitate the real deployment such as edge computing scenarios, embedded implementation of the key algorithm modules is introduced. These include the creative implementation on the Xavier NX platform, in comparison to the standard workstation settings with the NVIDIA GPUs. As a result, it has demonstrated promising results with improved performance in reduced resources consumption of the CPU/GPU usage and enhanced frame rate of real-time processing to benefit the real-time deployment with the uncompromised edge computing. Through these innovative investigation and development, a better understanding has been established on key challenges associated with UAV and Simultaneous Localisation and Mapping (SLAM) based applications, and possible solutions are presented. Keywords: Unmanned aerial vehicles (UAV); Simultaneous Localisation and Mapping (SLAM); denoising; dehazing; object detection; object recognition; deep learning; YOLOv7; transformer; GhostBottleneck; scene matching; embedded implementation; Xavier NX; edge computing.As an emerging field, unmanned aerial vehicles (UAVs) feature from interdisciplinary techniques in science, engineering and industrial sectors. The massive applications span from remote sensing, precision agriculture, marine inspection, coast guarding, environmental monitoring, natural resources monitoring, e.g. forest, land and river, and disaster assessment, to smart city, intelligent transportation and logistics and delivery. With the fast growing demands from a wide range of application sectors, there is always a bottleneck how to improve the efficiency and efficacy of UAV in operation. Often, smart decision making is needed from the captured footages in a real-time manner, yet this is severely affected by the poor image quality, ineffective object detection and recognition models, and lack of robust and light models for supporting the edge computing and real deployment. In this thesis, several innovative works have been focused and developed to tackle some of the above issues. First of all, considering the quality requirements of the UAV images, various approaches and models have been proposed, yet they focus on different aspects and produce inconsistent results. As such, the work in this thesis has been categorised into denoising and dehazing focused, followed by comprehensive evaluation in terms of both qualitative and quantitative assessment. These will provide valuable insights and useful guidance to help the end user and research community. For fast and effective object detection and recognition, deep learning based models, especially the YOLO series, are popularly used. However, taking the YOLOv7 as the baseline, the performance is very much affected by a few factors, such as the low quality of the UAV images and the high-level of demanding of resources, leading to unsatisfactory performance in accuracy and processing speed. As a result, three major improvements, namely transformer, CIoULoss and the GhostBottleneck module, are introduced in this work to improve feature extraction, decision making in detection and recognition, and running efficiency. Comprehensive experiments on both publicly available and self-collected datasets have validated the efficiency and efficacy of the proposed algorithm. In addition, to facilitate the real deployment such as edge computing scenarios, embedded implementation of the key algorithm modules is introduced. These include the creative implementation on the Xavier NX platform, in comparison to the standard workstation settings with the NVIDIA GPUs. As a result, it has demonstrated promising results with improved performance in reduced resources consumption of the CPU/GPU usage and enhanced frame rate of real-time processing to benefit the real-time deployment with the uncompromised edge computing. Through these innovative investigation and development, a better understanding has been established on key challenges associated with UAV and Simultaneous Localisation and Mapping (SLAM) based applications, and possible solutions are presented. Keywords: Unmanned aerial vehicles (UAV); Simultaneous Localisation and Mapping (SLAM); denoising; dehazing; object detection; object recognition; deep learning; YOLOv7; transformer; GhostBottleneck; scene matching; embedded implementation; Xavier NX; edge computing

    Scene understanding for interactive applications

    Get PDF
    Para interactuar con el entorno, es necesario entender que está ocurriendo en la escena donde se desarrolla la acción. Décadas de investigación en el campo de la visión por computador han contribuido a conseguir sistemas que permiten interpretar de manera automática el contenido en una escena a partir de información visual. Se podría decir el objetivo principal de estos sistemas es replicar la capacidad humana para extraer toda la información a partir solo de datos visuales. Por ejemplo, uno de sus objetivos es entender como percibimosel mundo en tres dimensiones o como podemos reconocer sitios y objetos a pesar de la gran variación en su apariencia. Una de las tareas básicas para entender una escena es asignar un significado semántico a cada elemento (píxel) de una imagen. Esta tarea se puede formular como un problema de etiquetado denso el cual especifica valores (etiquetas) a cada pixel o región de una imagen. Dependiendo de la aplicación, estas etiquetas puedenrepresentar conceptos muy diferentes, desde magnitudes físicas como la información de profundidad, hasta información semántica, como la categoría de un objeto. El objetivo general en esta tesis es investigar y desarrollar nuevas técnicas para incorporar automáticamente una retroalimentación por parte del usuario, o un conocimiento previo en sistemas inteligente para conseguir analizar automáticamente el contenido de una escena. en particular,esta tesis explora dos fuentes comunes de información previa proporcionado por los usuario: interacción humana y etiquetado manual de datos de ejemplo.La primera parte de esta tesis esta dedicada a aprendizaje de información de una escena a partir de información proporcionada de manera interactiva por un usuario. Las soluciones que involucran a un usuario imponen limitaciones en el rendimiento, ya que la respuesta que se le da al usuario debe obtenerse en un tiempo interactivo. Esta tesis presenta un paradigma eficiente que aproxima cualquier magnitud por píxel a partir de unos pocos trazos del usuario. Este sistema propaga los escasos datos de entrada proporcionados por el usuario a cada píxel de la imagen. El paradigma propuesto se ha validado a través detres aplicaciones interactivas para editar imágenes, las cuales requieren un conocimiento por píxel de una cierta magnitud, con el objetivo de simular distintos efectos.Otra estrategia común para aprender a partir de información de usuarios es diseñar sistemas supervisados de aprendizaje automático. En los últimos años, las redes neuronales convolucionales han superado el estado del arte de gran variedad de problemas de reconocimiento visual. Sin embargo, para nuevas tareas, los datos necesarios de entrenamiento pueden no estar disponibles y recopilar suficientes no es siempre posible. La segunda parte de esta tesis explora como mejorar los sistema que aprenden etiquetado denso semántico a partir de imágenes previamente etiquetadas por los usuarios. En particular, se presenta y validan estrategias, basadas en los dos principales enfoques para transferir modelos basados en deep learning, para segmentación semántica, con el objetivo de poder aprender nuevas clases cuando los datos de entrenamiento no son suficientes en cantidad o precisión.Estas estrategias se han validado en varios entornos realistas muy diferentes, incluyendo entornos urbanos, imágenes aereas y imágenes submarinas.In order to interact with the environment, it is necessary to understand what is happening on it, on the scene where the action is ocurring. Decades of research in the computer vision field have contributed towards automatically achieving this scene understanding from visual information. Scene understanding is a very broad area of research within the computer vision field. We could say that it tries to replicate the human capability of extracting plenty of information from visual data. For example, we would like to understand how the people perceive the world in three dimensions or can quickly recognize places or objects despite substantial appearance variation. One of the basic tasks in scene understanding from visual data is to assign a semantic meaning to every element of the image, i.e., assign a concept or object label to every pixel in the image. This problem can be formulated as a dense image labeling problem which assigns specific values (labels) to each pixel or region in the image. Depending on the application, the labels can represent very different concepts, from a physical magnitude, such as depth information, to high level semantic information, such as an object category. The general goal in this thesis is to investigate and develop new ways to automatically incorporate human feedback or prior knowledge in intelligent systems that require scene understanding capabilities. In particular, this thesis explores two common sources of prior information from users: human interactions and human labeling of sample data. The first part of this thesis is focused on learning complex scene information from interactive human knowledge. Interactive user solutions impose limitations on the performance where the feedback to the user must be at interactive rates. This thesis presents an efficient interaction paradigm that approximates any per-pixel magnitude from a few user strokes. It propagates the sparse user input to each pixel of the image. We demonstrate the suitability of the proposed paradigm through three interactive image editing applications which require per-pixel knowledge of certain magnitude: simulate the effect of depth of field, dehazing and HDR tone mapping. Other common strategy to learn from user prior knowledge is to design supervised machine-learning approaches. In the last years, Convolutional Neural Networks (CNNs) have pushed the state-of-the-art on a broad variety of visual recognition problems. However, for new tasks, enough training data is not always available and therefore, training from scratch is not always feasible. The second part of this thesis investigates how to improve systems that learn dense semantic labeling of images from user labeled examples. In particular, we present and validate strategies, based on common transfer learning approaches, for semantic segmentation. The goal of these strategies is to learn new specific classes when there is not enough labeled data to train from scratch. We evaluate these strategies across different environments, such as autonomous driving scenes, aerial images or underwater ones.<br /

    Neural Spectro-polarimetric Fields

    Full text link
    Modeling the spatial radiance distribution of light rays in a scene has been extensively explored for applications, including view synthesis. Spectrum and polarization, the wave properties of light, are often neglected due to their integration into three RGB spectral bands and their non-perceptibility to human vision. Despite this, these properties encompass substantial material and geometric information about a scene. In this work, we propose to model spectro-polarimetric fields, the spatial Stokes-vector distribution of any light ray at an arbitrary wavelength. We present Neural Spectro-polarimetric Fields (NeSpoF), a neural representation that models the physically-valid Stokes vector at given continuous variables of position, direction, and wavelength. NeSpoF manages inherently noisy raw measurements, showcases memory efficiency, and preserves physically vital signals, factors that are crucial for representing the high-dimensional signal of a spectro-polarimetric field. To validate NeSpoF, we introduce the first multi-view hyperspectral-polarimetric image dataset, comprised of both synthetic and real-world scenes. These were captured using our compact hyperspectral-polarimetric imaging system, which has been calibrated for robustness against system imperfections. We demonstrate the capabilities of NeSpoF on diverse scenes

    Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset

    Full text link
    Image datasets are essential not only in validating existing methods in computer vision but also in developing new methods. Most existing image datasets focus on trichromatic intensity images to mimic human vision. However, polarization and spectrum, the wave properties of light that animals in harsh environments and with limited brain capacity often rely on, remain underrepresented in existing datasets. Although spectro-polarimetric datasets exist, these datasets have insufficient object diversity, limited illumination conditions, linear-only polarization data, and inadequate image count. Here, we introduce two spectro-polarimetric datasets: trichromatic Stokes images and hyperspectral Stokes images. These novel datasets encompass both linear and circular polarization; they introduce multiple spectral channels; and they feature a broad selection of real-world scenes. With our dataset in hand, we analyze the spectro-polarimetric image statistics, develop efficient representations of such high-dimensional data, and evaluate spectral dependency of shape-from-polarization methods. As such, the proposed dataset promises a foundation for data-driven spectro-polarimetric imaging and vision research. Dataset and code will be publicly available

    REFUGE Challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs

    Full text link
    [EN] Glaucoma is one of the leading causes of irreversible but preventable blindness in working age populations. Color fundus photography (CFP) is the most cost-effective imaging modality to screen for retinal disorders. However, its application to glaucoma has been limited to the computation of a few related biomarkers such as the vertical cup-to-disc ratio. Deep learning approaches, although widely applied for medical image analysis, have not been extensively used for glaucoma assessment due to the limited size of the available data sets. Furthermore, the lack of a standardize benchmark strategy makes difficult to compare existing methods in a uniform way. In order to overcome these issues we set up the Retinal Fundus Glaucoma Challenge, REFUGE (https://refuge.grand-challenge.org), held in conjunction with MIC-CAI 2018. The challenge consisted of two primary tasks, namely optic disc/cup segmentation and glaucoma classification. As part of REFUGE, we have publicly released a data set of 1200 fundus images with ground truth segmentations and clinical glaucoma labels, currently the largest existing one. We have also built an evaluation framework to ease and ensure fairness in the comparison of different models, encouraging the development of novel techniques in the field. 12 teams qualified and participated in the online challenge. This paper summarizes their methods and analyzes their corresponding results. In particular, we observed that two of the top-ranked teams outperformed two human experts in the glaucoma classification task. Furthermore, the segmentation results were in general consistent with the ground truth annotations, with complementary outcomes that can be further exploited by ensembling the results.This work was supported by the Christian Doppler Research Association, the Austrian Federal Ministry for Digital and Economic Affairs and the National Foundation for Research, Technology and Development, J.I.O is supported by WWTF (Medical University of Vienna: AugUniWien/FA7464A0249, University of Vienna: VRG12- 009). Team Masker is supported by Natural Science Foundation of Guangdong Province of China (Grant 2017A030310647). Team BUCT is partially supported by the National Natural Science Foundation of China (Grant 11571031). The authors would also like to thank REFUGE study group for collaborating with this challenge.Orlando, JI.; Fu, H.; Breda, JB.; Van Keer, K.; Bathula, DR.; Diaz-Pinto, A.; Fang, R.... (2020). REFUGE Challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs. Medical Image Analysis. 59:1-21. https://doi.org/10.1016/j.media.2019.101570S12159Abramoff, M. D., Garvin, M. K., & Sonka, M. (2010). Retinal Imaging and Image Analysis. IEEE Reviews in Biomedical Engineering, 3, 169-208. doi:10.1109/rbme.2010.2084567Abràmoff, M. D., Lavin, P. T., Birch, M., Shah, N., & Folk, J. C. (2018). Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. npj Digital Medicine, 1(1). doi:10.1038/s41746-018-0040-6Al-Bander, B., Williams, B., Al-Nuaimy, W., Al-Taee, M., Pratt, H., & Zheng, Y. (2018). Dense Fully Convolutional Segmentation of the Optic Disc and Cup in Colour Fundus for Glaucoma Diagnosis. Symmetry, 10(4), 87. doi:10.3390/sym10040087Almazroa, A., Burman, R., Raahemifar, K., & Lakshminarayanan, V. (2015). Optic Disc and Optic Cup Segmentation Methodologies for Glaucoma Image Detection: A Survey. Journal of Ophthalmology, 2015, 1-28. doi:10.1155/2015/180972Burlina, P. M., Joshi, N., Pekala, M., Pacheco, K. D., Freund, D. E., & Bressler, N. M. (2017). Automated Grading of Age-Related Macular Degeneration From Color Fundus Images Using Deep Convolutional Neural Networks. JAMA Ophthalmology, 135(11), 1170. doi:10.1001/jamaophthalmol.2017.3782Carmona, E. J., Rincón, M., García-Feijoó, J., & Martínez-de-la-Casa, J. M. (2008). Identification of the optic nerve head with genetic algorithms. Artificial Intelligence in Medicine, 43(3), 243-259. doi:10.1016/j.artmed.2008.04.005Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic Minority Over-sampling Technique. Journal of Artificial Intelligence Research, 16, 321-357. doi:10.1613/jair.953Christopher, M., Belghith, A., Bowd, C., Proudfoot, J. A., Goldbaum, M. H., Weinreb, R. N., … Zangwill, L. M. (2018). Performance of Deep Learning Architectures and Transfer Learning for Detecting Glaucomatous Optic Neuropathy in Fundus Photographs. Scientific Reports, 8(1). doi:10.1038/s41598-018-35044-9De Fauw, J., Ledsam, J. R., Romera-Paredes, B., Nikolov, S., Tomasev, N., Blackwell, S., … Ronneberger, O. (2018). Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature Medicine, 24(9), 1342-1350. doi:10.1038/s41591-018-0107-6Decencière, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B., Trone, C., … Klein, J.-C. (2014). FEEDBACK ON A PUBLICLY DISTRIBUTED IMAGE DATABASE: THE MESSIDOR DATABASE. Image Analysis & Stereology, 33(3), 231. doi:10.5566/ias.1155DeLong, E. R., DeLong, D. M., & Clarke-Pearson, D. L. (1988). Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach. Biometrics, 44(3), 837. doi:10.2307/2531595European Glaucoma Society Terminology and Guidelines for Glaucoma, 4th Edition - Part 1Supported by the EGS Foundation. (2017). British Journal of Ophthalmology, 101(4), 1-72. doi:10.1136/bjophthalmol-2016-egsguideline.001Farbman, Z., Fattal, R., Lischinski, D., & Szeliski, R. (2008). Edge-preserving decompositions for multi-scale tone and detail manipulation. ACM Transactions on Graphics, 27(3), 1-10. doi:10.1145/1360612.1360666Fu, H., Cheng, J., Xu, Y., Wong, D. W. K., Liu, J., & Cao, X. (2018). Joint Optic Disc and Cup Segmentation Based on Multi-Label Deep Network and Polar Transformation. IEEE Transactions on Medical Imaging, 37(7), 1597-1605. doi:10.1109/tmi.2018.2791488Gómez-Valverde, J. J., Antón, A., Fatti, G., Liefers, B., Herranz, A., Santos, A., … Ledesma-Carbayo, M. J. (2019). Automatic glaucoma classification using color fundus images based on convolutional neural networks and transfer learning. Biomedical Optics Express, 10(2), 892. doi:10.1364/boe.10.000892Gulshan, V., Peng, L., Coram, M., Stumpe, M. C., Wu, D., Narayanaswamy, A., … Webster, D. R. (2016). Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA, 316(22), 2402. doi:10.1001/jama.2016.17216Hagiwara, Y., Koh, J. E. W., Tan, J. H., Bhandary, S. V., Laude, A., Ciaccio, E. J., … Acharya, U. R. (2018). Computer-aided diagnosis of glaucoma using fundus images: A review. Computer Methods and Programs in Biomedicine, 165, 1-12. doi:10.1016/j.cmpb.2018.07.012Haleem, M. S., Han, L., van Hemert, J., & Li, B. (2013). Automatic extraction of retinal features from colour retinal images for glaucoma diagnosis: A review. Computerized Medical Imaging and Graphics, 37(7-8), 581-596. doi:10.1016/j.compmedimag.2013.09.005Holm, S., Russell, G., Nourrit, V., & McLoughlin, N. (2017). DR HAGIS—a fundus image database for the automatic extraction of retinal surface vessels from diabetic patients. Journal of Medical Imaging, 4(1), 014503. doi:10.1117/1.jmi.4.1.014503Joshi, G. D., Sivaswamy, J., & Krishnadas, S. R. (2011). Optic Disk and Cup Segmentation From Monocular Color Retinal Images for Glaucoma Assessment. IEEE Transactions on Medical Imaging, 30(6), 1192-1205. doi:10.1109/tmi.2011.2106509Kaggle, 2015. Diabetic Retinopathy Detection. https://www.kaggle.com/c/diabetic-retinopathy-detection. [Online; accessed 10-January-2019].Kumar, J. R. H., Seelamantula, C. S., Kamath, Y. S., & Jampala, R. (2019). Rim-to-Disc Ratio Outperforms Cup-to-Disc Ratio for Glaucoma Prescreening. Scientific Reports, 9(1). doi:10.1038/s41598-019-43385-2Lavinsky, F., Wollstein, G., Tauber, J., & Schuman, J. S. (2017). The Future of Imaging in Detecting Glaucoma Progression. Ophthalmology, 124(12), S76-S82. doi:10.1016/j.ophtha.2017.10.011Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324. doi:10.1109/5.726791Li, Z., He, Y., Keel, S., Meng, W., Chang, R. T., & He, M. (2018). Efficacy of a Deep Learning System for Detecting Glaucomatous Optic Neuropathy Based on Color Fundus Photographs. Ophthalmology, 125(8), 1199-1206. doi:10.1016/j.ophtha.2018.01.023Litjens, G., Kooi, T., Bejnordi, B. E., Setio, A. A. A., Ciompi, F., Ghafoorian, M., … Sánchez, C. I. (2017). A survey on deep learning in medical image analysis. Medical Image Analysis, 42, 60-88. doi:10.1016/j.media.2017.07.005Liu, S., Graham, S. L., Schulz, A., Kalloniatis, M., Zangerl, B., Cai, W., … You, Y. (2018). A Deep Learning-Based Algorithm Identifies Glaucomatous Discs Using Monoscopic Fundus Photographs. Ophthalmology Glaucoma, 1(1), 15-22. doi:10.1016/j.ogla.2018.04.002Lowell, J., Hunter, A., Steel, D., Basu, A., Ryder, R., Fletcher, E., & Kennedy, L. (2004). Optic Nerve Head Segmentation. IEEE Transactions on Medical Imaging, 23(2), 256-264. doi:10.1109/tmi.2003.823261Maier-Hein, L., Eisenmann, M., Reinke, A., Onogur, S., Stankovic, M., Scholz, P., … Kopp-Schneider, A. (2018). Why rankings of biomedical image analysis competitions should be interpreted with care. Nature Communications, 9(1). doi:10.1038/s41467-018-07619-7Miri, M. S., Abramoff, M. D., Lee, K., Niemeijer, M., Wang, J.-K., Kwon, Y. H., & Garvin, M. K. (2015). Multimodal Segmentation of Optic Disc and Cup From SD-OCT and Color Fundus Photographs Using a Machine-Learning Graph-Based Approach. IEEE Transactions on Medical Imaging, 34(9), 1854-1866. doi:10.1109/tmi.2015.2412881Niemeijer, M., van Ginneken, B., Cree, M. J., Mizutani, A., Quellec, G., Sanchez, C. I., … Abramoff, M. D. (2010). Retinopathy Online Challenge: Automatic Detection of Microaneurysms in Digital Color Fundus Photographs. IEEE Transactions on Medical Imaging, 29(1), 185-195. doi:10.1109/tmi.2009.2033909Odstrcilik, J., Kolar, R., Budai, A., Hornegger, J., Jan, J., Gazarek, J., … Angelopoulou, E. (2013). Retinal vessel segmentation by improved matched filtering: evaluation on a new high‐resolution fundus image database. IET Image Processing, 7(4), 373-383. doi:10.1049/iet-ipr.2012.0455Orlando, J. I., Prokofyeva, E., & Blaschko, M. B. (2017). A Discriminatively Trained Fully Connected Conditional Random Field Model for Blood Vessel Segmentation in Fundus Images. IEEE Transactions on Biomedical Engineering, 64(1), 16-27. doi:10.1109/tbme.2016.2535311Park, S. J., Shin, J. Y., Kim, S., Son, J., Jung, K.-H., & Park, K. H. (2018). A Novel Fundus Image Reading Tool for Efficient Generation of a Multi-dimensional Categorical Image Database for Machine Learning Algorithm Training. Journal of Korean Medical Science, 33(43). doi:10.3346/jkms.2018.33.e239Poplin, R., Varadarajan, A. V., Blumer, K., Liu, Y., McConnell, M. V., Corrado, G. S., … Webster, D. R. (2018). Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nature Biomedical Engineering, 2(3), 158-164. doi:10.1038/s41551-018-0195-0Porwal, P., Pachade, S., Kamble, R., Kokare, M., Deshmukh, G., Sahasrabuddhe, V., & Meriaudeau, F. (2018). Indian Diabetic Retinopathy Image Dataset (IDRiD): A Database for Diabetic Retinopathy Screening Research. Data, 3(3), 25. doi:10.3390/data3030025Prokofyeva, E., & Zrenner, E. (2012). Epidemiology of Major Eye Diseases Leading to Blindness in Europe: A Literature Review. Ophthalmic Research, 47(4), 171-188. doi:10.1159/000329603Raghavendra, U., Fujita, H., Bhandary, S. V., Gudigar, A., Tan, J. H., & Acharya, U. R. (2018). Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus images. Information Sciences, 441, 41-49. doi:10.1016/j.ins.2018.01.051Reis, A. S. C., Sharpe, G. P., Yang, H., Nicolela, M. T., Burgoyne, C. F., & Chauhan, B. C. (2012). Optic Disc Margin Anatomy in Patients with Glaucoma and Normal Controls with Spectral Domain Optical Coherence Tomography. Ophthalmology, 119(4), 738-747. doi:10.1016/j.ophtha.2011.09.054Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., … Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3), 211-252. doi:10.1007/s11263-015-0816-ySchmidt-Erfurth, U., Sadeghipour, A., Gerendas, B. S., Waldstein, S. M., & Bogunović, H. (2018). Artificial intelligence in retina. Progress in Retinal and Eye Research, 67, 1-29. doi:10.1016/j.preteyeres.2018.07.004Sevastopolsky, A. (2017). Optic disc and cup segmentation methods for glaucoma detection with modification of U-Net convolutional neural network. Pattern Recognition and Image Analysis, 27(3), 618-624. doi:10.1134/s1054661817030269Taha, A. A., & Hanbury, A. (2015). Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Medical Imaging, 15(1). doi:10.1186/s12880-015-0068-xThakur, N., & Juneja, M. (2018). Survey on segmentation and classification approaches of optic cup and optic disc for diagnosis of glaucoma. Biomedical Signal Processing and Control, 42, 162-189. doi:10.1016/j.bspc.2018.01.014Tham, Y.-C., Li, X., Wong, T. Y., Quigley, H. A., Aung, T., & Cheng, C.-Y. (2014). Global Prevalence of Glaucoma and Projections of Glaucoma Burden through 2040. Ophthalmology, 121(11), 2081-2090. doi:10.1016/j.ophtha.2014.05.013Johnson, S. S., Wang, J.-K., Islam, M. S., Thurtell, M. J., Kardon, R. H., & Garvin, M. K. (2018). Local Estimation of the Degree of Optic Disc Swelling from Color Fundus Photography. Lecture Notes in Computer Science, 277-284. doi:10.1007/978-3-030-00949-6_33Trucco, E., Ruggeri, A., Karnowski, T., Giancardo, L., Chaum, E., Hubschman, J. P., … Dhillon, B. (2013). Validating Retinal Fundus Image Analysis Algorithms: Issues and a Proposal. Investigative Opthalmology & Visual Science, 54(5), 3546. doi:10.1167/iovs.12-10347Vergara, I. A., Norambuena, T., Ferrada, E., Slater, A. W., & Melo, F. (2008). StAR: a simple tool for the statistical comparison of ROC curves. BMC Bioinformatics, 9(1). doi:10.1186/1471-2105-9-265Wu, Z., Shen, C., & van den Hengel, A. (2019). Wider or Deeper: Revisiting the ResNet Model for Visual Recognition. Pattern Recognition, 90, 119-133. doi:10.1016/j.patcog.2019.01.006Zheng, Y., Hijazi, M. H. A., & Coenen, F. (2012). Automated «Disease/No Disease» Grading of Age-Related Macular Degeneration by an Image Mining Approach. Investigative Opthalmology & Visual Science, 53(13), 8310. doi:10.1167/iovs.12-957

    A deep learning approach towards railway safety risk assessment

    Get PDF
    Railway stations are essential aspects of railway systems, and they play a vital role in public daily life. Various types of AI technology have been utilised in many fields to ensure the safety of people and their assets. In this paper, we propose a novel framework that uses computer vision and pattern recognition to perform risk management in railway systems in which a convolutional neural network (CNN) is applied as a supervised machine learning model to identify risks. However, risk management in railway stations is challenging because stations feature dynamic and complex conditions. Despite extensive efforts by industry associations and researchers to reduce the number of accidents and injuries in this field, such incidents still occur. The proposed model offers a beneficial method for obtaining more accurate motion data, and it detects adverse conditions as soon as possible by capturing fall, slip and trip (FST) events in the stations that represent high-risk outcomes. The framework of the presented method is generalisable to a wide range of locations and to additional types of risks
    corecore