15 research outputs found

    Detection of glaucoma using three-stage training with EfficientNet

    Full text link
    [EN] This paper sets forth a methodology that is based on three-stage-training of a state-of-the-art network architecture previously trained on Imagenet, and iteratively finetuned in three steps; freezing first all layers, then re-training a specific number of them and finally training all the architecture from scratch, to achieve a system with high accuracy and reliability. To determine the performance of our technique a dataset consisting of 17.070 color cropped samples of fundus images, and that includes two classes, normal and abnormal, is used. Extensive evaluations using baselines models (VGG16, InceptionV3 and Resnet50) are carried out, in addition to thorough experimentation with the proposed pipeline using variants of EfficientNet and EfficientNetV2. The training procedure is described accurately, putting emphasis on the number of parameters trained, the confusion matrices (with analysis of false positives and false negatives), accuracy, and F1-score obtained at each stage of the proposed methodology. The results achieved show that the intelligent system presented for the task at hand is reliable, presents high precision, its predictions are consistent and the number of parameters needed to train are low compared to other alternatives.This work is supported by the HK Innovation and Technology Commission (InnoHK Project CIMDA), the HK Research Grants Council (Project CityU 11204821) and City University of Hong Kong (Project 9610034). We acknowledge the support of Universitat Politècnica de València; R&D project PID2021-122580NB-I00, funded by MCIN/AEI/ 10.13039/501100011033 and ERDF.De Zarzà, I.; De Curtò, J.; Tavares De Araujo Cesariny Calafate, CM. (2022). Detection of glaucoma using three-stage training with EfficientNet. Intelligent Systems with Applications. 16:1-10. https://doi.org/10.1016/j.iswa.2022.2001401101

    Construction et Performance des Codes de Réseau

    No full text
    The main goal of this work is to implement and provide a theoretical description for different schemes of Physical-layer Network Coding. Using a basic scheme as starting point, the project presents the construction and performance of different systems of communications with increasing complexity.The project is structured in different parts: first, an introduction to Physical-layer Network Coding and Lattice Network Codes is done. Next, the mathematical tools needed to understand the system of Compute and Forward (C&F) are presented. Further, the first basic scheme is analysed and implemented. The next step consists on implementing a vectorial C&F System and a HAMMING q-ary coded version. Finally, different approaches to improve the matrix coefficient A are studied and implemented.L'objectif principal de ce travail est de mettre en œuvre et de fournir une description théorique de différents schémas de codage de réseau de couche physique. En utilisant un schéma de base comme point de départ, le projet présente la construction et la performance de différents systèmes de communication avec une complexité croissante.Le projet est structuré en différentes parties: tout d'abord, une introduction au codage de réseau de couche physique et aux codes de réseau en treillis est effectuée. Ensuite, les outils mathématiques nécessaires pour comprendre le système de Compute and Forward (C&F) sont présentés. En outre, le premier schéma de base est analysé et mis en œuvre. L'étape suivante consiste à implémenter un système C&F vectoriel et une version codée HAMMING q-aire. Enfin, différentes approches pour améliorer le coefficient de matrice A sont étudiées et mises en œuvre

    Vision et apprentissage dans le contexte des rovers planétaires

    No full text
    Generative Adversarial Networks (GANs) hatten enorme Anwendungen in Computer Vision. Im Kontext der Weltraumforschung und der Erforschung der Planeten steht die Tür jedoch offen für große Fortschritte. Wir stellen Werkzeuge für den Umgang mit Planetendaten aus der Mission Chang'E-4 vor und präsentieren ein Framework für die Übertragung des neuronalen Stils unter Verwendung der Zykluskonsistenz aus gerenderten Bildern.Wir führen auch eine neue Echtzeit-Pipeline für Simultaneous Localization and Mapping (SLAM) und Visual Inertial Odometry (VIO) im Kontext von Planetenrovern ein. Wir nutzen vorherige Informationen über den Standort des Landers, um einen SLAM-Ansatz auf Objektebene vorzuschlagen, der die Pose und Form des Landers zusammen mit den Kameratrajektorien des Rovers optimiert. Als weiteren Verfeinerungsschritt schlagen wir vor, Interpolationstechniken zwischen benachbarten zeitlichen Abtastwerten zu verwenden. videlicet synthetisiert nicht vorhandene Bilder, um die Gesamtgenauigkeit des Systems zu verbessern.Die Experimente werden im Rahmen des Iris Lunar Rover durchgeführt, eines Nano-Rovers, der 2021 als Flaggschiff von Carnegie Mellon als erstem unbemannten Rover Amerikas auf dem Mond im Mondgelände eingesetzt wird.Generative Adversarial Networks (GANs) have had tremendous applications in Computer Vision. Yet, in the context of space science and planetary exploration the door is open for major advances. We introduce tools to handle planetary data from the mission Chang'E-4 and present a framework for Neural Style Transfer using Cycle-consistency from rendered images. We also introduce a new real-time pipeline for Simultaneous Localization and Mapping (SLAM) and Visual Inertial Odometry (VIO) in the context of planetary rovers. We leverage prior information of the location of the lander to propose an object-level SLAM approach that optimizes pose and shape of the lander together with camera trajectories of the rover. As a further refinement step, we propose to use techniques of interpolation between adjacent temporal samples; videlicet synthesizing non-existing images to improve the overall accuracy of the system.The experiments are conducted in the context of the Iris Lunar Rover, a nano-rover that will be deployed in lunar terrain in 2021 as the flagship of Carnegie Mellon, being the first unmanned rover of America to be on the Moon.Generative Adversarial Networks (GAN) ont eu d'énormes applications en vision par ordinateur. Pourtant, dans le contexte de la science spatiale et de l'exploration planétaire, la porte est ouverte à des avancées majeures. Nous introduisons des outils pour gérer les données planétaires de la mission Chang'E-4 et présentons un cadre pour le transfert de style neuronal utilisant la cohérence de cycle à partir d'images rendues. Nous introduisons également un nouveau pipeline en temps réel pour Simultaneous Localization and Mapping (SLAM) et Visual Inertial Odometry (VIO) dans le contexte des rovers planétaires. Nous exploitons les informations préalables sur l'emplacement de l'atterrisseur pour proposer une approche SLAM au niveau de l'objet qui optimise la pose et la forme de l'atterrisseur ainsi que les trajectoires de caméra du rover. Comme étape de raffinement supplémentaire, nous proposons d'utiliser des techniques d'interpolation entre échantillons temporels adjacents; videlicet synthétise des images inexistantes pour améliorer la précision globale du système. Les expériences sont menées dans le contexte de l'Iris Lunar Rover, un nano-rover qui sera déployé sur un terrain lunaire en 2021 en tant que vaisseau amiral de Carnegie Mellon, étant le premier rover sans pilote d'Amérique à être sur la Lune

    Signature and Log-Signature for the Study of Empirical Distributions Generated with GANs

    Full text link
    [EN] In this paper, we address the research gap in efficiently assessing Generative Adversarial Network (GAN) convergence and goodness of fit by introducing the application of the Signature Transform to measure similarity between image distributions. Specifically, we propose the novel use of Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) Signature, along with Log-Signature, as alternatives to existing methods such as Fréchet Inception Distance (FID) and Multi-Scale Structural Similarity Index Measure (MS-SSIM). Our approach offers advantages in terms of efficiency and effectiveness, providing a comprehensive understanding and extensive evaluations of GAN convergence and goodness of fit. Furthermore, we present innovative analytical measures based on statistics by means of Kruskal--Wallis to evaluate the goodness of fit of GAN sample distributions. Unlike existing GAN measures, which are based on deep neural networks and require extensive GPU computations, our approach significantly reduces computation time and is performed on the CPU while maintaining the same level of accuracy. Our results demonstrate the effectiveness of the proposed method in capturing the intrinsic structure of the generated samples, providing meaningful insights into GAN performance. Lastly, we evaluate our approach qualitatively using Principal Component Analysis (PCA) and adaptive t-Distributed Stochastic Neighbor Embedding (t-SNE) for data visualization, illustrating the plausibility of our method.This work was supported by the HK Innovation and Technology Commission (InnoHK Project CIMDA). We acknowledge the support of R&D project PID2021-122580NB-I00, funded by MCIN/AEI/10.13039/501100011033 and ERDF. We thank the following funding sources from GOETHE-University Frankfurt am Main; DePP Dezentrale Plannung von Platoons im Straßengüterverkehr mit Hilfe einer KI auf Basis einzelner LKW and Center for Data Science & AI .De Curtò, J.; De Zarzà, I.; Roig, G.; Tavares De Araujo Cesariny Calafate, CM. (2023). Signature and Log-Signature for the Study of Empirical Distributions Generated with GANs. Electronics. 12(10). https://doi.org/10.3390/electronics12102192121

    Optimizing Neural Networks for Imbalanced Data

    No full text
    Imbalanced datasets pose pervasive challenges in numerous machine learning (ML) applications, notably in areas such as fraud detection, where fraudulent cases are vastly outnumbered by legitimate transactions. Conventional ML methods often grapple with such imbalances, resulting in models with suboptimal performance concerning the minority class. This study undertakes a thorough examination of strategies for optimizing supervised learning algorithms when confronted with imbalanced datasets, emphasizing resampling techniques. Initially, we explore multiple methodologies, encompassing Gaussian Naive Bayes, linear and quadratic discriminant analysis, K-nearest neighbors (K-NN), support vector machines (SVMs), decision trees, and multi-layer perceptron (MLP). We apply these on a four-class spiral dataset, a notoriously demanding non-linear classification problem, to gauge their effectiveness. Subsequently, we leverage the garnered insights for a real-world credit card fraud detection task on a public dataset, where we achieve a compelling accuracy of 99.937%. In this context, we compare and contrast the performances of undersampling, oversampling, and the synthetic minority oversampling technique (SMOTE). Our findings highlight the potency of resampling strategies in augmenting model performance on the minority class; in particular, oversampling techniques achieve the best performance, resulting in an accuracy of 99.928% with a significantly low number of false negatives (21/227,451)

    Cascading and Ensemble Techniques in Deep Learning

    Full text link
    [EN] In this study, we explore the integration of cascading and ensemble techniques in Deep Learning (DL) to improve prediction accuracy on diabetes data. The primary approach involves creating multiple Neural Networks (NNs), each predicting the outcome independently, and then feeding these initial predictions into another set of NN. Our exploration starts from an initial preliminary study and extends to various ensemble techniques including bagging, stacking, and finally cascading. The cascading ensemble involves training a second layer of models on the predictions of the first. This cascading structure, combined with ensemble voting for the final prediction, aims to exploit the strengths of multiple models while mitigating their individual weaknesses. Our results demonstrate significant improvement in prediction accuracy, providing a compelling case for the potential utility of these techniques in healthcare applications, specifically for prediction of diabetes where we achieve compelling model accuracy of 91.5% on the test set on a particular challenging dataset, where we compare thoroughly against many other methodologies.We thank the following funding sources from GOETHE-University Frankfurt am Main; DePP Dezentrale Plannung von Platoons im Straßengüterverkehr mit Hilfe einer KI auf Basis einzelner LKW , Center for Data Science & AI and xAIBiology . We acknowledge the support of R&D project PID2021-122580NB-I00, funded by MCIN/AEI/10.13039/501100011033 and ERDF.De Zarzà, I.; De Curtò, J.; Hernández-Orallo, E.; Tavares De Araujo Cesariny Calafate, CM. (2023). Cascading and Ensemble Techniques in Deep Learning. Electronics. 12(15). https://doi.org/10.3390/electronics12153354121

    LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments

    Full text link
    [EN] In this paper, we introduce an innovative approach to handling the multi-armed bandit (MAB) problem in non-stationary environments, harnessing the predictive power of large language models (LLMs). With the realization that traditional bandit strategies, including epsilon-greedy and upper confidence bound (UCB), may struggle in the face of dynamic changes, we propose a strategy informed by LLMs that offers dynamic guidance on exploration versus exploitation, contingent on the current state of the bandits. We bring forward a new non-stationary bandit model with fluctuating reward distributions and illustrate how LLMs can be employed to guide the choice of bandit amid this variability. Experimental outcomes illustrate the potential of our LLM- informed strategy, demonstrating its adaptability to the fluctuating nature of the bandit problem, while maintaining competitive performance against conventional strategies. This study provides key insights into the capabilities of LLMs in enhancing decision-making processes in dynamic and uncertain scenarios.We acknowledge the support of Universitat Politècnica de València: R&D project PID2021-122580NB-I00, funded by MCIN/AEI/10.13039/501100011033 and ERDF. We thank the following funding sources from GOETHE-University Frankfurt am Main; DePP Dezentrale Plannung von Platoons im Straßengüterverkehr mit Hilfe einer KI auf Basis einzelner LKW , Center for Data Science & AI and xAIBiology .De Curtò, J.; De Zarzà, I.; Roig, G.; Cano, J.; Manzoni, P.; Tavares De Araujo Cesariny Calafate, CM. (2023). LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments. Electronics. 12(13). https://doi.org/10.3390/electronics12132814121

    LLM Multimodal Traffic Accident Forecasting

    No full text
    With the rise in traffic congestion in urban centers, predicting accidents has become paramount for city planning and public safety. This work comprehensively studied the efficacy of modern deep learning (DL) methods in forecasting traffic accidents and enhancing Level-4 and Level-5 (L-4 and L-5) driving assistants with actionable visual and language cues. Using a rich dataset detailing accident occurrences, we juxtaposed the Transformer model against traditional time series models like ARIMA and the more recent Prophet model. Additionally, through detailed analysis, we delved deep into feature importance using principal component analysis (PCA) loadings, uncovering key factors contributing to accidents. We introduce the idea of using real-time interventions with large language models (LLMs) in autonomous driving with the use of lightweight compact LLMs like LLaMA-2 and Zephyr-7b-α. Our exploration extends to the realm of multimodality, through the use of Large Language-and-Vision Assistant (LLaVA)—a bridge between visual and linguistic cues by means of a Visual Language Model (VLM)—in conjunction with deep probabilistic reasoning, enhancing the real-time responsiveness of autonomous driving systems. In this study, we elucidate the advantages of employing large multimodal models within DL and deep probabilistic programming for enhancing the performance and usability of time series forecasting and feature weight importance, particularly in a self-driving scenario. This work paves the way for safer, smarter cities, underpinned by data-driven decision making

    Summarization of Videos with the Signature Transform

    No full text
    This manuscript presents a new benchmark for assessing the quality of visual summaries without the need for human annotators. It is based on the Signature Transform, specifically focusing on the RMSE and the MAE Signature and Log-Signature metrics, and builds upon the assumption that uniform random sampling can offer accurate summarization capabilities. We provide a new dataset comprising videos from Youtube and their corresponding automatic audio transcriptions. Firstly, we introduce a preliminary baseline for automatic video summarization, which has at its core a Vision Transformer, an image–text model pre-trained with Contrastive Language–Image Pre-training (CLIP), as well as a module of object detection. Following that, we propose an accurate technique grounded in the harmonic components captured by the Signature Transform, which delivers compelling accuracy. The analytical measures are extensively evaluated, and we conclude that they strongly correlate with the notion of a good summary
    corecore