108 research outputs found

    FruitVegCNN: Power- and Memory-Efficient Classification of Fruits & Vegetables Using CNN in Mobile MPSoC

    Get PDF
    Fruit and vegetable classification using Convolutional Neural Networks (CNNs) has become a popular application in the agricultural industry, however, to the best of our knowledge no previously recorded study has designed and evaluated such an application on a mobile platform. In this paper, we propose a power-efficient CNN model, FruitVegCNN, to perform classification of fruits and vegetables in a mobile multi-processor system-on-a-chip (MPSoC). We also evaluated the efficacy of FruitVegCNN compared to popular state-of-the-art CNN models in real mobile plat- forms (Huawei P20 Lite and Samsung Galaxy Note 9) and experimental results show the efficacy and power efficiency of our proposed CNN architecture

    Regularization Through Simultaneous Learning: A Case Study on Plant Classification

    Full text link
    In response to the prevalent challenge of overfitting in deep neural networks, this paper introduces Simultaneous Learning, a regularization approach drawing on principles of Transfer Learning and Multi-task Learning. We leverage auxiliary datasets with the target dataset, the UFOP-HVD, to facilitate simultaneous classification guided by a customized loss function featuring an inter-group penalty. This experimental configuration allows for a detailed examination of model performance across similar (PlantNet) and dissimilar (ImageNet) domains, thereby enriching the generalizability of Convolutional Neural Network models. Remarkably, our approach demonstrates superior performance over models without regularization and those applying dropout regularization exclusively, enhancing accuracy by 5 to 22 percentage points. Moreover, when combined with dropout, the proposed approach improves generalization, securing state-of-the-art results for the UFOP-HVD challenge. The method also showcases efficiency with significantly smaller sample sizes, suggesting its broad applicability across a spectrum of related tasks. In addition, an interpretability approach is deployed to evaluate feature quality by analyzing class feature correlations within the network's convolutional layers. The findings of this study provide deeper insights into the efficacy of Simultaneous Learning, particularly concerning its interaction with the auxiliary and target datasets

    Harnessing Big Data Analytics for Healthcare: A Comprehensive Review of Frameworks, Implications, Applications, and Impacts

    Get PDF
    Big Data Analytics (BDA) has garnered significant attention in both academia and industries, particularly in sectors such as healthcare, owing to the exponential growth of data and advancements in technology. The integration of data from diverse sources and the utilization of advanced analytical techniques has the potential to revolutionize healthcare by improving diagnostic accuracy, enabling personalized medicine, and enhancing patient outcomes. In this paper, we aim to provide a comprehensive literature review on the application of big data analytics in healthcare, focusing on its ecosystem, applications, and data sources. To achieve this, an extensive analysis of scientific studies published between 2013 and 2023 was conducted and overall 180 scientific studies were thoroughly evaluated, establishing a strong foundation for future research and identifying collaboration opportunities in the healthcare domain. The study delves into various application areas of BDA in healthcare, highlights successful implementations, and explores their potential to enhance healthcare outcomes while reducing costs. Additionally, it outlines the challenges and limitations associated with BDA in healthcare, discusses modelling tools and techniques, showcases deployed solutions, and presents the advantages of BDA through various real-world use cases. Furthermore, this study identifies and discusses key open research challenges in the field of big data analytics in healthcare, aiming to push the boundaries and contribute to enhanced healthcare outcomes and decision-making processes

    Hierarchical Decomposition of Large Deep Networks

    Get PDF
    Teaching computers how to recognize people and objects from visual cues in images and videos is an interesting challenge. The computer vision and pattern recognition communities have already demonstrated the ability of intelligent algorithms to detect and classify objects in difficult conditions such as pose, occlusions and image fidelity. Recent deep learning approaches in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) are built using very large and deep convolution neural network architectures. In 2015, such architectures outperformed human performance (94.9% human vs 95.06% machine) for top-5 validation accuracies on the ImageNet dataset, and earlier this year deep learning approaches demonstrated a remarkable 96.43% accuracy. These successes have been made possible by deep architectures such as VGG, GoogLeNet, and most recently by deep residual models with as many as 152 weight layers. Training of these deep models is a difficult task due to compute intensive learning of millions of parameters. Due to the inevitability of these parameters, very small filters of size 3x3 are used in convolutional layers to reduce the parameters in very deep networks. On the other hand, deep networks generalize well on other datasets and outperform complex datasets with less features or Images. This thesis proposes a robust approach for large scale visual recognition by introducing a framework that automatically analyses the similarity between different classes among the dataset and configures a family of smaller networks that replace a single larger network. Classes that are similar are grouped together and are learnt by a smaller network. This allows one to divide and conquer the large classification problem by identifying the class category from its coarse label to its fine label, deploying two or more stages of networks. In this way the proposed framework learns the natural hierarchy and effectively uses it for the classification problem. A comprehensive analysis of the proposed methods show that hierarchical models outperform traditional models in terms of accuracy, reduced computations and attribute to expanding the ability to learn large scale visual information effectively

    Deep learning of representations and its application to computer vision

    Get PDF
    L’objectif de cette thèse par articles est de présenter modestement quelques étapes du parcours qui mènera (on espère) à une solution générale du problème de l’intelligence artificielle. Cette thèse contient quatre articles qui présentent chacun une différente nouvelle méthode d’inférence perceptive en utilisant l’apprentissage machine et, plus particulièrement, les réseaux neuronaux profonds. Chacun de ces documents met en évidence l’utilité de sa méthode proposée dans le cadre d’une tâche de vision par ordinateur. Ces méthodes sont applicables dans un contexte plus général, et dans certains cas elles on tété appliquées ailleurs, mais ceci ne sera pas abordé dans le contexte de cette de thèse. Dans le premier article, nous présentons deux nouveaux algorithmes d’inférence variationelle pour le modèle génératif d’images appelé codage parcimonieux “spike- and-slab” (CPSS). Ces méthodes d’inférence plus rapides nous permettent d’utiliser des modèles CPSS de tailles beaucoup plus grandes qu’auparavant. Nous démontrons qu’elles sont meilleures pour extraire des détecteur de caractéristiques quand très peu d’exemples étiquetés sont disponibles pour l’entraînement. Partant d’un modèle CPSS, nous construisons ensuite une architecture profonde, la machine de Boltzmann profonde partiellement dirigée (MBP-PD). Ce modèle a été conçu de manière à simplifier d’entraînement des machines de Boltzmann profondes qui nécessitent normalement une phase de pré-entraînement glouton pour chaque couche. Ce problème est réglé dans une certaine mesure, mais le coût d’inférence dans le nouveau modèle est relativement trop élevé pour permettre de l’utiliser de manière pratique. Dans le deuxième article, nous revenons au problème d’entraînement joint de machines de Boltzmann profondes. Cette fois, au lieu de changer de famille de modèles, nous introduisons un nouveau critère d’entraînement qui donne naissance aux machines de Boltzmann profondes à multiples prédictions (MBP-MP). Les MBP-MP sont entraînables en une seule étape et ont un meilleur taux de succès en classification que les MBP classiques. Elles s’entraînent aussi avec des méthodes variationelles standard au lieu de nécessiter un classificateur discriminant pour obtenir un bon taux de succès en classification. Par contre, un des inconvénients de tels modèles est leur incapacité de générer deséchantillons, mais ceci n’est pas trop grave puisque la performance de classification des machines de Boltzmann profondes n’est plus une priorité étant donné les dernières avancées en apprentissage supervisé. Malgré cela, les MBP-MP demeurent intéressantes parce qu’elles sont capable d’accomplir certaines tâches que des modèles purement supervisés ne peuvent pas faire, telles que celle de classifier des données incomplètes ou encore celle de combler intelligemment l’information manquante dans ces données incomplètes. Le travail présenté dans cette thèse s’est déroulé au milieu d’une période de transformations importantes du domaine de l’apprentissage à réseaux neuronaux profonds qui a été déclenchée par la découverte de l’algorithme de “dropout” par Geoffrey Hinton. Dropout rend possible un entraînement purement supervisé d’architectures de propagation unidirectionnel sans être exposé au danger de sur- entraînement. Le troisième article présenté dans cette thèse introduit une nouvelle fonction d’activation spécialement con ̧cue pour aller avec l’algorithme de Dropout. Cette fonction d’activation, appelée maxout, permet l’utilisation de aggrégation multi-canal dans un contexte d’apprentissage purement supervisé. Nous démontrons comment plusieurs tâches de reconnaissance d’objets sont mieux accomplies par l’utilisation de maxout. Pour terminer, sont présentons un vrai cas d’utilisation dans l’industrie pour la transcription d’adresses de maisons à plusieurs chiffres. En combinant maxout avec une nouvelle sorte de couche de sortie pour des réseaux neuronaux de convolution, nous démontrons qu’il est possible d’atteindre un taux de succès comparable à celui des humains sur un ensemble de données coriace constitué de photos prises par les voitures de Google. Ce système a été déployé avec succès chez Google pour lire environ cent million d’adresses de maisons.The goal of this thesis is to present a few small steps along the road to solving general artificial intelligence. This is a thesis by articles containing four articles. Each of these articles presents a new method for performing perceptual inference using machine learning and deep architectures. Each of these papers demonstrates the utility of the proposed method in the context of a computer vision task. The methods are more generally applicable and in some cases have been applied to other kinds of tasks, but this thesis does not explore such applications. In the first article, we present two fast new variational inference algorithms for a generative model of images known as spike-and-slab sparse coding (S3C). These faster inference algorithms allow us to scale spike-and-slab sparse coding to unprecedented problem sizes and show that it is a superior feature extractor for object recognition tasks when very few labeled examples are available. We then build a new deep architecture, the partially-directed deep Boltzmann machine (PD- DBM) on top of the S3C model. This model was designed to simplify the training procedure for deep Boltzmann machines, which previously required a greedy layer-wise pretraining procedure. This model partially succeeds at solving this problem, but the cost of inference in the new model is high enough that it makes scaling the model to serious applications difficult. In the second article, we revisit the problem of jointly training deep Boltzmann machines. This time, rather than changing the model family, we present a new training criterion, resulting in multi-prediction deep Boltzmann machines (MP- DBMs). MP-DBMs may be trained in a single stage and obtain better classification accuracy than traditional DBMs. They also are able to classify well using standard variational inference techniques, rather than requiring a separate, specialized, discriminatively trained classifier to obtain good classification performance. However, this comes at the cost of the model not being able to generate good samples. The classification performance of deep Boltzmann machines is no longer especially interesting following recent advances in supervised learning, but the MP-DBM remains interesting because it can perform tasks that purely supervised models cannot, such as classification in the presence of missing inputs and imputation of missing inputs. The general zeitgeist of deep learning research changed dramatically during the midst of the work on this thesis with the introduction of Geoffrey Hinton’s dropout algorithm. Dropout permits purely supervised training of feedforward architectures with little overfitting. The third paper in this thesis presents a new activation function for feedforward neural networks which was explicitly designed to work well with dropout. This activation function, called maxout, makes it possible to learn architectures that leverage the benefits of cross-channel pooling in a purely supervised manner. We demonstrate improvements on several object recognition tasks using this activation function. Finally, we solve a real world task: transcription of photos of multi-digit house numbers for geo-coding. Using maxout units and a new kind of output layer for convolutional neural networks, we demonstrate human level accuracy (with limited coverage) on a challenging real-world dataset. This system has been deployed at Google and successfully used to transcribe nearly 100 million house numbers

    GuavaNet: A deep neural network architecture for automatic sensory evaluation to predict degree of acceptability for Guava by a consumer

    Get PDF
    This thesis is divided into two parts:Part I: Analysis of Fruits, Vegetables, Cheese and Fish based on Image Processing using Computer Vision and Deep Learning: A Review. It consists of a comprehensive review of image processing, computer vision and deep learning techniques applied to carry out analysis of fruits, vegetables, cheese and fish.This part also serves as a literature review for Part II.Part II: GuavaNet: A deep neural network architecture for automatic sensory evaluation to predict degree of acceptability for Guava by a consumer. This part introduces to an end-to-end deep neural network architecture that can predict the degree of acceptability by the consumer for a guava based on sensory evaluation

    Fine Art Pattern Extraction and Recognition

    Get PDF
    This is a reprint of articles from the Special Issue published online in the open access journal Journal of Imaging (ISSN 2313-433X) (available at: https://www.mdpi.com/journal/jimaging/special issues/faper2020)

    Spatio-temporal traffic anomaly detection for urban networks

    Get PDF
    Urban road networks are often affected by disruptions such as accidents and roadworks, giving rise to congestion and delays, which can, in turn, create a wide range of negative impacts to the economy, environment, safety and security. Accurate detection of the onset of traffic anomalies, specifically Recurrent Congestion (RC) and Nonrecurrent Congestion (NRC) in the traffic networks, is an important ITS function to facilitate proactive intervention measures to reduce the level of severity of congestion. A substantial body of literature is dedicated to models with varying levels of complexity that attempt to identify such anomalies. Given the complexity of the problem, however, very less effort is dedicated to the development of methods that attempt to detect traffic anomalies using spatio-temporal features. Driven both by the recent advances in deep learning techniques and the development of Traffic Incident Management Systems (TIMS), the aim of this research is to develop novel traffic anomaly detection models that can incorporate both spatial and temporal traffic information to detect traffic anomalies at a network level. This thesis first reviews the state of the art in traffic anomaly detection techniques, including the existing methods and emerging machine learning and deep learning methods, before identifying the gaps in the current understanding of traffic anomaly and its detection. One of the problems in terms of adapting the deep learning models to traffic anomaly detection is the translation of time series traffic data from multiple locations to the format necessary for the deep learning model to learn the spatial and temporal features effectively. To address this challenging problem and build a systematic traffic anomaly detection method at a network level, this thesis proposes a methodological framework consisting of (a) the translation layer (which is designed to translate the time series traffic data from multiple locations over the road network into a desired format with spatial and temporal features), (b) detection methods and (c) localisation. This methodological framework is subsequently tested for early RC detection and NRC detection. Three translation layers including connectivity matrix, geographical grid translation and spatial temporal translation are presented and evaluated for both RC and NRC detection. The early RC detection approach is a deep learning based method that combines Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM). The NRC detection, on the other hand, involves only the application of the CNN. The performance of the proposed approach is compared against other conventional congestion detection methods, using a comprehensive evaluation framework that includes metrics such as detection rates and false positive rates, and the sensitivity analysis of time windows as well as prediction horizons. The conventional congestion detection methods used for the comparison include Multilayer Perceptron, Random Forest and Gradient Boost Classifier, all of which are commonly used in the literature. Real-world traffic data from the City of Bath are used for the comparative analysis of RC, while traffic data in conjunction with incident data extracted from Central London are used for NRC detection. The results show that while the connectivity matrix may be capable of extracting features of a small network, the increased sparsity in the matrix in a large network reduces its effectiveness in feature learning compared to geographical grid translation. The results also indicate that the proposed deep learning method demonstrates superior detection accuracy compared to alternative methods and that it can detect recurrent congestion as early as one hour ahead with acceptable accuracy. The proposed method is capable of being implemented within a real-world ITS system making use of traffic sensor data, thereby providing a practically useful tool for road network managers to manage traffic proactively. In addition, the results demonstrate that a deep learning-based approach may improve the accuracy of incident detection and locate traffic anomalies precisely, especially in a large urban network. Finally, the framework is further tested for robustness in terms of network topology, sensor faults and missing data. The robustness analysis demonstrates that the proposed traffic anomaly detection approaches are transferable to different sizes of road networks, and that they are robust in the presence of sensor faults and missing data.Open Acces

    Stochastic Methods for Fine-Grained Image Segmentation and Uncertainty Estimation in Computer Vision

    Get PDF
    In this dissertation, we exploit concepts of probability theory, stochastic methods and machine learning to address three existing limitations of deep learning-based models for image understanding. First, although convolutional neural networks (CNN) have substantially improved the state of the art in image understanding, conventional CNNs provide segmentation masks that poorly adhere to object boundaries, a critical limitation for many potential applications. Second, training deep learning models requires large amounts of carefully selected and annotated data, but large-scale annotation of image segmentation datasets is often prohibitively expensive. And third, conventional deep learning models also lack the capability of uncertainty estimation, which compromises both decision making and model interpretability. To address these limitations, we introduce the Region Growing Refinement (RGR) algorithm, an unsupervised post-processing algorithm that exploits Monte Carlo sampling and pixel similarities to propagate high-confidence labels into regions of low-confidence classification. The probabilistic Region Growing Refinement (pRGR) provides RGR with a rigorous mathematical foundation that exploits concepts of Bayesian estimation and variance reduction techniques. Experiments demonstrate both the effectiveness of (p)RGR for the refinement of segmentation predictions, as well as its suitability for uncertainty estimation, since its variance estimates obtained in the Monte Carlo iterations are highly correlated with segmentation accuracy. We also introduce FreeLabel, an intuitive open-source web interface that exploits RGR to allow users to obtain high-quality segmentation masks with just a few freehand scribbles, in a matter of seconds. Designed to benefit the computer vision community, FreeLabel can be used for both crowdsourced or private annotation and has a modular structure that can be easily adapted for any image dataset. The practical relevance of methods developed in this dissertation are illustrated through applications on agricultural and healthcare-related domains. We have combined RGR and modern CNNs for fine segmentation of fruit flowers, motivated by the importance of automated bloom intensity estimation for optimization of fruit orchard management and, possibly, automatizing procedures such as flower thinning and pollination. We also exploited an early version of FreeLabel to annotate novel datasets for segmentation of fruit flowers, which are currently publicly available. Finally, this dissertation also describes works on fine segmentation and gaze estimation for images collected from assisted living environments, with the ultimate goal of assisting geriatricians in evaluating health status of patients in such facilities

    Image recognition, semantic segmentation and photo adjustment using deep neural networks

    Get PDF
    Deep Neural Networks (DNNs) have proven to be effective models for solving various problems in computer vision. Multi-Layer Perceptron Networks, Convolutional Neural Networks and Recurrent Neural Networks are representative examples of DNNs in the setting of supervised learning. The key ingredients in the successful development of DNN-based models include but not limited to task-specific designs of network architecture, discriminative feature representation learning and scalable training algorithms. In this thesis, we describe a collection of DNN-based models to address three challenging computer vision tasks, namely large-scale visual recognition, image semantic segmentation and automatic photo adjustment. For each task, the network architecture is carefully designed on the basis of the nature of the task. For large-scale visual recognition, we design a hierarchical Convolutional Neural Network to fully exploit a semantic hierarchy among visual categories. The resulting model can be deemed as an ensemble of specialized classifiers. We improve state-of-the-art results at an affordable increase of the computational cost. For image semantic segmentation, we integrate convolutional layers with novel spatially recurrent layers for incorporating global contexts into the prediction process. The resulting hybrid network is capable of learning improved feature representations, which lead to more accurate region recognition and boundary localization. Combined with a post-processing step involving a fully-connected conditional random field, our hybrid network achieves new state-of-the-art results on a large benchmark dataset. For automatic photo adjustment, we take a data-driven approach to learn the underlying color transforms from manually enhanced examples. We formulate the learning problem as a regression task, which can be approached with a Multi-Layer Perceptron network. We concatenate global contextual features, local contextual features as well as pixel-wise features and feed them into the deep network. State-of-the-art results are achieved on datasets with both global and local stylized adjustments
    • …
    corecore