3,686 research outputs found

    A Heuristic Neural Network Structure Relying on Fuzzy Logic for Images Scoring

    Get PDF
    Traditional deep learning methods are sub-optimal in classifying ambiguity features, which often arise in noisy and hard to predict categories, especially, to distinguish semantic scoring. Semantic scoring, depending on semantic logic to implement evaluation, inevitably contains fuzzy description and misses some concepts, for example, the ambiguous relationship between normal and probably normal always presents unclear boundaries (normal − more likely normal - probably normal). Thus, human error is common when annotating images. Differing from existing methods that focus on modifying kernel structure of neural networks, this study proposes a dominant fuzzy fully connected layer (FFCL) for Breast Imaging Reporting and Data System (BI-RADS) scoring and validates the universality of this proposed structure. This proposed model aims to develop complementary properties of scoring for semantic paradigms, while constructing fuzzy rules based on analyzing human thought patterns, and to particularly reduce the influence of semantic conglutination. Specifically, this semantic-sensitive defuzzier layer projects features occupied by relative categories into semantic space, and a fuzzy decoder modifies probabilities of the last output layer referring to the global trend. Moreover, the ambiguous semantic space between two relative categories shrinks during the learning phases, as the positive and negative growth trends of one category appearing among its relatives were considered. We first used the Euclidean Distance (ED) to zoom in the distance between the real scores and the predicted scores, and then employed two sample t test method to evidence the advantage of the FFCL architecture. Extensive experimental results performed on the CBIS-DDSM dataset show that our FFCL structure can achieve superior performances for both triple and multiclass classification in BI-RADS scoring, outperforming the state-of-the-art methods

    A Deep Spatio-Temporal Fuzzy Neural Network for Passenger Demand Prediction

    Get PDF
    In spite of its importance, passenger demand prediction is a highly challenging problem, because the demand is simultaneously influenced by the complex interactions among many spatial and temporal factors and other external factors such as weather. To address this problem, we propose a Spatio-TEmporal Fuzzy neural Network (STEF-Net) to accurately predict passenger demands incorporating the complex interactions of all known important factors. We design an end-to-end learning framework with different neural networks modeling different factors. Specifically, we propose to capture spatio-temporal feature interactions via a convolutional long short-term memory network and model external factors via a fuzzy neural network that handles data uncertainty significantly better than deterministic methods. To keep the temporal relations when fusing two networks and emphasize discriminative spatio-temporal feature interactions, we employ a novel feature fusion method with a convolution operation and an attention layer. As far as we know, our work is the first to fuse a deep recurrent neural network and a fuzzy neural network to model complex spatial-temporal feature interactions with additional uncertain input features for predictive learning. Experiments on a large-scale real-world dataset show that our model achieves more than 10% improvement over the state-of-the-art approaches.Comment: https://epubs.siam.org/doi/abs/10.1137/1.9781611975673.1

    Hierarchical Attention Network for Action Segmentation

    Full text link
    The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the capacity to effectively map the temporal relationships in between the frames as they only capture a limited span of temporal dependencies. To this end we propose a complete end-to-end supervised learning approach that can better learn relationships between actions over time, thus improving the overall segmentation performance. The proposed hierarchical recurrent attention framework analyses the input video at multiple temporal scales, to form embeddings at frame level and segment level, and perform fine-grained action segmentation. This generates a simple, lightweight, yet extremely effective architecture for segmenting continuous video streams and has multiple application domains. We evaluate our system on multiple challenging public benchmark datasets, including MERL Shopping, 50 salads, and Georgia Tech Egocentric datasets, and achieves state-of-the-art performance. The evaluated datasets encompass numerous video capture settings which are inclusive of static overhead camera views and dynamic, ego-centric head-mounted camera views, demonstrating the direct applicability of the proposed framework in a variety of settings.Comment: Published in Pattern Recognition Letter

    A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community

    Full text link
    In recent years, deep learning (DL), a re-branding of neural networks (NNs), has risen to the top in numerous areas, namely computer vision (CV), speech recognition, natural language processing, etc. Whereas remote sensing (RS) possesses a number of unique challenges, primarily related to sensors and applications, inevitably RS draws from many of the same theories as CV; e.g., statistics, fusion, and machine learning, to name a few. This means that the RS community should be aware of, if not at the leading edge of, of advancements like DL. Herein, we provide the most comprehensive survey of state-of-the-art RS DL research. We also review recent new developments in the DL field that can be used in DL for RS. Namely, we focus on theories, tools and challenges for the RS community. Specifically, we focus on unsolved challenges and opportunities as it relates to (i) inadequate data sets, (ii) human-understandable solutions for modelling physical phenomena, (iii) Big Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and learning algorithms for spectral, spatial and temporal data, (vi) transfer learning, (vii) an improved theoretical understanding of DL systems, (viii) high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote Sensin

    Applied deep learning in intelligent transportation systems and embedding exploration

    Get PDF
    Deep learning techniques have achieved tremendous success in many real applications in recent years and show their great potential in many areas including transportation. Even though transportation becomes increasingly indispensable in people’s daily life, its related problems, such as traffic congestion and energy waste, have not been completely solved, yet some problems have become even more critical. This dissertation focuses on solving the following fundamental problems: (1) passenger demand prediction, (2) transportation mode detection, (3) traffic light control, in the transportation field using deep learning. The dissertation also extends the application of deep learning to an embedding system for visualization and data retrieval. The first part of this dissertation is about a Spatio-TEmporal Fuzzy neural Network (STEF-Net) which accurately predicts passenger demand by incorporating the complex interaction of all known important factors, such as temporal, spatial and external information. Specifically, a convolutional long short-term memory network is employed to simultaneously capture spatio-temporal feature interaction, and a fuzzy neural network to model external factors. A novel feature fusion method with convolution and an attention layer is proposed to keep the temporal relation and discriminative spatio-temporal feature interaction. Experiments on a large-scale real-world dataset show the proposed model outperforms the state-of-the-art approaches. The second part is a light-weight and energy-efficient system which detects transportation modes using only accelerometer sensors in smartphones. Understanding people’s transportation modes is beneficial to many civilian applications, such as urban transportation planning. The system collects accelerometer data in an efficient way and leverages a convolutional neural network to determine transportation modes. Different architectures and classification methods are tested with the proposed convolutional neural network to optimize the system design. Performance evaluation shows that the proposed approach achieves better accuracy than existing work in detecting people’s transportation modes. The third component of this dissertation is a deep reinforcement learning model, based on Q learning, to control the traffic light. Existing inefficient traffic light control causes numerous problems, such as long delay and waste of energy. In the proposed model, the complex traffic scenario is quantified as states by collecting data and dividing the whole intersection into grids. The timing changes of a traffic light are the actions, which are modeled as a high-dimension Markov decision process. The reward is the cumulative waiting time difference between two cycles. To solve the model, a convolutional neural network is employed to map states to rewards, which is further optimized by several components, such as dueling network, target network, double Q-learning network, and prioritized experience replay. The simulation results in Simulation of Urban MObility (SUMO) show the efficiency of the proposed model in controlling traffic lights. The last part of this dissertation studies the hierarchical structure in an embedding system. Traditional embedding approaches associate a real-valued embedding vector with each symbol or data point, which generates storage-inefficient representation and fails to effectively encode the internal semantic structure of data. A regularized autoencoder framework is proposed to learn compact Hierarchical K-way D-dimensional (HKD) discrete embedding of data points, aiming at capturing semantic structures of data. Experimental results on synthetic and real-world datasets show that the proposed HKD embedding can effectively reveal the semantic structure of data via visualization and greatly reduce the search space of nearest neighbor retrieval while preserving high accuracy

    A multilevel paradigm for deep convolutional neural network features selection with an application to human gait recognition

    Get PDF
    Human gait recognition (HGR) shows high importance in the area of video surveillance due to remote access and security threats. HGR is a technique commonly used for the identification of human style in daily life. However, many typical situations like change of clothes condition and variation in view angles degrade the system performance. Lately, different machine learning (ML) techniques have been introduced for video surveillance which gives promising results among which deep learning (DL) shows best performance in complex scenarios. In this article, an integrated framework is proposed for HGR using deep neural network and fuzzy entropy controlled skewness (FEcS) approach. The proposed technique works in two phases: In the first phase, deep convolutional neural network (DCNN) features are extracted by pre-trained CNN models (VGG19 and AlexNet) and their information is mixed by parallel fusion approach. In the second phase, entropy and skewness vectors are calculated from fused feature vector (FV) to select best subsets of features by suggested FEcS approach. The best subsets of picked features are finally fed to multiple classifiers and finest one is chosen on the basis of accuracy value. The experiments were carried out on four well-known datasets, namely, AVAMVG gait, CASIA A, B and C. The achieved accuracy of each dataset was 99.8, 99.7, 93.3 and 92.2%, respectively. Therefore, the obtained overall recognition results lead to conclude that the proposed system is very promising

    Fuzzy Layered Convolution Neutral Network for Feature Level Fusion Based On Multimodal Sentiment Classification

    Get PDF
    Multimodal sentiment analysis (MSA) is one of the core research topics of natural language processing (NLP). MSA has become a challenge for scholars and is equally complicated for an appliance to comprehend. One study that supports MS difficulties is the MSA, which is learning opinions, emotions, and attitudes in an audio-visual format. In order words, using such diverse modalities to obtain opinions and identify emotions is necessary. Such utilization can be achieved via modality data fusion, such as feature fusion. In handling the data fusion of such diverse modalities while obtaining high performance, a typical machine learning algorithm is Deep Learning (DL), particularly the Convolutional Neutral Network (CNN), which has the capacity to handle tasks of great intricacy and difficulty. In this paper, we present a CNN architecture with an integrated layer via fuzzy methodologies for MSA, a task yet to be explored in improving the accuracy performance of CNN for diverse inputs. Experiments conducted on a benchmark multimodal dataset, MOSI, obtaining 37.5% and 81% on seven (7) class and binary classification respectively, reveals an improved accuracy performance compared with the typical CNN, which acquired 28.9% and 78%, respectively
    corecore