1,351 research outputs found

    A Survey of Deep Learning-Based Object Detection

    Get PDF
    Object detection is one of the most important and challenging branches of computer vision, which has been widely applied in peoples life, such as monitoring security, autonomous driving and so on, with the purpose of locating instances of semantic objects of a certain class. With the rapid development of deep learning networks for detection tasks, the performance of object detectors has been greatly improved. In order to understand the main development status of object detection pipeline, thoroughly and deeply, in this survey, we first analyze the methods of existing typical detection models and describe the benchmark datasets. Afterwards and primarily, we provide a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors. Moreover, we list the traditional and new applications. Some representative branches of object detection are analyzed as well. Finally, we discuss the architecture of exploiting these object detection methods to build an effective and efficient system and point out a set of development trends to better follow the state-of-the-art algorithms and further research.Comment: 30 pages,12 figure

    Unveiling the frontiers of deep learning: innovations shaping diverse domains

    Full text link
    Deep learning (DL) enables the development of computer models that are capable of learning, visualizing, optimizing, refining, and predicting data. In recent years, DL has been applied in a range of fields, including audio-visual data processing, agriculture, transportation prediction, natural language, biomedicine, disaster management, bioinformatics, drug design, genomics, face recognition, and ecology. To explore the current state of deep learning, it is necessary to investigate the latest developments and applications of deep learning in these disciplines. However, the literature is lacking in exploring the applications of deep learning in all potential sectors. This paper thus extensively investigates the potential applications of deep learning across all major fields of study as well as the associated benefits and challenges. As evidenced in the literature, DL exhibits accuracy in prediction and analysis, makes it a powerful computational tool, and has the ability to articulate itself and optimize, making it effective in processing data with no prior training. Given its independence from training data, deep learning necessitates massive amounts of data for effective analysis and processing, much like data volume. To handle the challenge of compiling huge amounts of medical, scientific, healthcare, and environmental data for use in deep learning, gated architectures like LSTMs and GRUs can be utilized. For multimodal learning, shared neurons in the neural network for all activities and specialized neurons for particular tasks are necessary.Comment: 64 pages, 3 figures, 3 table

    Affective image content analysis: two decades review and new perspectives

    Get PDF

    Affective Image Content Analysis: Two Decades Review and New Perspectives

    Get PDF
    Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-of-the-art methods with respect to three main challenges -- the affective gap, perception subjectivity, and label noise and absence. We begin with an introduction to the key emotion representation models that have been widely employed in AICA and description of available datasets for performing evaluation with quantitative comparison of label noise and dataset bias. We then summarize and compare the representative approaches on (1) emotion feature extraction, including both handcrafted and deep features, (2) learning methods on dominant emotion recognition, personalized emotion prediction, emotion distribution learning, and learning from noisy data or few labels, and (3) AICA based applications. Finally, we discuss some challenges and promising research directions in the future, such as image content and context understanding, group emotion clustering, and viewer-image interaction.Comment: Accepted by IEEE TPAM

    Fuzzy Logic in Surveillance Big Video Data Analysis: Comprehensive Review, Challenges, and Research Directions

    Get PDF
    CCTV cameras installed for continuous surveillance generate enormous amounts of data daily, forging the term “Big Video Data” (BVD). The active practice of BVD includes intelligent surveillance and activity recognition, among other challenging tasks. To efficiently address these tasks, the computer vision research community has provided monitoring systems, activity recognition methods, and many other computationally complex solutions for the purposeful usage of BVD. Unfortunately, the limited capabilities of these methods, higher computational complexity, and stringent installation requirements hinder their practical implementation in real-world scenarios, which still demand human operators sitting in front of cameras to monitor activities or make actionable decisions based on BVD. The usage of human-like logic, known as fuzzy logic, has been employed emerging for various data science applications such as control systems, image processing, decision making, routing, and advanced safety-critical systems. This is due to its ability to handle various sources of real world domain and data uncertainties, generating easily adaptable and explainable data-based models. Fuzzy logic can be effectively used for surveillance as a complementary for huge-sized artificial intelligence models and tiresome training procedures. In this paper, we draw researchers’ attention towards the usage of fuzzy logic for surveillance in the context of BVD. We carry out a comprehensive literature survey of methods for vision sensory data analytics that resort to fuzzy logic concepts. Our overview highlights the advantages, downsides, and challenges in existing video analysis methods based on fuzzy logic for surveillance applications. We enumerate and discuss the datasets used by these methods, and finally provide an outlook towards future research directions derived from our critical assessment of the efforts invested so far in this exciting field

    Bootstrap–CURE: A novel clustering approach for sensor data: an application to 3D printing industry

    Get PDF
    The agenda of Industry 4.0 highlights smart manufacturing by making machines smart enough to make data-driven decisions. Large-scale 3D printers, being one of the important pillars in Industry 4.0, are equipped with smart sensors to continuously monitor print processes and make automated decisions. One of the biggest challenges in decision autonomy is to consume data quickly along the process and extract knowledge from the printer, suitable for improving the printing process. This paper presents the innovative unsupervised learning approach, bootstrap–CURE, to decode the sensor patterns and operation modes of 3D printers by analyzing multivariate sensor data. An automatic technique to detect the suitable number of clusters using the dendrogram is developed. The proposed methodology is scalable and significantly reduces computational cost as compared to classical CURE. A distinct combination of the 3D printer’s sensors is found, and its impact on the printing process is also discussed. A real application is presented to illustrate the performance and usefulness of the proposal. In addition, a new state of the art for sensor data analysis is presented.This work was supported in part by KEMLG-at-IDEAI (UPC) under Grant SGR-2017-574 from the Catalan government.Peer ReviewedPostprint (published version
    • …
    corecore