12 research outputs found

    Comparison of Visual Datasets for Machine Learning

    Get PDF
    One of the greatest technological improvements in recent years is the rapid progress using machine learning for processing visual data. Among all factors that contribute to this development, datasets with labels play crucial roles. Several datasets are widely reused for investigating and analyzing different solutions in machine learning. Many systems, such as autonomous vehicles, rely on components using machine learning for recognizing objects. This paper compares different visual datasets and frameworks for machine learning. The comparison is both qualitative and quantitative and investigates object detection labels with respect to size, location, and contextual information. This paper also presents a new approach creating datasets using real-time, geo-tagged visual data, greatly improving the contextual information of the data. The data could be automatically labeled by cross-referencing information from other sources (such as weather)

    Image complexity based fMRI-BOLD visual network categorization across visual datasets using topological descriptors and deep-hybrid learning

    Full text link
    This study proposes a new approach that investigates differences in topological characteristics of visual networks, which are constructed using fMRI BOLD time-series corresponding to visual datasets of COCO, ImageNet, and SUN. A publicly available BOLD5000 dataset is utilized that contains fMRI scans while viewing 5254 images of diverse complexities. The objective of this study is to examine how network topology differs in response to distinct visual stimuli from these visual datasets. To achieve this, 0- and 1-dimensional persistence diagrams are computed for each visual network representing COCO, ImageNet, and SUN. For extracting suitable features from topological persistence diagrams, K-means clustering is executed. The extracted K-means cluster features are fed to a novel deep-hybrid model that yields accuracy in the range of 90%-95% in classifying these visual networks. To understand vision, this type of visual network categorization across visual datasets is important as it captures differences in BOLD signals while perceiving images with different contexts and complexities. Furthermore, distinctive topological patterns of visual network associated with each dataset, as revealed from this study, could potentially lead to the development of future neuroimaging biomarkers for diagnosing visual processing disorders like visual agnosia or prosopagnosia, and tracking changes in visual cognition over time

    Referential communication in heterogeneous communities of pre-trained visual deep networks

    Full text link
    As large pre-trained image-processing neural networks are being embedded in autonomous agents such as self-driving cars or robots, the question arises of how such systems can communicate with each other about the surrounding world, despite their different architectures and training regimes. As a first step in this direction, we systematically explore the task of \textit{referential communication} in a community of heterogeneous state-of-the-art pre-trained visual networks, showing that they can develop, in a self-supervised way, a shared protocol to refer to a target object among a set of candidates. This shared protocol can also be used, to some extent, to communicate about previously unseen object categories of different granularity. Moreover, a visual network that was not initially part of an existing community can learn the community's protocol with remarkable ease. Finally, we study, both qualitatively and quantitatively, the properties of the emergent protocol, providing some evidence that it is capturing high-level semantic features of objects

    URBAN TRAFFIC FLOW ANALYSIS BASED ON DEEP LEARNING CAR DETECTION FROM CCTV IMAGE SERIES

    Get PDF
    Abstract. Traffic flow analysis is fundamental for urban planning and management of road traffic infrastructure. Automatic number plate recognition (ANPR) systems are conventional methods for vehicle detection and travel times estimation. However, such systems are specifically focused on car plates, providing a limited extent of road users. The advance of open-source deep learning convolutional neural networks (CNN) in combination with freely-available closed-circuit television (CCTV) datasets have offered the opportunities for detection and classification of various road users. The research, presented here, aims to analyse traffic flow patterns through fine-tuning pre-trained CNN models on domain-specific low quality imagery, as captured in various weather conditions and seasons of the year 2018. Such imagery is collected from the North East Combined Authority (NECA) Travel and Transport Data, Newcastle upon Tyne, UK. Results show that the fine-tuned MobileNet model with 98.2% precision, 58.5% recall and 73.4% harmonic mean could potentially be used for a real time traffic monitoring application with big data, due to its fast performance. Compared to MobileNet, the fine-tuned Faster region proposal R-CNN model, providing a better harmonic mean (80.4%), recall (68.8%) and more accurate estimations of car units, could be used for traffic analysis applications that demand higher accuracy than speed. This research ultimately exploits machine learning alogrithms for a wider understanding of traffic congestion and disruption under social events and extreme weather conditions. Document type: Articl

    Implementation of Deep Learning Algorithm on Embedded Device

    Get PDF
    Bakalářská práce se zabývá implementací inferenčního modelu, založeného na metodách hlubokého učení na embedded zařízení. V první části je provedena rešerše strojového a následně hlubokého učení a některých používaných state-of-the-art metod. V další části se práce zabývá výběrem nejlepšího vhodného hardware. Na konci kapitoly jsou podle výsledků vybrány pro implementaci Jetson Nano a Raspberry Pi. Dále je vytvořen vlastní dataset s třídami pro detekci bonbonů Maoam a na jeho základě potom vytrénován pomocí transfer learning inferenční model. Ten je potom použit při sestavení vlastní aplikace na detekci objektů, která je implementována na Jetson Nano a Raspberry Pi. Výsledky jsou vyhodnoceny a jsou naznačeny další možná budoucí vylepšení.This thesis deals with the implementation of inference model, based on the methods of deep learning, on embedded device. First, machine learning and deep learning methods are researched with emphasis on state-of-the-art techniques. Next, the best suitable hardware had to be selected. In the conclusion, two devices are chosen: Jetson Nano and Raspberry Pi. Then the custom dataset, consisting of three classes of candies, was created and used for training custom inference model through the transfer learning technique. Model is later used in the application, capable of object detection. Application is implemented on Jetson Nano and Raspberry Pi and then evaluated.

    URBAN TRAFFIC FLOW ANALYSIS BASED ON DEEP LEARNING CAR DETECTION FROM CCTV IMAGE SERIES

    Get PDF
    Traffic flow analysis is fundamental for urban planning and management of road traffic infrastructure. Automatic number plate recognition (ANPR) systems are conventional methods for vehicle detection and travel times estimation. However, such systems are specifically focused on car plates, providing a limited extent of road users. The advance of open-source deep learning convolutional neural networks (CNN) in combination with freely-available closed-circuit television (CCTV) datasets have offered the opportunities for detection and classification of various road users. The research, presented here, aims to analyse traffic flow patterns through fine-tuning pre-trained CNN models on domain-specific low quality imagery, as captured in various weather conditions and seasons of the year 2018. Such imagery is collected from the North East Combined Authority (NECA) Travel and Transport Data, Newcastle upon Tyne, UK. Results show that the fine-tuned MobileNet model with 98.2 % precision, 58.5 % recall and 73.4 % harmonic mean could potentially be used for a real time traffic monitoring application with big data, due to its fast performance. Compared to MobileNet, the fine-tuned Faster region proposal R-CNN model, providing a better harmonic mean (80.4 %), recall (68.8 %) and more accurate estimations of car units, could be used for traffic analysis applications that demand higher accuracy than speed. This research ultimately exploits machine learning alogrithms for a wider understanding of traffic congestion and disruption under social events and extreme weather conditions

    Semantic Video Quality Assessment

    Get PDF
    The increasing availability of high-speed internet connections, the increase in smartphone usability and also the ubiquity of social networking, all combined, help to create a great diversity of User-Generated Content (UGC). Along with this expansion, Ultra High Definition (UHD) broadcast technology has been developing rapidly since its beginning. This created the need to distinguish between good and bad quality videos. The best way to assess the quality of a video is through the human eye. However, given the amount of content it becomes quite impractical. Therefore, computational methods are used. These methods try to assess it as close as possible to what would be assessed by the human vision. The semantics of a video is the meaning of the video itself and using this information, an idea of what the video is about can be provided, helping even in the assessment of a video. Having that in mind, this thesis uses a video collection and a news articles collection in order to extract the information regarding the objects in the scene and the terms in the news. The similarity between both information is taken into consideration to assess the quality o the videos. In this way, the assessment is done using semantic information. The main contributions of this work are the video quality assessment based on semantic information and an evaluation of a set of object detection algorithms used for semantic extraction in videos
    corecore