424 research outputs found

    ImageNet Large Scale Visual Recognition Challenge

    Get PDF
    The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional reference

    An Overview of the Networking Issues of Cloud Gaming: A Literature Review

    Get PDF
    With the increasing prevalence of video games comes innovations that aim to evolve them. Cloud gaming is poised as the next phase of gaming. It enables users to play video games on any internet-enabled device. Such improvement could, therefore, enhance the processing power of existing devices and solve the need to spend large amounts of money on the latest gaming equipment. However, others argue that it may be far from being practically functional. Since cloud gaming places dependency on networks, new issues emerge. In relation, this paper is a review of the networking perspective of cloud gaming. Specifically, the paper analyzes its issues and challenges along with possible solutions. In order to accomplish the study, a literature review was performed. Results show that there are numerous issues and challenges regarding cloud gaming networks. Generally, cloud gaming has problems with its network quality of service (QoS) and quality of experience (QoE). The poor QoS and QoE of cloud gaming can be linked to unsatisfactory latency, bandwidth, delay, packet loss, and graphics quality. Moreover, the cost of providing the service and the complexity of implementing cloud gaming were considered challenges. For these issues and challenges, solutions were found. The solutions include lag or latency compensation, compression with encoding techniques, client computing power, edge computing, machine learning, frame adaption, and GPU-based server selection. However, these have limitations and may not always be applicable. Thus, even if solutions exist, it would be beneficial to analyze the networking side of cloud gaming further

    Diversity in Fashion Recommendation using Semantic Parsing

    Full text link
    Developing recommendation system for fashion images is challenging due to the inherent ambiguity associated with what criterion a user is looking at. Suggesting multiple images where each output image is similar to the query image on the basis of a different feature or part is one way to mitigate the problem. Existing works for fashion recommendation have used Siamese or Triplet network to learn features between a similar pair and a similar-dissimilar triplet respectively. However, these methods do not provide basic information such as, how two clothing images are similar, or which parts present in the two images make them similar. In this paper, we propose to recommend images by explicitly learning and exploiting part based similarity. We propose a novel approach of learning discriminative features from weakly-supervised data by using visual attention over the parts and a texture encoding network. We show that the learned features surpass the state-of-the-art in retrieval task on DeepFashion dataset. We then use the proposed model to recommend fashion images having an explicit variation with respect to similarity of any of the parts.Comment: 5 pages, ICIP2018, code: https://github.com/sagarverma/fashion_recommendation_stlst

    Edge Video Analytics: A Survey on Applications, Systems and Enabling Techniques

    Full text link
    Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. The basic concepts of EVA (e.g., definition, architectures) were not fully elucidated due to the rapid development of this domain. To fill these gaps, we provide a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.Comment: 31 pages, 13 figure

    Large Language Models for Networking: Applications, Enabling Techniques, and Challenges

    Full text link
    The rapid evolution of network technologies and the growing complexity of network tasks necessitate a paradigm shift in how networks are designed, configured, and managed. With a wealth of knowledge and expertise, large language models (LLMs) are one of the most promising candidates. This paper aims to pave the way for constructing domain-adapted LLMs for networking. Firstly, we present potential LLM applications for vertical network fields and showcase the mapping from natural language to network language. Then, several enabling technologies are investigated, including parameter-efficient finetuning and prompt engineering. The insight is that language understanding and tool usage are both required for network LLMs. Driven by the idea of embodied intelligence, we propose the ChatNet, a domain-adapted network LLM framework with access to various external network tools. ChatNet can reduce the time required for burdensome network planning tasks significantly, leading to a substantial improvement in efficiency. Finally, key challenges and future research directions are highlighted.Comment: 7 pages, 3 figures, 2 table

    Fast human activity recognition in lifelogging

    Get PDF
    This paper addresses the problem of fast Human Activity Recognition (HAR) in visual lifelogging. We identify the importance of visual features related to HAR and we specifically evaluate the HAR discrimination potential of Colour Histograms and Histogram of Oriented Gradients. In our evaluation we show that colour can be a low-cost and effective means of low-cost HAR when performing single-user classification. It is also noted that, while much more efficient, global image descriptors perform as well or better than local descriptors in our HAR experiments. We believe that both of these findings are due to the fact that a user’s lifelog is rich in reoccurring scenes and environments

    On the Audio-Visual Emotion Recognition using Convolutional Neural Networks and Extreme Learning Machine

    Get PDF
    The advances in artificial intelligence and machine learning concerning emotion recognition have been enormous and in previously inconceivable ways. Inspired by the promising evolution in human-computer interaction, this paper is based on developing a multimodal emotion recognition system. This research encompasses two modalities as input, namely speech and video. In the proposed model, the input video samples are subjected to image pre-processing and image frames are obtained. The signal is pre-processed and transformed into the frequency domain for the audio input. The aim is to obtain Mel-spectrogram, which is processed further as images. Convolutional neural networks are used for training and feature extraction for both audio and video with different configurations. The fusion of outputs from two CNNs is done using two extreme learning machines. For classification, the proposed system incorporates a support vector machine. The model is evaluated using three databases, namely eNTERFACE, RML, and SAVEE. For the eNTERFACE dataset, the accuracy obtained without and with augmentation was 87.2% and 94.91%, respectively. The RML dataset yielded an accuracy of 98.5%, and for the SAVEE dataset, the accuracy reached 97.77%. Results achieved from this research are an illustration of the fruitful exploration and effectiveness of the proposed system
    corecore