Search CORE

424 research outputs found

ImageNet Large Scale Visual Recognition Challenge

Author: A Geiger
A Torralba
Aditya Khosla
Alexander C. Berg
Andrej Karpathy
B Alexe
B Yao
C Liu
C Vondrick
DG Lowe
GA Miller
Hao Su
J Uijlings
Jia Deng
Jonathan Krause
K Crammer
KEA Sande van de
KEA Sande van de
Li Fei-Fei
M Everingham
M Everingham
Michael Bernstein
Olga Russakovsky
P Arbelaez
P Felzenszwalb
S Thorpe
Sanjeev Satheesh
Sean Ma
T Ahonen
Zhiheng Huang
Publication venue
Publication date: 01/01/2015
Field of study

The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL VOC (per-category comparisons in Table 3, distribution of localization difficulty in Fig 16), a list of queries used for obtaining object detection images (Appendix C), and some additional reference

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Carolina Digital Repository

An Overview of the Networking Issues of Cloud Gaming: A Literature Review

Author: Baldovino Britanny
Publication venue: 'Politeknik Negeri Cilacap'
Publication date: 01/12/2022
Field of study

With the increasing prevalence of video games comes innovations that aim to evolve them. Cloud gaming is poised as the next phase of gaming. It enables users to play video games on any internet-enabled device. Such improvement could, therefore, enhance the processing power of existing devices and solve the need to spend large amounts of money on the latest gaming equipment. However, others argue that it may be far from being practically functional. Since cloud gaming places dependency on networks, new issues emerge. In relation, this paper is a review of the networking perspective of cloud gaming. Specifically, the paper analyzes its issues and challenges along with possible solutions. In order to accomplish the study, a literature review was performed. Results show that there are numerous issues and challenges regarding cloud gaming networks. Generally, cloud gaming has problems with its network quality of service (QoS) and quality of experience (QoE). The poor QoS and QoE of cloud gaming can be linked to unsatisfactory latency, bandwidth, delay, packet loss, and graphics quality. Moreover, the cost of providing the service and the complexity of implementing cloud gaming were considered challenges. For these issues and challenges, solutions were found. The solutions include lag or latency compensation, compression with encoding techniques, client computing power, edge computing, machine learning, frame adaption, and GPU-based server selection. However, these have limitations and may not always be applicable. Thus, even if solutions exist, it would be beneficial to analyze the networking side of cloud gaming further

E-Journal Politeknik Negeri Cilacap

Directory of Open Access Journals

Diversity in Fashion Recommendation using Semantic Parsing

Author: Anand Sukhad
Arora Chetan
Rai Atul
Verma Sagar
Publication venue
Publication date: 18/10/2019
Field of study

Developing recommendation system for fashion images is challenging due to the inherent ambiguity associated with what criterion a user is looking at. Suggesting multiple images where each output image is similar to the query image on the basis of a different feature or part is one way to mitigate the problem. Existing works for fashion recommendation have used Siamese or Triplet network to learn features between a similar pair and a similar-dissimilar triplet respectively. However, these methods do not provide basic information such as, how two clothing images are similar, or which parts present in the two images make them similar. In this paper, we propose to recommend images by explicitly learning and exploiting part based similarity. We propose a novel approach of learning discriminative features from weakly-supervised data by using visual attention over the parts and a texture encoding network. We show that the learned features surpass the state-of-the-art in retrieval task on DeepFashion dataset. We then use the proposed model to recommend fashion images having an explicit variation with respect to similarity of any of the parts.Comment: 5 pages, ICIP2018, code: https://github.com/sagarverma/fashion_recommendation_stlst

arXiv.org e-Print Archive

Crossref

Edge Video Analytics: A Survey on Applications, Systems and Enabling Techniques

Author: Razavi Saiedeh
Xu Renjie
Zheng Rong
Publication venue
Publication date: 19/09/2023
Field of study

Video, as a key driver in the global explosion of digital information, can create tremendous benefits for human society. Governments and enterprises are deploying innumerable cameras for a variety of applications, e.g., law enforcement, emergency management, traffic control, and security surveillance, all facilitated by video analytics (VA). This trend is spurred by the rapid advancement of deep learning (DL), which enables more precise models for object classification, detection, and tracking. Meanwhile, with the proliferation of Internet-connected devices, massive amounts of data are generated daily, overwhelming the cloud. Edge computing, an emerging paradigm that moves workloads and services from the network core to the network edge, has been widely recognized as a promising solution. The resulting new intersection, edge video analytics (EVA), begins to attract widespread attention. Nevertheless, only a few loosely-related surveys exist on this topic. The basic concepts of EVA (e.g., definition, architectures) were not fully elucidated due to the rapid development of this domain. To fill these gaps, we provide a comprehensive survey of the recent efforts on EVA. In this paper, we first review the fundamentals of edge computing, followed by an overview of VA. The EVA system and its enabling techniques are discussed next. In addition, we introduce prevalent frameworks and datasets to aid future researchers in the development of EVA systems. Finally, we discuss existing challenges and foresee future research directions. We believe this survey will help readers comprehend the relationship between VA and edge computing, and spark new ideas on EVA.Comment: 31 pages, 13 figure

arXiv.org e-Print Archive

Large Language Models for Networking: Applications, Enabling Techniques, and Challenges

Author: Du Hongyang
Huang Tao
Huang Yudong
Kang Jiawen
Niyato Dusit
Wang Shuo
Xiong Zehui
Zhang Xinyuan
Publication venue
Publication date: 29/11/2023
Field of study

The rapid evolution of network technologies and the growing complexity of network tasks necessitate a paradigm shift in how networks are designed, configured, and managed. With a wealth of knowledge and expertise, large language models (LLMs) are one of the most promising candidates. This paper aims to pave the way for constructing domain-adapted LLMs for networking. Firstly, we present potential LLM applications for vertical network fields and showcase the mapping from natural language to network language. Then, several enabling technologies are investigated, including parameter-efficient finetuning and prompt engineering. The insight is that language understanding and tool usage are both required for network LLMs. Driven by the idea of embodied intelligence, we propose the ChatNet, a domain-adapted network LLM framework with access to various external network tools. ChatNet can reduce the time required for burdensome network planning tasks significantly, leading to a substantial improvement in efficiency. Finally, key challenges and future research directions are highlighted.Comment: 7 pages, 3 figures, 2 table

arXiv.org e-Print Archive

Fast human activity recognition in lifelogging

Author: A.B. Krueger
C. Gurrin
G. Varoquaux
J. Hamm
K.E.A. Sande van de
K.E.A. Sande van de
N. Caprani
P. Wang
S. Mann
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

This paper addresses the problem of fast Human Activity Recognition (HAR) in visual lifelogging. We identify the importance of visual features related to HAR and we specifically evaluate the HAR discrimination potential of Colour Histograms and Histogram of Oriented Gradients. In our evaluation we show that colour can be a low-cost and effective means of low-cost HAR when performing single-user classification. It is also noted that, while much more efficient, global image descriptors perform as well or better than local descriptors in our HAR experiments. We believe that both of these findings are due to the fact that a user’s lifelog is rich in reoccurring scenes and environments

Crossref

Irish Universities

DCU Online Research Access Service

On the Audio-Visual Emotion Recognition using Convolutional Neural Networks and Extreme Learning Machine

Author: Arifin Fatchul
Ashraf Arselan
Gunawan Teddy Surya
Habaebi Mohamed Hadi
Kartiwi Mira
Sophian Ali
Publication venue: IAES Indonesia Section
Publication date: 06/09/2022
Field of study

The advances in artificial intelligence and machine learning concerning emotion recognition have been enormous and in previously inconceivable ways. Inspired by the promising evolution in human-computer interaction, this paper is based on developing a multimodal emotion recognition system. This research encompasses two modalities as input, namely speech and video. In the proposed model, the input video samples are subjected to image pre-processing and image frames are obtained. The signal is pre-processed and transformed into the frequency domain for the audio input. The aim is to obtain Mel-spectrogram, which is processed further as images. Convolutional neural networks are used for training and feature extraction for both audio and video with different configurations. The fusion of outputs from two CNNs is done using two extreme learning machines. For classification, the proposed system incorporates a support vector machine. The model is evaluated using three databases, namely eNTERFACE, RML, and SAVEE. For the eNTERFACE dataset, the accuracy obtained without and with augmentation was 87.2% and 94.91%, respectively. The RML dataset yielded an accuracy of 98.5%, and for the SAVEE dataset, the accuracy reached 97.77%. Results achieved from this research are an illustration of the fruitful exploration and effectiveness of the proposed system

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)