Search CORE

2,609 research outputs found

Semantic Based Sport Video Browsing

Author: Xueming Qian
Publication venue: 'IntechOpen'
Publication date: 25/04/2012
Field of study

IntechOpen

A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector

Classification of Animal Sound Using Convolutional Neural Network

Author: Singh Neha
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2020
Field of study

Recently, labeling of acoustic events has emerged as an active topic covering a wide range of applications. High-level semantic inference can be conducted based on main audioeffects to facilitate various content-based applications for analysis, efficient recovery and content management. This paper proposes a flexible Convolutional neural network-based framework for animal audio classification. The work takes inspiration from various deep neural network developed for multimedia classification recently. The model is driven by the ideology of identifying the animal sound in the audio file by forcing the network to pay attention to core audio effect present in the audio to generate Mel-spectrogram. The designed framework achieves an accuracy of 98% while classifying the animal audio on weekly labelled datasets. The state-of-the-art in this research is to build a framework which could even run on the basic machine and do not necessarily require high end devices to run the classification

Arrow@TUDublin

Visual-aural attention modeling for talk show video highlight detection

Author: Gao Wen
Huang Qingming
Jiang Shuqiang
Zheng Yijia
Zhu Guangyu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

In this paper, we propose a visual-aural attention modeling based video content analysis approach, which can be used to automatically detect the highlights of the popular TV program - talk show video. First, the visual and aural affective features are extracted to represent and model the human attention of highlight. For efficiency consideration, the adopted affective features are kept as few as possible. Then, a specific fusion strategy called ordinal-decision is used to combine the visual, aural attention models and form the attention curve for a video. This curve can reflect the change of human attention while watching TV. Finally, highlight segments are located at the peaks of the attention curve. Moreover, sentence boundary detection is used to refine the highlight boundaries in order to keep the segments' integrality and fluency. This framework is extensible and flexible in integrating more affective features with a variety of fusion schemes. Experimental results demonstrate our proposed visual-aural attention analysis approach is effective for talk show video highlight detection. ?2008 IEEE.EI

Crossref

Deep-Learning-Based Computer Vision Approach For The Segmentation Of Ball Deliveries And Tracking In Cricket

Author: Abbas Kumail
Ahmed Khandakar
Khan M. Imad
Saeed Muhammad
Wang Hua
Publication venue
Publication date: 21/11/2022
Field of study

There has been a significant increase in the adoption of technology in cricket recently. This trend has created the problem of duplicate work being done in similar computer vision-based research works. Our research tries to solve one of these problems by segmenting ball deliveries in a cricket broadcast using deep learning models, MobileNet and YOLO, thus enabling researchers to use our work as a dataset for their research. The output from our research can be used by cricket coaches and players to analyze ball deliveries which are played during the match. This paper presents an approach to segment and extract video shots in which only the ball is being delivered. The video shots are a series of continuous frames that make up the whole scene of the video. Object detection models are applied to reach a high level of accuracy in terms of correctly extracting video shots. The proof of concept for building large datasets of video shots for ball deliveries is proposed which paves the way for further processing on those shots for the extraction of semantics. Ball tracking in these video shots is also done using a separate RetinaNet model as a sample of the usefulness of the proposed dataset. The position on the cricket pitch where the ball lands is also extracted by tracking the ball along the y-axis. The video shot is then classified as a full-pitched, good-length or short-pitched delivery

arXiv.org e-Print Archive

Learning efficient temporal information in deep networks: From the viewpoints of applications and modeling

Author: Mac Cu Khoi-Nguyen
Publication venue
Publication date: 01/12/2021
Field of study

With the introduction of deep learning, machine learning has dominated several technology areas, giving birth to high-performance applications that can even challenge human-level accuracy. However, the complexity of deep models is also exploding as a by-product of the revolution of machine learning. Such enormous model complexity has raised the new challenge of improving the efficiency in deep models to reduce deployment expense, especially for systems with high throughput demands or devices with limited power. The dissertation aims to improve the efficiency of temporal-sensitive deep models in four different directions. First, we develop a bandwidth extension mapping to avoid deploying multiple speech recognition systems corresponding to wideband and narrowband signals. Second, we apply a multi-modality approach to compensate for the performance of an excitement scoring system, where the input video sequences are aggressively down-sampled to reduce throughput. Third, we formulate the motion feature in the feature space by directly inducing the temporal information from intermediate layers of deep networks instead of relying on an additional optical flow stream. Finally, we model a spatiotemporal sampling network inspired by the human visual perception mechanism to reduce input frames and regions adaptively

Illinois Digital Environment for Access to Learning and Scholarship Repository

CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

Author: Boujemaa Nozha
Compañó Ramón
Dosch Christoph
Geurts Joost
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Sebe Nicu
Publication venue: Chorus Project Consortium
Publication date: 01/01/2007
Field of study

Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive