437 research outputs found
Cognition-Based Networks: A New Perspective on Network Optimization Using Learning and Distributed Intelligence
IEEE Access
Volume 3, 2015, Article number 7217798, Pages 1512-1530
Open Access
Cognition-based networks: A new perspective on network optimization using learning and distributed intelligence (Article)
Zorzi, M.a , Zanella, A.a, Testolin, A.b, De Filippo De Grazia, M.b, Zorzi, M.bc
a Department of Information Engineering, University of Padua, Padua, Italy
b Department of General Psychology, University of Padua, Padua, Italy
c IRCCS San Camillo Foundation, Venice-Lido, Italy
View additional affiliations
View references (107)
Abstract
In response to the new challenges in the design and operation of communication networks, and taking inspiration from how living beings deal with complexity and scalability, in this paper we introduce an innovative system concept called COgnition-BAsed NETworkS (COBANETS). The proposed approach develops around the systematic application of advanced machine learning techniques and, in particular, unsupervised deep learning and probabilistic generative models for system-wide learning, modeling, optimization, and data representation. Moreover, in COBANETS, we propose to combine this learning architecture with the emerging network virtualization paradigms, which make it possible to actuate automatic optimization and reconfiguration strategies at the system level, thus fully unleashing the potential of the learning approach. Compared with the past and current research efforts in this area, the technical approach outlined in this paper is deeply interdisciplinary and more comprehensive, calling for the synergic combination of expertise of computer scientists, communications and networking engineers, and cognitive scientists, with the ultimate aim of breaking new ground through a profound rethinking of how the modern understanding of cognition can be used in the management and optimization of telecommunication network
Rule-Based combination of video quality metrics
Lately, several algorithms have been proposed to automatically estimate the quality of video sequences, even some have been included in international standards. However, the majority only provide high performance under particular conditions and with certain types of degradations. Therefore, some proposals have been presented setting out the combination of various quality metrics to improve the performance and the range of application. In this paper, a rule-based combination of standardized metrics is presented, in contrast to most of these type of approaches based on combinational models. The proposed system consists of a first stage in which the type of degradation affecting the video quality is identified to be caused by coding impairments or transmission errors. Then, the most appropriate metric for that distortion is applied. Specifically, VQM and VQuad have been considered for coding and transmission distortions, respectively. The results show that the overall performance is better than using the quality metrics individually
A Survey of Machine Learning Techniques for Video Quality Prediction from Quality of Delivery Metrics
A growing number of video streaming networks are incorporating machine learning (ML) applications. The growth of video streaming services places enormous pressure on network and video content providers who need to proactively maintain high levels of video quality. ML has been applied to predict the quality of video streams. Quality of delivery (QoD) measurements, which capture the end-to-end performances of network services, have been leveraged in video quality prediction. The drive for end-to-end encryption, for privacy and digital rights management, has brought about a lack of visibility for operators who desire insights from video quality metrics. In response, numerous solutions have been proposed to tackle the challenge of video quality prediction from QoD-derived metrics. This survey provides a review of studies that focus on ML techniques for predicting the QoD metrics in video streaming services. In the context of video quality measurements, we focus on QoD metrics, which are not tied to a particular type of video streaming service. Unlike previous reviews in the area, this contribution considers papers published between 2016 and 2021. Approaches for predicting QoD for video are grouped under the following headings: (1) video quality prediction under QoD impairments, (2) prediction of video quality from encrypted video streaming traffic, (3) predicting the video quality in HAS applications, (4) predicting the video quality in SDN applications, (5) predicting the video quality in wireless settings, and (6) predicting the video quality in WebRTC applications. Throughout the survey, some research challenges and directions in this area are discussed, including (1) machine learning over deep learning; (2) adaptive deep learning for improved video delivery; (3) computational cost and interpretability; (4) self-healing networks and failure recovery. The survey findings reveal that traditional ML algorithms are the most widely adopted models for solving video quality prediction problems. This family of algorithms has a lot of potential because they are well understood, easy to deploy, and have lower computational requirements than deep learning techniques
AudioGen: Textually Guided Audio Generation
We tackle the problem of generating audio samples conditioned on descriptive
text captions. In this work, we propose AaudioGen, an auto-regressive
generative model that generates audio samples conditioned on text inputs.
AudioGen operates on a learnt discrete audio representation. The task of
text-to-audio generation poses multiple challenges. Due to the way audio
travels through a medium, differentiating ``objects'' can be a difficult task
(e.g., separating multiple people simultaneously speaking). This is further
complicated by real-world recording conditions (e.g., background noise,
reverberation, etc.). Scarce text annotations impose another constraint,
limiting the ability to scale models. Finally, modeling high-fidelity audio
requires encoding audio at high sampling rate, leading to extremely long
sequences. To alleviate the aforementioned challenges we propose an
augmentation technique that mixes different audio samples, driving the model to
internally learn to separate multiple sources. We curated 10 datasets
containing different types of audio and text annotations to handle the scarcity
of text-audio data points. For faster inference, we explore the use of
multi-stream modeling, allowing the use of shorter sequences while maintaining
a similar bitrate and perceptual quality. We apply classifier-free guidance to
improve adherence to text. Comparing to the evaluated baselines, AudioGen
outperforms over both objective and subjective metrics. Finally, we explore the
ability of the proposed method to generate audio continuation conditionally and
unconditionally. Samples: https://tinyurl.com/audiogen-text2audi
An Effective Ultrasound Video Communication System Using Despeckle Filtering and HEVC
The recent emergence of the high-efficiency video coding (HEVC) standard promises to deliver significant bitrate savings over current and prior video compression standards, while also supporting higher resolutions that can meet the clinical acquisition spatiotemporal settings. The effective application of HEVC to medical ultrasound necessitates a careful evaluation of strict clinical criteria that guarantee that clinical quality will not be sacrificed in the compression process. Furthermore, the potential use of despeckle filtering prior to compression provides for the possibility of significant additional bitrate savings that have not been previously considered. This paper provides a thorough comparison of the use of MPEG-2, H.263, MPEG-4, H.264/AVC, and HEVC for compressing atherosclerotic plaque ultrasound videos. For the comparisons, we use both subjective and objective criteria based on plaque structure and motion. For comparable clinical video quality, experimental evaluation on ten videos demonstrates that HEVC reduces bitrate requirements by as much as 33.2% compared to H.264/AVC and up to 71% compared to MPEG-2. The use of despeckle filtering prior to compression is also investigated as a method that can reduce bitrate requirements through the removal of higher frequency components without sacrificing clinical quality. Based on the use of three despeckle filtering methods with both H.264/AVC and HEVC, we find that prior filtering can yield additional significant bitrate savings. The best performing despeckle filter (DsFlsmv) achieves bitrate savings of 43.6% and 39.2% compared to standard nonfiltered HEVC and H.264/AVC encoding, respectively
Video Quality Prediction for Video over Wireless Access Networks (UMTS and WLAN)
Transmission of video content over wireless access networks (in particular, Wireless Local
Area Networks (WLAN) and Third Generation Universal Mobile Telecommunication System (3G UMTS)) is growing exponentially and gaining popularity, and is predicted to expose new revenue streams for mobile network operators. However, the success of these video applications over wireless access networks very much depend on meeting the user’s Quality of Service (QoS) requirements. Thus, it is highly desirable to be able to predict and, if appropriate, to control video quality to meet user’s QoS requirements. Video quality is
affected by distortions caused by the encoder and the wireless access network. The impact of these distortions is content dependent, but this feature has not been widely used in existing
video quality prediction models.
The main aim of the project is the development of novel and efficient models for video
quality prediction in a non-intrusive way for low bitrate and resolution videos and to
demonstrate their application in QoS-driven adaptation schemes for mobile video streaming
applications. This led to five main contributions of the thesis as follows:(1) A thorough understanding of the relationships between video quality, wireless access network (UMTS and WLAN) parameters (e.g. packet/block loss, mean burst length
and link bandwidth), encoder parameters (e.g. sender bitrate, frame rate) and content type is provided. An understanding of the relationships and interactions between them
and their impact on video quality is important as it provides a basis for the development of non-intrusive video quality prediction models.(2) A new content classification method was proposed based on statistical tools as content
type was found to be the most important parameter.
(3) Efficient regression-based and artificial neural network-based learning models were
developed for video quality prediction over WLAN and UMTS access networks. The
models are light weight (can be implemented in real time monitoring), provide a measure for user perceived quality, without time consuming subjective tests. The models have potential applications in several other areas, including QoS control and
optimization in network planning and content provisioning for network/service
providers.(4) The applications of the proposed regression-based models were investigated in (i)
optimization of content provisioning and network resource utilization and (ii) A new
fuzzy sender bitrate adaptation scheme was presented at the sender side over WLAN and UMTS access networks.
(5) Finally, Internet-based subjective tests that captured distortions caused by the encoder
and the wireless access network for different types of contents were designed. The
database of subjective results has been made available to research community as there is a lack of subjective video quality assessment databases.Partially sponsored by EU FP7 ADAMANTIUM Project (EU Contract 214751
Recommended from our members
End-to-end 3D video communication over heterogeneous networks
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Three-dimensional technology, more commonly referred to as 3D technology, has revolutionised many fields including entertainment, medicine, and communications to name a few. In addition to 3D films, games, and sports channels, 3D perception has made tele-medicine a reality. By the year 2015, 30% of the all HD panels at home will be 3D enabled, predicted by consumer electronics manufacturers. Stereoscopic cameras, a comparatively mature technology compared to other 3D systems, are now being used by ordinary citizens to produce 3D content and share at a click of a button just like they do with the 2D counterparts via sites like YouTube. But technical challenges still exist, including with autostereoscopic multiview displays. 3D content requires many complex considerations--including how to represent it, and deciphering what is the best compression format--when considering transmission or storage, because of its increased amount of data. Any decision must be taken in the light of the available bandwidth or storage capacity, quality and user expectations. Free viewpoint navigation also remains partly unsolved. The most pressing issue getting in the way of widespread uptake of consumer 3D systems is the ability to deliver 3D content to heterogeneous consumer displays over the heterogeneous networks. Optimising 3D video communication solutions must consider the entire pipeline, starting with optimisation at the video source to the end display and transmission optimisation. Multi-view offers the most compelling solution for 3D videos with motion parallax and freedom from wearing headgear for 3D video perception. Optimising multi-view video for delivery and display could increase the demand for true 3D in the consumer market. This thesis focuses on an end-to-end quality optimisation in 3D video communication/transmission, offering solutions for optimisation at the compression, transmission, and decoder levels.Brunel University - Isambard Research Scholarshi
Beyond the pixels: learning and utilising video compression features for localisation of digital tampering.
Video compression is pervasive in digital society. With rising usage of deep convolutional neural networks (CNNs) in the fields of computer vision, video analysis and video tampering detection, it is important to investigate how patterns invisible to human eyes may be influencing modern computer vision techniques and how they can be used advantageously. This work thoroughly explores how video compression influences accuracy of CNNs and shows how optimal performance is achieved when compression levels in the training set closely match those of the test set. A novel method is then developed, using CNNs, to derive compression features directly from the pixels of video frames. It is then shown that these features can be readily used to detect inauthentic video content with good accuracy across multiple different video tampering techniques. Moreover, the ability to explain these features allows predictions to be made about their effectiveness against future tampering methods. The problem is motivated with a novel investigation into recent video manipulation methods, which shows that there is a consistent drive to produce convincing, photorealistic, manipulated or synthetic video. Humans, blind to the presence of video tampering, are also blind to the type of tampering. New detection techniques are required and, in order to compensate for human limitations, they should be broadly applicable to multiple tampering types. This thesis details the steps necessary to develop and evaluate such techniques
Structure-Constrained Basis Pursuit for Compressively Sensing Speech
Compressed Sensing (CS) exploits the sparsity of many signals to enable sampling below the Nyquist rate. If the original signal is sufficiently sparse, the Basis Pursuit (BP) algorithm will perfectly reconstruct the original signal. Unfortunately many signals that intuitively appear sparse do not meet the threshold for sufficient sparsity . These signals require so many CS samples for accurate reconstruction that the advantages of CS disappear. This is because Basis Pursuit/Basis Pursuit Denoising only models sparsity. We developed a Structure-Constrained Basis Pursuit that models the structure of somewhat sparse signals as upper and lower bound constraints on the Basis Pursuit Denoising solution. We applied it to speech, which seems sparse but does not compress well with CS, and gained improved quality over Basis Pursuit Denoising. When a single parameter (i.e. the phone) is encoded, Normalized Mean Squared Error (NMSE) decreases by between 16.2% and 1.00% when sampling with CS between 1/10 and 1/2 the Nyquist rate, respectively. When bounds are coded as a sum of Gaussians, NMSE decreases between 28.5% and 21.6% in the same range. SCBP can be applied to any somewhat sparse signal with a predictable structure to enable improved reconstruction quality with the same number of samples
Investigating the Effects of Network Dynamics on Quality of Delivery Prediction and Monitoring for Video Delivery Networks
Video streaming over the Internet requires an optimized delivery system given the advances in network architecture, for example, Software Defined Networks. Machine Learning (ML) models have been deployed in an attempt to predict the quality of the video streams. Some of these efforts have considered the prediction of Quality of Delivery (QoD) metrics of the video stream in an effort to measure the quality of the video stream from the network perspective. In most cases, these models have either treated the ML algorithms as black-boxes or failed to capture the network dynamics of the associated video streams.
This PhD investigates the effects of network dynamics in QoD prediction using ML techniques. The hypothesis that this thesis investigates is that ML techniques that model the underlying network dynamics achieve accurate QoD and video quality predictions and measurements. The thesis results demonstrate that the proposed techniques offer performance gains over approaches that fail to consider network dynamics. This thesis results highlight that adopting the correct model by modelling the dynamics of the network infrastructure is crucial to the accuracy of the ML predictions. These results are significant as they demonstrate that improved performance is achieved at no additional computational or storage cost. These techniques can help the network manager, data center operatives and video service providers take proactive and corrective actions for improved network efficiency and effectiveness
- …