65 research outputs found
Recommended from our members
MAC-REALM: A video content feature extraction and modelling framework
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.A consequence of the ‘data deluge’ is the exponential increase in digital video footage, while the ability to find relevant video clips diminishes. Traditional text based search engines are no longer optimal for searching, as they cannot provide a granular search of the content inside video footage. To be able to search the video in a content based manner, the content features of the video need to be extracted and modelled into a content model, which can then act as a searchable proxy for the video content. This thesis focuses on the extraction of syntactic and semantic content features and content modelling, using machine driven processes, with either little or no user interaction. Our abstract framework design extracts syntactic and semantic content features and compiles them into an integrated content model. The framework integrates a four plane strategy that consists of a pre-processing plane that removes redundant data and filters the media to improve the feature extraction properties of the media; a syntactic feature extraction plane that extracts low level syntactic feature and mid-level syntactic features that have semantic attributes; a semantic relationship analysis and linkage plane, where the spatial and temporal relationships of all the content features are defined, and finally a content modelling stage where the syntactic and semantic content features are integrated into a content model. Each of the four planes can be split into three layers namely, the content layer, where the content to be processed is stored; the application layer, where the content is converted into content descriptions, and the MPEG-7 layer, where content descriptions are serialised. Using MPEG-7 standards to produce the content model will provide wide-ranging interoperability, while facilitating granular multi-content type searches. The framework is aiming to ‘bridge’ the semantic gap, by integrating the syntactic and semantic content features from extraction through to modelling. The design of the framework has been implemented into a prototype called MAC-REALM, which has been tested and evaluated for its effectiveness to extract and model content features. Conclusions are drawn about the research output as a whole and whether they have met the objectives. Finally, future work is presented on how concept detection and crowd sourcing can be used with MAC-REALM
Entrega de conteúdos multimédia em over-the-top: caso de estudo das gravações automáticas
Doutoramento em Engenharia EletrotécnicaOver-The-Top (OTT) multimedia delivery is a very appealing approach for providing
ubiquitous,
exible, and globally accessible services capable of low-cost
and unrestrained device targeting. In spite of its appeal, the underlying delivery
architecture must be carefully planned and optimized to maintain a high Qualityof-
Experience (QoE) and rational resource usage, especially when migrating from
services running on managed networks with established quality guarantees. To address
the lack of holistic research works on OTT multimedia delivery systems, this
Thesis focuses on an end-to-end optimization challenge, considering a migration
use-case of a popular Catch-up TV service from managed IP Television (IPTV)
networks to OTT. A global study is conducted on the importance of Catch-up
TV and its impact in today's society, demonstrating the growing popularity of
this time-shift service, its relevance in the multimedia landscape, and tness as
an OTT migration use-case. Catch-up TV consumption logs are obtained from
a Pay-TV operator's live production IPTV service containing over 1 million subscribers
to characterize demand and extract insights from service utilization at a
scale and scope not yet addressed in the literature. This characterization is used
to build demand forecasting models relying on machine learning techniques to enable
static and dynamic optimization of OTT multimedia delivery solutions, which
are able to produce accurate bandwidth and storage requirements' forecasts, and
may be used to achieve considerable power and cost savings whilst maintaining a
high QoE. A novel caching algorithm, Most Popularly Used (MPU), is proposed,
implemented, and shown to outperform established caching algorithms in both
simulation and experimental scenarios. The need for accurate QoE measurements
in OTT scenarios supporting HTTP Adaptive Streaming (HAS) motivates the creation
of a new QoE model capable of taking into account the impact of key HAS
aspects. By addressing the complete content delivery pipeline in the envisioned
content-aware OTT Content Delivery Network (CDN), this Thesis demonstrates
that signi cant improvements are possible in next-generation multimedia delivery
solutions.A entrega de conteúdos multimédia em Over-The-Top (OTT) e uma proposta
atractiva para fornecer um serviço flexível e globalmente acessível, capaz de alcançar qualquer dispositivo, com uma promessa de baixos custos. Apesar das suas vantagens, e necessario um planeamento arquitectural detalhado e optimizado para manter níveis elevados de Qualidade de Experiência (QoE), em particular aquando da migração dos serviços suportados em redes geridas com garantias de qualidade pré-estabelecidas. Para colmatar a falta de trabalhos de investigação na área de sistemas de entrega de conteúdos multimédia em OTT, esta Tese foca-se na optimização destas soluções como um todo, partindo do caso de uso de migração de um serviço popular de Gravações Automáticas suportado em redes de Televisão sobre IP (IPTV) geridas, para um cenário de entrega em OTT. Um estudo global para aferir a importância das Gravações Automáticas revela a sua relevância no panorama de serviços multimédia e a sua adequação enquanto caso de uso de
migração para cenários OTT. São obtidos registos de consumos de um serviço
de produção de Gravações Automáticas, representando mais de 1 milhão de assinantes,
para caracterizar e extrair informação de consumos numa escala e âmbito
não contemplados ate a data na literatura. Esta caracterização e utilizada para
construir modelos de previsão de carga, tirando partido de sistemas de machine
learning, que permitem optimizações estáticas e dinâmicas dos sistemas de entrega
de conteúdos em OTT através de previsões das necessidades de largura de banda e
armazenamento, potenciando ganhos significativos em consumo energético e custos.
Um novo mecanismo de caching, Most Popularly Used (MPU), demonstra um
desempenho superior as soluções de referencia, quer em cenários de simulação quer
experimentais. A necessidade de medição exacta da QoE em streaming adaptativo
HTTP motiva a criaçao de um modelo capaz de endereçar aspectos específicos
destas tecnologias adaptativas. Ao endereçar a cadeia completa de entrega através
de uma arquitectura consciente dos seus conteúdos, esta Tese demonstra que são
possíveis melhorias de desempenho muito significativas nas redes de entregas de
conteúdos em OTT de próxima geração
A bag of words description scheme for image quality assessment
Every day millions of images are obtained, processed, compressed, saved, transmitted and reproduced.
All these operations can cause distortions that affect their quality. The quality of
these images should be measured subjectively. However, that brings the disadvantage of achieving
a considerable number of tests with individuals requested to provide a statistical analysis of
an image’s perceptual quality. Several objective metrics have been developed, that try to model
the human perception of quality. However, in most applications the representation of human
quality perception given by these metrics is far from the desired representation. Therefore,
this work proposes the usage of machine learning models that allow for a better approximation.
In this work, definitions for image and quality are given and some of the difficulties of the study
of image quality are mentioned. Moreover, three metrics are initially explained. One uses the
image’s original quality has a reference (SSIM) while the other two are no reference (BRISQUE
and QAC). A comparison is made, showing a large discrepancy of values between the two kinds
of metrics.
The database that is used for the tests is TID2013. This database was chosen due to its dimension
and by the fact of considering a large number of distortions. A study of each type of distortion
in this database is made.
Furthermore, some concepts of machine learning are introduced along with algorithms relevant
in the context of this dissertation, notably, K-means, KNN and SVM. Description aggregator
algorithms like “bag of words” and “fisher-vectors” are also mentioned.
This dissertation studies a new model that combines machine learning and a quality metric for
quality estimation. This model is based on the division of images in cells, where a specific
metric is computed. With this division, it is possible to obtain local quality descriptors that will
be aggregated using “bag of words”. A SVM with an RBF kernel is trained and tested on the same
database and the results of the model are evaluated using cross-validation.
The results are analysed using Pearson, Spearman and Kendall correlations and the RMSE to
evaluate the representation of the model when compared with the subjective results. The
model improves the results of the metric that was used and shows a new path to apply machine
learning for quality evaluation.No nosso dia-a-dia as imagens são obtidas, processadas, comprimidas, guardadas, transmitidas
e reproduzidas. Em qualquer destas operações podem ocorrer distorções que prejudicam a sua
qualidade. A qualidade destas imagens pode ser medida de forma subjectiva, o que tem a
desvantagem de serem necessários vários testes, a um número considerável de indivíduos para
ser feita uma análise estatística da qualidade perceptual de uma imagem. Foram desenvolvidas
várias métricas objectivas, que de alguma forma tentam modelar a percepção humana de
qualidade. Todavia, em muitas aplicações a representação de percepção de qualidade humana
dada por estas métricas fica aquém do desejável, razão porque se propõe neste trabalho usar
modelos de reconhecimento de padrões que permitam uma maior aproximação.
Neste trabalho, são dadas definições para imagem e qualidade e algumas das dificuldades do
estudo da qualidade de imagem são referidas. É referida a importância da qualidade de imagem
como ramo de estudo, e são estudadas diversas métricas de qualidade.
São explicadas três métricas, uma delas que usa a qualidade original como referência (SSIM) e
duas métricas sem referência (BRISQUE e QAC). Uma comparação é feita entre elas, mostrando-
– se uma grande discrepância de valores entre os dois tipos de métricas.
Para os testes feitos é usada a base de dados TID2013, que é muitas vezes considerada para
estudos de qualidade de métricas devido à sua dimensão e ao facto de considerar um grande
número de distorções. Neste trabalho também se fez um estudo dos tipos de distorção incluidos
nesta base de dados e como é que eles são simulados.
São introduzidos também alguns conceitos teóricos de reconhecimento de padrões e alguns
algoritmos relevantes no contexto da dissertação, são descritos como o K-means, KNN e as
SVMs. Algoritmos de agregação de descritores como o “bag of words” e o “fisher-vectors”
também são referidos.
Esta dissertação adiciona métodos de reconhecimento de padrões a métricas objectivas de qua–
lidade de imagem. Uma nova técnica é proposta, baseada na divisão de imagens em células, nas
quais uma métrica será calculada. Esta divisão permite obter descritores locais de qualidade
que serão agregados usando “bag of words”. Uma SVM com kernel RBF é treinada e testada na
mesma base de dados e os resultados do modelo são mostrados usando cross-validation.
Os resultados são analisados usando as correlações de Pearson, Spearman e Kendall e o RMSE
que permitem avaliar a proximidade entre a métrica desenvolvida e os resultados subjectivos.
Este modelo melhora os resultados obtidos com a métrica usada e demonstra uma nova forma
de aplicar modelos de reconhecimento de padrões ao estudo de avaliação de qualidade
Real-time neural network based video super-resolution as a service: design and implementation of a real-time video super-resolution service using public cloud services
Despite the advancements in video streaming, we still find limitations when there is the necessity to stream real-time video in a higher resolution (e.g., in super- resolution) through mobile devices with limited resources. This thesis work aims to give an option to address this challenge through a cloud service.
There were two main code components to create this service. The first component was aiortc (e.g., the WebRTC python version), the streaming protocol. The second component was the Efficient Sub-Pixel Convolutional Neural Network (ESPCN)-model, one of the outstanding methods to upscale video at the present time. These two code components were implemented in a virtual machine in the Microsoft Azure cloud environment with a customized configuration.
Qualitative as well as quantitative results of this work were obtained and analyzed. To obtain the qualitative results two versions of the ESPCN-model were developed and for the quantitative outcomes three different configurations of HW/SW codecs and CPU/GPU utilization were produced and analyzed.
Besides finding and defining the code components mentioned before as optimal to create an efficient real-time video super-resolution service based on the cloud, an- other conclusion of this project is that sending or receiving information (frames) from the CPU to the GPU and vice-versa has a very big negative impact in the efficiency of the whole service. Hence, to limit this CPU-GPU interaction or to only use GPU (e.g., with the NVIDIA Virtual Processing Framework [VPF]) is critical for an efficient service. This issue can be avoided, as the quantitative results show, if a codec that only makes use of the GPU (e.g., a NVIDIA HW codec) is employed. Furthermore, the Azure cloud environment component, enables an efficient execution of the service in diverse mobile devices. In future, the quality measure of the video super-resolution done by the ESPCN- model is suggested as a next step to do
Monte Carlo Method with Heuristic Adjustment for Irregularly Shaped Food Product Volume Measurement
Volume measurement plays an important role in the production and processing of food products. Various methods have been
proposed to measure the volume of food products with irregular shapes based on 3D reconstruction. However, 3D reconstruction
comes with a high-priced computational cost. Furthermore, some of the volume measurement methods based on 3D reconstruction
have a low accuracy. Another method for measuring volume of objects uses Monte Carlo method. Monte Carlo method performs
volume measurements using random points. Monte Carlo method only requires information regarding whether random points
fall inside or outside an object and does not require a 3D reconstruction. This paper proposes volume measurement using a
computer vision system for irregularly shaped food products without 3D reconstruction based on Monte Carlo method with
heuristic adjustment. Five images of food product were captured using five cameras and processed to produce binary images.
Monte Carlo integration with heuristic adjustment was performed to measure the volume based on the information extracted from
binary images. The experimental results show that the proposed method provided high accuracy and precision compared to the
water displacement method. In addition, the proposed method is more accurate and faster than the space carving method
Data-driven visual quality estimation using machine learning
Heutzutage werden viele visuelle Inhalte erstellt und sind zugänglich, was auf Verbesserungen der Technologie wie Smartphones und das Internet zurückzuführen ist. Es ist daher notwendig, die von den Nutzern wahrgenommene Qualität zu bewerten, um das Erlebnis weiter zu verbessern. Allerdings sind nur wenige der aktuellen Qualitätsmodelle speziell für höhere Auflösungen konzipiert, sagen mehr als nur den Mean Opinion Score vorher oder nutzen maschinelles Lernen. Ein Ziel dieser Arbeit ist es, solche maschinellen Modelle für höhere Auflösungen mit verschiedenen Datensätzen zu trainieren und zu evaluieren. Als Erstes wird eine objektive Analyse der Bildqualität bei höheren Auflösungen durchgeführt. Die Bilder wurden mit Video-Encodern komprimiert, hierbei weist AV1 die beste Qualität und Kompression auf. Anschließend werden die Ergebnisse eines Crowd-Sourcing-Tests mit einem Labortest bezüglich Bildqualität verglichen. Weiterhin werden auf Deep Learning basierende Modelle für die Vorhersage von Bild- und Videoqualität beschrieben. Das auf Deep Learning basierende Modell ist aufgrund der benötigten Ressourcen für die Vorhersage der Videoqualität in der Praxis nicht anwendbar. Aus diesem Grund werden pixelbasierte Videoqualitätsmodelle vorgeschlagen und ausgewertet, die aussagekräftige Features verwenden, welche Bild- und Bewegungsaspekte abdecken. Diese Modelle können zur Vorhersage von Mean Opinion Scores für Videos oder sogar für anderer Werte im Zusammenhang mit der Videoqualität verwendet werden, wie z.B. einer Bewertungsverteilung. Die vorgestellte Modellarchitektur kann auf andere Videoprobleme angewandt werden, wie z.B. Videoklassifizierung, Vorhersage der Qualität von Spielevideos, Klassifikation von Spielegenres oder der Klassifikation von Kodierungsparametern. Ein wichtiger Aspekt ist auch die Verarbeitungszeit solcher Modelle. Daher wird ein allgemeiner Ansatz zur Beschleunigung von State-of-the-Art-Videoqualitätsmodellen vorgestellt, der zeigt, dass ein erheblicher Teil der Verarbeitungszeit eingespart werden kann, während eine ähnliche Vorhersagegenauigkeit erhalten bleibt. Die Modelle sind als Open Source veröffentlicht, so dass die entwickelten Frameworks für weitere Forschungsarbeiten genutzt werden können. Außerdem können die vorgestellten Ansätze als Bausteine für neuere Medienformate verwendet werden.Today a lot of visual content is accessible and produced, due to improvements in technology such as smartphones and the internet. This results in a need to assess the quality perceived by users to further improve the experience. However, only a few of the state-of-the-art quality models are specifically designed for higher resolutions, predict more than mean opinion score, or use machine learning. One goal of the thesis is to train and evaluate such machine learning models of higher resolutions with several datasets. At first, an objective evaluation of image quality in case of higher resolutions is performed. The images are compressed using video encoders, and it is shown that AV1 is best considering quality and compression. This evaluation is followed by the analysis of a crowdsourcing test in comparison with a lab test investigating image quality. Afterward, deep learning-based models for image quality prediction and an extension for video quality are proposed. However, the deep learning-based video quality model is not practically usable because of performance constrains. For this reason, pixel-based video quality models using well-motivated features covering image and motion aspects are proposed and evaluated. These models can be used to predict mean opinion scores for videos, or even to predict other video quality-related information, such as a rating distributions. The introduced model architecture can be applied to other video problems, such as video classification, gaming video quality prediction, gaming genre classification or encoding parameter estimation. Furthermore, one important aspect is the processing time of such models. Hence, a generic approach to speed up state-of-the-art video quality models is introduced, which shows that a significant amount of processing time can be saved, while achieving similar prediction accuracy. The models have been made publicly available as open source so that the developed frameworks can be used for further research. Moreover, the presented approaches may be usable as building blocks for newer media formats
AXMEDIS 2008
The AXMEDIS International Conference series aims to explore all subjects and topics related to cross-media and digital-media content production, processing, management, standards, representation, sharing, protection and rights management, to address the latest developments and future trends of the technologies and their applications, impacts and exploitation. The AXMEDIS events offer venues for exchanging concepts, requirements, prototypes, research ideas, and findings which could contribute to academic research and also benefit business and industrial communities. In the Internet as well as in the digital era, cross-media production and distribution represent key developments and innovations that are fostered by emergent technologies to ensure better value for money while optimising productivity and market coverage
Persönliche Wege der Interaktion mit multimedialen Inhalten
Today the world of multimedia is almost completely device- and content-centered. It focuses it’s energy nearly exclusively on technical issues such as computing power, network specifics or content and device characteristics and capabilities. In most multimedia systems, the presentation of multimedia content and the basic controls for playback are main issues. Because of this, a very passive user experience, comparable to that of traditional TV, is most often provided. In the face of recent developments and changes in the realm of multimedia and mass media, this ”traditional” focus seems outdated. The increasing use of multimedia content on mobile devices, along with the continuous growth in the amount and variety of content available, make necessary an urgent re-orientation of this domain. In order to highlight the depth of the increasingly difficult situation faced by users of such systems, it is only logical that these individuals be brought to the center of attention. In this thesis we consider these trends and developments by applying concepts and mechanisms to multimedia systems that were first introduced in the domain of usercentrism. Central to the concept of user-centrism is that devices should provide users with an easy way to access services and applications. Thus, the current challenge is to combine mobility, additional services and easy access in a single and user-centric approach. This thesis presents a framework for introducing and supporting several of the key concepts of user-centrism in multimedia systems. Additionally, a new definition of a user-centric multimedia framework has been developed and implemented. To satisfy the user’s need for mobility and flexibility, our framework makes possible seamless media and service consumption. The main aim of session mobility is to help people cope with the increasing number of different devices in use. Using a mobile agent system, multimedia sessions can be transferred between different devices in a context-sensitive way. The use of the international standard MPEG-21 guarantees extensibility and the integration of content adaptation mechanisms. Furthermore, a concept is presented that will allow for individualized and personalized selection and face the need for finding appropriate content. All of which can be done, using this approach, in an easy and intuitive way. Especially in the realm of television, the demand that such systems cater to the need of the audience is constantly growing. Our approach combines content-filtering methods, state-of-the-art classification techniques and mechanisms well known from the area of information retrieval and text mining. These are all utilized for the generation of recommendations in a promising new way. Additionally, concepts from the area of collaborative tagging systems are also used. An extensive experimental evaluation resulted in several interesting findings and proves the applicability of our approach. In contrast to the ”lean-back” experience of traditional media consumption, interactive media services offer a solution to make possible the active participation of the audience. Thus, we present a concept which enables the use of interactive media services on mobile devices in a personalized way. Finally, a use case for enriching TV with additional content and services demonstrates the feasibility of this concept.Die heutige Welt der Medien und der multimedialen Inhalte ist nahezu ausschließlich inhalts- und geräteorientiert. Im Fokus verschiedener Systeme und Entwicklungen stehen oft primär die Art und Weise der Inhaltspräsentation und technische Spezifika, die meist geräteabhängig sind. Die zunehmende Menge und Vielfalt an multimedialen Inhalten und der verstärkte Einsatz von mobilen Geräten machen ein Umdenken bei der Konzeption von Multimedia Systemen und Frameworks dringend notwendig. Statt an eher starren und passiven Konzepten, wie sie aus dem TV Umfeld bekannt sind, festzuhalten, sollte der Nutzer in den Fokus der multimedialen Konzepte rücken. Um dem Nutzer im Umgang mit dieser immer komplexeren und schwierigen Situation zu helfen, ist ein Umdenken im grundlegenden Paradigma des Medienkonsums notwendig. Durch eine Fokussierung auf den Nutzer kann der beschriebenen Situation entgegengewirkt werden. In der folgenden Arbeit wird auf Konzepte aus dem Bereich Nutzerzentrierung zurückgegriffen, um diese auf den Medienbereich zu übertragen und sie im Sinne einer stärker nutzerspezifischen und nutzerorientierten Ausrichtung einzusetzen. Im Fokus steht hierbei der TV-Bereich, wobei die meisten Konzepte auch auf die allgemeine Mediennutzung übertragbar sind. Im Folgenden wird ein Framework für die Unterstützung der wichtigsten Konzepte der Nutzerzentrierung im Multimedia Bereich vorgestellt. Um dem Trend zur mobilen Mediennutzung Sorge zu tragen, ermöglicht das vorgestellte Framework die Nutzung von multimedialen Diensten und Inhalten auf und über die Grenzen verschiedener Geräte und Netzwerke hinweg (Session mobility). Durch die Nutzung einer mobilen Agentenplattform in Kombination mit dem MPEG-21 Standard konnte ein neuer und flexibel erweiterbarer Ansatz zur Mobilität von Benutzungssitzungen realisiert werden. Im Zusammenhang mit der stetig wachsenden Menge an Inhalten und Diensten stellt diese Arbeit ein Konzept zur einfachen und individualisierten Selektion und dem Auffinden von interessanten Inhalten und Diensten in einer kontextspezifischen Weise vor. Hierbei werden Konzepte und Methoden des inhaltsbasierten Filterns, aktuelle Klassifikationsmechanismen und Methoden aus dem Bereich des ”Textminings” in neuer Art und Weise in einem Multimedia Empfehlungssystem eingesetzt. Zusätzlich sind Methoden des Web 2.0 in eine als Tag-basierte kollaborative Komponente integriert. In einer umfassenden Evaluation wurde sowohl die Umsetzbarkeit als auch der Mehrwert dieser Komponente demonstriert. Eine aktivere Beteiligung im Medienkonsum ermöglicht unsere iTV Komponente. Sie unterstützt das Anbieten und die Nutzung von interaktiven Diensten, begleitend zum Medienkonsum, auf mobilen Geräten. Basierend auf einem Szenario zur Anreicherung von TV Sendungen um interaktive Dienste konnte die Umsetzbarkeit dieses Konzepts demonstriert werden
A comparison of statistical machine learning methods in heartbeat detection and classification
In health care, patients with heart problems require quick responsiveness in a clinical setting or in the operating theatre. Towards that end, automated classification of heartbeats is vital as some heartbeat irregularities are time consuming to detect. Therefore, analysis of electro-cardiogram (ECG) signals is an active area of research. The methods proposed in the literature depend on the structure of a heartbeat cycle. In this paper, we use interval and amplitude based features together with a few samples from the ECG signal as a feature vector. We studied a variety of classification algorithms focused especially on a type of arrhythmia known as the ventricular ectopic fibrillation (VEB). We compare the performance of the classifiers against algorithms proposed in the literature and make recommendations regarding features, sampling rate, and choice of the classifier to apply in a real-time clinical setting. The extensive study is based on the MIT-BIH arrhythmia database. Our main contribution is the evaluation of existing classifiers over a range sampling rates, recommendation of a detection methodology to employ in a practical setting, and extend the notion of a mixture of experts to a larger class of algorithms
- …