24 research outputs found

    Scalable image quality assessment with 2D mel-cepstrum and machine learning approach

    Get PDF
    Cataloged from PDF version of article.Measurement of image quality is of fundamental importance to numerous image and video processing applications. Objective image quality assessment (IQA) is a two-stage process comprising of the following: (a) extraction of important information and discarding the redundant one, (b) pooling the detected features using appropriate weights. These two stages are not easy to tackle due to the complex nature of the human visual system (HVS). In this paper, we first investigate image features based on two-dimensional (20) mel-cepstrum for the purpose of IQA. It is shown that these features are effective since they can represent the structural information, which is crucial for IQA. Moreover, they are also beneficial in a reduced-reference scenario where only partial reference image information is used for quality assessment. We address the second issue by exploiting machine learning. In our opinion, the well established methodology of machine learning/pattern recognition has not been adequately used for IQA so far; we believe that it will be an effective tool for feature pooling since the required weights/parameters can be determined in a more convincing way via training with the ground truth obtained according to subjective scores. This helps to overcome the limitations of the existing pooling methods, which tend to be over simplistic and lack theoretical justification. Therefore, we propose a new metric by formulating IQA as a pattern recognition problem. Extensive experiments conducted using six publicly available image databases (totally 3211 images with diverse distortions) and one video database (with 78 video sequences) demonstrate the effectiveness and efficiency of the proposed metric, in comparison with seven relevant existing metrics. (C) 2011 Elsevier Ltd. All rights reserved

    Statistical Approaches for Binary and Categorical Data Modeling

    Get PDF
    Nowadays a massive amount of data is generated as the development of technology and services has accelerated. Therefore, the demand for data clustering in order to gain knowledge has increased in many sectors such as medical sciences, risk assessment and product sales. Moreover, binary data has been widely used in various applications including market basket data and text documents analysis. While applying classic widely used k-means method is inappropriate to cluster binary data, we propose an improvement of K-medoids algorithm using binary similarity measures instead of Euclidean distance which is generally deployed in clustering algorithms. In addition to K-medoids clustering method, agglomerative hierarchical clustering methods based on Gaussian probability models have recently shown to be efficient in different applications. However, the emerging of pattern recognition applications where the features are binary or integer-valued demand extending research efforts to such data types. We propose a hierarchical clustering framework for clustering categorical data based on Multinomial and Bernoulli mixture models. We have compared two widely used density-based distances, namely; Bhattacharyya and Kullback-Leibler. The merits of our proposed clustering frameworks have been shown through extensive experiments on clustering text, binary images categorization and images categorization. The development of generative/discriminative approaches for classifying different kinds of data has attracted scholars’ attention. Considering the strengths and weaknesses of both approaches, several hybrid learning approaches which combined the desirable properties of both have been developed. Our contribution is to combine Support Vector Machines (SVMs) and Bernoulli mixture model in order to classify binary data. We propose using Bernoulli mixture model for generating probabilistic kernels for SVM based on information divergence. These kernels make intelligent use of unlabeled binary data to achieve good data discrimination. We evaluate the proposed hybrid learning approach by classifying binary and texture images

    Scalable image quality assessment with 2D mel-cepstrum and machine learning approach

    Get PDF
    Measurement of image quality is of fundamental importance to numerous image and video processing applications. Objective image quality assessment (IQA) is a two-stage process comprising of the following: (a) extraction of important information and discarding the redundant one, (b) pooling the detected features using appropriate weights. These two stages are not easy to tackle due to the complex nature of the human visual system (HVS). In this paper, we first investigate image features based on two-dimensional (2D) mel-cepstrum for the purpose of IQA. It is shown that these features are effective since they can represent the structural information, which is crucial for IQA. Moreover, they are also beneficial in a reduced-reference scenario where only partial reference image information is used for quality assessment. We address the second issue by exploiting machine learning. In our opinion, the well established methodology of machine learning/pattern recognition has not been adequately used for IQA so far; we believe that it will be an effective tool for feature pooling since the required weights/parameters can be determined in a more convincing way via training with the ground truth obtained according to subjective scores. This helps to overcome the limitations of the existing pooling methods, which tend to be over simplistic and lack theoretical justification. Therefore, we propose a new metric by formulating IQA as a pattern recognition problem. Extensive experiments conducted using six publicly available image databases (totally 3211 images with diverse distortions) and one video database (with 78 video sequences) demonstrate the effectiveness and efficiency of the proposed metric, in comparison with seven relevant existing metrics. © 2011 Elsevier Ltd. All rights reserved

    Ein Beitrag zur Pixel-basierten Verteilten Videocodierung: Seiteninformationsgenerierung, WZ-Codierung und flexible Decodierung

    Get PDF
    Moderne Anwendungsszenarien, wie die individuelle Übertragung von Videodaten zwischen mobilen Endgeräten, stellen neue Herausforderungen an das Videoübertragungssystem. Hierbei liegt ein besonderer Fokus auf der geringen Komplexität des Videoencoders. Diese Anforderung kann mit Hilfe der Verteilten Videocodierung erfüllt werden. Im Fokus der vorliegenden Arbeit liegen die sehr geringe Encoderkomplexität sowie auch die Steigerung der Leistungsfähigkeit und die Verbesserung der Flexibilität des Decodierungsprozesses. Einer der wesentlichen Beiträge der Arbeit bezieht sich auf die Verbesserung der Seiteninformationsqualität durch temporale Interpolation

    Low complexity hardware oriented H.264/AVC motion estimation algorithm and related low power and low cost architecture design

    Get PDF
    制度:新 ; 報告番号:甲2999号 ; 学位の種類:博士(工学) ; 授与年月日:2010/3/15 ; 早大学位記番号:新525

    Image and Video Forensics

    Get PDF
    Nowadays, images and videos have become the main modalities of information being exchanged in everyday life, and their pervasiveness has led the image forensics community to question their reliability, integrity, confidentiality, and security. Multimedia contents are generated in many different ways through the use of consumer electronics and high-quality digital imaging devices, such as smartphones, digital cameras, tablets, and wearable and IoT devices. The ever-increasing convenience of image acquisition has facilitated instant distribution and sharing of digital images on digital social platforms, determining a great amount of exchange data. Moreover, the pervasiveness of powerful image editing tools has allowed the manipulation of digital images for malicious or criminal ends, up to the creation of synthesized images and videos with the use of deep learning techniques. In response to these threats, the multimedia forensics community has produced major research efforts regarding the identification of the source and the detection of manipulation. In all cases (e.g., forensic investigations, fake news debunking, information warfare, and cyberattacks) where images and videos serve as critical evidence, forensic technologies that help to determine the origin, authenticity, and integrity of multimedia content can become essential tools. This book aims to collect a diverse and complementary set of articles that demonstrate new developments and applications in image and video forensics to tackle new and serious challenges to ensure media authenticity

    Energy Data Analytics for Smart Meter Data

    Get PDF
    The principal advantage of smart electricity meters is their ability to transfer digitized electricity consumption data to remote processing systems. The data collected by these devices make the realization of many novel use cases possible, providing benefits to electricity providers and customers alike. This book includes 14 research articles that explore and exploit the information content of smart meter data, and provides insights into the realization of new digital solutions and services that support the transition towards a sustainable energy system. This volume has been edited by Andreas Reinhardt, head of the Energy Informatics research group at Technische Universität Clausthal, Germany, and Lucas Pereira, research fellow at Técnico Lisboa, Portugal

    Métodos sem referência baseados em características espaço-temporais para avaliação objetiva de qualidade de vídeo digital

    Get PDF
    The development of no-reference video quality assessment methods is an incipient topic in the literature and it is challenging in the sense that the results obtained by the proposed method should provide the best possible correlation with the evaluations of the Human Visual System. This thesis presents three proposals for objective no-reference video quality evaluation based on spatio-temporal features. The first approach uses a sigmoidal analytical model with leastsquares solution using the Levenberg-Marquardt method. The second and third approaches use a Single-Hidden Layer Feedforward Neural Network with learning based on the Extreme Learning Machine algorithm. Furthermore, an extended version of Extreme Learning Machine algorithm was developed which looks for the best parameters of the artificial neural network iteratively, according to a simple termination criteria, whose goal is to increase the correlation between the objective and subjective scores. The experimental results using cross-validation techniques indicate that the proposed methods are correlated to the Human Visual System scores. Therefore, they are suitable for the monitoring of video quality in broadcasting systems and over IP networks, and can be implemented in devices such as set-top boxes, ultrabooks, tablets, smartphones and Wireless Display (WiDi) devices.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)O desenvolvimento de métodos sem referência para avaliação de qualidade de vídeo é um assunto incipiente na literatura e desafiador, no sentido de que os resultados obtidos pelo método proposto devem apresentar a melhor correlação possível com a percepção do Sistema Visual Humano. Esta tese apresenta três propostas para avaliação objetiva de qualidade de vídeo sem referência baseadas em características espaço-temporais. A primeira abordagem segue um modelo analítico sigmoidal com solução de mínimos quadrados que usa o método Levenberg-Marquardt e a segunda e terceira abordagens utilizam uma rede neural artificial Single-Hidden Layer Feedforward Neural Network com aprendizado baseado no algoritmo Extreme Learning Machine. Além disso, foi desenvolvida uma versão estendida desse algoritmo que busca os melhores parâmetros da rede neural artificial de forma iterativa, segundo um simples critério de parada, cujo objetivo é aumentar a correlação entre os escores objetivos e subjetivos. Os resultados experimentais, que usam técnicas de validação cruzada, indicam que os escores dos métodos propostos apresentam alta correlação com as escores do Sistema Visual Humano. Logo, eles são adequados para o monitoramento de qualidade de vídeo em sistemas de radiodifusão e em redes IP, bem como podem ser implementados em dispositivos como decodificadores, ultrabooks, tablets, smartphones e em equipamentos Wireless Display (WiDi)

    Bag-of-words representations for computer audition

    Get PDF
    Computer audition is omnipresent in everyday life, in applications ranging from personalised virtual agents to health care. From a technical point of view, the goal is to robustly classify the content of an audio signal in terms of a defined set of labels, such as, e.g., the acoustic scene, a medical diagnosis, or, in the case of speech, what is said or how it is said. Typical approaches employ machine learning (ML), which means that task-specific models are trained by means of examples. Despite recent successes in neural network-based end-to-end learning, taking the raw audio signal as input, models relying on hand-crafted acoustic features are still superior in some domains, especially for tasks where data is scarce. One major issue is nevertheless that a sequence of acoustic low-level descriptors (LLDs) cannot be fed directly into many ML algorithms as they require a static and fixed-length input. Moreover, also for dynamic classifiers, compressing the information of the LLDs over a temporal block by summarising them can be beneficial. However, the type of instance-level representation has a fundamental impact on the performance of the model. In this thesis, the so-called bag-of-audio-words (BoAW) representation is investigated as an alternative to the standard approach of statistical functionals. BoAW is an unsupervised method of representation learning, inspired from the bag-of-words method in natural language processing, forming a histogram of the terms present in a document. The toolkit openXBOW is introduced, enabling systematic learning and optimisation of these feature representations, unified across arbitrary modalities of numeric or symbolic descriptors. A number of experiments on BoAW are presented and discussed, focussing on a large number of potential applications and corresponding databases, ranging from emotion recognition in speech to medical diagnosis. The evaluations include a comparison of different acoustic LLD sets and configurations of the BoAW generation process. The key findings are that BoAW features are a meaningful alternative to statistical functionals, offering certain benefits, while being able to preserve the advantages of functionals, such as data-independence. Furthermore, it is shown that both representations are complementary and their fusion improves the performance of a machine listening system.Maschinelles Hören ist im täglichen Leben allgegenwärtig, mit Anwendungen, die von personalisierten virtuellen Agenten bis hin zum Gesundheitswesen reichen. Aus technischer Sicht besteht das Ziel darin, den Inhalt eines Audiosignals hinsichtlich einer Auswahl definierter Labels robust zu klassifizieren. Die Labels beschreiben bspw. die akustische Umgebung der Aufnahme, eine medizinische Diagnose oder - im Falle von Sprache - was gesagt wird oder wie es gesagt wird. Übliche Ansätze hierzu verwenden maschinelles Lernen, d.h., es werden anwendungsspezifische Modelle anhand von Beispieldaten trainiert. Trotz jüngster Erfolge beim Ende-zu-Ende-Lernen mittels neuronaler Netze, in welchen das unverarbeitete Audiosignal als Eingabe benutzt wird, sind Modelle, die auf definierten akustischen Merkmalen basieren, in manchen Bereichen weiterhin überlegen. Dies gilt im Besonderen für Einsatzzwecke, für die nur wenige Daten vorhanden sind. Allerdings besteht dabei das Problem, dass Zeitfolgen von akustischen Deskriptoren in viele Algorithmen des maschinellen Lernens nicht direkt eingespeist werden können, da diese eine statische Eingabe fester Länge benötigen. Außerdem kann es auch für dynamische (zeitabhängige) Klassifikatoren vorteilhaft sein, die Deskriptoren über ein gewisses Zeitintervall zusammenzufassen. Jedoch hat die Art der Merkmalsdarstellung einen grundlegenden Einfluss auf die Leistungsfähigkeit des Modells. In der vorliegenden Dissertation wird der sogenannte Bag-of-Audio-Words-Ansatz (BoAW) als Alternative zum Standardansatz der statistischen Funktionale untersucht. BoAW ist eine Methode des unüberwachten Lernens von Merkmalsdarstellungen, die von der Bag-of-Words-Methode in der Computerlinguistik inspiriert wurde, bei der ein Textdokument als Histogramm der vorkommenden Wörter beschrieben wird. Das Toolkit openXBOW wird vorgestellt, welches systematisches Training und Optimierung dieser Merkmalsdarstellungen - vereinheitlicht für beliebige Modalitäten mit numerischen oder symbolischen Deskriptoren - erlaubt. Es werden einige Experimente zum BoAW-Ansatz durchgeführt und diskutiert, die sich auf eine große Zahl möglicher Anwendungen und entsprechende Datensätze beziehen, von der Emotionserkennung in gesprochener Sprache bis zur medizinischen Diagnostik. Die Auswertungen beinhalten einen Vergleich verschiedener akustischer Deskriptoren und Konfigurationen der BoAW-Methode. Die wichtigsten Erkenntnisse sind, dass BoAW-Merkmalsvektoren eine geeignete Alternative zu statistischen Funktionalen darstellen, gewisse Vorzüge bieten und gleichzeitig wichtige Eigenschaften der Funktionale, wie bspw. die Datenunabhängigkeit, erhalten können. Zudem wird gezeigt, dass beide Darstellungen komplementär sind und eine Fusionierung die Leistungsfähigkeit eines Systems des maschinellen Hörens verbessert
    corecore