141 research outputs found
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
Bridging semantic gap: learning and integrating semantics for content-based retrieval
Digital cameras have entered ordinary homes and produced^incredibly large number
of photos. As a typical example of broad image domain, unconstrained consumer
photos vary significantly. Unlike professional or domain-specific images, the objects
in the photos are ill-posed, occluded, and cluttered with poor lighting, focus, and
exposure. Content-based image retrieval research has yet to bridge the semantic gap
between computable low-level information and high-level user interpretation.
In this thesis, we address the issue of semantic gap with a structured learning
framework to allow modular extraction of visual semantics. Semantic image regions
(e.g. face, building, sky etc) are learned statistically, detected directly from image
without segmentation, reconciled across multiple scales, and aggregated spatially to
form compact semantic index. To circumvent the ambiguity and subjectivity in a
query, a new query method that allows spatial arrangement of visual semantics is
proposed. A query is represented as a disjunctive normal form of visual query terms
and processed using fuzzy set operators.
A drawback of supervised learning is the manual labeling of regions as training
samples. In this thesis, a new learning framework to discover local semantic patterns
and to generate their samples for training with minimal human intervention has been
developed. The discovered patterns can be visualized and used in semantic indexing.
In addition, three new class-based indexing schemes are explored. The winnertake-
all scheme supports class-based image retrieval. The class relative scheme and
the local classification scheme compute inter-class memberships and local class patterns
as indexes for similarity matching respectively. A Bayesian formulation is
proposed to unify local and global indexes in image comparison and ranking that
resulted in superior image retrieval performance over those of single indexes.
Query-by-example experiments on 2400 consumer photos with 16 semantic queries
show that the proposed approaches have significantly better (18% to 55%) average
precisions than a high-dimension feature fusion approach. The thesis has paved
two promising research directions, namely the semantics design approach and the
semantics discovery approach. They form elegant dual frameworks that exploits
pattern classifiers in learning and integrating local and global image semantics
IMAGE GEOLOCALIZATION AND ITS APPLICATION TO MEDIA FORENSICS
Image geo-localization is an important research problem. In recent years, the IARPA Finder program gathers many researchers to develop the technology to address the geo-localization task. One particularly effective approach is utilizing the large-scale ground-level image and/or overhead imagery with image matching techniques for image geo-localization. In this dissertation, we focus on two different aspects of geo-localization. First, we focus on indoor image and use geo-localization to recognize different business venues. Second, we address the venerability of such a computer vision system and apply geo-localization to solve media forensics problems such as content manipulation and meta-data manipulation.
With the prevalence of social media platforms, media shared on the Internet can reach millions of people in a short time. Sheer amounts of media available on the Internet enable many different computer vision applications. However, at the same time, people can easily share a tampered media for malicious goals such as creating panic or distorting public opinions with little effort.
We first propose an image localization framework for extracting fine-grained location information (i.e. business venues) from images. Our framework utilizes the information available from social media websites such as Instagram and Yelp to extract a set of location-related concepts. Using these concepts with a multi-modal recognition model, we were able to extract location information based on the image content.
Secondly, to make a robust system, we address the metadata tampering detection problem, detecting the discrepancy between the images and its associated metadata such as GPS and timestamp. We propose a multi-task learning model to verify its authenticity by detecting the discrepancy between image content and its metadata. Our model first detects meteorological properties such as weather condition, sun angle, and temperatures from the image content and comparing it with the information from the online weather database. To facilitate the training and evaluating of our model, we create a large-scale outdoor dataset labeled with meteorological properties.
Thirdly, we address the event verification problem by designing a convolutional neural networks configuration specifically target for image localization. The proposed networks utilize the bilinear pooling layer and attention module to extract detail location information from the image content.
Forth, we present a generative model to generate realistic image compositing using adversarial learning, which can be used to further improve the image tampering detection model. Finally, we propose an object-based provenance approach to address the content manipulation problem in media forensics
Computer vision based classification of fruits and vegetables for self-checkout at supermarkets
The field of machine learning, and, in particular, methods to improve the capability of machines to perform a wider variety of generalised tasks are among the most rapidly growing research areas in today’s world. The current applications of machine learning and artificial intelligence can be divided into many significant fields namely computer vision, data sciences, real time analytics and Natural Language Processing (NLP). All these applications are being used to help computer based systems to operate more usefully in everyday contexts. Computer vision research is currently active in a wide range of areas such as the development of autonomous vehicles, object recognition, Content Based Image Retrieval (CBIR), image segmentation and terrestrial analysis from space (i.e. crop estimation). Despite significant prior research, the area of object recognition still has many topics to be explored. This PhD thesis focuses on using advanced machine learning approaches to enable the automated recognition of fresh produce (i.e. fruits and vegetables) at supermarket self-checkouts. This type of complex classification task is one of the most recently emerging applications of advanced computer vision approaches and is a productive research topic in this field due to the limited means of representing the features and machine learning techniques for classification. Fruits and vegetables offer significant inter and intra class variance in weight, shape, size, colour and texture which makes the classification challenging.
The applications of effective fruit and vegetable classification have significant importance in daily life e.g. crop estimation, fruit classification, robotic harvesting, fruit quality assessment, etc. One potential application for this fruit and vegetable classification capability is for supermarket self-checkouts. Increasingly, supermarkets are introducing self-checkouts in stores to make the checkout process easier and faster. However, there are a number of challenges with this as all goods cannot readily be sold with packaging and barcodes, for instance loose fresh items (e.g. fruits and vegetables). Adding barcodes to these types of items individually is impractical and pre-packaging limits the freedom of choice when selecting fruits and vegetables and creates additional waste, hence reducing customer satisfaction. The current situation, which relies on customers correctly identifying produce themselves leaves open the potential for incorrect billing either due to inadvertent error, or due to intentional fraudulent misclassification resulting in financial losses for the store. To address this identified problem, the main goals of this PhD work are: (a) exploring the types of visual and non-visual sensors that could be incorporated into a self-checkout system for classification of fruits and vegetables, (b) determining a suitable feature representation method for fresh produce items available at supermarkets, (c) identifying optimal machine learning techniques for classification within this context and (d) evaluating our work relative to the state-of-the-art object classification results presented in the literature.
An in-depth analysis of related computer vision literature and techniques is performed to identify and implement the possible solutions. A progressive process distribution approach is used for this project where the task of computer vision based fruit and vegetables classification is divided into pre-processing and classification techniques. Different classification techniques have been implemented and evaluated as possible solution for this problem. Both visual and non-visual features of fruit and vegetables are exploited to perform the classification. Novel classification techniques have been carefully developed to deal with the complex and highly variant physical features of fruit and vegetables while taking advantages of both visual and non-visual features. The capability of classification techniques is tested in individual and ensemble manner to achieved the higher effectiveness.
Significant results have been obtained where it can be concluded that the fruit and vegetables classification is complex task with many challenges involved. It is also observed that a larger dataset can better comprehend the complex variant features of fruit and vegetables. Complex multidimensional features can be extracted from the larger datasets to generalise on higher number of classes. However, development of a larger multiclass dataset is an expensive and time consuming process. The effectiveness of classification techniques can be significantly improved by subtracting the background occlusions and complexities. It is also worth mentioning that ensemble of simple and less complicated classification techniques can achieve effective results even if applied to less number of features for smaller number of classes. The combination of visual and nonvisual features can reduce the struggle of a classification technique to deal with higher number of classes with similar physical features. Classification of fruit and vegetables with similar physical features (i.e. colour and texture) needs careful estimation and hyper-dimensional embedding of visual features. Implementing rigorous classification penalties as loss function can achieve this goal at the cost of time and computational requirements. There is a significant need to develop larger datasets for different fruit and vegetables related computer vision applications. Considering more sophisticated loss function penalties and discriminative hyper-dimensional features embedding techniques can significantly improve the effectiveness of the classification techniques for the fruit and vegetables applications
Feature based dynamic intra-video indexing
A thesis submitted in partial fulfillment for the degree of Doctor of PhilosophyWith the advent of digital imagery and its wide spread application in all vistas of life, it has become an important component in the world of communication. Video content ranging from broadcast news, sports, personal videos, surveillance, movies and entertainment and similar domains is increasing exponentially in quantity and it is becoming a challenge to retrieve content of interest from the corpora. This has led to an increased interest amongst the researchers to investigate concepts of video structure analysis, feature extraction, content annotation, tagging, video indexing, querying and retrieval to fulfil the requirements. However, most of the previous work is confined within specific domain and constrained by the quality, processing and storage capabilities. This thesis presents a novel framework agglomerating the established approaches from feature extraction to browsing in one system of content based video retrieval. The proposed framework significantly fills the gap identified while satisfying the imposed constraints of processing, storage, quality and retrieval times. The output entails a framework, methodology and prototype application to allow the user to efficiently and effectively retrieved content of interest such as age, gender and activity by specifying the relevant query. Experiments have shown plausible results with an average precision and recall of 0.91 and 0.92 respectively for face detection using Haar wavelets based approach. Precision of age ranges from 0.82 to 0.91 and recall from 0.78 to 0.84. The recognition of gender gives better precision with males (0.89) compared to females while recall gives a higher value with females (0.92). Activity of the subject has been detected using Hough transform and classified using Hiddell Markov Model. A comprehensive dataset to support similar studies has also been developed as part of the research process. A Graphical User Interface (GUI) providing a friendly and intuitive interface has been integrated into the developed system to facilitate the retrieval process. The comparison results of the intraclass correlation coefficient (ICC) shows that the performance of the system closely resembles with that of the human annotator. The performance has been optimised for time and error rate
Artificial Intelligence for Multimedia Signal Processing
Artificial intelligence technologies are also actively applied to broadcasting and multimedia processing technologies. A lot of research has been conducted in a wide variety of fields, such as content creation, transmission, and security, and these attempts have been made in the past two to three years to improve image, video, speech, and other data compression efficiency in areas related to MPEG media processing technology. Additionally, technologies such as media creation, processing, editing, and creating scenarios are very important areas of research in multimedia processing and engineering. This book contains a collection of some topics broadly across advanced computational intelligence algorithms and technologies for emerging multimedia signal processing as: Computer vision field, speech/sound/text processing, and content analysis/information mining
Integrated analysis of audiovisual signals and external information sources for event detection in team sports video
Ph.DDOCTOR OF PHILOSOPH
- …