8,284 research outputs found

    Automatic image quality enhancement using deep neural networks

    Get PDF
    Abstract. Photo retouching can significantly improve image quality and it is considered an essential part of photography. Traditionally this task has been completed manually with special image enhancement software. However, recent research utilizing neural networks has been proven to perform better in the automated image enhancement task compared to traditional methods. During the literature review of this thesis, multiple automatic neural-network-based image enhancement methods were studied, and one of these methods was chosen for closer examination and evaluation. The chosen network design has several appealing qualities such as the ability to learn both local and global enhancements, and its simple architecture constructed for efficient computational speed. This research proposes a novel dataset generation method for automated image enhancement research, and tests its usefulness with the chosen network design. This dataset generation method simulates commonly occurring photographic errors, and the original high-quality images can be used as the target data. This dataset design allows studying fixes for individual and combined aberrations. The underlying idea of this design choice is that the network would learn to fix these aberrations while producing aesthetically pleasing and consistent results. The quantitative evaluation proved that the network can learn to counter these errors, and with greater effort, it could also learn to enhance all of these aspects simultaneously. Additionally, the network’s capability of learning local and portrait specific enhancement tasks were evaluated. The models can apply the effect successfully, but the results did not gain the same level of accuracy as with global enhancement tasks. According to the completed qualitative survey, the images enhanced by the proposed general enhancement model can successfully enhance the image quality, and it can perform better than some of the state-of-the-art image enhancement methods.Automaattinen kuvanlaadun parantaminen käyttämällä syviä neuroverkkoja. Tiivistelmä. Manuaalinen valokuvien käsittely voi parantaa kuvanlaatua huomattavasti ja sitä pidetään oleellisena osana valokuvausprosessia. Perinteisesti tätä tehtävää varten on käytetty erityisiä manuaalisesti operoitavia kuvankäsittelyohjelmia. Nykytutkimus on kuitenkin todistanut neuroverkkojen paremmuuden automaattisessa kuvanparannussovelluksissa perinteisiin menetelmiin verrattuna. Tämän diplomityön kirjallisuuskatsauksessa tutkittiin useita neuroverkkopohjaisia kuvanparannusmenetelmiä, ja yksi näistä valittiin tarkempaa tutkimusta ja arviointia varten. Valitulla verkkomallilla on useita vetoavia ominaisuuksia, kuten paikallisten sekä globaalien kuvanparannusten oppiminen ja sen yksinkertaistettu arkkitehtuuri, joka on rakennettu tehokasta suoritusnopeutta varten. Tämä tutkimus esittää uuden opetusdatan generointimenetelmän automaattisia kuvanparannusmetodeja varten, ja testaa sen soveltuvuutta käyttämällä valittua neuroverkkorakennetta. Tämä opetusdatan generointimenetelmä simuloi usein esiintyviä valokuvauksellisia virheitä, ja alkuperäisiä korkealaatuisia kuvia voi käyttää opetuksen tavoitedatana. Tämän generointitavan avulla voitiin tutkia erillisten valokuvausvirheiden, sekä näiden yhdistelmän korjausta. Tämän menetelmän tarkoitus oli opettaa verkkoa korjaamaan erilaisia virheitä sekä tuottamaan esteettisesti miellyttäviä ja yhtenäisiä tuloksia. Kvalitatiivinen arviointi todisti, että käytetty neuroverkko kykenee oppimaan erillisiä korjauksia näille virheille. Neuroverkko pystyy oppimaan myös mallin, joka korjaa kaikkia ennalta määrättyjä virheitä samanaikaisesti, mutta alhaisemmalla tarkkuudella. Lisäksi neuroverkon kyvykkyyttä oppia paikallisia muotokuvakohtaisia kuvanparannuksia arvioitiin. Koulutetut mallit pystyvät myös toteuttamaan paikallisen kuvanparannuksen onnistuneesti, mutta nämä mallit eivät yltäneet globaalien parannusten tasolle. Toteutetun kyselytutkimuksen mukaan esitetty yleisen kuvanparannuksen malli pystyy parantamaan kuvanlaatua onnistuneesti, sekä tuottaa parempia tuloksia kuin osa vertailluista kuvanparannustekniikoista

    Text–to–Video: Image Semantics and NLP

    Get PDF
    When aiming at automatically translating an arbitrary text into a visual story, the main challenge consists in finding a semantically close visual representation whereby the displayed meaning should remain the same as in the given text. Besides, the appearance of an image itself largely influences how its meaningful information is transported towards an observer. This thesis now demonstrates that investigating in both, image semantics as well as the semantic relatedness between visual and textual sources enables us to tackle the challenging semantic gap and to find a semantically close translation from natural language to a corresponding visual representation. Within the last years, social networking became of high interest leading to an enormous and still increasing amount of online available data. Photo sharing sites like Flickr allow users to associate textual information with their uploaded imagery. Thus, this thesis exploits this huge knowledge source of user generated data providing initial links between images and words, and other meaningful data. In order to approach visual semantics, this work presents various methods to analyze the visual structure as well as the appearance of images in terms of meaningful similarities, aesthetic appeal, and emotional effect towards an observer. In detail, our GPU-based approach efficiently finds visual similarities between images in large datasets across visual domains and identifies various meanings for ambiguous words exploring similarity in online search results. Further, we investigate in the highly subjective aesthetic appeal of images and make use of deep learning to directly learn aesthetic rankings from a broad diversity of user reactions in social online behavior. To gain even deeper insights into the influence of visual appearance towards an observer, we explore how simple image processing is capable of actually changing the emotional perception and derive a simple but effective image filter. To identify meaningful connections between written text and visual representations, we employ methods from Natural Language Processing (NLP). Extensive textual processing allows us to create semantically relevant illustrations for simple text elements as well as complete storylines. More precisely, we present an approach that resolves dependencies in textual descriptions to arrange 3D models correctly. Further, we develop a method that finds semantically relevant illustrations to texts of different types based on a novel hierarchical querying algorithm. Finally, we present an optimization based framework that is capable of not only generating semantically relevant but also visually coherent picture stories in different styles.Bei der automatischen Umwandlung eines beliebigen Textes in eine visuelle Geschichte, besteht die größte Herausforderung darin eine semantisch passende visuelle Darstellung zu finden. Dabei sollte die Bedeutung der Darstellung dem vorgegebenen Text entsprechen. Darüber hinaus hat die Erscheinung eines Bildes einen großen Einfluß darauf, wie seine bedeutungsvollen Inhalte auf einen Betrachter übertragen werden. Diese Dissertation zeigt, dass die Erforschung sowohl der Bildsemantik als auch der semantischen Verbindung zwischen visuellen und textuellen Quellen es ermöglicht, die anspruchsvolle semantische Lücke zu schließen und eine semantisch nahe Übersetzung von natürlicher Sprache in eine entsprechend sinngemäße visuelle Darstellung zu finden. Des Weiteren gewann die soziale Vernetzung in den letzten Jahren zunehmend an Bedeutung, was zu einer enormen und immer noch wachsenden Menge an online verfügbaren Daten geführt hat. Foto-Sharing-Websites wie Flickr ermöglichen es Benutzern, Textinformationen mit ihren hochgeladenen Bildern zu verknüpfen. Die vorliegende Arbeit nutzt die enorme Wissensquelle von benutzergenerierten Daten welche erste Verbindungen zwischen Bildern und Wörtern sowie anderen aussagekräftigen Daten zur Verfügung stellt. Zur Erforschung der visuellen Semantik stellt diese Arbeit unterschiedliche Methoden vor, um die visuelle Struktur sowie die Wirkung von Bildern in Bezug auf bedeutungsvolle Ähnlichkeiten, ästhetische Erscheinung und emotionalem Einfluss auf einen Beobachter zu analysieren. Genauer gesagt, findet unser GPU-basierter Ansatz effizient visuelle Ähnlichkeiten zwischen Bildern in großen Datenmengen quer über visuelle Domänen hinweg und identifiziert verschiedene Bedeutungen für mehrdeutige Wörter durch die Erforschung von Ähnlichkeiten in Online-Suchergebnissen. Des Weiteren wird die höchst subjektive ästhetische Anziehungskraft von Bildern untersucht und "deep learning" genutzt, um direkt ästhetische Einordnungen aus einer breiten Vielfalt von Benutzerreaktionen im sozialen Online-Verhalten zu lernen. Um noch tiefere Erkenntnisse über den Einfluss des visuellen Erscheinungsbildes auf einen Betrachter zu gewinnen, wird erforscht, wie alleinig einfache Bildverarbeitung in der Lage ist, tatsächlich die emotionale Wahrnehmung zu verändern und ein einfacher aber wirkungsvoller Bildfilter davon abgeleitet werden kann. Um bedeutungserhaltende Verbindungen zwischen geschriebenem Text und visueller Darstellung zu ermitteln, werden Methoden des "Natural Language Processing (NLP)" verwendet, die der Verarbeitung natürlicher Sprache dienen. Der Einsatz umfangreicher Textverarbeitung ermöglicht es, semantisch relevante Illustrationen für einfache Textteile sowie für komplette Handlungsstränge zu erzeugen. Im Detail wird ein Ansatz vorgestellt, der Abhängigkeiten in Textbeschreibungen auflöst, um 3D-Modelle korrekt anzuordnen. Des Weiteren wird eine Methode entwickelt die, basierend auf einem neuen hierarchischen Such-Anfrage Algorithmus, semantisch relevante Illustrationen zu Texten verschiedener Art findet. Schließlich wird ein optimierungsbasiertes Framework vorgestellt, das nicht nur semantisch relevante, sondern auch visuell kohärente Bildgeschichten in verschiedenen Bildstilen erzeugen kann

    Towards Learning Representations in Visual Computing Tasks

    Get PDF
    abstract: The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle anomalies in the data such as missing samples and noisy input caused by the undesired, external factors of variation. It should also reduce the data redundancy. Over the years, many feature extraction processes have been invented to produce good representations of raw images and videos. The feature extraction processes can be categorized into three groups. The first group contains processes that are hand-crafted for a specific task. Hand-engineering features requires the knowledge of domain experts and manual labor. However, the feature extraction process is interpretable and explainable. Next group contains the latent-feature extraction processes. While the original feature lies in a high-dimensional space, the relevant factors for a task often lie on a lower dimensional manifold. The latent-feature extraction employs hidden variables to expose the underlying data properties that cannot be directly measured from the input. Latent features seek a specific structure such as sparsity or low-rank into the derived representation through sophisticated optimization techniques. The last category is that of deep features. These are obtained by passing raw input data with minimal pre-processing through a deep network. Its parameters are computed by iteratively minimizing a task-based loss. In this dissertation, I present four pieces of work where I create and learn suitable data representations. The first task employs hand-crafted features to perform clinically-relevant retrieval of diabetic retinopathy images. The second task uses latent features to perform content-adaptive image enhancement. The third task ranks a pair of images based on their aestheticism. The goal of the last task is to capture localized image artifacts in small datasets with patch-level labels. For both these tasks, I propose novel deep architectures and show significant improvement over the previous state-of-art approaches. A suitable combination of feature representations augmented with an appropriate learning approach can increase performance for most visual computing tasks.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    ICS Materials. Towards a re-Interpretation of material qualities through interactive, connected, and smart materials.

    Get PDF
    The domain of materials for design is changing under the influence of an increased technological advancement, miniaturization and democratization. Materials are becoming connected, augmented, computational, interactive, active, responsive, and dynamic. These are ICS Materials, an acronym that stands for Interactive, Connected and Smart. While labs around the world are experimenting with these new materials, there is the need to reflect on their potentials and impact on design. This paper is a first step in this direction: to interpret and describe the qualities of ICS materials, considering their experiential pattern, their expressive sensorial dimension, and their aesthetic of interaction. Through case studies, we analyse and classify these emerging ICS Materials and identified common characteristics, and challenges, e.g. the ability to change over time or their programmability by the designers and users. On that basis, we argue there is the need to reframe and redesign existing models to describe ICS materials, making their qualities emerge

    Artificial Intelligence Technology

    Get PDF
    This open access book aims to give our readers a basic outline of today’s research and technology developments on artificial intelligence (AI), help them to have a general understanding of this trend, and familiarize them with the current research hotspots, as well as part of the fundamental and common theories and methodologies that are widely accepted in AI research and application. This book is written in comprehensible and plain language, featuring clearly explained theories and concepts and extensive analysis and examples. Some of the traditional findings are skipped in narration on the premise of a relatively comprehensive introduction to the evolution of artificial intelligence technology. The book provides a detailed elaboration of the basic concepts of AI, machine learning, as well as other relevant topics, including deep learning, deep learning framework, Huawei MindSpore AI development framework, Huawei Atlas computing platform, Huawei AI open platform for smart terminals, and Huawei CLOUD Enterprise Intelligence application platform. As the world’s leading provider of ICT (information and communication technology) infrastructure and smart terminals, Huawei’s products range from digital data communication, cyber security, wireless technology, data storage, cloud computing, and smart computing to artificial intelligence

    Gazo bunseki to kanren joho o riyoshita gazo imi rikai ni kansuru kenkyu

    Get PDF
    制度:新 ; 報告番号:甲3514号 ; 学位の種類:博士(国際情報通信学) ; 授与年月日:2012/2/8 ; 早大学位記番号:新585
    corecore