506 research outputs found

    Lossless Differential Compression for Synchronizing Arbitrary Single-Dimensional Strings

    Get PDF
    Differential compression allows expressing a modified document as differences relative to another version of the document. A compressed string requires space relative to amount of changes, irrespective of original document sizes. The purpose of this study was to answer what algorithms are suitable for universal lossless differential compression for synchronizing two arbitrary documents either locally or remotely. Two main problems in differential compression are finding the differences (differencing), and compactly communicating the differences (encoding). We discussed local differencing algorithms based on subsequence searching, hashtable lookups, suffix searching, and projection. We also discussed probabilistic remote algorithms based on both recursive comparison and characteristic polynomial interpolation of hashes computed from variable-length content-defined substrings. We described various heuristics for approximating optimal algorithms as arbitrary long strings and memory limitations force discarding information. Discussion also included compact delta encoding and in-place reconstruction. We presented results from empirical testing using discussed algorithms. The conclusions were that multiple algorithms need to be integrated into a hybrid implementation, which heuristically chooses algorithms based on evaluation of the input data. Algorithms based on hashtable lookups are faster on average and require less memory, but algorithms based on suffix searching find least differences. Interpolating characteristic polynomials was found to be too slow for general use. With remote hash comparison, content-defined chunks and recursive comparison can reduce protocol overhead. A differential compressor should be merged with a state-of-art non-differential compressor to enable more compact delta encoding. Input should be processed multiple times to allow constant a space bound without significant reduction in compression efficiency. Compression efficiently of current popular synchronizers could be improved, as our empiral testing showed that a non-differential compressor produced smaller files without having access to one of the two strings

    Entwicklung einer Fully-Convolutional-Netzwerkarchitektur für die Detektion von defekten LED-Chips in Photolumineszenzbildern

    Get PDF
    Nowadays, light-emitting diodes (LEDs) can be found in a large variety of applications, from standard LEDs in domestic lighting solutions to advanced chip designs in automobiles, smart watches and video walls. The advances in chip design also affect the test processes, where the execution of certain contact measurements is exacerbated by ever decreasing chip dimensions or even rendered impossible due to the chip design. As an instance, wafer probing determines the electrical and optical properties of all LED chips on a wafer by contacting each and every chip with a prober needle. Chip designs without a contact pad on the surface, however, elude wafer probing and while electrical and optical properties can be determined by sample measurements, defective LED chips are distributed randomly over the wafer. Here, advanced data analysis methods provide a new approach to gather defect information from already available non-contact measurements. Photoluminescence measurements, for example, record a brightness image of an LED wafer, where conspicuous brightness values indicate defective chips. To extract these defect information from photoluminescence images, a computer-vision algorithm is required that transforms photoluminescence images into defect maps. In other words, each and every pixel of a photoluminescence image must be classifed into a class category via semantic segmentation, where so-called fully-convolutional-network algorithms represent the state-of-the-art method. However, the aforementioned task poses several challenges: on the one hand, each pixel in a photoluminescence image represents an LED chip and thus, pixel-fine output resolution is required. On the other hand, photoluminescence images show a variety of brightness values from wafer to wafer in addition to local areas of differing brightness. Additionally, clusters of defective chips assume various shapes, sizes and brightness gradients and thus, the algorithm must reliably recognise objects at multiple scales. Finally, not all salient brightness values correspond to defective LED chips, requiring the algorithm to distinguish salient brightness values corresponding to measurement artefacts, non-defect structures and defects, respectively. In this dissertation, a novel fully-convolutional-network architecture was developed that allows the accurate segmentation of defective LED chips in highly variable photoluminescence wafer images. For this purpose, the basic fully-convolutional-network architecture was modifed with regard to the given application and advanced architectural concepts were incorporated so as to enable a pixel-fine output resolution and a reliable segmentation of multiple scaled defect structures. Altogether, the developed dense ASPP Vaughan architecture achieved a pixel accuracy of 97.5 %, mean pixel accuracy of 96.2% and defect-class accuracy of 92.0 %, trained on a dataset of 136 input-label pairs and hereby showed that fully-convolutional-network algorithms can be a valuable contribution to data analysis in industrial manufacturing.Leuchtdioden (LEDs) werden heutzutage in einer Vielzahl von Anwendungen verbaut, angefangen bei Standard-LEDs in der Hausbeleuchtung bis hin zu technisch fortgeschrittenen Chip-Designs in Automobilen, Smartwatches und Videowänden. Die Weiterentwicklungen im Chip-Design beeinflussen auch die Testprozesse: Hierbei wird die Durchführung bestimmter Kontaktmessungen durch zunehmend verringerte Chip-Dimensionen entweder erschwert oder ist aufgrund des Chip-Designs unmöglich. Die sogenannteWafer-Prober-Messung beispielsweise ermittelt die elektrischen und optischen Eigenschaften aller LED-Chips auf einem Wafer, indem jeder einzelne Chip mit einer Messnadel kontaktiert und vermessen wird; Chip-Designs ohne Kontaktpad auf der Oberfläche können daher nicht durch die Wafer-Prober-Messung charakterisiert werden. Während die elektrischen und optischen Chip-Eigenschaften auch mittels Stichprobenmessungen bestimmt werden können, verteilen sich defekte LED-Chips zufällig über die Waferfläche. Fortgeschrittene Datenanalysemethoden ermöglichen hierbei einen neuen Ansatz, Defektinformationen aus bereits vorhandenen, berührungslosen Messungen zu gewinnen. Photolumineszenzmessungen, beispielsweise, erfassen ein Helligkeitsbild des LEDWafers, in dem auffällige Helligkeitswerte auf defekte LED-Chips hinweisen. Ein Bildverarbeitungsalgorithmus, der diese Defektinformationen aus Photolumineszenzbildern extrahiert und ein Defektabbild erstellt, muss hierzu jeden einzelnen Bildpunkt mittels semantischer Segmentation klassifizieren, eine Technik bei der sogenannte Fully-Convolutional-Netzwerke den Stand der Technik darstellen. Die beschriebene Aufgabe wird jedoch durch mehrere Faktoren erschwert: Einerseits entspricht jeder Bildpunkt eines Photolumineszenzbildes einem LED-Chip, so dass eine bildpunktfeine Auflösung der Netzwerkausgabe notwendig ist. Andererseits weisen Photolumineszenzbilder sowohl stark variierende Helligkeitswerte von Wafer zu Wafer als auch lokal begrenzte Helligkeitsabweichungen auf. Zusätzlich nehmen Defektanhäufungen unterschiedliche Formen, Größen und Helligkeitsgradienten an, weswegen der Algorithmus Objekte verschiedener Abmessungen zuverlässig erkennen können muss. Schlussendlich weisen nicht alle auffälligen Helligkeitswerte auf defekte LED-Chips hin, so dass der Algorithmus in der Lage sein muss zu unterscheiden, ob auffällige Helligkeitswerte mit Messartefakten, defekten LED-Chips oder defektfreien Strukturen korrelieren. In dieser Dissertation wurde eine neuartige Fully-Convolutional-Netzwerkarchitektur entwickelt, die die akkurate Segmentierung defekter LED-Chips in stark variierenden Photolumineszenzbildern von LED-Wafern ermöglicht. Zu diesem Zweck wurde die klassische Fully-Convolutional-Netzwerkarchitektur hinsichtlich der beschriebenen Anwendung angepasst und fortgeschrittene architektonische Konzepte eingearbeitet, um eine bildpunktfeine Ausgabeauflösung und eine zuverlässige Sementierung verschieden großer Defektstrukturen umzusetzen. Insgesamt erzielt die entwickelte dense-ASPP-Vaughan-Architektur eine Pixelgenauigkeit von 97,5 %, durchschnittliche Pixelgenauigkeit von 96,2% und eine Defektklassengenauigkeit von 92,0 %, trainiert mit einem Datensatz von 136 Bildern. Hiermit konnte gezeigt werden, dass Fully-Convolutional-Netzwerke eine wertvolle Erweiterung der Datenanalysemethoden sein können, die in der industriellen Fertigung eingesetzt werden

    Incremental volume rendering using hierarchical compression

    Get PDF
    Includes bibliographical references.The research has been based on the thesis that efficient volume rendering of datasets, contained on the Internet, can be achieved on average personal workstations. We present a new algorithm here for efficient incremental rendering of volumetric datasets. The primary goal of this algorithm is to give average workstations the ability to efficiently render volume data received over relatively low bandwidth network links in such a way that rapid user feedback is maintained. Common limitations of workstation rendering of volume data include: large memory overheads, the requirement of expensive rendering hardware, and high speed processing ability. The rendering algorithm presented here overcomes these problems by making use of the efficient Shear-Warp Factorisation method which does not require specialised graphics hardware. However the original Shear-Warp algorithm suffers from a high memory overhead and does not provide for incremental rendering which is required should rapid user feedback be maintained. Our algorithm represents the volumetric data using a hierarchical data structure which provides for the incremental classification and rendering of volume data. This exploits the multiscale nature of the octree data structure. The algorithm reduces the memory footprint of the original Shear-Warp Factorisation algorithm by a factor of more than two, while maintaining good rendering performance. These factors make our octree algorithm more suitable for implementation on average desktop workstations for the purposes of interactive exploration of volume models over a network. This dissertation covers the theory and practice of developing the octree based Shear-Warp algorithms, and then presents the results of extensive empirical testing. The results, using typical volume datasets, demonstrate the ability of the algorithm to achieve high rendering rates for both incremental rendering and standard rendering while reducing the runtime memory requirements

    Analytical Comparison of Grid File and K-d-b-tree Structures

    Get PDF
    Computing and Information Scienc

    Deep Learning Techniques for Music Generation -- A Survey

    Full text link
    This paper is a survey and an analysis of different ways of using deep learning (deep artificial neural networks) to generate musical content. We propose a methodology based on five dimensions for our analysis: Objective - What musical content is to be generated? Examples are: melody, polyphony, accompaniment or counterpoint. - For what destination and for what use? To be performed by a human(s) (in the case of a musical score), or by a machine (in the case of an audio file). Representation - What are the concepts to be manipulated? Examples are: waveform, spectrogram, note, chord, meter and beat. - What format is to be used? Examples are: MIDI, piano roll or text. - How will the representation be encoded? Examples are: scalar, one-hot or many-hot. Architecture - What type(s) of deep neural network is (are) to be used? Examples are: feedforward network, recurrent network, autoencoder or generative adversarial networks. Challenge - What are the limitations and open challenges? Examples are: variability, interactivity and creativity. Strategy - How do we model and control the process of generation? Examples are: single-step feedforward, iterative feedforward, sampling or input manipulation. For each dimension, we conduct a comparative analysis of various models and techniques and we propose some tentative multidimensional typology. This typology is bottom-up, based on the analysis of many existing deep-learning based systems for music generation selected from the relevant literature. These systems are described and are used to exemplify the various choices of objective, representation, architecture, challenge and strategy. The last section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P. Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music Generation, Computational Synthesis and Creative Systems, Springer, 201

    Contributions in image and video coding

    Get PDF
    Orientador: Max Henrique Machado CostaTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: A comunidade de codificação de imagens e vídeo vem também trabalhando em inovações que vão além das tradicionais técnicas de codificação de imagens e vídeo. Este trabalho é um conjunto de contribuições a vários tópicos que têm recebido crescente interesse de pesquisadores na comunidade, nominalmente, codificação escalável, codificação de baixa complexidade para dispositivos móveis, codificação de vídeo de múltiplas vistas e codificação adaptativa em tempo real. A primeira contribuição estuda o desempenho de três transformadas 3-D rápidas por blocos em um codificador de vídeo de baixa complexidade. O codificador recebeu o nome de Fast Embedded Video Codec (FEVC). Novos métodos de implementação e ordens de varredura são propostos para as transformadas. Os coeficiente 3-D são codificados por planos de bits pelos codificadores de entropia, produzindo um fluxo de bits (bitstream) de saída totalmente embutida. Todas as implementações são feitas usando arquitetura com aritmética inteira de 16 bits. Somente adições e deslocamentos de bits são necessários, o que reduz a complexidade computacional. Mesmo com essas restrições, um bom desempenho em termos de taxa de bits versus distorção pôde ser obtido e os tempos de codificação são significativamente menores (em torno de 160 vezes) quando comparados ao padrão H.264/AVC. A segunda contribuição é a otimização de uma recente abordagem proposta para codificação de vídeo de múltiplas vistas em aplicações de video-conferência e outras aplicações do tipo "unicast" similares. O cenário alvo nessa abordagem é fornecer vídeo com percepção real em 3-D e ponto de vista livre a boas taxas de compressão. Para atingir tal objetivo, pesos são atribuídos a cada vista e mapeados em parâmetros de quantização. Neste trabalho, o mapeamento ad-hoc anteriormente proposto entre pesos e parâmetros de quantização é mostrado ser quase-ótimo para uma fonte Gaussiana e um mapeamento ótimo é derivado para fonte típicas de vídeo. A terceira contribuição explora várias estratégias para varredura adaptativa dos coeficientes da transformada no padrão JPEG XR. A ordem de varredura original, global e adaptativa do JPEG XR é comparada com os métodos de varredura localizados e híbridos propostos neste trabalho. Essas novas ordens não requerem mudanças nem nos outros estágios de codificação e decodificação, nem na definição da bitstream A quarta e última contribuição propõe uma transformada por blocos dependente do sinal. As transformadas hierárquicas usualmente exploram a informação residual entre os níveis no estágio da codificação de entropia, mas não no estágio da transformada. A transformada proposta neste trabalho é uma técnica de compactação de energia que também explora as similaridades estruturais entre os níveis de resolução. A idéia central da técnica é incluir na transformada hierárquica um número de funções de base adaptativas derivadas da resolução menor do sinal. Um codificador de imagens completo foi desenvolvido para medir o desempenho da nova transformada e os resultados obtidos são discutidos neste trabalhoAbstract: The image and video coding community has often been working on new advances that go beyond traditional image and video architectures. This work is a set of contributions to various topics that have received increasing attention from researchers in the community, namely, scalable coding, low-complexity coding for portable devices, multiview video coding and run-time adaptive coding. The first contribution studies the performance of three fast block-based 3-D transforms in a low complexity video codec. The codec has received the name Fast Embedded Video Codec (FEVC). New implementation methods and scanning orders are proposed for the transforms. The 3-D coefficients are encoded bit-plane by bit-plane by entropy coders, producing a fully embedded output bitstream. All implementation is performed using 16-bit integer arithmetic. Only additions and bit shifts are necessary, thus lowering computational complexity. Even with these constraints, reasonable rate versus distortion performance can be achieved and the encoding time is significantly smaller (around 160 times) when compared to the H.264/AVC standard. The second contribution is the optimization of a recent approach proposed for multiview video coding in videoconferencing applications or other similar unicast-like applications. The target scenario in this approach is providing realistic 3-D video with free viewpoint video at good compression rates. To achieve such an objective, weights are computed for each view and mapped into quantization parameters. In this work, the previously proposed ad-hoc mapping between weights and quantization parameters is shown to be quasi-optimum for a Gaussian source and an optimum mapping is derived for a typical video source. The third contribution exploits several strategies for adaptive scanning of transform coefficients in the JPEG XR standard. The original global adaptive scanning order applied in JPEG XR is compared with the localized and hybrid scanning methods proposed in this work. These new orders do not require changes in either the other coding and decoding stages or in the bitstream definition. The fourth and last contribution proposes an hierarchical signal dependent block-based transform. Hierarchical transforms usually exploit the residual cross-level information at the entropy coding step, but not at the transform step. The transform proposed in this work is an energy compaction technique that can also exploit these cross-resolution-level structural similarities. The core idea of the technique is to include in the hierarchical transform a number of adaptive basis functions derived from the lower resolution of the signal. A full image codec is developed in order to measure the performance of the new transform and the obtained results are discussed in this workDoutoradoTelecomunicações e TelemáticaDoutor em Engenharia Elétric

    Machine Learning for Informed Representation Learning

    Get PDF
    The way we view reality and reason about the processes surrounding us is intimately connected to our perception and the representations we form about our observations and experiences. The popularity of machine learning and deep learning techniques in that regard stems from their ability to form useful representations by learning from large sets of observations. Typical application examples include image recognition or language processing for which artificial neural networks are powerful tools to extract regularity patterns or relevant statistics. In this thesis, we leverage and further develop this representation learning capability to address relevant but challenging real-world problems in geoscience and chemistry, to learn representations in an informed manner relevant to the task at hand, and reason about representation learning in neural networks, in general. Firstly, we develop an approach for efficient and scalable semantic segmentation of degraded soil in alpine grasslands in remotely-sensed images based on convolutional neural networks. To this end, we consider different grassland erosion phenomena in several Swiss valleys. We find that we are able to monitor soil degradation consistent with state-of-the-art methods in geoscience and can improve detection of affected areas. Furthermore, our approach provides a scalable method for large-scale analysis which is infeasible with established methods. Secondly, we address the question of how to identify suitable latent representations to enable generation of novel objects with selected properties. For this, we introduce a new deep generative model in the context of manifold learning and disentanglement. Our model improves targeted generation of novel objects by making use of property cycle consistency in property-relevant and property-invariant latent subspaces. We demonstrate the improvements on the generation of molecules with desired physical or chemical properties. Furthermore, we show that our model facilitates interpretability and exploration of the latent representation. Thirdly, in the context of recent advances in deep learning theory and the neural tangent kernel, we empirically investigate the learning of feature representations in standard convolutional neural networks and corresponding random feature models given by the linearisation of the neural networks. We find that performance differences between standard and linearised networks generally increase with the difficulty of the task but decrease with the considered width or over-parametrisation of these networks. Our results indicate interesting implications for feature learning and random feature models as well as the generalisation performance of highly over-parametrised neural networks. In summary, we employ and study feature learning in neural networks and review how we may use informed representation learning for challenging tasks

    Music Synchronization, Audio Matching, Pattern Detection, and User Interfaces for a Digital Music Library System

    Get PDF
    Over the last two decades, growing efforts to digitize our cultural heritage could be observed. Most of these digitization initiatives pursuit either one or both of the following goals: to conserve the documents - especially those threatened by decay - and to provide remote access on a grand scale. For music documents these trends are observable as well, and by now several digital music libraries are in existence. An important characteristic of these music libraries is an inherent multimodality resulting from the large variety of available digital music representations, such as scanned score, symbolic score, audio recordings, and videos. In addition, for each piece of music there exists not only one document of each type, but many. Considering and exploiting this multimodality and multiplicity, the DFG-funded digital library initiative PROBADO MUSIC aimed at developing a novel user-friendly interface for content-based retrieval, document access, navigation, and browsing in large music collections. The implementation of such a front end requires the multimodal linking and indexing of the music documents during preprocessing. As the considered music collections can be very large, the automated or at least semi-automated calculation of these structures would be recommendable. The field of music information retrieval (MIR) is particularly concerned with the development of suitable procedures, and it was the goal of PROBADO MUSIC to include existing and newly developed MIR techniques to realize the envisioned digital music library system. In this context, the present thesis discusses the following three MIR tasks: music synchronization, audio matching, and pattern detection. We are going to identify particular issues in these fields and provide algorithmic solutions as well as prototypical implementations. In Music synchronization, for each position in one representation of a piece of music the corresponding position in another representation is calculated. This thesis focuses on the task of aligning scanned score pages of orchestral music with audio recordings. Here, a previously unconsidered piece of information is the textual specification of transposing instruments provided in the score. Our evaluations show that the neglect of such information can result in a measurable loss of synchronization accuracy. Therefore, we propose an OCR-based approach for detecting and interpreting the transposition information in orchestral scores. For a given audio snippet, audio matching methods automatically calculate all musically similar excerpts within a collection of audio recordings. In this context, subsequence dynamic time warping (SSDTW) is a well-established approach as it allows for local and global tempo variations between the query and the retrieved matches. Moving to real-life digital music libraries with larger audio collections, however, the quadratic runtime of SSDTW results in untenable response times. To improve on the response time, this thesis introduces a novel index-based approach to SSDTW-based audio matching. We combine the idea of inverted file lists introduced by Kurth and Müller (Efficient index-based audio matching, 2008) with the shingling techniques often used in the audio identification scenario. In pattern detection, all repeating patterns within one piece of music are determined. Usually, pattern detection operates on symbolic score documents and is often used in the context of computer-aided motivic analysis. Envisioned as a new feature of the PROBADO MUSIC system, this thesis proposes a string-based approach to pattern detection and a novel interactive front end for result visualization and analysis

    Non-disruptive use of light fields in image and video processing

    Get PDF
    In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-toreality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.Im Zeitalter der computergestützten Bildgebung erfassen Kameras nicht mehr nur ein Bild, sondern vielmehr auch Daten. Diese erfassten Zusatzdaten lassen sich optimal für fotorealistische Renderings nutzen und erlauben zahlreiche Nachbearbeitungsmöglichkeiten, wie Perspektivwechsel, Tiefenskalierung, digitale Nachfokussierung, 3D-Rekonstruktion und vieles mehr. In der computergestützten Fotografie erfasst die Lichtfeld-Abbildungstechnologie die vollständige volumetrische Information einer Szene. Diese Technologie bietet dabei das größte Potenzial, immersive Erlebnisse zu mehr Realitätsnähe zu beschleunigen. Deshalb gewinnt sie sowohl im kommerziellen Sektor als auch im Forschungsbereich zunehmend an Bedeutung. Aufgrund fehlender Kompressions- und Speicherformate sowie der Inkompatibilität derWerkzeuge zur Verarbeitung und Freigabe der Daten, wird das Potenzial der Lichtfelder nicht voll ausgeschöpft. Diese Dissertation ermöglicht die Integration von Lichtfelddaten in die Bild- und Videoverarbeitung. Hierzu wird die Darstellung von Lichtfeldern mit Hilfe von fortschrittlichen für 2D-Bilder entwickelten Dateiformaten erarbeitet, um die Wiederverwendbarkeit von Assets- Dateien und die Kompatibilität zwischen Anwendungen und Geräten zu erleichtern. Die neuartige 5D-Lichtfeldaufnahme und die aktuelle Forschung an Kompressions-Rahmenbedingungen werden vorgestellt. Es werden zudem verschiedene Techniken für eine optimierte Sequenzierung von Lichtfelddaten vorgeschlagen. Da Lichtfelder die vollständige 3D-Information einer Szene beinhalten, wird eine große Menge an Daten, die in hohem Maße redundant sind, erfasst. Die hier vorgeschlagenen Ansätze zur Datenvorverarbeitung erreichen dabei eine ausgezeichnete Komprimierleistung
    corecore