45 research outputs found

    Inpainting of Missing Audio Signal Samples

    Get PDF
    V oblasti zpracování signálů se v současné době čím dál více využívají tzv. řídké reprezentace signálů, tzn. že daný signál je možné vyjádřit přesně či velmi dobře aproximovat lineární kombinací velmi malého počtu vektorů ze zvoleného reprezentačního systému. Tato práce se zabývá využitím řídkých reprezentací pro rekonstrukci poškozených zvukových záznamů, ať už historických nebo nově vzniklých. Především historické zvukové nahrávky trpí zarušením jako praskání nebo šum. Krátkodobé poškození zvukových nahrávek bylo doposud řešeno interpolačními technikami, zejména pomocí autoregresního modelování. V nedávné době byl představen algoritmus s názvem Audio Inpainting, který řeší doplňování chybějících vzorků ve zvukovém signálu pomocí řídkých reprezentací. Zmíněný algoritmus využívá tzv. hladové algoritmy pro řešení optimalizačních úloh. Cílem této práce je porovnání dosavadních interpolačních metod s technikou Audio Inpaintingu. Navíc, k řešení optimalizačních úloh jsou využívány algoritmy založené na l1-relaxaci, a to jak ve formě analyzujícího, tak i syntetizujícího modelu. Především se jedná o proximální algoritmy. Tyto algoritmy pracují jak s jednotlivými koeficienty samostatně, tak s koeficienty v závislosti na jejich okolí, tzv. strukturovaná řídkost. Strukturovaná řídkost je dále využita taky pro odšumování zvukových nahrávek. Jednotlivé algoritmy jsou v praktické části zhodnoceny z hlediska nastavení parametrů pro optimální poměr rekonstrukce vs. výpočetní čas. Všechny algoritmy popsané v práci jsou na praktických příkladech porovnány pomocí objektivních metod odstupu signálu od šumu (SNR) a PEMO-Q. Na závěr je úspěšnost rekonstrukce poškozených zvukových signálů vyhodnocena.Recently, sparse representations of signals became very popular in the field of signal processing. Sparse representation mean that the signal is represented exactly or very well approximated by a linear combination of only a few vectors from the specific representation system. This thesis deals with the utilization of sparse representations of signals for the process of audio restoration, either historical or recent records. Primarily old audio recordings suffer from defects like crackles or noise. Until now, short gaps in audio signals were repaired by interpolation techniques, especially autoregressive modeling. Few years ago, an algorithm termed the Audio Inpainting was introduced. This algorithm solves the missing audio signal samples inpainting using sparse representations through the greedy algorithm for sparse approximation. This thesis aims to compare the state-of-the-art interpolation methods with the Audio Inpainting. Besides this, l1-relaxation methods are utilized for sparse approximation, while both analysis and synthesis models are incorporated. Algorithms used for the sparse approximation are called the proximal algorithms. These algorithms treat the coefficients either separately or with relations to their neighbourhood (structured sparsity). Further, structured sparsity is used for audio denoising. In the experimental part of the thesis, parameters of each algorithm are evaluated in terms of optimal restoration efficiency vs. processing time efficiency. All of the algorithms described in the thesis are compared using objective evaluation methods Signal-to-Noise ratio (SNR) and PEMO-Q. Finally, the overall conclusion and discussion on the restoration results is presented.

    Zero-padding Network Coding and Compressed Sensing for Optimized Packets Transmission

    Get PDF
    Ubiquitous Internet of Things (IoT) is destined to connect everybody and everything on a never-before-seen scale. Such networks, however, have to tackle the inherent issues created by the presence of very heterogeneous data transmissions over the same shared network. This very diverse communication, in turn, produces network packets of various sizes ranging from very small sensory readings to comparatively humongous video frames. Such a massive amount of data itself, as in the case of sensory networks, is also continuously captured at varying rates and contributes to increasing the load on the network itself, which could hinder transmission efficiency. However, they also open up possibilities to exploit various correlations in the transmitted data due to their sheer number. Reductions based on this also enable the networks to keep up with the new wave of big data-driven communications by simply investing in the promotion of select techniques that efficiently utilize the resources of the communication systems. One of the solutions to tackle the erroneous transmission of data employs linear coding techniques, which are ill-equipped to handle the processing of packets with differing sizes. Random Linear Network Coding (RLNC), for instance, generates unreasonable amounts of padding overhead to compensate for the different message lengths, thereby suppressing the pervasive benefits of the coding itself. We propose a set of approaches that overcome such issues, while also reducing the decoding delays at the same time. Specifically, we introduce and elaborate on the concept of macro-symbols and the design of different coding schemes. Due to the heterogeneity of the packet sizes, our progressive shortening scheme is the first RLNC-based approach that generates and recodes unequal-sized coded packets. Another of our solutions is deterministic shifting that reduces the overall number of transmitted packets. Moreover, the RaSOR scheme employs coding using XORing operations on shifted packets, without the need for coding coefficients, thus favoring linear encoding and decoding complexities. Another facet of IoT applications can be found in sensory data known to be highly correlated, where compressed sensing is a potential approach to reduce the overall transmissions. In such scenarios, network coding can also help. Our proposed joint compressed sensing and real network coding design fully exploit the correlations in cluster-based wireless sensor networks, such as the ones advocated by Industry 4.0. This design focused on performing one-step decoding to reduce the computational complexities and delays of the reconstruction process at the receiver and investigates the effectiveness of combined compressed sensing and network coding

    A Joint Traffic Flow Estimation and Prediction Approach for Urban Networks

    Get PDF
    Classical methods of traffic flow prediction with missing data are generally implemented in two sequential stages: a) imputing the missing data by certain imputation methods such as kNN, PPCA based methods etc.; b) using parametric or non-parametric methods to predict the future traffic flow with the completed data. However, the errors generated in missing data imputation stage will be accumulated into prediction stage, and thus will negatively influence the prediction performance when missing rate becomes large. To solve this problem, this paper proposes a Joint Traffic Flow Estimation and Prediction (JT-FEP) approach, which considers the missing data as additional unknown network parameters during a deep learning model training process. By updating missing data and the other network parameters via backward propagation, the model training error can generally be evenly distributed across the missing data and future data, thus reducing the error propagation. We conduct extensive experiments for two missing patterns i.e. Completely Missing at Random (CMAR) and Not Missing at Random (NMAR) with various missing rates. The experimental results demonstrate the superiority of JTFEP over existing methods

    New approaches for unsupervised transcriptomic data analysis based on Dictionary learning

    Get PDF
    The era of high-throughput data generation enables new access to biomolecular profiles and exploitation thereof. However, the analysis of such biomolecular data, for example, transcriptomic data, suffers from the so-called "curse of dimensionality". This occurs in the analysis of datasets with a significantly larger number of variables than data points. As a consequence, overfitting and unintentional learning of process-independent patterns can appear. This can lead to insignificant results in the application. A common way of counteracting this problem is the application of dimension reduction methods and subsequent analysis of the resulting low-dimensional representation that has a smaller number of variables. In this thesis, two new methods for the analysis of transcriptomic datasets are introduced and evaluated. Our methods are based on the concepts of Dictionary learning, which is an unsupervised dimension reduction approach. Unlike many dimension reduction approaches that are widely applied for transcriptomic data analysis, Dictionary learning does not impose constraints on the components that are to be derived. This allows for great flexibility when adjusting the representation to the data. Further, Dictionary learning belongs to the class of sparse methods. The result of sparse methods is a model with few non-zero coefficients, which is often preferred for its simplicity and ease of interpretation. Sparse methods exploit the fact that the analysed datasets are highly structured. Indeed, a characteristic of transcriptomic data is particularly their structuredness, which appears due to the connection of genes and pathways, for example. Nonetheless, the application of Dictionary learning in medical data analysis is mainly restricted to image analysis. Another advantage of Dictionary learning is that it is an interpretable approach. Interpretability is a necessity in biomolecular data analysis to gain a holistic understanding of the investigated processes. Our two new transcriptomic data analysis methods are each designed for one main task: (1) identification of subgroups for samples from mixed populations, and (2) temporal ordering of samples from dynamic datasets, also referred to as "pseudotime estimation". Both methods are evaluated on simulated and real-world data and compared to other methods that are widely applied in transcriptomic data analysis. Our methods convince through high performance and overall outperform the comparison methods

    Using reconfigurable computing technology to accelerate matrix decomposition and applications

    Get PDF
    Matrix decomposition plays an increasingly significant role in many scientific and engineering applications. Among numerous techniques, Singular Value Decomposition (SVD) and Eigenvalue Decomposition (EVD) are widely used as factorization tools to perform Principal Component Analysis for dimensionality reduction and pattern recognition in image processing, text mining and wireless communications, while QR Decomposition (QRD) and sparse LU Decomposition (LUD) are employed to solve the dense or sparse linear system of equations in bioinformatics, power system and computer vision. Matrix decompositions are computationally expensive and their sequential implementations often fail to meet the requirements of many time-sensitive applications. The emergence of reconfigurable computing has provided a flexible and low-cost opportunity to pursue high-performance parallel designs, and the use of FPGAs has shown promise in accelerating this class of computation. In this research, we have proposed and implemented several highly parallel FPGA-based architectures to accelerate matrix decompositions and their applications in data mining and signal processing. Specifically, in this dissertation we describe the following contributions: • We propose an efficient FPGA-based double-precision floating-point architecture for EVD, which can efficiently analyze large-scale matrices. • We implement a floating-point Hestenes-Jacobi architecture for SVD, which is capable of analyzing arbitrary sized matrices. • We introduce a novel deeply pipelined reconfigurable architecture for QRD, which can be dynamically configured to perform either Householder transformation or Givens rotation in a manner that takes advantage of the strengths of each. • We design a configurable architecture for sparse LUD that supports both symmetric and asymmetric sparse matrices with arbitrary sparsity patterns. • By further extending the proposed hardware solution for SVD, we parallelize a popular text mining tool-Latent Semantic Indexing with an FPGA-based architecture. • We present a configurable architecture to accelerate Homotopy l1-minimization, in which the modification of the proposed FPGA architecture for sparse LUD is used at its core to parallelize both Cholesky decomposition and rank-1 update. Our experimental results using an FPGA-based acceleration system indicate the efficiency of our proposed novel architectures, with application and dimension-dependent speedups over an optimized software implementation that range from 1.5ÃÂ to 43.6ÃÂ in terms of computation time

    Multi-frame reconstruction using super-resolution, inpainting, segmentation and codecs

    Get PDF
    In this thesis, different aspects of video and light field reconstruction are considered such as super-resolution, inpainting, segmentation and codecs. For this purpose, each of these strategies are analyzed based on a specific goal and a specific database. Accordingly, databases which are relevant to film industry, sport videos, light fields and hyperspectral videos are used for the sake of improvement. This thesis is constructed around six related manuscripts, in which several approaches are proposed for multi-frame reconstruction. Initially, a novel multi-frame reconstruction strategy is proposed for lightfield super-resolution in which graph-based regularization is applied along with edge preserving filtering for improving the spatio-angular quality of lightfield. Second, a novel video reconstruction is proposed which is built based on compressive sensing (CS), Gaussian mixture models (GMM) and sparse 3D transform-domain block matching. The motivation of the proposed technique is the improvement in visual quality performance of the video frames and decreasing the reconstruction error in comparison with the former video reconstruction methods. In the next approach, student-t mixture models and edge preserving filtering are applied for the purpose of video super-resolution. Student-t mixture model has a heavy tail which makes it robust and suitable as a video frame patch prior and rich in terms of log likelihood for information retrieval. In another approach, a hyperspectral video database is considered, and a Bayesian dictionary learning process is used for hyperspectral video super-resolution. To that end, Beta process is used in Bayesian dictionary learning and a sparse coding is generated regarding the hyperspectral video super-resolution. The spatial super-resolution is followed by a spectral video restoration strategy, and the whole process leveraged two different dictionary learnings, in which the first one is trained for spatial super-resolution and the second one is trained for the spectral restoration. Furthermore, in another approach, a novel framework is proposed for replacing advertisement contents in soccer videos in an automatic way by using deep learning strategies. For this purpose, a UNET architecture is applied (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), the unwanted content is replaced by new one using a homography mapping procedure. In addition, in another research work, a novel video compression framework is presented using autoencoder networks that encode and decode videos by using less chroma information than luma information. For this purpose, instead of converting Y'CbCr 4:2:2/4:2:0 videos to and from RGB 4:4:4, the video is kept in Y'CbCr 4:2:2/4:2:0 and merged the luma and chroma channels after the luma is downsampled to match the chroma size. An inverse function is performed for the decoder. The performance of these models is evaluated by using CPSNR, MS-SSIM, and VMAF metrics. The experiments reveal that, as compared to video compression involving conversion to and from RGB 4:4:4, the proposed method increases the video quality by about 5.5% for Y'CbCr 4:2:2 and 8.3% for Y'CbCr 4:2:0 while reducing the amount of computation by nearly 37% for Y'CbCr 4:2:2 and 40% for Y'CbCr 4:2:0. The thread that ties these approaches together is reconstruction of the video and light field frames based on different aspects of problems such as having loss of information, blur in the frames, existing noise after reconstruction, existing unpleasant content, excessive size of information and high computational overhead. In three of the proposed approaches, we have used Plug-and-Play ADMM model for the first time regarding reconstruction of videos and light fields in order to address both information retrieval in the frames and tackling noise/blur at the same time. In two of the proposed models, we applied sparse dictionary learning to reduce the data dimension and demonstrate them as an efficient linear combination of basis frame patches. Two of the proposed approaches are developed in collaboration with industry, in which deep learning frameworks are used to handle large set of features and to learn high-level features from the data
    corecore