279 research outputs found

    Efficient Approaches for Voice Change and Voice Conversion Systems

    Get PDF
    In this thesis, the study and design of Voice Change and Voice Conversion systems are presented. Particularly, a voice change system manipulates a speaker’s voice to be perceived as it is not spoken by this speaker; and voice conversion system modifies a speaker’s voice, such that it is perceived as being spoken by a target speaker. This thesis mainly includes two sub-parts. The first part is to develop a low latency and low complexity voice change system (i.e. includes frequency/pitch scale modification and formant scale modification algorithms), which can be executed on the smartphones in 2012 with very limited computational capability. Although some low-complexity voice change algorithms have been proposed and studied, the real-time implementations are very rare. According to the experimental results, the proposed voice change system achieves the same quality as the baseline approach but requires much less computational complexity and satisfies the requirement of real-time. Moreover, the proposed system has been implemented in C language and was released as a commercial software application. The second part of this thesis is to investigate a novel low-complexity voice conversion system (i.e. from a source speaker A to a target speaker B) that improves the perceptual quality and identity without introducing large processing latencies. The proposed scheme directly manipulates the spectrum using an effective and physically motivated method – Continuous Frequency Warping and Magnitude Scaling (CFWMS) to guarantee high perceptual naturalness and quality. In addition, a trajectory limitation strategy is proposed to prevent the frame-by-frame discontinuity to further enhance the speech quality. The experimental results show that the proposed method outperforms the conventional baseline solutions in terms of either objective tests or subjective tests

    オートエンコーダを利用した任意話者の声質変換手法の提案

    Get PDF
     声質変換は,入力音声を目的話者の声質に変換する技術である.声質変換手法として,従来はGaussian Mixture Model(GMM)を用いた手法がよく用いられていたが,近年のDeep Learning に関する技術の台頭により,Deep Neural Network(DNN)を用いた声質手法が注目されている.しかし,GMM やDNN を用いた手法の多くは一対一の声質変換手法を提案しており,任意話者の入力に対応した研究は少なく,従来の任意話者の声質変換手法は,一対一声質変換と比べ変換精度が劣ってしまうという問題がある.また,従来のDNN を用いた声質変換手法では,一対一変換および多対一変換において複雑なネットワークを用いるため,多くの訓練データが必要となり,かつ変換に要する時間が長くなるという問題がある. 本研究では,これらの問題を解決するため,オートエンコーダおよびスパースオートエンコーダを用いた声質変換手法を提案する.提案手法では,オートエンコーダで次元圧縮した高次特徴量を目的話者の高次特徴量へDNN で変換し,目的話者のオートエンコーダを用いて音響特徴量に復元する.評価実験では,提案手法と従来手法を比較し,オートエンコーダを用いた手法は従来手法よりも若干高い精度でスペクトル変換を行い,変換時間を短縮することができた.スパースオートエンコーダを用いた手法では,オートエンコーダを用いた提案手法と比べ,スペクトル変換精度の向上および変換した音声の自然性を改善し,任意話者の声質変換精度を向上させることができた.電気通信大学201

    Rekonstruktion, Analyse und Editierung dynamisch deformierter 3D-Oberflächen

    Get PDF
    Dynamically deforming 3D surfaces play a major role in computer graphics. However, producing time-varying dynamic geometry at ever increasing detail is a time-consuming and costly process, and so a recent trend is to capture geometry data directly from the real world. In the first part of this thesis, I propose novel approaches for this research area. These approaches capture dense dynamic 3D surfaces from multi-camera systems in a particularly robust and accurate way. This provides highly realistic dynamic surface models for phenomena like moving garments and bulging muscles. However, re-using, editing, or otherwise analyzing dynamic 3D surface data is not yet conveniently possible. To close this gap, the second part of this dissertation develops novel data-driven modeling and animation approaches. I first show a supervised data-driven approach for modeling human muscle deformations that scales to huge datasets and provides fine-scale, anatomically realistic deformations at high quality not attainable by previous methods. I then extend data-driven modeling to the unsupervised case, providing editing tools for a wider set of input data ranging from facial performance capture and full-body motion to muscle and cloth deformation. To this end, I introduce the concepts of sparsity and locality within a mathematical optimization framework. I also explore these concepts for constructing shape-aware functions that are useful for static geometry processing, registration, and localized editing.Dynamisch deformierbare 3D-Oberflächen spielen in der Computergrafik eine zentrale Rolle. Die Erstellung der für Computergrafik-Anwendungen benötigten, hochaufgelösten und zeitlich veränderlichen Oberflächengeometrien ist allerdings äußerst arbeitsintensiv. Aus dieser Problematik heraus hat sich der Trend entwickelt, Oberflächendaten direkt aus Aufnahmen der echten Welt zu erfassen. Dazu nötige 3D-Rekonstruktionsverfahren werden im ersten Teil der Arbeit entwickelt. Die vorgestellten, neuartigen Verfahren erlauben die Erfassung dynamischer 3D-Oberflächen aus Mehrkamera-Aufnahmen bei hoher Verlässlichkeit und Präzision. Auf diese Weise können detaillierte Oberflächenmodelle von Phänomenen wie in Bewegung befindliche Kleidung oder sich anspannende Muskeln erfasst werden. Aber auch die Wiederverwendung, Bearbeitung und Analyse derlei gewonnener 3D-Oberflächendaten ist aktuell noch nicht auf eine einfache Art und Weise möglich. Um diese Lücke zu schließen beschäftigt sich der zweite Teil der Arbeit mit der datengetriebenen Modellierung und Animation. Zunächst wird ein Ansatz für das überwachte Lernen menschlicher Muskel-Deformationen vorgestellt. Dieses neuartige Verfahren ermöglicht eine datengetriebene Modellierung mit besonders umfangreichen Datensätzen und liefert anatomisch-realistische Deformationseffekte. Es übertrifft damit die Genauigkeit früherer Methoden. Im nächsten Teil beschäftigt sich die Dissertation mit dem unüberwachten Lernen aus 3D-Oberflächendaten. Es werden neuartige Werkzeuge vorgestellt, die eine weitreichende Menge an Eingabedaten verarbeiten können, von aufgenommenen Gesichtsanimationen über Ganzkörperbewegungen bis hin zu Muskel- und Kleidungsdeformationen. Um diese Anwendungsbreite zu erreichen stützt sich die Arbeit auf die allgemeinen Konzepte der Spärlichkeit und Lokalität und bettet diese in einen mathematischen Optimierungsansatz ein. Abschließend zeigt die vorliegende Arbeit, wie diese Konzepte auch für die Konstruktion von oberflächen-adaptiven Basisfunktionen übertragen werden können. Dadurch können Anwendungen für die Verarbeitung, Registrierung und Bearbeitung statischer Oberflächenmodelle erschlossen werden

    Computational and Statistical Aspects of High-Dimensional Structured Estimation

    Get PDF
    University of Minnesota Ph.D. dissertation. May 2018. Major: Computer Science. Advisor: Arindam Banerjee. 1 computer file (PDF); xiii, 256 pages.Modern statistical learning often faces high-dimensional data, for which the number of features that should be considered is very large. In consideration of various constraints encountered in data collection, such as cost and time, however, the available samples for applications in certain domains are of small size compared with the feature sets. In this scenario, statistical estimation becomes much more challenging than in the large-sample regime. Since the information revealed by small samples is inadequate for finding the optimal model parameters, the estimator may end up with incorrect models that appear to fit the observed data but fail to generalize to unseen ones. Owning to the prior knowledge about the underlying parameters, additional structures can be imposed to effectively reduce the parameter space, in which it is easier to identify the true one with limited data. This simple idea has inspired the study of high-dimensional statistics since its inception. Over the last two decades, sparsity has been one of the most popular structures to exploit when we estimate a high-dimensional parameter, which assumes that the number of nonzero elements in parameter vector/matrix is much smaller than its ambient dimension. For simple scenarios such as linear models, L1-norm based convex estimators like Lasso and Dantzig selector, have been widely used to find the true parameter with reasonable amount of computation and provably small error. Recent years have also seen a variety of structures proposed beyond sparsity, e.g., group sparsity and low-rankness of matrix, which are demonstrated to be useful in many applications. On the other hand, the aforementioned estimators can be extended to leverage new types of structures by finding appropriate convex surrogates like the L1 norm for sparsity. Despite their success on individual structures, current developments towards a unified understanding of various structures are still incomplete in both computational and statistical aspects. Moreover, due to the nature of the model or the parameter structure, the associated estimator can be inherently non-convex, which may need additional care when we consider such unification of different structures. In this thesis, we aim to make progress towards a unified framework for the estimation with general structures, by studying the high-dimensional structured linear model and other semi-parametric and non-convex extensions. In particular, we introduce the generalized Dantzig selector (GDS), which extends the original Dantzig selector for sparse linear models. For the computational aspect, we develop an efficient optimization algorithm to compute the GDS. On statistical side, we establish the recovery guarantees of GDS using certain geometric measures. Then we demonstrate that those geometric measures can be bounded by utilizing simple information of the structures. These results on GDS have been extended to the matrix setting as well. Apart from the linear model, we also investigate one of its semi-parametric extension -- the single-index model (SIM). To estimate the true parameter, we incorporate its structure into two types of simple estimators, whose estimation error can be established using similar geometric measures. Besides we also design a new semi-parametric model called sparse linear isotonic model (SLIM), for which we provide an efficient estimation algorithm along with its statistical guarantees. Lastly, we consider the non-convex estimation for structured multi-response linear models. We propose an alternating estimation procedure to estimate the parameters. In spite of dealing with non-convexity, we show that the statistical guarantees for general structures can be also summarized by the geometric measures

    Parameter recovery for transient signals

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 115-118).Transient signals naturally arise in numerous disciplines for which the decay rates and amplitudes carry some informational significance. Even when the decay rates are known, solving for the amplitudes results in an ill-conditioned formulation. Transient signals in the presence of noise are further complicated as the signal-to-noise ratio asymptotically decreases in time. In this thesis the Discrete-Time Transient Transform and the Discrete Transient Transform are defined in order to represent a general signal using a linear combination of decaying exponential signals. A common approach to computing a change of basis is to make use of the dual basis. Two algorithms are proposed for generating a dual basis: the first algorithm is specific to a general exponential basis, e.g., real exponential or harmonically related complex exponential bases are special cases of the general exponential basis, while the second algorithm is usable for any general basis. Several properties of a transient domain representation are discussed. Algorithms for computing numerically stable approximate transient spectra are additionally proposed. The inherent infinite bandwidth of a continuous-time transient signal motivates in part the development of a framework for recovering the decay rates and amplitudes of a discrete-time lowpass filtered transient signal. This framework takes advantage of existing parameter modeling, identification, and recovery techniques to determine the decay rates while an alternating projection method utilizing the Discrete Transient Transform determines the amplitudes.by Tarek A. Lahlou.S.M

    Design of large polyphase filters in the Quadratic Residue Number System

    Full text link

    Temperature aware power optimization for multicore floating-point units

    Full text link

    Fundamentals

    Get PDF
    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    NASA Tech Briefs, December 1988

    Get PDF
    This month's technical section includes forecasts for 1989 and beyond by NASA experts in the following fields: Integrated Circuits; Communications; Computational Fluid Dynamics; Ceramics; Image Processing; Sensors; Dynamic Power; Superconductivity; Artificial Intelligence; and Flow Cytometry. The quotes provide a brief overview of emerging trends, and describe inventions and innovations being developed by NASA, other government agencies, and private industry that could make a significant impact in coming years. A second bonus feature in this month's issue is the expanded subject index that begins on page 98. The index contains cross-referenced listings for all technical briefs appearing in NASA Tech Briefs during 1988
    corecore