118 research outputs found

    Predicted and perceived quality of bit-reduced gray-scale still images

    Get PDF

    Artificial Bandwidth Extension of Speech Signals using Neural Networks

    Get PDF
    Although mobile wideband telephony has been standardized for over 15 years, many countries still do not have a nationwide network with good coverage. As a result, many cellphone calls are still downgraded to narrowband telephony. The resulting loss of quality can be reduced by artificial bandwidth extension. There has been great progress in bandwidth extension in recent years due to the use of neural networks. The topic of this thesis is the enhancement of artificial bandwidth extension using neural networks. A special focus is given to hands-free calls in a car, where the risk is high that the wideband connection is lost due to the fast movement. The bandwidth of narrowband transmission is not only reduced towards higher frequencies above 3.5 kHz but also towards lower frequencies below 300 Hz. There are already methods that estimate the low-frequency components quite well, which will therefore not be covered in this thesis. In most bandwidth extension algorithms, the narrowband signal is initially separated into a spectral envelope and an excitation signal. Both parts are then extended separately in order to finally combine both parts again. While the extension of the excitation can be implemented using simple methods without reducing the speech quality compared to wideband speech, the estimation of the spectral envelope for frequencies above 3.5 kHz is not yet solved satisfyingly. Current bandwidth extension algorithms are just able to reduce the quality loss due to narrowband transmission by a maximum of 50% in most evaluations. In this work, a modification for an existing method for excitation extension is proposed which achieves slight improvements while not generating additional computational complexity. In order to enhance the wideband envelope estimation with neural networks, two modifications of the training process are proposed. On the one hand, the loss function is extended with a discriminative part to address the different characteristics of phoneme classes. On the other hand, by using a GAN (generative adversarial network) for the training phase, a second network is added temporarily to evaluate the quality of the estimation. The neural networks that were trained are compared in subjective and objective evaluations. A final listening test addressed the scenario of a hands-free call in a car, which was simulated acoustically. The quality loss caused by the missing high frequency components could be reduced by 60% with the proposed approach.Obwohl die mobile Breitbandtelefonie bereits seit über 15 Jahren standardisiert ist, gibt es oftmals noch kein flächendeckendes Netz mit einer guten Abdeckung. Das führt dazu, dass weiterhin viele Mobilfunkgespräche auf Schmalbandtelefonie heruntergestuft werden. Der damit einhergehende Qualitätsverlust kann mit künstlicher Bandbreitenerweiterung reduziert werden. Das Thema dieser Arbeit sind Methoden zur weiteren Verbesserungen der Qualität des erweiterten Sprachsignals mithilfe neuronaler Netze. Ein besonderer Fokus liegt auf der Freisprech-Telefonie im Auto, da dabei das Risiko besonders hoch ist, dass durch die schnelle Fortbewegung die Breitbandverbindung verloren geht. Bei der Schmalbandübertragung fehlen neben den hochfrequenten Anteilen (etwa 3.5–7 kHz) auch tiefe Frequenzen unterhalb von etwa 300 Hz. Diese tieffrequenten Anteile können mit bereits vorhandenen Methoden gut geschätzt werden und sind somit nicht Teil dieser Arbeit. In vielen Algorithmen zur Bandbreitenerweiterung wird das Schmalbandsignal zu Beginn in eine spektrale Einhüllende und ein Anregungssignal aufgeteilt. Beide Anteile werden dann separat erweitert und schließlich wieder zusammengeführt. Während die Erweiterung der Anregung nahezu ohne Qualitätsverlust durch einfache Methoden umgesetzt werden kann ist die Schätzung der spektralen Einhüllenden für Frequenzen über 3.5 kHz noch nicht zufriedenstellend gelöst. Mit aktuellen Methoden können im besten Fall nur etwa 50% der durch Schmalbandübertragung reduzierten Qualität zurückgewonnen werden. Für die Anregungserweiterung wird in dieser Arbeit eine Variation vorgestellt, die leichte Verbesserungen erzielt ohne dabei einen Mehraufwand in der Berechnung zu erzeugen. Für die Schätzung der Einhüllenden des Breitbandsignals mithilfe neuronaler Netze werden zwei Änderungen am Trainingsprozess vorgeschlagen. Einerseits wird die Kostenfunktion um einen diskriminativen Anteil erweitert, der das Netz besser zwischen verschiedenen Phonemen unterscheiden lässt. Andererseits wird als Architektur ein GAN (Generative adversarial network) verwendet, wofür in der Trainingsphase ein zweites Netz verwendet wird, das die Qualität der Schätzung bewertet. Die trainierten neuronale Netze wurden in subjektiven und objektiven Tests verglichen. Ein abschließender Hörtest diente zur Evaluierung des Freisprechens im Auto, welches akustisch simuliert wurde. Der Qualitätsverlust durch Wegfallen der hohen Frequenzanteile konnte dabei mit dem vorgeschlagenen Ansatz um etwa 60% reduziert werden

    GRACE: Loss-Resilient Real-Time Video through Neural Codecs

    Full text link
    In real-time video communication, retransmitting lost packets over high-latency networks is not viable due to strict latency requirements. To counter packet losses without retransmission, two primary strategies are employed -- encoder-based forward error correction (FEC) and decoder-based error concealment. The former encodes data with redundancy before transmission, yet determining the optimal redundancy level in advance proves challenging. The latter reconstructs video from partially received frames, but dividing a frame into independently coded partitions inherently compromises compression efficiency, and the lost information cannot be effectively recovered by the decoder without adapting the encoder. We present a loss-resilient real-time video system called GRACE, which preserves the user's quality of experience (QoE) across a wide range of packet losses through a new neural video codec. Central to GRACE's enhanced loss resilience is its joint training of the neural encoder and decoder under a spectrum of simulated packet losses. In lossless scenarios, GRACE achieves video quality on par with conventional codecs (e.g., H.265). As the loss rate escalates, GRACE exhibits a more graceful, less pronounced decline in quality, consistently outperforming other loss-resilient schemes. Through extensive evaluation on various videos and real network traces, we demonstrate that GRACE reduces undecodable frames by 95% and stall duration by 90% compared with FEC, while markedly boosting video quality over error concealment methods. In a user study with 240 crowdsourced participants and 960 subjective ratings, GRACE registers a 38% higher mean opinion score (MOS) than other baselines

    Learning by correlation for computer vision applications: from Kernel methods to deep learning

    Get PDF
    Learning to spot analogies and differences within/across visual categories is an arguably powerful approach in machine learning and pattern recognition which is directly inspired by human cognition. In this thesis, we investigate a variety of approaches which are primarily driven by correlation and tackle several computer vision applications

    S.R. Ranganathan's Ontology of the Book: On a Bibliographical Conceptual Model avant la lettre

    Get PDF
    This paper examines a conceptual model of the book advanced in the mid-20th century by the eminent Indian librarian and classification theorist S.R. Ranganathan (1892-1972), who formulated it with the aid of an ontological model drawn from Hindu philosophical thought. The analysis of this model, which has hitherto received only sporadic discussion in KO literature, unfolds in three parts. First, the paper outlines Ranganathan’s model, explains its Hindu philosophical background, and traces its development, showing that, in fact, it comprised two distinct versions – a triadic (i.e., three-entity) and a dyadic (i.e., two-entity) one – which were fully compatible to one another and which Ranganathan used in different contexts. Next, the structure of Ranganathan’s model, in both its triadic and dyadic forms, is compared with those of the contemporary bibliographic conceptual models most widely used today, IFLA-LRM (and its predecessor, FRBR) and BIBFRAME. It is shown that Ranganathan’s model bears some striking resemblances to these current models: in particular, the triadic version of Ranganathan’s model shares affinities with FRBR and IFLA-LRM, while the dyadic version is closer to BIBFRAME. Then follows a discussion of significant structural divergences between Ranganathan’s model and its latter-day counterparts, and an explanation for these differences is adduced. The paper concludes with a brief consideration of the surprising lack of historical connection between Ranganathan’s conceptual model of the book avant la lettre and current bibliographic conceptual models, as well as a reflection on the enduring relevance of Ranganathan’s model for today

    The Psycho-logic of Universal Quantifiers

    Get PDF
    A universally quantified sentence like every frog is green is standardly thought to express a two-place second-order relation (e.g., the set of frogs is a subset of the set of green things). This dissertation argues that as a psychological hypothesis about how speakers mentally represent universal quantifiers, this view is wrong in two respects. First, each, every, and all are not represented as two-place relations, but as one-place descriptions of how a predicate applies to a restricted domain (e.g., relative to the frogs, everything is green). Second, while every and all are represented in a second-order way that implicates a group, each is represented in a completely first-order way that does not involve grouping the satisfiers of a predicate together (e.g., relative to individual frogs, each one is green).These “psycho-logical” distinctions have consequences for how participants evaluate sentences like every circle is green in controlled settings. In particular, participants represent the extension of the determiner’s internal argument (the cir- cles), but not the extension of its external argument (the green things). Moreover, the cognitive system they use to represent the internal argument differs depend- ing on the determiner: Given every or all, participants show signatures of forming ensemble representations, but given each, they represent individual object-files. In addition to psychosemantic evidence, the proposed representations provide explanations for at least two semantic phenomena. The first is the “conservativity” universal: All determiners allow for duplicating their first argument in their second argument without a change in informational significance (e.g., every fish swims has the same truth-conditions as every fish is a fish that swims). This is a puzzling gen- eralization if determiners express two-place relations, but it is a logical consequence if they are devices for forming one-place restricted quantifiers. The second is that every, but not each, naturally invites certain kinds of generic interpretations (e.g., gravity acts on every/#each object). This asymmetry can po- tentially be explained by details of the interfacing cognitive systems (ensemble and object-file representations). And given that the difference leads to lower-level con- comitants in child-ambient speech (as revealed by a corpus investigation), children may be able to leverage it to acquire every’s second-order meaning. This case study on the universal quantifiers suggests that knowing the meaning of a word like every consists not just in understanding the informational contribu- tion that it makes, but in representing that contribution in a particular format. And much like phonological representations provide instructions to the motor plan- ning system, it supports the idea that meaning representations provide (sometimes surprisingly precise) instructions to conceptual systems

    Modeling and Development of Iterative Reconstruction Algorithms in Emerging X-ray Imaging Technologies

    Get PDF
    Many new promising X-ray-based biomedical imaging technologies have emerged over the last two decades. Five different novel X-ray based imaging technologies are discussed in this dissertation: differential phase-contrast tomography (DPCT), grating-based phase-contrast tomography (GB-PCT), spectral-CT (K-edge imaging), cone-beam computed tomography (CBCT), and in-line X-ray phase contrast (XPC) tomosynthesis. For each imaging modality, one or more specific problems prevent them being effectively or efficiently employed in clinical applications have been discussed. Firstly, to mitigate the long data-acquisition times and large radiation doses associated with use of analytic reconstruction methods in DPCT, we analyze the numerical and statistical properties of two classes of discrete imaging models that form the basis for iterative image reconstruction. Secondly, to improve image quality in grating-based phase-contrast tomography, we incorporate 2nd order statistical properties of the object property sinograms, including correlations between them, into the formulation of an advanced multi-channel (MC) image reconstruction algorithm, which reconstructs three object properties simultaneously. We developed an advanced algorithm based on the proximal point algorithm and the augmented Lagrangian method to rapidly solve the MC reconstruction problem. Thirdly, to mitigate image artifacts that arise from reduced-view and/or noisy decomposed sinogram data in K-edge imaging, we exploited the inherent sparseness of typical K-edge objects and incorporated the statistical properties of the decomposed sinograms to formulate two penalized weighted least square problems with a total variation (TV) penalty and a weighted sum of a TV penalty and an l1-norm penalty with a wavelet sparsifying transform. We employed a fast iterative shrinkage/thresholding algorithm (FISTA) and splitting-based FISTA algorithm to solve these two PWLS problems. Fourthly, to enable advanced iterative algorithms to obtain better diagnostic images and accurate patient positioning information in image-guided radiation therapy for CBCT in a few minutes, two accelerated variants of the FISTA for PLS-based image reconstruction are proposed. The algorithm acceleration is obtained by replacing the original gradient-descent step by a sub-problem that is solved by use of the ordered subset concept (OS-SART). In addition, we also present efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units (GPUs). Finally, we employed our developed accelerated version of FISTA for dealing with the incomplete (and often noisy) data inherent to in-line XPC tomosynthesis which combines the concepts of tomosynthesis and in-line XPC imaging to utilize the advantages of both for biological imaging applications. We also investigate the depth resolution properties of XPC tomosynthesis and demonstrate that the z-resolution properties of XPC tomosynthesis is superior to that of conventional absorption-based tomosynthesis. To investigate all these proposed novel strategies and new algorithms in these different imaging modalities, we conducted computer simulation studies and real experimental data studies. The proposed reconstruction methods will facilitate the clinical or preclinical translation of these emerging imaging methods

    Null Element Restoration

    Get PDF
    Understanding the syntactic structure of a sentence is a necessary preliminary to understanding its semantics and therefore for many practical applications. The field of natural language processing has achieved a high degree of accuracy in parsing, at least in English. However, the syntactic structures produced by the most commonly used parsers are less detailed than those structures found in the treebanks the parsers were trained on. In particular, these parsers typically lack the null elements used to indicate wh-movement, control, and other phenomena. This thesis presents a system for inserting these null elements into parse trees in English. It then examines the problem in Arabic, which motivates a second, joint- inference system which has improved performance on English as well. Finally, it examines the application of information derived from the Google Web 1T corpus as a way of reducing certain data sparsity issues related to wh-movement
    • …
    corecore