1,395 research outputs found

    A Detailed Study on Aggregation Methods used in Natural Language Interface to Databases (NLIDB)

    Get PDF
    Historically, databases have been the most crucial issue in the study of information systems, and they constitute an essential part of all information management systems. Since, it complicated due to restricting the number of potential users, particularly non-expert database users who must comprehend the database structure to submit such queries. Natural language interface (NLI), the simplest method to retrieve information, is one possibility for interacting with the database. The transformation of a natural language query into a Structured Query (SQL) in a database is known as a "Natural Language Interface to Database" (NLIDB). This study uses NLIDB to handle the works performed under various aggregations with aggregation functions, a grouping phrase, and a possessing clause. This study carefully examines the numerous systematic aggregation approaches utilized in the NLIDB. This review provides extensive information about the many methods, including query-based, pattern-based, general, keyword-based NLIDB, and grammar-based systems, to extract data for a dissertation from a generic module for use in such systems that support query execution utilizing aggregations

    Deep learning-enabled framework for automatic lens design starting point generation

    Get PDF
    We present a simple, highly modular deep neural network (DNN) framework to address the problem of automatically inferring lens design starting points tailored to the desired specifications. In contrast to previous work, our model can handle various and complex lens structures suitable for real-world problems such as Cooke Triplets or Double Gauss lenses. Our successfully trained dynamic model can infer lens designs with realistic glass materials whose optical performance compares favorably to reference designs from the literature on 80 different lens structures. Using our trained model as a backbone, we make available to the community a web application that outputs a selection of varied, high-quality starting points directly from the desired specifications, which we believe will complement any lens designer’s toolbox

    FullExpression - Emotion Recognition Software

    Get PDF
    During human evolution emotion expression became an important social tool that contributed to the complexification of societies. Human-computer interaction is commonly present in our daily life, and the industry is struggling for solutions that can analyze human emotions, in an attempt to provide better experiences. The purpose of this study was to understand if a software built using the transfer-learning technique on a deep learning model was capable of classifying human emotions, through facial expression analysis. A Convolutional Neuronal Network model was trained and used in a web application, which is available online. Several tools were created to facilitate the software development process, including the training and validation processes, and these are also available online. The data was collected after the combination of several facial expression emotion databases, such as KDEF_AKDEF, TFEID, Face_Place and jaffe. Software evaluation reveled an accuracy in identifying the correct emotions close to 80%. In addition, a comparison between the software and preliminary data from human’s performance, on recognizing facial expressed emotions, suggested that the software performed better. This work can be useful in many different domains such as marketing (to understand the effect of marketing campaigns on people’s emotional states), health (to help mental diseases diagnosis) and industry 4.0 (to create a better collaborating environment between humans and machines).Durante a evolução da espécie humana, a expressões de emoções tornou-se uma ferramenta social importante, que permitiu a criação de sociedades cada vez mais complexas. A interação entre humanos e máquinas acontece regularmente, evidenciando a necessidade da indústria desenvolver soluções que possam analisar emoções, de modo a proporcionar melhores experiências aos utilizadores. O propósito deste trabalho foi perceber se soluções de software desenvolvidas a partir da técnica de transfer-learning são capazes de classificar emoções humanas, a partir da análise de expressões faciais. Um modelo que implementa a arquitetura Convolutional Neuronal Network foi escolhido para ser treinado e utilizado na aplicação web desenvolvida neste trabalho, que está disponível online. A par da aplicação web, diferentes ferramentas foram criadas de forma a facilitar o processo de criação e avaliação de modelos Deep Learning, e estas também estão disponíveis online. Os dados foram recolhidos após a combinação de várias bases de dados de expressões de emoções (KDEF_AKDEF, TFEID, Face_Place and jaffe). A avaliação do software demostrou uma precisão na classificação de emoções próxima dos 80%. Para além disso, uma comparação entre o software e dados preliminares relativos ao reconhecimento de emoções por pessoas sugere que o software é melhor a classificar emoções. Os resultados deste trabalho podem aplicados em diversas áreas, como a publicidade (de forma a perceber os efeitos das campanhas no estado emocional das pessoas), a saúde (para um melhor diagnóstico de doenças mentais) e na indústria 4.0 (de forma a criar um melhor ambiente de colaboração entre humanos e máquinas)

    Efficient, end-to-end and self-supervised methods for speech processing and generation

    Get PDF
    Deep learning has affected the speech processing and generation fields in many directions. First, end-to-end architectures allow the direct injection and synthesis of waveform samples. Secondly, the exploration of efficient solutions allow to implement these systems in computationally restricted environments, like smartphones. Finally, the latest trends exploit audio-visual data with least supervision. In this thesis these three directions are explored. Firstly, we propose the use of recent pseudo-recurrent structures, like self-attention models and quasi-recurrent networks, to build acoustic models for text-to-speech. The proposed system, QLAD, turns out to synthesize faster on CPU and GPU than its recurrent counterpart whilst preserving the good synthesis quality level, which is competitive with state of the art vocoder-based models. Then, a generative adversarial network is proposed for speech enhancement, named SEGAN. This model works as a speech-to-speech conversion system in time-domain, where a single inference operation is needed for all samples to operate through a fully convolutional structure. This implies an increment in modeling efficiency with respect to other existing models, which are auto-regressive and also work in time-domain. SEGAN achieves prominent results in noise supression and preservation of speech naturalness and intelligibility when compared to the other classic and deep regression based systems. We also show that SEGAN is efficient in transferring its operations to new languages and noises. A SEGAN trained for English performs similarly to this language on Catalan and Korean with only 24 seconds of adaptation data. Finally, we unveil the generative capacity of the model to recover signals from several distortions. We hence propose the concept of generalized speech enhancement. First, the model proofs to be effective to recover voiced speech from whispered one. Then the model is scaled up to solve other distortions that require a recomposition of damaged parts of the signal, like extending the bandwidth or recovering lost temporal sections, among others. The model improves by including additional acoustic losses in a multi-task setup to impose a relevant perceptual weighting on the generated result. Moreover, a two-step training schedule is also proposed to stabilize the adversarial training after the addition of such losses, and both components boost SEGAN's performance across distortions.Finally, we propose a problem-agnostic speech encoder, named PASE, together with the framework to train it. PASE is a fully convolutional network that yields compact representations from speech waveforms. These representations contain abstract information like the speaker identity, the prosodic features or the spoken contents. A self-supervised framework is also proposed to train this encoder, which suposes a new step towards unsupervised learning for speech processing. Once the encoder is trained, it can be exported to solve different tasks that require speech as input. We first explore the performance of PASE codes to solve speaker recognition, emotion recognition and speech recognition. PASE works competitively well compared to well-designed classic features in these tasks, specially after some supervised adaptation. Finally, PASE also provides good descriptors of identity for multi-speaker modeling in text-to-speech, which is advantageous to model novel identities without retraining the model.L'aprenentatge profund ha afectat els camps de processament i generació de la parla en vàries direccions. Primer, les arquitectures fi-a-fi permeten la injecció i síntesi de mostres temporals directament. D'altra banda, amb l'exploració de solucions eficients permet l'aplicació d'aquests sistemes en entorns de computació restringida, com els telèfons intel·ligents. Finalment, les darreres tendències exploren les dades d'àudio i veu per derivar-ne representacions amb la mínima supervisió. En aquesta tesi precisament s'exploren aquestes tres direccions. Primer de tot, es proposa l'ús d'estructures pseudo-recurrents recents, com els models d’auto atenció i les xarxes quasi-recurrents, per a construir models acústics text-a-veu. Així, el sistema QLAD proposat en aquest treball sintetitza més ràpid en CPU i GPU que el seu homòleg recurrent, preservant el mateix nivell de qualitat de síntesi, competitiu amb l'estat de l'art en models basats en vocoder. A continuació es proposa un model de xarxa adversària generativa per a millora de veu, anomenat SEGAN. Aquest model fa conversions de veu-a-veu en temps amb una sola operació d'inferència sobre una estructura purament convolucional. Això implica un increment en l'eficiència respecte altres models existents auto regressius i que també treballen en el domini temporal. La SEGAN aconsegueix resultats prominents d'extracció de soroll i preservació de la naturalitat i la intel·ligibilitat de la veu comparat amb altres sistemes clàssics i models regressius basats en xarxes neuronals profundes en espectre. També es demostra que la SEGAN és eficient transferint les seves operacions a nous llenguatges i sorolls. Així, un model SEGAN entrenat en Anglès aconsegueix un rendiment comparable a aquesta llengua quan el transferim al català o al coreà amb només 24 segons de dades d'adaptació. Finalment, explorem l'ús de tota la capacitat generativa del model i l’apliquem a recuperació de senyals de veu malmeses per vàries distorsions severes. Això ho anomenem millora de la parla generalitzada. Primer, el model demostra ser efectiu per a la tasca de recuperació de senyal sonoritzat a partir de senyal xiuxiuejat. Posteriorment, el model escala a poder resoldre altres distorsions que requereixen una reconstrucció de parts del senyal que s’han malmès, com extensió d’ample de banda i recuperació de seccions temporals perdudes, entre d’altres. En aquesta última aplicació del model, el fet d’incloure funcions de pèrdua acústicament rellevants incrementa la naturalitat del resultat final, en una estructura multi-tasca que prediu característiques acústiques a la sortida de la xarxa discriminadora de la nostra GAN. També es proposa fer un entrenament en dues etapes del sistema SEGAN, el qual mostra un increment significatiu de l’equilibri en la sinèrgia adversària i la qualitat generada finalment després d’afegir les funcions acústiques. Finalment, proposem un codificador de veu agnòstic al problema, anomenat PASE, juntament amb el conjunt d’eines per entrenar-lo. El PASE és un sistema purament convolucional que crea representacions compactes de trames de veu. Aquestes representacions contenen informació abstracta com identitat del parlant, les característiques prosòdiques i els continguts lingüístics. També es proposa un entorn auto-supervisat multi-tasca per tal d’entrenar aquest sistema, el qual suposa un avenç en el terreny de l’aprenentatge no supervisat en l’àmbit del processament de la parla. Una vegada el codificador esta entrenat, es pot exportar per a solventar diferents tasques que requereixin tenir senyals de veu a l’entrada. Primer explorem el rendiment d’aquest codificador per a solventar tasques de reconeixement del parlant, de l’emoció i de la parla, mostrant-se efectiu especialment si s’ajusta la representació de manera supervisada amb un conjunt de dades d’adaptació.Postprint (published version
    • …
    corecore