6,281 research outputs found

    Deep learning for speech to text transcription for the portuguese language

    Get PDF
    Automatic speech recognition (ASR) is the process of transcribing audio recordings into text, i.e. to transform speech into the respective sequence of words. This process is also commonly known as speechto- text. Machine learning (ML), the ability of machines to learn from examples, is one of the most relevant areas of artificial intelligence in today’s world. Deep learning is a subset of ML which makes use of Deep Neural Networks, a particular type of Artificial Neural Networks (ANNs), which are intended to mimic human neurons, that possess a large number of layers. This dissertation reviews the state-of-the-art on automatic speech recognition throughout time, from early systems which used Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs) to the most up-to-date end-to-end (E2E) deep neural models. Considering the context of the present work, some deep learning algorithms used in state-of-the-art approaches are explained in additional detail. The current work aims to develop an ASR system for the European Portuguese language using deep learning. This is achieved by implementing a pipeline composed of stages responsible for data acquisition, data analysis, data pre-processing, model creation and evaluation of results. With the NVIDIA NeMo framework was possible to implement the QuartzNet15x5 architecture based on 1D time-channel separable convolutions. Following a data-centric methodology, the model developed yielded state-of-the-art Word Error Rate (WER) results of WER = 0.0503; Sumário: Aprendizagem profunda para transcrição de fala para texto para a Língua Portuguesa - O reconhecimento automático de fala (ASR) é o processo de transcrever gravações de áudio em texto, i.e., transformar a fala na respectiva sequência de palavras. Esse processo também é comumente conhecido como speech-to-text. A aprendizagem de máquina (ML), a capacidade das máquinas de aprenderem através de exemplos, é um dos campos mais relevantes da inteligência artificial no mundo atual. Deep learning é um subconjunto de ML que faz uso de Redes Neurais Profundas, um tipo particular de Redes Neurais Artificiais (ANNs), que se destinam a imitar neurónios humanos, que possuem um grande número de camadas Esta dissertação faz uma revisão ao estado da arte do reconhecimento automático de fala ao longo do tempo, desde os primeiros sistemas que usavam Hidden Markov Models (HMMs) e Gaussian Mixture Models (GMMs até sistemas end-to-end (E2E) mais recentes que usam modelos neuronais profundos. Considerando o contexto do presente trabalho, alguns algoritmos de aprendizagem profunda usados em abordagens de ponta são explicados mais detalhadamente. O presente trabalho tem como objetivo desenvolver um sistema ASR para a língua portuguesa europeia utilizando deep learning. Isso é conseguido por meio da implementação de um pipeline composto por etapas responsáveis pela aquisição de dados, análise dos dados, pré-processamento dos dados, criação do modelo e avaliação dos resultados. Com o framework NVIDIA NeMo foi possível implementar a arquitetura QuartzNet15x5 baseada em convoluções 1D separáveis por canal de tempo. Seguindo uma metodologia centrada em dados, o modelo desenvolvido produziu resultados de taxa de erro de palavra (WER) semelhantes aos de estado da arte de WER = 0.0503

    Examples of works to practice staccato technique in clarinet instrument

    Get PDF
    Klarnetin staccato tekniğini güçlendirme aşamaları eser çalışmalarıyla uygulanmıştır. Staccato geçişlerini hızlandıracak ritim ve nüans çalışmalarına yer verilmiştir. Çalışmanın en önemli amacı sadece staccato çalışması değil parmak-dilin eş zamanlı uyumunun hassasiyeti üzerinde de durulmasıdır. Staccato çalışmalarını daha verimli hale getirmek için eser çalışmasının içinde etüt çalışmasına da yer verilmiştir. Çalışmaların üzerinde titizlikle durulması staccato çalışmasının ilham verici etkisi ile müzikal kimliğe yeni bir boyut kazandırmıştır. Sekiz özgün eser çalışmasının her aşaması anlatılmıştır. Her aşamanın bir sonraki performans ve tekniği güçlendirmesi esas alınmıştır. Bu çalışmada staccato tekniğinin hangi alanlarda kullanıldığı, nasıl sonuçlar elde edildiği bilgisine yer verilmiştir. Notaların parmak ve dil uyumu ile nasıl şekilleneceği ve nasıl bir çalışma disiplini içinde gerçekleşeceği planlanmıştır. Kamış-nota-diyafram-parmak-dil-nüans ve disiplin kavramlarının staccato tekniğinde ayrılmaz bir bütün olduğu saptanmıştır. Araştırmada literatür taraması yapılarak staccato ile ilgili çalışmalar taranmıştır. Tarama sonucunda klarnet tekniğin de kullanılan staccato eser çalışmasının az olduğu tespit edilmiştir. Metot taramasında da etüt çalışmasının daha çok olduğu saptanmıştır. Böylelikle klarnetin staccato tekniğini hızlandırma ve güçlendirme çalışmaları sunulmuştur. Staccato etüt çalışmaları yapılırken, araya eser çalışmasının girmesi beyni rahatlattığı ve istekliliği daha arttırdığı gözlemlenmiştir. Staccato çalışmasını yaparken doğru bir kamış seçimi üzerinde de durulmuştur. Staccato tekniğini doğru çalışmak için doğru bir kamışın dil hızını arttırdığı saptanmıştır. Doğru bir kamış seçimi kamıştan rahat ses çıkmasına bağlıdır. Kamış, dil atma gücünü vermiyorsa daha doğru bir kamış seçiminin yapılması gerekliliği vurgulanmıştır. Staccato çalışmalarında baştan sona bir eseri yorumlamak zor olabilir. Bu açıdan çalışma, verilen müzikal nüanslara uymanın, dil atış performansını rahatlattığını ortaya koymuştur. Gelecek nesillere edinilen bilgi ve birikimlerin aktarılması ve geliştirici olması teşvik edilmiştir. Çıkacak eserlerin nasıl çözüleceği, staccato tekniğinin nasıl üstesinden gelinebileceği anlatılmıştır. Staccato tekniğinin daha kısa sürede çözüme kavuşturulması amaç edinilmiştir. Parmakların yerlerini öğrettiğimiz kadar belleğimize de çalışmaların kaydedilmesi önemlidir. Gösterilen azmin ve sabrın sonucu olarak ortaya çıkan yapıt başarıyı daha da yukarı seviyelere çıkaracaktır

    A Decision Support System for Economic Viability and Environmental Impact Assessment of Vertical Farms

    Get PDF
    Vertical farming (VF) is the practice of growing crops or animals using the vertical dimension via multi-tier racks or vertically inclined surfaces. In this thesis, I focus on the emerging industry of plant-specific VF. Vertical plant farming (VPF) is a promising and relatively novel practice that can be conducted in buildings with environmental control and artificial lighting. However, the nascent sector has experienced challenges in economic viability, standardisation, and environmental sustainability. Practitioners and academics call for a comprehensive financial analysis of VPF, but efforts are stifled by a lack of valid and available data. A review of economic estimation and horticultural software identifies a need for a decision support system (DSS) that facilitates risk-empowered business planning for vertical farmers. This thesis proposes an open-source DSS framework to evaluate business sustainability through financial risk and environmental impact assessments. Data from the literature, alongside lessons learned from industry practitioners, would be centralised in the proposed DSS using imprecise data techniques. These techniques have been applied in engineering but are seldom used in financial forecasting. This could benefit complex sectors which only have scarce data to predict business viability. To begin the execution of the DSS framework, VPF practitioners were interviewed using a mixed-methods approach. Learnings from over 19 shuttered and operational VPF projects provide insights into the barriers inhibiting scalability and identifying risks to form a risk taxonomy. Labour was the most commonly reported top challenge. Therefore, research was conducted to explore lean principles to improve productivity. A probabilistic model representing a spectrum of variables and their associated uncertainty was built according to the DSS framework to evaluate the financial risk for VF projects. This enabled flexible computation without precise production or financial data to improve economic estimation accuracy. The model assessed two VPF cases (one in the UK and another in Japan), demonstrating the first risk and uncertainty quantification of VPF business models in the literature. The results highlighted measures to improve economic viability and the viability of the UK and Japan case. The environmental impact assessment model was developed, allowing VPF operators to evaluate their carbon footprint compared to traditional agriculture using life-cycle assessment. I explore strategies for net-zero carbon production through sensitivity analysis. Renewable energies, especially solar, geothermal, and tidal power, show promise for reducing the carbon emissions of indoor VPF. Results show that renewably-powered VPF can reduce carbon emissions compared to field-based agriculture when considering the land-use change. The drivers for DSS adoption have been researched, showing a pathway of compliance and design thinking to overcome the ‘problem of implementation’ and enable commercialisation. Further work is suggested to standardise VF equipment, collect benchmarking data, and characterise risks. This work will reduce risk and uncertainty and accelerate the sector’s emergence

    Acoustic modelling, data augmentation and feature extraction for in-pipe machine learning applications

    Get PDF
    Gathering measurements from infrastructure, private premises, and harsh environments can be difficult and expensive. From this perspective, the development of new machine learning algorithms is strongly affected by the availability of training and test data. We focus on audio archives for in-pipe events. Although several examples of pipe-related applications can be found in the literature, datasets of audio/vibration recordings are much scarcer, and the only references found relate to leakage detection and characterisation. Therefore, this work proposes a methodology to relieve the burden of data collection for acoustic events in deployed pipes. The aim is to maximise the yield of small sets of real recordings and demonstrate how to extract effective features for machine learning. The methodology developed requires the preliminary creation of a soundbank of audio samples gathered with simple weak annotations. For practical reasons, the case study is given by a range of appliances, fittings, and fixtures connected to pipes in domestic environments. The source recordings are low-reverberated audio signals enhanced through a bespoke spectral filter and containing the desired audio fingerprints. The soundbank is then processed to create an arbitrary number of synthetic augmented observations. The data augmentation improves the quality and the quantity of the metadata and automatically creates strong and accurate annotations that are both machine and human-readable. Besides, the implemented processing chain allows precise control of properties such as signal-to-noise ratio, duration of the events, and the number of overlapping events. The inter-class variability is expanded by recombining source audio blocks and adding simulated artificial reverberation obtained through an acoustic model developed for the purpose. Finally, the dataset is synthesised to guarantee separability and balance. A few signal representations are optimised to maximise the classification performance, and the results are reported as a benchmark for future developments. The contribution to the existing knowledge concerns several aspects of the processing chain implemented. A novel quasi-analytic acoustic model is introduced to simulate in-pipe reverberations, adopting a three-layer architecture particularly convenient for batch processing. The first layer includes two algorithms: one for the numerical calculation of the axial wavenumbers and one for the separation of the modes. The latter, in particular, provides a workaround for a problem not explicitly treated in the literature and related to the modal non-orthogonality given by the solid-liquid interface in the analysed domain. A set of results for different waveguides is reported to compare the dispersive behaviour against different mechanical configurations. Two more novel solutions are also included in the second layer of the model and concern the integration of the acoustic sources. Specifically, the amplitudes of the non-orthogonal modal potentials are obtained using either a distance minimisation objective function or by solving an analytical decoupling problem. In both cases, results show that sources sufficiently smooth can be approximated with a limited number of modes keeping the error below 1%. The last layer proposes a bespoke approach for the integration of the acoustic model into the synthesiser as a reverberation simulator. Additional elements of novelty relate to the other blocks of the audio synthesiser. The statistical spectral filter, for instance, is a batch-processing solution for the attenuation of the background noise of the source recordings. The signal-to-noise ratio analysis for both moderate and high noise levels indicates a clear improvement of several decibels against the closest filter example in the literature. The recombination of the audio blocks and the system of fully tracked annotations are also novel extensions of similar approaches recently adopted in other contexts. Moreover, a bespoke synthesis strategy is proposed to guarantee separable and balanced datasets. The last contribution concerns the extraction of convenient sets of audio features. Elements of novelty are introduced for the optimisation of the filter banks of the mel-frequency cepstral coefficients and the scattering wavelet transform. In particular, compared to the respective standard definitions, the average F-score performance of the optimised features is roughly 6% higher in the first case and 2.5% higher for the latter. Finally, the soundbank, the synthetic dataset, and the fundamental blocks of the software library developed are publicly available for further research

    Binaural virtual auditory display for music discovery and recommendation

    Get PDF
    Emerging patterns in audio consumption present renewed opportunity for searching or navigating music via spatial audio interfaces. This thesis examines the potential benefits and considerations for using binaural audio as the sole or principal output interface in a music browsing system. Three areas of enquiry are addressed. Specific advantages and constraints in spatial display of music tracks are explored in preliminary work. A voice-led binaural music discovery prototype is shown to offer a contrasting interactive experience compared to a mono smartspeaker. Results suggest that touch or gestural interaction may be more conducive input modes in the former case. The limit of three binaurally spatialised streams is identified from separate data as a usability threshold for simultaneous presentation of tracks, with no evident advantages derived from visual prompts to aid source discrimination or localisation. The challenge of implementing personalised binaural rendering for end-users of a mobile system is addressed in detail. A custom framework for assessing head-related transfer function (HRTF) selection is applied to data from an approach using 2D rendering on a personal computer. That HRTF selection method is developed to encompass 3D rendering on a mobile device. Evaluation against the same criteria shows encouraging results in reliability, validity, usability and efficiency. Computational analysis of a novel approach for low-cost, real-time, head-tracked binaural rendering demonstrates measurable advantages compared to first order virtual Ambisonics. Further perceptual evaluation establishes working parameters for interactive auditory display use cases. In summation, the renderer and identified tolerances are deployed with a method for synthesised, parametric 3D reverberation (developed through related research) in a final prototype for mobile immersive playlist editing. Task-oriented comparison with a graphical interface reveals high levels of usability and engagement, plus some evidence of enhanced flow state when using the eyes-free binaural system

    The determinants of value addition: a crtitical analysis of global software engineering industry in Sri Lanka

    Get PDF
    It was evident through the literature that the perceived value delivery of the global software engineering industry is low due to various facts. Therefore, this research concerns global software product companies in Sri Lanka to explore the software engineering methods and practices in increasing the value addition. The overall aim of the study is to identify the key determinants for value addition in the global software engineering industry and critically evaluate the impact of them for the software product companies to help maximise the value addition to ultimately assure the sustainability of the industry. An exploratory research approach was used initially since findings would emerge while the study unfolds. Mixed method was employed as the literature itself was inadequate to investigate the problem effectively to formulate the research framework. Twenty-three face-to-face online interviews were conducted with the subject matter experts covering all the disciplines from the targeted organisations which was combined with the literature findings as well as the outcomes of the market research outcomes conducted by both government and nongovernment institutes. Data from the interviews were analysed using NVivo 12. The findings of the existing literature were verified through the exploratory study and the outcomes were used to formulate the questionnaire for the public survey. 371 responses were considered after cleansing the total responses received for the data analysis through SPSS 21 with alpha level 0.05. Internal consistency test was done before the descriptive analysis. After assuring the reliability of the dataset, the correlation test, multiple regression test and analysis of variance (ANOVA) test were carried out to fulfil the requirements of meeting the research objectives. Five determinants for value addition were identified along with the key themes for each area. They are staffing, delivery process, use of tools, governance, and technology infrastructure. The cross-functional and self-organised teams built around the value streams, employing a properly interconnected software delivery process with the right governance in the delivery pipelines, selection of tools and providing the right infrastructure increases the value delivery. Moreover, the constraints for value addition are poor interconnection in the internal processes, rigid functional hierarchies, inaccurate selections and uses of tools, inflexible team arrangements and inadequate focus for the technology infrastructure. The findings add to the existing body of knowledge on increasing the value addition by employing effective processes, practices and tools and the impacts of inaccurate applications the same in the global software engineering industry

    Bibliographic Control in the Digital Ecosystem

    Get PDF
    With the contributions of international experts, the book aims to explore the new boundaries of universal bibliographic control. Bibliographic control is radically changing because the bibliographic universe is radically changing: resources, agents, technologies, standards and practices. Among the main topics addressed: library cooperation networks; legal deposit; national bibliographies; new tools and standards (IFLA LRM, RDA, BIBFRAME); authority control and new alliances (Wikidata, Wikibase, Identifiers); new ways of indexing resources (artificial intelligence); institutional repositories; new book supply chain; “discoverability” in the IIIF digital ecosystem; role of thesauri and ontologies in the digital ecosystem; bibliographic control and search engines

    Individual differences in early instructed language learning

    Get PDF
    Variability in predispositions for language learning has attracted scholarly curiosity for over 100 years. Despite major changes in theoretical explanations and foreign/second language teaching paradigms, some patterns of associations between predispositions and learning outcomes seem timelessly robust. This book discusses evidence from a research project investigating individual differences in a wide variety of domains, ranging from language aptitude over general cognitive abilities to motivational and other affective and social constructs. The focus lies on young learners aged 10 to 12, a less frequently investigated age in aptitude research. The data stem from two samples of multilingual learners in German-speaking Switzerland. The target languages are French and English. The chapters of the book offer two complementary perspectives on the topic: On the one hand, cross-sectional investigations of the underlying structure of these individual differences and their association with the target languages are discussed. Drawing on factor analytical and multivariable analyses, the different components are scrutinized with respect to their mutual dependence and their relative impact on target language skills. The analyses also take into account contextual factors such as the learners’ family background and differences across the two contexts investigated. On the other hand, the potential to predict learner’s skills in the target language over time based on the many different indicators is investigated using machine learning algorithms. The results provide new insights into the stability of the individual dispositions, on the impact of contextual variables, and on empirically robust dimensions within the array of variables tested

    The Multifaceted Nature of Food and Nutrition Insecurity around the World and Foodservice Business

    Get PDF
    The international concept of food security is a situation where all people have physical, social, and economic access at all times to sufficient, safe, and nutritious food that meets their dietary needs and food preferences for an active and healthy life. All four parameters (availability, access, utilization, and stability) should therefore be measured to determine food security status.Taking into account these premises, this book aims to present original research articles and reviews concerning the following: Agriculture and food security; Agri-tourism and its potential to assist with food security; Business–science cooperation to advance food security; Competing demands and tradeoffs for land and water resources; Consumer behavior, nutritional security and food assistance programs; Food and health; Global and local analyses of food security and its drivers; Global governance and food security; Infectious and non-infectious diseases and food security; Reducing food loss and waste; Reducing risks to food production and distribution from climate change; Supply chains and food security; Technological breakthroughs to help feed the globe; Tourism food security relationship; Urbanization, food value chains, and the sustainable, secure sourcing of food; Food and service quality at food catering establishments; Consumer behavior at foodservice operations (restaurants, cafés, hotels)
    corecore