84 research outputs found

    Towards an automatic speech recognition system for use by deaf students in lectures

    Get PDF
    According to the Royal National Institute for Deaf people there are nearly 7.5 million hearing-impaired people in Great Britain. Human-operated machine transcription systems, such as Palantype, achieve low word error rates in real-time. The disadvantage is that they are very expensive to use because of the difficulty in training operators, making them impractical for everyday use in higher education. Existing automatic speech recognition systems also achieve low word error rates, the disadvantages being that they work for read speech in a restricted domain. Moving a system to a new domain requires a large amount of relevant data, for training acoustic and language models. The adopted solution makes use of an existing continuous speech phoneme recognition system as a front-end to a word recognition sub-system. The subsystem generates a lattice of word hypotheses using dynamic programming with robust parameter estimation obtained using evolutionary programming. Sentence hypotheses are obtained by parsing the word lattice using a beam search and contributing knowledge consisting of anti-grammar rules, that check the syntactic incorrectness’ of word sequences, and word frequency information. On an unseen spontaneous lecture taken from the Lund Corpus and using a dictionary containing "2637 words, the system achieved 815% words correct with 15% simulated phoneme error, and 73.1% words correct with 25% simulated phoneme error. The system was also evaluated on 113 Wall Street Journal sentences. The achievements of the work are a domain independent method, using the anti- grammar, to reduce the word lattice search space whilst allowing normal spontaneous English to be spoken; a system designed to allow integration with new sources of knowledge, such as semantics or prosody, providing a test-bench for determining the impact of different knowledge upon word lattice parsing without the need for the underlying speech recognition hardware; the robustness of the word lattice generation using parameters that withstand changes in vocabulary and domain

    Semantically-enhanced advertisement recommender systems in social networks

    Get PDF
    El objetivo principal de la investigación es estudiar y diseñar un entorno de recomendación publicitaria en las redes sociales que puede ser enriquecido mediante tecnologías semánticas. A pesar de que existen muchas aplicaciones y soluciones para los sistemas de recomendación, en este estudio se diseña un framework robusto con un rendimiento adecuado para poder ser implementado en las redes sociales con el objetivo de ampliar los propósitos de negocio. De este objetivo principal se pueden derivar los siguientes objetivos secundarios: 1. Superar las limitaciones iniciales de los métodos clásicos de recomendación. 2. Aumentar la calidad y precisión de las recomendaciones y el rendimiento del sistema de recomendación. 3. Utilizar convenientemente la metodología planteada. 4. Establecer el marco propuesto en una plataforma de software real. 5. Considerar en la solución la portabilidad como un aspecto clave en los sistemas de software. 6. Considerar la fiabilidad del framework. 7. Tener un nivel de seguridad aceptable para el framework. En primer lugar, es necesario superar las limitaciones de los métodos clásicos de recomendación. En el presente trabajo, este objetivo se alcanzará mediante un método híbrido que se componga de los cuatro métodos básicos de recomendación (filtrado colaborativo, basado en contenido, demográfico y basado en conocimiento), y que recoja cada uno de los beneficios individuales de los mismos. En concreto, a pesar de los problemas conocidos de los métodos basados en filtrado colaborativo, a saber, escasez de datos (del inglés ‘data sparsity’), escalabilidad y arranque en frio (del inglés ‘cold start’), sigue siendo fundamental aprovechar las ventajas de esta técnica colaborativa de recomendación. Además, mediante la adición de técnicas semánticas durante el proceso de cálculo de las recomendaciones publicitarias, se aumentará la calidad y precisión de éstas. La tecnología semántica utilizada en el marco ha mejorado el rendimiento del sistema y supone un punto novedoso, siendo ésta una de las principales contribuciones frente al resto de investigaciones similares. En particular, para mejorar la exactitud de las recomendaciones, la semántica tanto de los distintos elementos de información como de los perfiles de clientes se ha tenido en cuenta. Introducir la semántica en el pronóstico proporciona una visión adicional sobre las explicaciones básicas detrás de las cuales un cliente podría permitir el acceso a productos específicos (algo que se entiende y se cubre con estrategias habituales sin consideración semántica). La semántica utilizada en este estudio es entendida en forma de relaciones entre conceptos. Como resultado, es posible extraer un conocimiento extra de los elementos disponibles. Otro de los objetivos de esta tesis es asegurar que se siga una metodología apropiada. Es necesario que la investigación obtenga resultados aceptables mediante la implementación de algoritmos fáciles de usar y un enfoque adecuado. Para alcanzar este objetivo, se diseña un caso de estudio, y posteriormente se implementa una aplicación Web capaz de determinar recomendaciones para los usuarios. El desarrollo de esta aplicación Web tiene sus propias dificultades y complejidades, pero la aplicación es amigable y fácil de usar. Los usuarios pueden navegar fácilmente en línea y trabajar con las aplicaciones instaladas en el sitio Web. La evaluación de la aproximación propuesta se realizará sobre este entorno real. De esta forma, también se establece como objetivo el establecer el framework en una plataforma de software real para probarlo y observar el rendimiento del mismo. Este objetivo es muy importante dado que si no existe la posibilidad de establecer un prototipo (prueba de concepto) para implementar la idea de la investigación, no será posible llegar a una conclusión adecuada y alcanzar los objetivos del estudio. Así, antes de desarrollar la idea de la investigación, se verificó si era posible encontrar una solución de software para obtener resultados reales en el marco implementado que permitiera posteriormente observar el resultado adecuado y, de este modo, asegurase de que los objetivos y requerimientos iniciales de la investigación en forma de resultados finales pueden ser probados. Asegurar la portabilidad y la fiabilidad es otra de las claves perseguidas en este trabajo. En este contexto, la portabilidad hace referencia a la posibilidad de implementar el framework en distintas plataformas disponibles incluyendo hardware, software, tipo de red social y publicidad. En este caso, el diseño del marco es independiente de cualquier plataforma. El framework se ha propuesto en un formato general y es muy fácil ajustarlo a los sistemas de software y hardware disponibles. Incluso es posible establecer el marco en diferentes sistemas operativos y no hay limitación en el número de instancias de instalación. Por otro lado, la fiabilidad, similar a la validez, es un método para evaluar la naturaleza de la estrategia de estimación utilizada para recopilar información en un estudio. En conjunto, para que los resultados de un estudio se consideren sustanciales, el sistema de estimación debe ser sólido. Lo que se persigue con la fiabilidad es que cualquier resultado crítico sea más que un hallazgo irregular y sea, por tanto, repetible. Distintos científicos deben tener la capacidad de realizar la misma investigación, en las mismas condiciones y producir los mismos resultados. Esto fortalecerá los descubrimientos y garantizará que grupos académicos más extensos reconozcan la teoría. La fiabilidad entendida de este modo es, en consecuencia, esencial para que una teoría se acumule como una verdad experimental reconocida. En esta tesis doctoral se realizan sobre la aplicación Web un total de 73 experimentos, resultando en un nivel prometedor de fiabilidad. Por último, la seguridad es uno de los retos fundamentales en las aplicaciones de la Web social y constituye un requisito básico del marco de trabajo propuesto en esta tesis. La seguridad es, en realidad, una de las principales preocupaciones de todas las aplicaciones software y la implementación del marco en una plataforma segura es, por tanto, muy importante. Para ello se consideró el componente de seguridad como uno de los elementos del marco, el cual se compone de diferentes niveles: (i) autenticación, y (ii) comprobación de identidad a partir del comportamiento. La autenticación única (‘SSO’ del inglés, Single Sign-On) permite a los usuarios loguearse en el sistema. Por otro lado, se mantiene un registro del comportamiento del usuario en las interacciones con la aplicación Web y se compara éste con el histórico. Este segundo nivel de seguridad previene el acceso de atacantes a contenidos no autorizados.The composition of Semantic Web advances with Web 2.0 application plan designs has risen to the social semantic Web, additionally introduced as Web 3.0. In accordance with this thought, a software platform will be displayed that effectively joins both Web 2.0 ideas and Semantic Web advancements. The structure of this study joins a progression of semantic-based application modules in a completely fledged social application with the goal of catching semantics in the purpose of information retrieval. Once the establishments and principle ideas of the alluded framework are brought up and its architecture was explained, a comprehensive model of the system will be demonstrated. Finally, the result of a case study will be validated using the standard metrics. It will be spoken to how the system can help in obtaining semantically-improved financially related data from the clients of the social applications and giving valuable proposals to advertisement recommender. The ability of knowledge contribution nowadays is unmatched ever. At no other time have such a large number of inventive and proficient individuals been associated by such a productive, all-inclusive system. The expenses of social occasion and registering over their commitments have come down to the point where new organizations with extremely humble spending plans give imaginative new administrations to a great number of online members. Collective intelligence is an amazing insight which can have numerous constructive outcomes on social networks. The outcome nowadays is amazing broadness of data and variety of point of view, and a society of mass investment that supports a wellspring of freely accessible substance. The Social Web (containing services, for example, MySpace, Flickr, last.fm, and WordPress) has caught the consideration of a large number of clients and in addition billions of dollars in venture and procurement. Social sites, advancing around the associations amongst individuals and their entities of interest, are experiencing limits in the territories of information integration, dispersal, reuse, compactness, searchability, automation and requesting undertakings like questioning. The Semantic Web is a perfect tool for interlinking and performing operations on various individual and item related information accessible from the Social Web, and has delivered an assortment of ways to deal with beat the limits being knowledgeable about Social Web application ranges. Recommendation is a compelling approach to diminish the expense for discovering data furthermore a capable approach to draw in clients. It has been broadly utilized as a part of numerous e-commerce applications, e.g., Amazon.com, CDNOW.com, eBay.com, Reel.com, et cetera. As of late, numerous techniques have been proposed for suggestion, for instance, Content-based Filtering, Collaborative Filtering, Clustering Model, Classification Model, Graph Model, and Association Rule approach. The proposed approaches have been connected to the conventional Web applications, which as a rule need suggest one and only sort of data (e.g., Amazon prescribes books, news.baidu.com prescribes news, and movielens.com prescribes films). So as to defeat data over-burden, recommender frameworks have turned into a key apparatus for giving clients customized suggestions on things, for example, films, music, books, news, and web pages. Captivated by numerous viable applications, analysts have created calculations and frameworks in the course of the most recent decade. Some of them have been popularized by online merchants, for example, Amazon.com, Netflix.com, and IMDb.com. These frameworks foresee user preferences (frequently spoke to as numeric evaluations) for new items in light of the client's past appraisals on different items. There are regularly two sorts of calculations for recommender frameworks - content-based techniques and collaborative filtering. Content-based techniques measure the likeness of the prescribed item (target item) to the ones that an objective user (i.e., user who gets recommendations) likes or aversions in light of item properties. Then again, collaborative filtering discovers users with tastes that are like the objective users depends on their ratings in the past. Collaborative filtering will then make recommendations to the objective user in light of the feelings of those comparative users. In spite of these endeavors, recommender frameworks still face numerous testing issues. These problems will make many limitations on the operation of recommendation systems. The change in the expectation precision can build client fulfillment, which thusly prompts higher benefits for those e-trade sites. Second, calculations for recommender frameworks experience the side effects of numerous problems. For instance, keeping in mind the end goal to gauge thing closeness, Content-based strategies depend with respect to express thing depictions. Be that as it may, such depictions might be hard to acquire for things like thoughts or feelings. As opposed to the tremendous number of things in recommender frameworks, every client regularly just rates a couple. In this way, the user/thing rating matrix is commonly extremely scanty. It is troublesome for recommender frameworks to precisely quantify client likenesses from those predetermined number of audits. A related issue is the Cold-start issue. Notwithstanding for a framework that is not especially meager, when a client at first joins, the framework has none or maybe just a couple audits from this client. In this manner, the framework can't precisely translate this current client's inclination. To handle those issues, two methodologies have been proposed. The main methodology is to gather the user/item rating matrix through dimensionality lessening systems, for example, Singular Value Decomposition (SVD). By grouping clients or things as per their idle structure, unrepresentative clients or things can be disposed of, and in this way the user/item grid gets to be denser. Nonetheless, these strategies don't essentially enhance the execution of recommender frameworks, and now and again aggravate the execution even. For using this approach, a methodology of kNN has been utilized for the framework to cluster users to two groups of neighbors and the other. So, the framework considers only those neighbor users which have more relative and similar data to the current user. The second approach is to "improve" the user/item rating matrix by 1) presenting default evaluations or verifiable client ratings, e.g., the time spent on perusing articles; 2) utilizing silly evaluating expectations from content-based techniques; or 3) abusing transitive relationship among clients through their past exchanges and feedback. These techniques enhance the execution of recommender frameworks to some degree. Specifically, another worldview of recommender frameworks is proposed by using data in social networks, particularly that of social impact. Customary recommender frameworks do not think about unequivocal social relations among clients, yet the significance of social impact in item advertising has for quite some time been perceived. Instinctively, when we need to purchase an item that is not commonplace, we frequently counsel with our companions who have as of now had involvement with the item, since they are those that we can go after quick exhortation. At the point when companions prescribe an item to us, we additionally have a tendency to acknowledge the suggestion in light of the fact that their inputs are dependable. This is one reason that collaborative filtering has been used as one of the components of the recommender system. Furthermore, the combination of social networks can hypothetically enhance the execution of current recommender frameworks. To start with, as far as the forecast precision, the extra data about clients and their companions acquired from social networks enhances the comprehension of client practices and appraisals. In this manner, we can demonstrate and translate client inclinations all the more absolutely, and accordingly enhance the forecast precision. Second, with companion data in social networks, it is no more important to discover comparable clients by measuring their rating comparability, in light of the fact that the way that two individuals are companions as of now demonstrates that they have things in like manner. In this manner, the information Sparsity issue can be reduced. At long last, for the Cold-start issue, regardless of the possibility that a client has no past audits, recommender framework still can make proposals to the client in view of the inclinations of his/her companions on the off chance that it coordinates with social networks. These instincts and perceptions rouse us to plan another worldview of recommender frameworks that can exploit data in social networks. The late rise of online social networks (OSNs) gives us a chance to examine the part of social impact in recommender frameworks. With the expanding ubiquity of Web 2.0, numerous OSNs, for example, Myspace.com, Facebook.com, and Linkedin.com have risen. Individuals in those systems have their own customized space where they not just distribute their life stories, leisure activities, interests, online journals, and so forth., additionally list their companions. Companions or guests can visit these individual spaces and leave remarks. OSNs give stages where individuals can put themselves on show and keep up associations with companions. As OSNs keep on gaining more fame, the phenomenal measure of individual data and social relations enhance sociology research where it was once constrained by an absence of information. As an exploration, the part of unequivocal social relations in recommender frameworks is as an important part of the research, for example, how client inclinations or evaluations are connected with those of neighbors, and how to utilize such relationships to outline a superior recommender framework. Specifically, a calculation structure is planned which makes suggestions taking into account client's own particular inclinations, the general acknowledgment of the objective thing, and the assessments from social networks. A genuine online social network data from last.fm has been crawled as a contextual investigation, and perform broad examination on this dataset. Additionally, the dataset is utilized, accumulated from the social network, to assess the execution of the proposed framework on the scalability, data sparsity, and cold start. The exploratory aftereffects of our framework show critical change against customary community oriented sifting in those perspectives. For instance, the computed precision in the wake of running the contextual analysis has enhanced by 0.7498 contrasted with conventional shared separating. Moreover, it is proposed to utilize the semantics of client connections by their similitudes and better grained client appraisals to enhance the expectation exactness

    A Review of Deep Learning Techniques for Speech Processing

    Full text link
    The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field

    Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

    Get PDF
    Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

    Exploring the use of Technology for Assessment and Intensive Treatment of Childhood Apraxia of Speech

    Get PDF
    Given the rapid advances in technology over the past decade, this thesis examines the potential for automatic speech recognition (ASR) technology to expedite the process of objective analysis of speech, particularly for lexical stress patterns in childhood apraxia of speech. This dissertation also investigates the potential for mobile technology to bridge the gap between current service delivery models in Australia and best practice treatment intensity for CAS. To address these two broad aims, this thesis describes three main projects. The first is a systematic literature review summarising the development, implementation and accuracy of automatic speech analysis tools when applied to evaluation and modification of children’s speech production skills. Guided by the results of the systematic review, the second project presents data on the accuracy and clinical utility of a custom-designed lexical stress classification tool, designed as part of a multi-component speech analysis system for a mobile therapy application, Tabby Talks, for use with children with CAS. The third project is a randomised control trial exploring the effect of different types of feedback on response to intervention for children with CAS. The intervention was designed to specifically explore the feasibility and effectiveness of using an app equipped with ASR technology to provide feedback on speech production accuracy during home practice sessions, simulating the common service delivery model in Australia. The thesis concludes with a discussion of future directions for technology-based speech assessment and intensive speech production practice, guidelines for future development of therapy tools that include more game-based practice activities and the contexts in which children can be transferred from predominantly clinician-delivered augmented feedback to ASR-delivered right/wrong feedback and continue to make optimal gains in acquisition and retention of speech production targets

    Syntax-based machine translation using dependency grammars and discriminative machine learning

    Get PDF
    Machine translation underwent huge improvements since the groundbreaking introduction of statistical methods in the early 2000s, going from very domain-specific systems that still performed relatively poorly despite the painstakingly crafting of thousands of ad-hoc rules, to general-purpose systems automatically trained on large collections of bilingual texts which manage to deliver understandable translations that convey the general meaning of the original input. These approaches however still perform quite below the level of human translators, typically failing to convey detailed meaning and register, and producing translations that, while readable, are often ungrammatical and unidiomatic. This quality gap, which is considerably large compared to most other natural language processing tasks, has been the focus of the research in recent years, with the development of increasingly sophisticated models that attempt to exploit the syntactical structure of human languages, leveraging the technology of statistical parsers, as well as advanced machine learning methods such as marging-based structured prediction algorithms and neural networks. The translation software itself became more complex in order to accommodate for the sophistication of these advanced models: the main translation engine (the decoder) is now often combined with a pre-processor which reorders the words of the source sentences to a target language word order, or with a post-processor that ranks and selects a translation according according to fine model from a list of candidate translations generated by a coarse model. In this thesis we investigate the statistical machine translation problem from various angles, focusing on translation from non-analytic languages whose syntax is best described by fluid non-projective dependency grammars rather than the relatively strict phrase-structure grammars or projectivedependency grammars which are most commonly used in the literature. We propose a framework for modeling word reordering phenomena between language pairs as transitions on non-projective source dependency parse graphs. We quantitatively characterize reordering phenomena for the German-to-English language pair as captured by this framework, specifically investigating the incidence and effects of the non-projectivity of source syntax and the non-locality of word movement w.r.t. the graph structure. We evaluated several variants of hand-coded pre-ordering rules in order to assess the impact of these phenomena on translation quality. We propose a class of dependency-based source pre-ordering approaches that reorder sentences based on a flexible models trained by SVMs and and several recurrent neural network architectures. We also propose a class of translation reranking models, both syntax-free and source dependency-based, which make use of a type of neural networks known as graph echo state networks which is highly flexible and requires extremely little training resources, overcoming one of the main limitations of neural network models for natural language processing tasks

    Advances in Robotics, Automation and Control

    Get PDF
    The book presents an excellent overview of the recent developments in the different areas of Robotics, Automation and Control. Through its 24 chapters, this book presents topics related to control and robot design; it also introduces new mathematical tools and techniques devoted to improve the system modeling and control. An important point is the use of rational agents and heuristic techniques to cope with the computational complexity required for controlling complex systems. Through this book, we also find navigation and vision algorithms, automatic handwritten comprehension and speech recognition systems that will be included in the next generation of productive systems developed by man

    Investigating the build-up of precedence effect using reflection masking

    Get PDF
    The auditory processing level involved in the build‐up of precedence [Freyman et al., J. Acoust. Soc. Am. 90, 874–884 (1991)] has been investigated here by employing reflection masked threshold (RMT) techniques. Given that RMT techniques are generally assumed to address lower levels of the auditory signal processing, such an approach represents a bottom‐up approach to the buildup of precedence. Three conditioner configurations measuring a possible buildup of reflection suppression were compared to the baseline RMT for four reflection delays ranging from 2.5–15 ms. No buildup of reflection suppression was observed for any of the conditioner configurations. Buildup of template (decrease in RMT for two of the conditioners), on the other hand, was found to be delay dependent. For five of six listeners, with reflection delay=2.5 and 15 ms, RMT decreased relative to the baseline. For 5‐ and 10‐ms delay, no change in threshold was observed. It is concluded that the low‐level auditory processing involved in RMT is not sufficient to realize a buildup of reflection suppression. This confirms suggestions that higher level processing is involved in PE buildup. The observed enhancement of reflection detection (RMT) may contribute to active suppression at higher processing levels
    corecore