315 research outputs found

    Dynamic language modeling for European Portuguese

    Get PDF
    Doutoramento em Engenharia InformáticaActualmente muitas das metodologias utilizadas para transcrição e indexação de transmissões noticiosas são baseadas em processos manuais. Com o processamento e transcrição deste tipo de dados os prestadores de serviços noticiosos procuram extrair informação semântica que permita a sua interpretação, sumarização, indexação e posterior disseminação selectiva. Pelo que, o desenvolvimento e implementação de técnicas automáticas para suporte deste tipo de tarefas têm suscitado ao longo dos últimos anos o interesse pela utilização de sistemas de reconhecimento automático de fala. Contudo, as especificidades que caracterizam este tipo de tarefas, nomeadamente a diversidade de tópicos presentes nos blocos de notícias, originam um elevado número de ocorrência de novas palavras não incluídas no vocabulário finito do sistema de reconhecimento, o que se traduz negativamente na qualidade das transcrições automáticas produzidas pelo mesmo. Para línguas altamente flexivas, como é o caso do Português Europeu, este problema torna-se ainda mais relevante. Para colmatar este tipo de problemas no sistema de reconhecimento, várias abordagens podem ser exploradas: a utilização de informações específicas de cada um dos blocos noticiosos a ser transcrito, como por exemplo os scripts previamente produzidos pelo pivot e restantes jornalistas, e outro tipo de fontes como notícias escritas diariamente disponibilizadas na Internet. Este trabalho engloba essencialmente três contribuições: um novo algoritmo para selecção e optimização do vocabulário, utilizando informação morfosintáctica de forma a compensar as diferenças linguísticas existentes entre os diferentes conjuntos de dados; uma metodologia diária para adaptação dinâmica e não supervisionada do modelo de linguagem, utilizando múltiplos passos de reconhecimento; metodologia para inclusão de novas palavras no vocabulário do sistema, mesmo em situações de não existência de dados de adaptação e sem necessidade re-estimação global do modelo de linguagem.Most of today methods for transcription and indexation of broadcast audio data are manual. Broadcasters process thousands hours of audio and video data on a daily basis, in order to transcribe that data, to extract semantic information, and to interpret and summarize the content of those documents. The development of automatic and efficient support for these manual tasks has been a great challenge and over the last decade there has been a growing interest in the usage of automatic speech recognition as a tool to provide automatic transcription and indexation of broadcast news and random and relevant access to large broadcast news databases. However, due to the common topic changing over time which characterizes this kind of tasks, the appearance of new events leads to high out-of-vocabulary (OOV) word rates and consequently to degradation of recognition performance. This is especially true for highly inflected languages like the European Portuguese language. Several innovative techniques can be exploited to reduce those errors. The use of news shows specific information, such as topic-based lexicons, pivot working script, and other sources such as the online written news daily available in the Internet can be added to the information sources employed by the automatic speech recognizer. In this thesis we are exploring the use of additional sources of information for vocabulary optimization and language model adaptation of a European Portuguese broadcast news transcription system. Hence, this thesis has 3 different main contributions: a novel approach for vocabulary selection using Part-Of-Speech (POS) tags to compensate for word usage differences across the various training corpora; language model adaptation frameworks performed on a daily basis for single-stage and multistage recognition approaches; a new method for inclusion of new words in the system vocabulary without the need of additional data or language model retraining

    Teacher's use of exemplification and explanations in mediating the object of learning

    Get PDF
    A research report submitted to the Faculty of Science, University of the Witwatersrand, in partial fulfilment of the requirements for the degree of Master of Science. November 2017.This study examined teachers' use of exemplification and explanations in the teaching of algebraic expressions. In particular the focus was on the selection and sequencing of examples as well as what a teacher does with these examples in terms of their explanations to maintain the focus or object of learning. Adler and Ronda's (2015) Mathematical Discourse in Instruction (MDI) framework was the foundation of this study's conceptual and analytical framework and was complemented with the work of Stein (2000) and Moschovich (1999, 2015). Data was collected from two Grade 8 teachers through the use of video-recordings and transcripts. The data was then analysed based on the themes that emerged from the conceptual framework. The findings revealed that the examples themselves had the potential to restrict the object of learning and together with the teacher's corresponding explanatory talk could reduce or shift the object of learning from translating algebraic expressions to focusing on procedures. The findings show how each component of MDI worked separately and then together to mediate the of the object of learning, but this study has additionally highlighted how the components themselves, namely exemplification and explanatory talk, have a direct effect on each other.LG201

    What is a Domain?: Understanding the domain term in Multi Domain Operations

    Get PDF
    There is an ongoing debate as to carry out Multi-Domain Operations and how to best prepare your force for it and implement the concept. However, neither the domain term itself, nor MDO as a resulting construct is unitarily understood among the participants in the debate. To contribute to this debate, this thesis poses the following three research questions. 1. How is the term Domain understood in the current MDO debate? a. What, if any variation has there been in the understanding of the Domain concept since MDO was launched as a concept in 2016/17? 2. Is there a correlation in the understanding of the Domain term and the understanding of the MDO concept? Is there a causal link between the understanding of the term and the understanding of the concept? If so, what is the dependent and what is the independent variable? 3. How can these relations, and their attendant understandings of MDO, affect small and medium states, with Norway as an example? By conducting a literary review of openly available government publications, primarily from the US, UK, NATO, and Norway, combined with different articles and contributions to the debate, a picture of the various interpretations and their implications can be painted. It is quickly apparent that the domain term is far from unitary in its understanding, but the origins and implications of the various interpretations are less clear. In addition, the variation does not appear to follow any set geographical, organisational, or chronological lines within the timeframe addressed, primarily from 2016 up until today. Furthermore, variations in the understanding or usage of the domain term appears to correlate with the understanding of MDO. However, the direction of a possible causal link, or if both are dependent on another variable cannot be ascertained with any degree of confidence from the available source material. This would seem to indicate that there may be several different relationships between the variables, depending on the writer in question. Based on this apparently fractured understanding of both the domain term and MDO as a concept, navigating the security environment becomes challenging for small states such as Norway. This is primarily because the various interpretations lead to significantly different adaptations of one’s posture and policy, with several of the adaptations having the potential of being mutually exclusive. In total this shows that the apparent lack of a common understanding of the terms and concepts discussed in this debate contributes to potential misunderstandings in aligning both force structure, doctrine, and general security policy. Given the potential cost associated with investing in the “wrong” structure or approach, this ambiguity is especially troubling for smaller states such as Norway with V relatively limited capacity to pursue multiple tracks of development and force structure. However, given the varying points of view and perspectives involved, a unitary understanding of the term and the concept is unlikely. It is therefore imperative that one is mindful of the perspectives of the various participants when trying to make sense of the debate

    Consortium Proposal NFDI-MatWerk

    Get PDF
    This is the official proposal the NFDI-consortium NFDI-MatWerk submitted to the DFG within the request for funding the project. Visit www.dfg.de/nfdi for more infos on the German National Research Data Infrastructure (Nationale Forschungsdateninfrastruktur - NFDI) initiative. Visit www.nfdi-matwerk.de for last infos about the project NFDI-MatWerk

    Model morphisms (MoMo) to enable language independent information models and interoperable business networks

    Get PDF
    MSc. Dissertation presented at Faculdade de Ciências e Tecnologia of Universidade Nova de Lisboa to obtain the Master degree in Electrical and Computer EngineeringWith the event of globalisation, the opportunities for collaboration became more evident with the effect of enlarging business networks. In such conditions, a key for enterprise success is a reliable communication with all the partners. Therefore, organisations have been searching for flexible integrated environments to better manage their services and product life cycle, where their software applications could be easily integrated independently of the platform in use. However, with so many different information models and implementation standards being used, interoperability problems arise. Moreover,organisations are themselves at different technological maturity levels, and the solution that might be good for one, can be too advanced for another, or vice-versa. This dissertation responds to the above needs, proposing a high level meta-model to be used at the entire business network, enabling to abstract individual models from their specificities and increasing language independency and interoperability, while keeping all the enterprise legacy software‟s integrity intact. The strategy presented allows an incremental mapping construction, to achieve a gradual integration. To accomplish this, the author proposes Model Driven Architecture (MDA) based technologies for the development of traceable transformations and execution of automatic Model Morphisms

    Models and Methods for Network Selection and Balancing in Heterogeneous Scenarios

    Get PDF
    The outbreak of 5G technologies for wireless communications can be considered a response to the need for widespread coverage, in terms of connectivity and bandwidth, to guarantee broadband services, such as streaming or on-demand programs offered by the main television networks or new generation services based on augmented and virtual reality (AR / VR). The purpose of the study conducted for this thesis aims to solve two of the main problems that will occur with the outbreak of 5G, that is, the search for the best possible connectivity, in order to offer users the resources necessary to take advantage of the new generation services, and multicast as required by the eMBMS. The aim of the thesis is the search for innovative algorithms that will allow to obtain the best connectivity to offer users the resources necessary to use the 5G services in a heterogeneous scenario. Study UF that allows you to improve the search for the best candidate network and to achieve a balance that allows you to avoid congestion of the chosen networks. To achieve these two important focuses, I conducted a study on the main mathematical methods that made it possible to select the network based on QoS parameters based on the type of traffic made by users. A further goal was to improve the computational computation performance they present. Furthermore, I carried out a study in order to obtain an innovative algorithm that would allow the management of multicast. The algorithm that has been implemented responds to the needs present in the eMBMS, in realistic scenarios

    Manufacturing code generation for rotational parts in a feature based product modelling environment

    Get PDF
    An important element for the integration of CAD/CAM is the representation and handling of data used during the design and manufacturing activities. The use of features and product modelling techniques bring a better handling of this data and provide CAD/CAM with an excellent platform for integration. The thesis explores the use of a predefined set of features in a product modelling environment for the design and machining of rotational components. Theword features in this research implies a set of functional, geometrical and technological information with a unique form. Those features are pre-defined and comprise of a limited number of elements which carry the information related to design and manufacturing activities. The thesis is divided into three main parts. The first part contains a review of topics related to the research e. g. group technology, component features, CAD/CAM and also contains a literature survey of related research works. In the second part the "features" are defined and presented. Also the product modelling environment is explained and the basic rule based procedures which are used to automatize the operation planning activities are presented. In the last part a description of the case-studies used for automatic NC code generation is presented followed by a discussion of the results. Lastly, the conclusions are drawn and ideas for further work presented

    Incorporating Weak Statistics for Low-Resource Language Modeling

    Get PDF
    Automatic speech recognition (ASR) requires a strong language model to guide the acoustic model and favor likely utterances. While many tasks enjoy billions of language model training tokens, many domains which require ASR do not have readily available electronic corpora.The only source of useful language modeling data is expensive and time-consuming human transcription of in-domain audio. This dissertation seeks to quickly and inexpensively improve low-resource language modeling for use in automatic speech recognition. This dissertation first considers efficient use of non-professional human labor to best improve system performance, and demonstrate that it is better to collect more data, despite higher transcription error, than to redundantly transcribe data to improve quality. In the process of developing procedures to collect such data, this work also presents an efficient rating scheme to detect poor transcribers without gold standard data. As an alternative to this process, automatic transcripts are generated with an ASR system and explore efficiently combining these low-quality transcripts with a small amount of high quality transcripts. Standard n-gram language models are sensitive to the quality of the highest order n-gram and are unable to exploit accurate weaker statistics. Instead, a log-linear language model is introduced, which elegantly incorporates a variety of background models through MAP adaptation. This work introduces marginal class constraints which effectively capture knowledge of transcriber error and improve performance over n-gram features. Finally, this work constrains the language modeling task to keyword search of words unseen in the training text. While overall system performance is good, these words suffer the most due to a low probability in the language model. Semi-supervised learning effectively extracts likely n-grams containing these new keywords from a large corpus of audio. By using a search metric that favors recall over precision, this method captures over 80% of the potential gain

    Dynamic topic adaptation for improved contextual modelling in statistical machine translation

    Get PDF
    In recent years there has been an increased interest in domain adaptation techniques for statistical machine translation (SMT) to deal with the growing amount of data from different sources. Topic modelling techniques applied to SMT are closely related to the field of domain adaptation but more flexible in dealing with unstructured text. Topic models can capture latent structure in texts and are therefore particularly suitable for modelling structure in between and beyond corpus boundaries, which are often arbitrary. In this thesis, the main focus is on dynamic translation model adaptation to texts of unknown origin, which is a typical scenario for an online MT engine translating web documents. We introduce a new bilingual topic model for SMT that takes the entire document context into account and for the first time directly estimates topic-dependent phrase translation probabilities in a Bayesian fashion. We demonstrate our model’s ability to improve over several domain adaptation baselines and further provide evidence for the advantages of bilingual topic modelling for SMT over the more common monolingual topic modelling. We also show improved performance when deriving further adapted translation features from the same model which measure different aspects of topical relatedness. We introduce another new topic model for SMT which exploits the distributional nature of phrase pair meaning by modelling topic distributions over phrase pairs using their distributional profiles. Using this model, we explore combinations of local and global contextual information and demonstrate the usefulness of different levels of contextual information, which had not been previously examined for SMT. We also show that combining this model with a topic model trained at the document-level further improves performance. Our dynamic topic adaptation approach performs competitively in comparison with two supervised domain-adapted systems. Finally, we shed light on the relationship between domain adaptation and topic adaptation and propose to combine multi-domain adaptation and topic adaptation in a framework that entails automatic prediction of domain labels at the document level. We show that while each technique provides complementary benefits to the overall performance, there is an amount of overlap between domain and topic adaptation. This can be exploited to build systems that require less adaptation effort at runtime