67 research outputs found

    A study on creating a custom South Sotho spellchecking and correcting software desktop application

    Get PDF
    Thesis (B. Tech.) - Central University of Technology, Free State, 200

    Dealing with spelling variation in Early Modern English texts

    Get PDF
    Early English Books Online contains facsimiles of virtually every English work printed between 1473 and 1700; some 125,000 publications. In September 2009, the Text Creation Partnership released the second instalment of transcriptions of the EEBO collection, bringing the total number of transcribed works to 25,000. It has been estimated that this transcribed portion contains 1 billion words of running text. With such large datasets and the increasing variety of historical corpora available from the Early Modern English period, the opportunities for historial corpus linguistic research have never been greater. However, it has been observed in prior research, and quantified on a large-scale for the first time in this thesis, that texts from this period contain significant amounts of spelling variation until the eventual standardisation of orthography in the 18th century. The problems caused by this historical spelling variation are the focus of this thesis. It will be shown that the high levels of spelling variation found have a significant impact on the accuracy of two widely used automatic corpus linguistic methods - Part-of-Speech annotation and key word analysis. The development of historical spelling normalisation methods which can alleviate these issues will then be presented. Methods will be based on techniques used in modern spellchecking, with various analyses of Early Modern English spelling variation dictating how the techniques are applied. With the methods combined into a single procedure, automatic normalisation can be performed on an entire corpus of any size. Evaluation of the normalisation performance shows that after training, 62% of required normalisations are made, with a precision rate of 95%

    Split and Migrate: Resource-Driven Placement and Discovery of Microservices at the Edge

    Get PDF
    Microservices architectures combine the use of fine-grained and independently-scalable services with lightweight communication protocols, such as REST calls over HTTP. Microservices bring flexibility to the development and deployment of application back-ends in the cloud. Applications such as collaborative editing tools require frequent interactions between the front-end running on users\u27 machines and a back-end formed of multiple microservices. User-perceived latencies depend on their connection to microservices, but also on the interaction patterns between these services and their databases. Placing services at the edge of the network, closer to the users, is necessary to reduce user-perceived latencies. It is however difficult to decide on the placement of complete stateful microservices at one specific core or edge location without trading between a latency reduction for some users and a latency increase for the others. We present how to dynamically deploy microservices on a combination of core and edge resources to systematically reduce user-perceived latencies. Our approach enables the split of stateful microservices, and the placement of the resulting splits on appropriate core and edge sites. Koala, a decentralized and resource-driven service discovery middleware, enables REST calls to reach and use the appropriate split, with only minimal changes to a legacy microservices application. Locality awareness using network coordinates further enables to automatically migrate services split and follow the location of the users. We confirm the effectiveness of our approach with a full prototype and an application to ShareLatex, a microservices-based collaborative editing application

    Speech-to-text models to transcribe emergency calls

    Get PDF
    This thesis is part of the larger project “AI-Support in Medical Emergency Calls (AISMEC)”, which aims to develop a decision support system for Emergency Medical Communication Center (EMCC) operators to better identify and respond to acute brain stroke. The system will utilize historical health data and the transcription from the emergency call to assist the EMCC operator in whether or not to dispatch an ambulance and with what priority and urgency. Our research primarily focuses on adapting the Automatic Speech Recognition (ASR) model, Whisper, to create a robust and accurate ASR model to transcribe Norwegian emergency calls. The model was fine-tuned on simulated emergency calls and recordings done by ourselves. Furthermore, a proof-of-concept ASR web application was developed with the goal of streamlining the manual task of transcribing emergency calls. After demonstrating the application to the involved researchers in AISMEC, and the potential users, both suggested optimism about the potential of this solution to streamline the transcription process. As part of our research, we conducted an experiment where we utilized the suggested transcriptions provided by the application and then corrected them for accuracy. This approach showed a notable reduction in our transcription time. We also found that establishing a machine learning pipeline to fine-tune the model on historical emergency calls was feasible. Further work would involve training the model on actual emergency calls. To investigate the efficiency of the ASR web application further, a larger scale of the semi-automatic transcription experiment could be conducted by the professional audio transcribers at Haukeland universitetssjukehus.Master's Thesis in Joint Master's Programme in Software Engineering - collaboration with HVLPROG399MAMN-PRO

    Exploration of documents concerning Foundlings in Fafe along XIX Century

    Get PDF
    Dissertação de mestrado integrado em Informatics EngineeringThe abandonment of children and newborns is a problem in our society. In the last few decades, the introduction of contraceptive methods, the development of social programs and family planning were fundamental to control undesirable pregnancies and support families in need. But these developments were not enough to solve the abandonment epidemic. The anonymous abandonment has a dangerous aspect. In order to preserve the family identity, a child is usually left in a public place at night. Since children and newborns are one of the most vulnerable groups in our society, the time between the abandonment and the assistance of the child is potentially deadly. The establishment of public institutions in the past, such as the foundling wheel, was extremely important as a strategy to save lives. These institutions supported the abandoned children, while simultaneously providing a safer abandonment process, without compromising the anonymity of the family. The focus of the Master’s Project discussed in this dissertation is the analysis and processing of nineteenth century documents, concerning the Foundling Wheel of Fafe. The analysis of sample documents is the initial step in the development of an ontology. The ontology has a fundamental role in the organization and structure of the information contained in these historical documents. The identification of concepts and the relationships between them, culminates in a structured knowledge repository. Other important component is the development of a digital platform, where users are able to access the content stored in the knowledge repository and explore the digital archive, which incorporates the digitized version of documents and books from these historical institutions. The development of this project is important for some reasons. Directly, the implementation of a knowledge repository and a digital platform preserves information. These documents are mostly unique records and due to their age and advanced state of degradation, the substitution of the physical by digital access reduces the wear and tear associated to each consultation. Additionally, the digital archive facilitates the dissemination of valuable information. Research groups or the general public are able to use the platform as a tool to discover the past, by performing biographic, cultural or socio-economic studies over documents dated to the ninetieth century.O abandono de crianças e de recém-nascidos é um flagelo da sociedade. Nas últimas décadas, a introdução de métodos contraceptivos e de programas sociais foram essenciais para o desenvolvimento do planeamento familiar. Apesar destes avanços, estes programas não solucionaram a problemática do abandono de crianças e recém-nascidos. Problemas socioeconómicos são o principal factor que explica o abandono. O processo de abandono de crianças possui uma agravante perigosa. De forma a proteger a identidade da família, este processo ocorre normalmente em locais públicos e durante a noite. Como crianças e recém-nascidos constituem um dos grupos mais vulneráveis da sociedade, o tempo entre o abandono da criança e seu salvamento, pode ser demasiado longo e fatal. A casa da roda foi uma instituição introduzida de forma a tornar o processo de abandono anónimo mais seguro. O foco do Projeto de Mestrado discutido nesta dissertação é a análise e tratamento de documentos do século XIX, relativos à Casa da Roda de Fafe preservados pelo Arquivo Municipal de Fafe. A análise documental representa o ponto de partida do processo de desenvolvimento de uma ontologia. A ontologia possui um papel fundamental na organização e estruturação da informação contida nos documentos históricos. O processo de desenvolvimento de uma base de conhecimento consiste na identificação de conceitos e relações existentes nos documentos. Outra componente fundamental deste projecto é o desenvolvimento de uma plataforma digital, que permite utilizadores acederem à base de conhecimento desenvolvida. Os utilizadores podem pesquisar, explorar e adicionar informação à base de conhecimento. O desenvolvimento deste projecto possui importância. De forma imediata, a implementação de uma plataforma digital permite salvaguardar e preservar informação contida nos documentos. Estes documentos são os únicos registos existentes com esse conteúdo e muitos encontram-se num estado avançado de degradação. A substituição de acessos físicos por acessos digitais reduz o desgaste associado a cada consulta. O desenvolvimento da plataforma digital permite disseminar a informação contida na base documental. Investigadores ou o público em geral podem utilizar esta ferramenta com o intuito de realizar estudos biográficos, culturais e sociais sobre este arquivo histórico

    Algoritma Jaro-Winkler Distance: Fitur Autocorrect dan Spelling Suggestion pada Penulisan Naskah Bahasa Indonesia di BMS TV

    Get PDF
    Autocorrect adalah suatu sistem yang dapat memeriksa dan memperbaiki kesalahan penulisan kata secara otomatis. Dewasa ini fitur autocorrect memang sering ditemui pada berbagai perangkat dan aplikasi, misalkan pada papan ketik smartphone dan aplikasi misalkan sebut saja Microsoft Word. Sistem autocorrect tersebut langsung mengganti kata yang dianggap salah oleh sistem secara otomatis tanpa memberi tahu pengguna sehingga pengguna seringkali tidak sadar tulisannya berubah sedangkan kata penggantinya tidak selalu benar sesuai dengan yang dimaksud pengguna. Pengetahuan Microsoft Word pada fitur autocorrect-nya berbahasa Inggris sehingga tidak dapat diterapkan pada penulisan naskah berita di BMS TV. Setiap harinya News Director BMS TV memeriksa naskah yang akan diberitakan dimana termasuk diantaranya adalah pemeriksaan ejaan. Dengan fitur autocorrect dan spelling suggestion bahasa Indonesia diharapkan dapat membantu News Director BMS TV untuk memeriksa dan memperbaiki kesalahan penulisan kata secara otomatis serta memberi saran penulisan ejaan kata yang benar dalam bahasa Indonesia. Metode pengembangan perangkat lunak yang digunakan adalah Extreme Programming dan algoritme Jaro-Winkler Distance. Jaro-Winkler adalah algoritme untuk menghitung nilai jarak kedekatan antara dua teks. Hasil dari penelitian ini adalah sebuah sistem yang dapat membantu News Director BMS TV dalam pemeriksaan kesalahan penulisan ejaan kata pada naskah bahasa Indonesia dan mempermudah News Director pusat dalam penghimpunan naskah dari berbagai kontributor BMS TV. Dapat disimpulkan bahwa fitur autocorrect dan spelling suggestion dapat menengani kesalahan penulisan ejaan kata dengan pengujian 60 kata yang terdiri dari berbagai skenario kesalahan penulisan kata fitur ini dapat memperbaiki sepuluh kata secara otomatis dengan benar dan memunculkan saran ejaan kata pada 39 kata dengan tepat.   Abstract Autocorrect is a software system that automatically identifies and correct misspelled words. Nowadays autocorrect feature is often encountered in various devices dan applications, like on the smartphone keyboard dan Microsoft Word application. The autocorrect system instantly replaces the word that is considered wrong by the system automatically without notifying the user so that users are often not aware of writing changes while the replacement word is not always true in accordance with the intended user. The Autocorrect feature of Microsoft Word uses English so it can’t be applied on writing news script in BMS TV. Every day News Director of BMS TV checks the script that would be reported where there is a spell checking included. By using bahasa in autocorrect dan spelling suggestion, it is expected to help News Director BMS TV to check dan fix the misspelled word automatically dan give suggestion for the right words spelling in bahasa. The development software method that is used is Extreme Programming dan Jaro-Winkler Distance algorithm. Jaro-Winkler is an algorithm that is applied to calculate the distance of proximity between two texts. The results of this study is a system that could help News Director BMS TV in identifying  misspelled words on script in bahasa dan to make it easier for News Director center in collecting of manuscripts from various contributors of BMS TV. It can be concluded that the autocorrect dan spelling suggestion features can compound the misspelled words with a 60-word test consisting of various error scenarios. This feature can correct ten words automatically dan show correct spelling suggestion word on 39 words

    Development of Online Course System and an Open Access Online Repository

    Get PDF
    This Project was divided in to two phases: the first phase comprising of development of an online course system for the institute with the help of moodle. Moodle( modular object oriented dynamic learning environment) is an open source software package for producing internet-based courses and web sites. It's an ongoing development project designed to support a social Constructionist framework of education. Moodle is provided freely as Open Source software (under the GNU Public License). Basically this means Moodle is copyrighted, but that we have additional freedoms of improvising the source code. The 2nd Phase of the project was that of deployment of an open access online repository system using E-prints. EPrints is an open source software package for building open access repositories that are compliant with the Open Archives Initiative Protocol for Metadata Harvesting. It shares many of the features commonly seen in Document Management systems, but is primarily used for institutional repositories and scientific journals. EPrints has been developed at the University of Southampton School of Electronics and Computer Science and released under a GPL license

    Smart forms: a survey to state and test the most major electronic forms technologies that are based on W3C standards

    Get PDF
    Smart Forms are efficient and powerful electronic forms that could be used for the interactions between end users and web applications systems. Several electronic forms software products that use W3C technologies are presented to meet the demands of users. This thesis aims to study and test the major electronic forms technologies that are based on W3C standards. It discusses the main electronic forms features and experiments them with some essential applications. This research produces deep understanding of the most electronic forms technologies that are based on W3C standards and their important features, which make an electronic form smart form. In addition, it opens developments prospects for other researchers to develop some applications ideas that could contribute in the electronic forms domain

    A Conversational Bot Expert in TCP/IP

    Get PDF
    When studying a telecommunication degree, it can be sometimes hard to remember all concepts or memorizing in detail how certain protocols work. To answer this problem, this project aimed to study how to create a bot in order to answer simple questions regarding the TCP/IP protocols. First of all, it was necessary to analyse general information about conversational bots and programming tools in order to choose how to make the best implementation possible. Afterwards, we proposed different design alternatives that had to be done in order to develop the bot. These alternatives included the creation of a new algorithm to analyse text from users and obtain the main concepts for creating answers to questions. Finally, we divided TeCePe’s implementation in programming modules that perform each of its functionalities separately to make easier its analysis and addition to the general code. Users’ results suggest that bots like TeCePe could provide some benefits to students while studying a subject. They usually prefer realistic human interactions and want more additional features besides bot’s main functionality in order to be encouraged to use conversational bots, which are not very popular in the education field at this moment. The main results of this project are generally favourable, as the bot developed fulfilled most requirements using all algorithms proposed. TeCePe is fast when searching, and can correctly detect users’ intention in order to output the best possible answer.Al estudiar un grado en ingeniería de telecomunicaciones, puede ocurrir que sea difícil recordar todos los conceptos dados en clase o memorizar cómo funcionan algunos protocolos. Para resolver este problema, en este proyecto se ha estudiado como crear un bot para resolver preguntas sencillas relacionadas con los protocolos TCP/IP. En primer lugar fue necesario un análisis sobre los bots conversacionales y herramientas de programación para poder realizar la mejor implementación posible. A continuación se propusieron diferentes alternativas de diseño que deberían realizarse para desarrollar el bot. Estas alternativas incluyen crear un nuevo algoritmo para analizar textos de los usuarios y obtener los principales conceptos e ideas para crear las respuestas del bot. Por último, dividimos la implementación de TeCePe en diferentes módulos de programación, realizando cada una de las funciones de TeCePe por separado para hacer la programación más sencilla y facilitar su integración con el código principal. Los resultados con usuarios sugieren que bots como TeCePe podrían otorgar algunos beneficios a los estudiantes que estén estudiando una asignatura concreta. Normalmente prefieren interacciones realistas (similares a las humanas) y quieren funcionalidades extra para que estén motivados a utilizar bots conversacionales, que no son muy populares en el campo educativo por el momento. Los principales resultados del proyecto son generalmente favorables, puesto que el bot desarrollado cumple la mayoría de los requisitos utilizando todos los algoritmos propuestos anteriormente. TeCePe es rápido en sus búsquedas y puede detectar las intenciones de los usuarios para dar la mejor respuesta posible en cada caso.Ingeniería Telemátic
    corecore