8 research outputs found

    Dealing with spelling variation in Early Modern English texts

    Get PDF
    Early English Books Online contains facsimiles of virtually every English work printed between 1473 and 1700; some 125,000 publications. In September 2009, the Text Creation Partnership released the second instalment of transcriptions of the EEBO collection, bringing the total number of transcribed works to 25,000. It has been estimated that this transcribed portion contains 1 billion words of running text. With such large datasets and the increasing variety of historical corpora available from the Early Modern English period, the opportunities for historial corpus linguistic research have never been greater. However, it has been observed in prior research, and quantified on a large-scale for the first time in this thesis, that texts from this period contain significant amounts of spelling variation until the eventual standardisation of orthography in the 18th century. The problems caused by this historical spelling variation are the focus of this thesis. It will be shown that the high levels of spelling variation found have a significant impact on the accuracy of two widely used automatic corpus linguistic methods - Part-of-Speech annotation and key word analysis. The development of historical spelling normalisation methods which can alleviate these issues will then be presented. Methods will be based on techniques used in modern spellchecking, with various analyses of Early Modern English spelling variation dictating how the techniques are applied. With the methods combined into a single procedure, automatic normalisation can be performed on an entire corpus of any size. Evaluation of the normalisation performance shows that after training, 62% of required normalisations are made, with a precision rate of 95%

    Proceedings of the 17th Annual Conference of the European Association for Machine Translation

    Get PDF
    Proceedings of the 17th Annual Conference of the European Association for Machine Translation (EAMT

    Towards Practical Privacy-Preserving Protocols

    Get PDF
    Protecting users' privacy in digital systems becomes more complex and challenging over time, as the amount of stored and exchanged data grows steadily and systems become increasingly involved and connected. Two techniques that try to approach this issue are Secure Multi-Party Computation (MPC) and Private Information Retrieval (PIR), which aim to enable practical computation while simultaneously keeping sensitive data private. In this thesis we present results showing how real-world applications can be executed in a privacy-preserving way. This is not only desired by users of such applications, but since 2018 also based on a strong legal foundation with the General Data Protection Regulation (GDPR) in the European Union, that forces companies to protect the privacy of user data by design. This thesis' contributions are split into three parts and can be summarized as follows: MPC Tools Generic MPC requires in-depth background knowledge about a complex research field. To approach this, we provide tools that are efficient and usable at the same time, and serve as a foundation for follow-up work as they allow cryptographers, researchers and developers to implement, test and deploy MPC applications. We provide an implementation framework that abstracts from the underlying protocols, optimized building blocks generated from hardware synthesis tools, and allow the direct processing of Hardware Definition Languages (HDLs). Finally, we present an automated compiler for efficient hybrid protocols from ANSI C. MPC Applications MPC was for a long time deemed too expensive to be used in practice. We show several use cases of real-world applications that can operate in a privacy-preserving, yet practical way when engineered properly and built on top of suitable MPC protocols. Use cases presented in this thesis are from the domain of route computation using BGP on the Internet or at Internet Exchange Points (IXPs). In both cases our protocols protect sensitive business information that is used to determine routing decisions. Another use case focuses on genomics, which is particularly critical as the human genome is connected to everyone during their entire lifespan and cannot be altered. Our system enables federated genomic databases, where several institutions can privately outsource their genome data and where research institutes can query this data in a privacy-preserving manner. PIR and Applications Privately retrieving data from a database is a crucial requirement for user privacy and metadata protection, and is enabled amongst others by a technique called Private Information Retrieval (PIR). We present improvements and a generalization of a well-known multi-server PIR scheme of Chor et al., and an implementation and evaluation thereof. We also design and implement an efficient anonymous messaging system built on top of PIR. Furthermore we provide a scalable solution for private contact discovery that utilizes ideas from efficient two-server PIR built from Distributed Point Functions (DPFs) in combination with Private Set Intersection (PSI)

    Preface

    Get PDF

    Digital Histories

    Get PDF
    Historical scholarship is currently undergoing a digital turn. All historians have experienced this change in one way or another, by writing on word processors, applying quantitative methods on digitalized source materials, or using internet resources and digital tools. Digital Histories showcases this emerging wave of digital history research. It presents work by historians who – on their own or through collaborations with e.g. information technology specialists – have uncovered new, empirical historical knowledge through digital and computational methods. The topics of the volume range from the medieval period to the present day, including various parts of Europe. The chapters apply an exemplary array of methods, such as digital metadata analysis, machine learning, network analysis, topic modelling, named entity recognition, collocation analysis, critical search, and text and data mining. The volume argues that digital history is entering a mature phase, digital history ‘in action’, where its focus is shifting from the building of resources towards the making of new historical knowledge. This also involves novel challenges that digital methods pose to historical research, including awareness of the pitfalls and limitations of the digital tools and the necessity of new forms of digital source criticisms. Through its combination of empirical, conceptual and contextual studies, Digital Histories is a timely and pioneering contribution taking stock of how digital research currently advances historical scholarship

    Jems 5-2016 - Full Issue

    Get PDF

    Minimising mathematical anxiety in teaching mathematics and assessing student’s work

    Get PDF
    This paper builds up a theoretical perspective and supports a possibility of creating a special assessment environment for students, where mathematical knowledge and understanding can be assessed with a reduced number of external psychological factors that may affect such assessment. A concept of a zone with minimal effect of anxiety is introduced and described. Students’ successful work on extending the zone by means of a carefully selected chain of questions, where some questions only are part of a real assessment, allows students to reconsider their attitudes towards mathematics and assist teachers to identify some students’ main learning difficulties as of psychological character. Further suggestions about developing and investigating special assessment environments are outlined and discussed
    corecore