126 research outputs found

    Improved treebank querying: a facelift for GrETEL

    Get PDF
    We describe the improvements to the interface of GrETEL, an online tool for querying treebanks. We demonstrate how we employed the results of two usability tests and individual user feedback in order to create a more user-friendly interface which meets the users’ needs

    Querying large treebanks : benchmarking GrETEL indexing

    Get PDF
    The amount of data that is available for research grows rapidly, yet technology to efficiently interpret and excavate these data lags behind. For instance, when using large treebanks for linguistic research, the speed of a query leaves much to be desired. GrETEL Indexing, or GrInding, tackles this issue. The idea behind GrInding is to make the search space as small as possible before actually starting the treebank search, by pre-processing the treebank at hand. We recursively divide the treebank into smaller parts, called subtree-banks, which are then converted into database files. All subtree-banks are organized according to their linguistic dependency pattern, and labeled as such. Additionally, general patterns are linked to more specific ones. By doing so, we create millions of databases, and given a linguistic structure we know in which databases that structure can occur, leading up to a significant efficiency boost. We present the results of a benchmark experiment, testing the effect of the GrInding procedure on the SoNaR-500 treebank

    Treebank querying with GrETEL 3 : bigger, faster, stronger

    Get PDF
    We describe the new version of GrETEL (http://gretel.ccl.kuleuven.be/gretel3), an online tool which allows users to query treebanks by means of a natural language example (example-based search) or via a formal query (XPath search). The new release comprises an update to the interface and considerable improvements in the back-end search mechanism. The update of the front-end is based on user suggestions. In addition to an overall design update, major changes include a more intuitive query builder in the example-based search mode and a visualizer for syntax trees that is compatible with all modern browsers. Moreover, the results are presented to the user as soon as they are found, so users can browse the matching sentences before the treebank search is completed. We will demonstrate that those changes considerably improve the query procedure. The update of the back-end mainly includes optimizing the search algorithm for querying the (very) large SoNaR treebank. Querying this 500-million word treebank was already made possible in the previous version of GrETEL, but due to the complex search mechanism this often resulted in long query times or even a timeout before the search completed. The improved version of the search algorithm results in faster query times and more accurate search results, which greatly enhances the usability of the SoNaR treebank for linguistic research

    A Comparison of Different Punctuation Prediction Approaches in a Translation Context

    Get PDF
    We test a series of techniques to predict punctuation and its effect on machine translation (MT) quality. Several techniques for punctuation prediction are compared: language modeling techniques, such as n-grams and long short-term memories (LSTM), sequence labeling LSTMs (unidirectional and bidirectional), and monolingual phrase-based, hierarchical and neural MT. For actual translation, phrase-based, hierarchical and neural MT are investigated. We observe that for punctuation prediction, phrase-based statistical MT and neural MT reach similar results, and are best used as a preprocessing step which is followed by neural MT to perform the actual translation. Implicit punctuation insertion by a dedicated neural MT system, trained on unpunctuated source and punctuated target, yields similar results.This research was done in the context of the SCATE project, funded by the Flemish Agency for Innovation and Entrepreneurship (IWT project 13007)

    Итальянская экономика под воздействием российских санкций

    Get PDF
    Выпускная квалификационная работа посвящена анализу экономики Италии в контексте введения российского продовольственного эмбарго. Основные цели исследования следующие: установить, в какой мере российское продовольственное эмбарго повлияло как на национальную экономику Италии, так и на экономику отдельно взятых итальянских регионов; определить, является ли использование санкций в международных отношениях наиболее оптимальной и, самое главное, легитимной мерой воздействия на другое государство в современной политической и экономической конъюнктуре; выяснить, каким образом Италия реагирует на новые вызовы и угрозы, а именно каким образом итальянское правительство сглаживает негативные последствия, которые повлекли российские торгово-экономические ограничительные меры. В качестве объекта исследования при написании данной работы рассматривается развитие итало-российских торгово-экономические отношения в контексте обоюдных санкций как со стороны России, так и со стороны Европейского Союза. Предметом исследования является развитие итальянской экономики под воздействием российского продовольственного эмбарго. В структуру работы входят: введение, три главы и заключение. Объём дипломной работы составляет 51 страницу. При написании исследования были использованы 29 источников.The final qualification paper is devoted to the analysis of the development of the Italian economy in the context of the Russian food embargo. The main issues of the paper are: to determine the influence of the Russian food embargo on both the national economy of Italy and the economy of Italian regions; to research whether the usage of the economic sanctions is the most appropriate and legal way to solve the problems in the modern interdependent world; to find out whereby the Italian economy has faced the new challenges and threats of the mutual sanctions confrontation between Russia and the European Union and their allies. The object of the paper is the development of the Italian-Russian trade and economic relations in the context of mutual sanctions implemented by both Russia and the European Union. The subject of the paper is the development of the Italian economy under the Russian sanctions. The final qualification paper consists of introduction, three chapters and conclusion. There are 51 pages in the paper. 29 sources have been used

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Low Resources Machine Translation

    Get PDF
    METIS-II was a EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use ‘basic’ linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their ‘home’ languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions

    European language equality

    Get PDF
    This deep dive on data, knowledge graphs (KGs) and language resources (LRs) is the final of the four technology deep dives, as data as well as related models are the basis for technologies and solutions in the area of Language Technology (LT) for European digital language equality (DLE). This chapter focuses on the data and LRs required to achieve full DLE in Europe by 2030. The main components identified – data, KGs, LRs – are explained, and used to analyse the state-of-the-art as well as identify gaps. All of these components need to be tackled in the future, for the widest range of languages possible, from official EU languages to dialects to non- EU languages used in Europe. For all these languages, efficient data collection and sustainable data provision to be facilitated with fair conditions and costs. Specific technologies, methodologies and tools have been identified to enable the implementation of the vision of DLE by 2030. In addition, data-related business models and data-governance models are discussed, as they are considered a prerequisite for a working data economy that stimulates a vibrant LT landscape that can bring about European DLE.peer-reviewe
    corecore