96 research outputs found

    DEXTER: A workbench for automatic term extraction with specialized corpora

    Full text link
    [EN] Automatic term extraction has become a priority area of research within corpus processing. Despite the extensive literature in this field, there are still some outstanding issues that should be dealt with during the construction of term extractors, particularly those oriented to support research in terminology and terminography. In this regard, this article describes the design and development of DEXTER, an online workbench for the extraction of simple and complex terms from domain-specific corpora in English, French, Italian and Spanish. In this framework, three issues contribute to placing the most important terms in the foreground. First, unlike the elaborate morphosyntactic patterns proposed by most previous research, shallow lexical filters have been constructed to discard term candidates. Second, a large number of common stopwords are automatically detected by means of a method that relies on the IATE database together with the frequency distribution of the domain-specific corpus and a general corpus. Third, the term-ranking metric, which is grounded on the notions of salience, relevance and cohesion, is guided by the IATE database to display an adequate distribution of terms.Financial support for this research has been provided by the DGI, Spanish Ministry of Education and Science, grant FFI2014-53788-C3-1-P.Periñán-Pascual, C. (2018). DEXTER: A workbench for automatic term extraction with specialized corpora. Natural Language Engineering. 24(2):163-198. https://doi.org/10.1017/S1351324917000365S16319824

    Automatic rule verification for digital building permits

    Get PDF
    Dissertação de mestrado em Modelação de Informação na Construção de Edifícios BIM A+O sector da construção está a enfrentar grandes mudanças nas exigências do cliente e do mercado, empurrando para a transformação digital e para uma indústria orientada para os dados. Os governos tomaram parte ativa nesta mudança, apoiando a digitalização de processos como o das licenças de construção, introduzindo a utilização de modelos de informação de construção (BIM). A investigação sobre a digitalização do licenciamento municipal de construções mostrou grandes avanços no que diz respeito à extração de regras de forma interpretável e à automatização de verificações; contudo, a conciliação entre as definições semânticas do modelo de construção e os conceitos definidos nos regulamentos está ainda em discussão. Além disso, a validação da acuidade das informações incluídas nos modelos de construção relativamente às definições do regulamento é importante para garantir a qualidade ao longo do processo de licença de construção. Esta dissertação visa propor um fluxo de trabalho híbrido para verificar a informação extraída explicitamente do modelo BIM e a informação implicitamente derivada das relações entre elementos, seguindo as disposições contidas nos regulamentos no contexto de Portugal. Com base em alguma revisão de literatura, foi proposto um novo processo, e foi desenvolvido um código Python utilizando a biblioteca IfcOpenshell para apoiar a automatização do processo de verificação, tradicionalmente realizada por técnicos nos gabinetes de licenciamento municipal. Os elementos desenvolvidos neste documento foram comprovados num estudo de caso, demonstrando que a validação híbrida pode ajudar a detetar erros de modelação e melhorar a acuidade da informação durante a apresentação inicial de modelos para um processo de licença de construção. Os resultados indicam que a inclusão de uma validação automática do modelo contra definições regulamentares pode ser introduzida para melhorar o grau de certeza da qualidade da informação contida no Modelo de Informação, além disso, a proposta de métodos que produzem resultados a partir de informação implícita pode alargar as capacidades do esquema IFC. Contudo, os esquemas desenvolvidos neste trabalho estão ainda em constante revisão e desenvolvimento e têm limitações de aplicabilidade em relação a certas classes do IFC.The construction sector is facing major changes in the client and market requirements, pushing towards the digital transformation and a data driven industry. Governments have taken an active part in this change by supporting the digitalization of processes such as the one for building permits by introducing the use of building information models (BIM). The research on the digitalization of the building permit has shown great advancements in regarding the rule extraction in interpretable ways and the automation of the verification; however, the conciliation between the building model semantic definitions and the concepts defined in the regulations is still in discussion. Moreover, the validation of the correctness of the information included in building models regarding the regulation definitions is important to guarantee the quality along the digital building permit process. This dissertation aims to propose a hybrid workflow to check the information extracted explicitly from the BIM model and the information implicitly derived from relationships between elements by following the provisions contained in the regulations in the context of Portugal. Based on some context and literature review, a process reengineering was proposed, and a Python code was developed using the IfcOpenShell library to support the automation of the verification process, traditionally carried out by technicians in the building permit offices. The elements developed in this document were proven in a case-study, demonstrating that the hybrid validation can help to detect modelling errors and improve the certainty of correctness of information during the initial submission of models for a building permit process. The results indicate that the inclusion of an automated validation of the model against regulation definitions can be introduced to improve the degree of certainty of the quality of the information contained in the Building Information Model, moreover the proposal of methods that produce results from implicit information can extend the capabilities of the IFC schema. However, the scripts developed in this work are still under constant review and development and have limitations of applicability in relation to certain IFC classes.Erasmus Mundus Joint Master Degree Programme – ERASMUS

    Advanced fuzzy matching in the translation of EU texts

    Get PDF
    In the translation industry today, CAT tool environments are an indispensable part of the translator’s workflow. Translation memory systems constitute one of the most important features contained in these tools and the question of how to best use them to make the translation process faster and more efficient legitimately arises. This research aims to examine whether there are more efficient methods of retrieving potentially useful translation suggestions than the ones currently used in TM systems. We are especially interested in investigating whether more sophisticated algorithms and the inclusion of linguistic features in the matching process lead to significant improvement in quality of the retrieved matches. The used dataset, the DGT-TM, is pre-processed and parsed, and a number of matching configurations are applied to the data structures contained in the produced parse trees. We also try to improve the matching by combining the individual metrics using a regression algorithm. The retrieved matches are then evaluated by means of automatic evaluation, based on correlations and mean scores, and human evaluation, based on correlations of the derived ranks and scores. Ultimately, the goal is to determine whether the implementation of some of these fuzzy matching metrics should be considered in the framework of the commercial CAT tools to improve the translation process

    Flexibility in Data Management

    Get PDF
    With the ongoing expansion of information technology, new fields of application requiring data management emerge virtually every day. In our knowledge culture increasing amounts of data and work force organized in more creativity-oriented ways also radically change traditional fields of application and question established assumptions about data management. For instance, investigative analytics and agile software development move towards a very agile and flexible handling of data. As the primary facilitators of data management, database systems have to reflect and support these developments. However, traditional database management technology, in particular relational database systems, is built on assumptions of relatively stable application domains. The need to model all data up front in a prescriptive database schema earned relational database management systems the reputation among developers of being inflexible, dated, and cumbersome to work with. Nevertheless, relational systems still dominate the database market. They are a proven, standardized, and interoperable technology, well-known in IT departments with a work force of experienced and trained developers and administrators. This thesis aims at resolving the growing contradiction between the popularity and omnipresence of relational systems in companies and their increasingly bad reputation among developers. It adapts relational database technology towards more agility and flexibility. We envision a descriptive schema-comes-second relational database system, which is entity-oriented instead of schema-oriented; descriptive rather than prescriptive. The thesis provides four main contributions: (1)~a flexible relational data model, which frees relational data management from having a prescriptive schema; (2)~autonomous physical entity domains, which partition self-descriptive data according to their schema properties for better query performance; (3)~a freely adjustable storage engine, which allows adapting the physical data layout used to properties of the data and of the workload; and (4)~a self-managed indexing infrastructure, which autonomously collects and adapts index information under the presence of dynamic workloads and evolving schemas. The flexible relational data model is the thesis\' central contribution. It describes the functional appearance of the descriptive schema-comes-second relational database system. The other three contributions improve components in the architecture of database management systems to increase the query performance and the manageability of descriptive schema-comes-second relational database systems. We are confident that these four contributions can help paving the way to a more flexible future for relational database management technology

    Flexibility in Data Management

    Get PDF
    With the ongoing expansion of information technology, new fields of application requiring data management emerge virtually every day. In our knowledge culture increasing amounts of data and work force organized in more creativity-oriented ways also radically change traditional fields of application and question established assumptions about data management. For instance, investigative analytics and agile software development move towards a very agile and flexible handling of data. As the primary facilitators of data management, database systems have to reflect and support these developments. However, traditional database management technology, in particular relational database systems, is built on assumptions of relatively stable application domains. The need to model all data up front in a prescriptive database schema earned relational database management systems the reputation among developers of being inflexible, dated, and cumbersome to work with. Nevertheless, relational systems still dominate the database market. They are a proven, standardized, and interoperable technology, well-known in IT departments with a work force of experienced and trained developers and administrators. This thesis aims at resolving the growing contradiction between the popularity and omnipresence of relational systems in companies and their increasingly bad reputation among developers. It adapts relational database technology towards more agility and flexibility. We envision a descriptive schema-comes-second relational database system, which is entity-oriented instead of schema-oriented; descriptive rather than prescriptive. The thesis provides four main contributions: (1)~a flexible relational data model, which frees relational data management from having a prescriptive schema; (2)~autonomous physical entity domains, which partition self-descriptive data according to their schema properties for better query performance; (3)~a freely adjustable storage engine, which allows adapting the physical data layout used to properties of the data and of the workload; and (4)~a self-managed indexing infrastructure, which autonomously collects and adapts index information under the presence of dynamic workloads and evolving schemas. The flexible relational data model is the thesis\' central contribution. It describes the functional appearance of the descriptive schema-comes-second relational database system. The other three contributions improve components in the architecture of database management systems to increase the query performance and the manageability of descriptive schema-comes-second relational database systems. We are confident that these four contributions can help paving the way to a more flexible future for relational database management technology

    Learning object metadata surrogates in search result interfaces: user evaluation, design and content

    Get PDF
    The purpose of this research was to evaluate user interaction with learning object metadata surrogates both in terms of content and presentation. The main objectives of this study were: (1) to review the literature on learning object metadata and user-centred evaluation of metadata surrogates in the context of cognitive information retrieval (including user-centred relevance and usability research); (2) to develop a framework for the evaluation of user interaction with learning object metadata surrogates in search result interfaces; (3) to investigate the usability of metadata surrogates in search result interfaces of learning object repositories (LORs) in terms of various presentation aspects (such as amount of information, structure and highlighting of query terms) as a means for facilitating the user relevance judgment process; (4) to investigate in-depth the type of content that should be included in learning object metadata surrogates in order to facilitate the process of relevance judgment; (5) to provide a set of recommendations—guidelines for the design of learning object metadata surrogates in search result interfaces both in terms of content and presentation. [Continues.

    “WARES”, a Web Analytics Recommender System

    Full text link
    Il est difficile d'imaginer des entreprises modernes sans analyse, c'est une tendance dans les entreprises modernes, même les petites entreprises et les entrepreneurs individuels commencent à utiliser des outils d'analyse d'une manière ou d'une autre pour leur entreprise. Pas étonnant qu'il existe un grand nombre d'outils différents pour les différents domaines, ils varient dans le but de simples statistiques d'amis et de visites pour votre page Facebook à grands et sophistiqués dans le cas des systèmes conçus pour les grandes entreprises, ils pourraient être shareware ou payés. Parfois, vous devez passer une formation spéciale, être un spécialiste certifiés, ou même avoir un diplôme afin d'être en mesure d'utiliser l'outil d'analyse. D'autres outils offrent une interface d’utilisateur simple, avec des tableaux de bord, pour satisfaire leur compréhension d’information pour tous ceux qui les ont vus pour la première fois. Ce travail sera consacré aux outils d'analyse Web. Quoi qu'il en soit pour tous ceux qui pensent à utiliser l'analyse pour ses propres besoins se pose une question: "quel outil doit je utiliser, qui convient à mes besoins, et comment payer moins et obtenir un gain maximum". Dans ce travail je vais essayer de donner une réponse sur cette question en proposant le système de recommandation pour les outils analytiques web –WARES, qui aideront l'utilisateur avec cette tâche "simple". Le système WARES utilise l'approche hybride, mais surtout, utilise des techniques basées sur le contenu pour faire des suggestions. Le système utilise certains ratings initiaux faites par utilisateur, comme entrée, pour résoudre le problème du “démarrage à froid”, offrant la meilleure solution possible en fonction des besoins des utilisateurs. Le besoin de consultations coûteuses avec des experts ou de passer beaucoup d'heures sur Internet, en essayant de trouver le bon outil. Le système lui–même devrait effectuer une recherche en ligne en utilisant certaines données préalablement mises en cache dans la base de données hors ligne, représentée comme une ontologie d'outils analytiques web existants extraits lors de la recherche en ligne précédente.It is hard to imagine modern business without analytics; it is a trend in modern business, even small companies and individual entrepreneurs start using analytics tools, in one way or another, for their business. Not surprising that there exist many different tools for different domains, they vary in purpose from simple friends and visits statistic for your Facebook page, to big and sophisticated systems designed for the big corporations, they could be free or paid. Sometimes you need to pass special training, be a certified specialist, or even have a degree to be able to use analytics tool, other tools offers simple user interface with dashboards for easy understanding and availability for everyone who saw them for the first time. Anyway, for everyone who is thinking about using analytics for his/her own needs stands a question: “what tool should I use, which one suits my needs and how to pay less and get maximum gain”. In this work, I will try to give an answer to this question by proposing a recommender tool, which will help the user with this “simple task”. This paper is devoted to the creation of WARES, as reduction from Web Analytics REcommender System. Proposed recommender system uses hybrid approach, but mostly, utilize content–based techniques for making suggestions, while using some user’s ratings as an input for “cold start” search. System produces recommendations depending on user’s needs, also allowing quick adjustments in selection without need of expensive consultations with experts or spending lots of hours for Internet search, trying to find out the right tool. The system itself should perform as an online search using some pre–cached data in offline database, represented as an ontology of existing web analytics tools, extracted during the previous online search
    • …
    corecore