96 research outputs found
DEXTER: A workbench for automatic term extraction with specialized corpora
[EN] Automatic term extraction has become a priority area of research within corpus processing. Despite the extensive literature in this field, there are still some outstanding issues that should be dealt with during the construction of term extractors, particularly those oriented to support research in terminology and terminography. In this regard, this article describes the design and development of DEXTER, an online workbench for the extraction of simple and complex terms from domain-specific corpora in English, French, Italian and Spanish. In this framework, three issues contribute to placing the most important terms in the foreground. First, unlike the elaborate morphosyntactic patterns proposed by most previous research, shallow lexical filters have been constructed to discard term candidates. Second, a large number of common stopwords are automatically detected by means of a method that relies on the IATE database together with the frequency distribution of the domain-specific corpus and a general corpus. Third, the term-ranking metric, which is grounded on the notions of salience, relevance and cohesion, is guided by the IATE database to display an adequate distribution of terms.Financial support for this research has been provided by the DGI, Spanish Ministry of Education and Science, grant FFI2014-53788-C3-1-P.Periñán-Pascual, C. (2018). DEXTER: A workbench for automatic term extraction with specialized corpora. Natural Language Engineering. 24(2):163-198. https://doi.org/10.1017/S1351324917000365S16319824
Automatic rule verification for digital building permits
Dissertação de mestrado em Modelação de Informação na Construção de EdifĂcios BIM A+O sector da construção está a enfrentar grandes mudanças nas exigĂŞncias do cliente e do mercado,
empurrando para a transformação digital e para uma indústria orientada para os dados. Os governos
tomaram parte ativa nesta mudança, apoiando a digitalização de processos como o das licenças de
construção, introduzindo a utilização de modelos de informação de construção (BIM). A investigação
sobre a digitalização do licenciamento municipal de construções mostrou grandes avanços no que diz
respeito à extração de regras de forma interpretável e à automatização de verificações; contudo, a
conciliação entre as definições semânticas do modelo de construção e os conceitos definidos nos
regulamentos está ainda em discussĂŁo. AlĂ©m disso, a validação da acuidade das informações incluĂdas
nos modelos de construção relativamente às definições do regulamento é importante para garantir a
qualidade ao longo do processo de licença de construção.
Esta dissertação visa propor um fluxo de trabalho hĂbrido para verificar a informação extraĂda
explicitamente do modelo BIM e a informação implicitamente derivada das relações entre elementos,
seguindo as disposições contidas nos regulamentos no contexto de Portugal. Com base em alguma
revisĂŁo de literatura, foi proposto um novo processo, e foi desenvolvido um cĂłdigo Python utilizando a
biblioteca IfcOpenshell para apoiar a automatização do processo de verificação, tradicionalmente
realizada por técnicos nos gabinetes de licenciamento municipal. Os elementos desenvolvidos neste
documento foram comprovados num estudo de caso, demonstrando que a validação hĂbrida pode ajudar
a detetar erros de modelação e melhorar a acuidade da informação durante a apresentação inicial de
modelos para um processo de licença de construção.
Os resultados indicam que a inclusão de uma validação automática do modelo contra definições
regulamentares pode ser introduzida para melhorar o grau de certeza da qualidade da informação contida
no Modelo de Informação, além disso, a proposta de métodos que produzem resultados a partir de
informação implĂcita pode alargar as capacidades do esquema IFC. Contudo, os esquemas
desenvolvidos neste trabalho estão ainda em constante revisão e desenvolvimento e têm limitações de
aplicabilidade em relação a certas classes do IFC.The construction sector is facing major changes in the client and market requirements, pushing towards
the digital transformation and a data driven industry. Governments have taken an active part in this
change by supporting the digitalization of processes such as the one for building permits by introducing
the use of building information models (BIM). The research on the digitalization of the building permit
has shown great advancements in regarding the rule extraction in interpretable ways and the automation
of the verification; however, the conciliation between the building model semantic definitions and the
concepts defined in the regulations is still in discussion. Moreover, the validation of the correctness of
the information included in building models regarding the regulation definitions is important to
guarantee the quality along the digital building permit process.
This dissertation aims to propose a hybrid workflow to check the information extracted explicitly from
the BIM model and the information implicitly derived from relationships between elements by following
the provisions contained in the regulations in the context of Portugal. Based on some context and
literature review, a process reengineering was proposed, and a Python code was developed using the
IfcOpenShell library to support the automation of the verification process, traditionally carried out by
technicians in the building permit offices. The elements developed in this document were proven in a
case-study, demonstrating that the hybrid validation can help to detect modelling errors and improve the
certainty of correctness of information during the initial submission of models for a building permit
process.
The results indicate that the inclusion of an automated validation of the model against regulation
definitions can be introduced to improve the degree of certainty of the quality of the information
contained in the Building Information Model, moreover the proposal of methods that produce results
from implicit information can extend the capabilities of the IFC schema. However, the scripts developed
in this work are still under constant review and development and have limitations of applicability in
relation to certain IFC classes.Erasmus Mundus Joint Master Degree Programme – ERASMUS
Advanced fuzzy matching in the translation of EU texts
In the translation industry today, CAT tool environments are an indispensable part of the translator’s workflow. Translation memory systems constitute one of the most important features contained in these tools and the question of how to best use them to make the translation process faster and more efficient legitimately arises. This research aims to examine whether there are more efficient methods of retrieving potentially useful translation suggestions than the ones currently used in TM systems. We are especially interested in investigating whether more sophisticated algorithms and the inclusion of linguistic features in the matching process lead to significant improvement in quality of the retrieved matches. The used dataset, the DGT-TM, is pre-processed and parsed, and a number of matching configurations are applied to the data structures contained in the produced parse trees. We also try to improve the matching by combining the individual metrics using a regression algorithm. The retrieved matches are then evaluated by means of automatic evaluation, based on correlations and mean scores, and human evaluation, based on correlations of the derived ranks and scores. Ultimately, the goal is to determine whether the implementation of some of these fuzzy matching metrics should be considered in the framework of the commercial CAT tools to improve the translation process
Flexibility in Data Management
With the ongoing expansion of information technology, new fields of application requiring data management emerge virtually every day. In our knowledge culture increasing amounts of data and work force organized in more creativity-oriented ways also radically change traditional fields of application and question established assumptions about data management. For instance, investigative analytics and agile software development move towards a very agile and flexible handling of data. As the primary facilitators of data management, database systems have to reflect and support these developments. However, traditional database management technology, in particular relational database systems, is built on assumptions of relatively stable application domains. The need to model all data up front in a prescriptive database schema earned relational database management systems the reputation among developers of being inflexible, dated, and cumbersome to work with. Nevertheless, relational systems still dominate the database market. They are a proven, standardized, and interoperable technology, well-known in IT departments with a work force of experienced and trained developers and administrators.
This thesis aims at resolving the growing contradiction between the popularity and omnipresence of relational systems in companies and their increasingly bad reputation among developers. It adapts relational database technology towards more agility and flexibility. We envision a descriptive schema-comes-second relational database system, which is entity-oriented instead of schema-oriented; descriptive rather than prescriptive. The thesis provides four main contributions: (1)~a flexible relational data model, which frees relational data management from having a prescriptive schema; (2)~autonomous physical entity domains, which partition self-descriptive data according to their schema properties for better query performance; (3)~a freely adjustable storage engine, which allows adapting the physical data layout used to properties of the data and of the workload; and (4)~a self-managed indexing infrastructure, which autonomously collects and adapts index information under the presence of dynamic workloads and evolving schemas. The flexible relational data model is the thesis\' central contribution. It describes the functional appearance of the descriptive schema-comes-second relational database system. The other three contributions improve components in the architecture of database management systems to increase the query performance and the manageability of descriptive schema-comes-second relational database systems. We are confident that these four contributions can help paving the way to a more flexible future for relational database management technology
Flexibility in Data Management
With the ongoing expansion of information technology, new fields of application requiring data management emerge virtually every day. In our knowledge culture increasing amounts of data and work force organized in more creativity-oriented ways also radically change traditional fields of application and question established assumptions about data management. For instance, investigative analytics and agile software development move towards a very agile and flexible handling of data. As the primary facilitators of data management, database systems have to reflect and support these developments. However, traditional database management technology, in particular relational database systems, is built on assumptions of relatively stable application domains. The need to model all data up front in a prescriptive database schema earned relational database management systems the reputation among developers of being inflexible, dated, and cumbersome to work with. Nevertheless, relational systems still dominate the database market. They are a proven, standardized, and interoperable technology, well-known in IT departments with a work force of experienced and trained developers and administrators.
This thesis aims at resolving the growing contradiction between the popularity and omnipresence of relational systems in companies and their increasingly bad reputation among developers. It adapts relational database technology towards more agility and flexibility. We envision a descriptive schema-comes-second relational database system, which is entity-oriented instead of schema-oriented; descriptive rather than prescriptive. The thesis provides four main contributions: (1)~a flexible relational data model, which frees relational data management from having a prescriptive schema; (2)~autonomous physical entity domains, which partition self-descriptive data according to their schema properties for better query performance; (3)~a freely adjustable storage engine, which allows adapting the physical data layout used to properties of the data and of the workload; and (4)~a self-managed indexing infrastructure, which autonomously collects and adapts index information under the presence of dynamic workloads and evolving schemas. The flexible relational data model is the thesis\' central contribution. It describes the functional appearance of the descriptive schema-comes-second relational database system. The other three contributions improve components in the architecture of database management systems to increase the query performance and the manageability of descriptive schema-comes-second relational database systems. We are confident that these four contributions can help paving the way to a more flexible future for relational database management technology
Learning object metadata surrogates in search result interfaces: user evaluation, design and content
The purpose of this research was to evaluate user interaction with learning object
metadata surrogates both in terms of content and presentation. The main objectives of
this study were: (1) to review the literature on learning object metadata and user-centred
evaluation of metadata surrogates in the context of cognitive information
retrieval (including user-centred relevance and usability research); (2) to develop a framework for the evaluation of user interaction with learning
object metadata surrogates in search result interfaces; (3) to investigate the usability of metadata surrogates in search result interfaces
of learning object repositories (LORs) in terms of various presentation aspects
(such as amount of information, structure and highlighting of query terms) as a
means for facilitating the user relevance judgment process; (4) to investigate in-depth the type of content that should be included in learning
object metadata surrogates in order to facilitate the process of relevance
judgment; (5) to provide a set of recommendations—guidelines for the design of learning
object metadata surrogates in search result interfaces both in terms of content
and presentation. [Continues.
“WARES”, a Web Analytics Recommender System
Il est difficile d'imaginer des entreprises modernes sans analyse, c'est une tendance dans les entreprises modernes, même les petites entreprises et les entrepreneurs individuels commencent à utiliser des outils d'analyse d'une manière ou d'une autre pour leur entreprise. Pas étonnant qu'il existe un grand nombre d'outils différents pour les différents domaines, ils varient dans le but de simples statistiques d'amis et de visites pour votre page Facebook à grands et sophistiqués dans le cas des systèmes conçus pour les grandes entreprises, ils pourraient être shareware ou payés. Parfois, vous devez passer une formation spéciale, être un spécialiste certifiés, ou même avoir un diplôme afin d'être en mesure d'utiliser l'outil d'analyse. D'autres outils offrent une interface d’utilisateur simple, avec des tableaux de bord, pour satisfaire leur compréhension d’information pour tous ceux qui les ont vus pour la première fois. Ce travail sera consacré aux outils d'analyse Web. Quoi qu'il en soit pour tous ceux qui pensent à utiliser l'analyse pour ses propres besoins se pose une question: "quel outil doit je utiliser, qui convient à mes besoins, et comment payer moins et obtenir un gain maximum". Dans ce travail je vais essayer de donner une réponse sur cette question en proposant le système de recommandation pour les outils analytiques web –WARES, qui aideront l'utilisateur avec cette tâche "simple".
Le système WARES utilise l'approche hybride, mais surtout, utilise des techniques basées sur le contenu pour faire des suggestions. Le système utilise certains ratings initiaux faites par utilisateur, comme entrée, pour résoudre le problème du “démarrage à froid”, offrant la meilleure solution possible en fonction des besoins des utilisateurs. Le besoin de consultations coûteuses avec des experts ou de passer beaucoup d'heures sur Internet, en essayant de trouver le bon outil. Le système lui–même devrait effectuer une recherche en ligne en utilisant certaines données préalablement mises en cache dans la base de données hors ligne, représentée comme une ontologie d'outils analytiques web existants extraits lors de la recherche en ligne précédente.It is hard to imagine modern business without analytics; it is a trend in modern business, even small companies and individual entrepreneurs start using analytics tools, in one way or another, for their business. Not surprising that there exist many different tools for different domains, they vary in purpose from simple friends and visits statistic for your Facebook page, to big and sophisticated systems designed for the big corporations, they could be free or paid. Sometimes you need to pass special training, be a certified specialist, or even have a degree to be able to use analytics tool, other tools offers simple user interface with dashboards for easy understanding and availability for everyone who saw them for the first time. Anyway, for everyone who is thinking about using analytics for his/her own needs stands a question: “what tool should I use, which one suits my needs and how to pay less and get maximum gain”. In this work, I will try to give an answer to this question by proposing a recommender tool, which will help the user with this “simple task”. This paper is devoted to the creation of WARES, as reduction from Web Analytics REcommender System. Proposed recommender system uses hybrid approach, but mostly, utilize content–based techniques for making suggestions, while using some user’s ratings as an input for “cold start” search. System produces recommendations depending on user’s needs, also allowing quick adjustments in selection without need of expensive consultations with experts or spending lots of hours for Internet search, trying to find out the right tool. The system itself should perform as an online search using some pre–cached data in offline database, represented as an ontology of existing web analytics tools, extracted during the previous online search
- …