2 research outputs found

    Terminological Methods in Lexicography: Conceptualising, Organising, and Encoding Terms in General Language Dictionaries

    Get PDF
    Os dicionários de língua geral apresentam inconsistências de uniformização e cientificidade no tratamento do conteúdo lexicográfico especializado. Analisando a presença e o tratamento de termos em dicionários de língua geral, propomos um tratamento mais uniforme e cientificamente rigoroso desse conteúdo, considerando também a necessidade de compilar e alinhar futuros recursos lexicais em consonância com padrões interoperáveis. Partimos da premissa de que o tratamento dos itens lexicais, sejam unidades lexicais (palavras em geral) ou unidades terminológicas (termos ou palavras pertencentes a determinados domínios), deve ser diferenciado, e recorremos a métodos terminológicos para tratar os termos dicionarizados. A nossa abordagem assume que a terminologia – na sua dupla dimensão linguística e conceptual – e a lexicografia, como domínios interdisciplinares, podem ser complementares. Assim, apresentamos objetivos teóricos (aperfeiçoamento da metalinguagem e descrição lexicográfica a partir de pressupostos terminológicos) e práticos (representação consistente de dados lexicográficos), que visam facilitar a organização, descrição e modelização consistente de componentes lexicográficos, nomeadamente a hierarquização das etiquetas de domínio, que são marcadores de identificação de léxico especializados. Queremos ainda facilitar a redação de definições, as quais podem ser otimizadas e elaboradas com maior precisão científica ao seguir uma abordagem terminológica no tratamento dos termos. Analisámos os dicionários desenvolvidos por três instituições académicas distintas: a Academia das Ciências de Lisboa, a Real Academia Española e a Académie Française, que representam um valioso legado da tradição lexicográfica académica europeia. A análise inicial inclui um levantamento exaustivo e a comparação das etiquetas de domínio usadas, bem como um debate sobre as opções escolhidas e um estudo comparativo do tratamento dos termos. Elaborámos, depois, uma proposta metodológica para o tratamento de termos em dicionários de língua geral, tomando como exemplo dois domínios, GEOLOGIA e FUTEBOL, extraídos da edição de 2001 do dicionário da Academia das Ciências de Lisboa. Revimos os termos selecionados de acordo com os princípios terminológicos defendidos, dando origem a sentidos especializados revistos/novos para a primeira edição digital deste dicionário. Representamos e anotamos os dados usando as especificações da TEI Lex-0, uma extensão da TEI (Text Encoding Initiative), dedicada à codificação de dados lexicográficos. Destacamos também a importância de ter etiquetas de domínio hierárquicas em vez de uma lista simples de domínios, vantajosas para a organização dos dados, correspondência e possíveis futuros alinhamentos entre diferentes recursos lexicográficos. A investigação revelou que a) os modelos estruturais dos recursos lexicais são complexos e contêm informação de natureza diversa; b) as etiquetas de domínio nos dicionários gerais da língua são planas, desequilibradas, inconsistentes e, muitas vezes, estão desatualizadas, havendo necessidade de as hierarquizar para organizar o conhecimento especializado; c) os critérios adotados para a marcação dos termos e as fórmulas utilizadas na definição são díspares; d) o tratamento dos termos é heterogéneo e formulado de diferentes formas, pelo que o recurso a métodos terminológicos podem ajudar os lexicógrafos a redigir definições; e) a aplicação de métodos terminológicos e lexicográficos interdisciplinares, e também de padrões, é vantajosa porque permite a construção de bases de dados lexicais estruturadas, concetualmente organizadas, apuradas do ponto de vista linguístico e interoperáveis. Em suma, procuramos contribuir para a questão urgente de resolver problemas que afetam a partilha, o alinhamento e vinculação de dados lexicográficos.General language dictionaries show inconsistencies in terms of uniformity and scientificity in the treatment of specialised lexicographic content. By analysing the presence and treatment of terms in general language dictionaries, we propose a more uniform and scientifically rigorous treatment of this content, considering the necessity of compiling and aligning future lexical resources according to interoperable standards. We begin from the premise that the treatment of lexical items, whether lexical units (words in general) or terminological units (terms or words belonging to particular subject fields), must be differentiated, and resort to terminological methods to treat dictionary terms. Our approach assumes that terminology – in its dual dimension, both linguistic and conceptual – and lexicography, as interdisciplinary domains, can be complementary. Thus, we present theoretical (improvement of metalanguage and lexicographic description based on terminological assumptions) and practical (consistent representation of lexicographic data) objectives that aim to facilitate the organisation, description and consistent modelling of lexicographic components, namely the hierarchy of domain labels, as they are specialised lexicon identification markers. We also want to facilitate the drafting of definitions, which can be optimised and elaborated with greater scientific precision by following a terminological approach for the treatment of terms. We analysed the dictionaries developed by three different academic institutions: the Academia das Ciências de Lisboa, the Real Academia Española and the Académie Française, which represent a valuable legacy of the European academic lexicographic tradition. The initial analysis includes an exhaustive survey and comparison of the domain labels used, as well as a debate on the chosen options and a comparative study of the treatment of the terms. We then developed a methodological proposal for the treatment of terms in general language dictionaries, exemplified using terms from two domains, GEOLOGY and FOOTBALL, taken from the 2001 edition of the dictionary of the Academia das Ciências de Lisboa. We revised the selected terms according to the defended terminological principles, giving rise to revised/new specialised meanings for the first digital edition of this dictionary. We represent and annotate the data using the TEI Lex-0 specifications, a TEI (Text Encoding Initiative) subset for encoding lexicographic data. We also highlight the importance of having hierarchical domain labels instead of a simple list of domains, which are beneficial to the data organisation itself, correspondence and possible future alignments between different lexicographic resources. Our investigation revealed the following: a) structural models of lexical resources are complex and contain information of a different nature; b) domain labels in general language dictionaries are flat, unbalanced, inconsistent and often outdated, requiring the need to hierarchise them for organising specialised knowledge; c) the criteria adopted for marking terms and the formulae used in the definition are disparate; d) the treatment of terms is heterogeneous and formulated differently, whereby terminological methods can help lexicographers to draft definitions; e) the application of interdisciplinary terminological and lexicographic methods, and of standards, is advantageous because it allows the construction of structured, conceptually organised, linguistically accurate and interoperable lexical databases. In short, we seek to contribute to the urgent issue of solving problems that affect the sharing, alignment and linking of lexicographic data

    Machine learning to generate soil information

    Get PDF
    This thesis is concerned with the novel use of machine learning (ML) methods in soil science research. ML adoption in soil science has increased considerably, especially in pedometrics (the use of quantitative methods to study the variation of soils). In parallel, the size of the soil datasets has also increased thanks to projects of global impact that aim to rescue legacy data or new large extent surveys to collect new information. While we have big datasets and global projects, currently, modelling is mostly based on "traditional" ML approaches which do not take full advantage of these large data compilations. This compilation of these global datasets is severely limited by privacy concerns and, currently, no solution has been implemented to facilitate the process. If we consider the performance differences derived from the generality of global models versus the specificity of local models, there is still a debate on which approach is better. Either in global or local DSM, most applications are static. Even with the large soil datasets available to date, there is not enough soil data to perform a fully-empirical, space-time modelling. Considering these knowledge gaps, this thesis aims to introduce advanced ML algorithms and training techniques, specifically deep neural networks, for modelling large datasets at a global scale and provide new soil information. The research presented here has been successful at applying the latest advances in ML to improve upon some of the current approaches for soil modelling with large datasets. It has also created opportunities to utilise information, such as descriptive data, that has been generally disregarded. ML methods have been embraced by the soil community and their adoption is increasing. In the particular case of neural networks, their flexibility in terms of structure and training makes them a good candidate to improve on current soil modelling approaches
    corecore