Search CORE

10,080 research outputs found

many faces, many places (Term21)

Author: Carvalho Sara
Costa Rute
Khan Anas Fahad
Ostroski Anic Ana
Publication venue: ELRA
Publication date: 01/01/2022
Field of study

UIDB/03213/2020 UIDP/03213/2020publishersversionpublishe

Repositório da Universidade Nova de Lisboa

many faces, many places (Term21)

Author: Carvalho Sara
Costa Rute
Khan Fahad
Ostroski Anic Ana
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2022
Field of study

UIDB/03213/2020 UIDP/03213/2020Proceedings of the LREC 2022 Workshop Language Resources and Evaluation Conferencepublishersversionpublishe

Repositório da Universidade Nova de Lisboa

Using Linguistic Information and Machine Learning Techniques to Identify Entities from Juridical Documents

Author: A. Tikhonov
C. Cortes
D. Wilkins
E. Bick
E. Schweighofer
F. Borges
G. Salton
J. Cowie
J. Shawe-Taylor
J. Zeleznikow
N. Chomsky
P. Quaresma
P. Quaresma
S. Brüninghaus
S. Brüninghaus
T. Gonçalves
T. Gonçalves
T. Joachims
T. Joachims
T. Joachims
T. Joachims
V. Vapnik
V. Vapnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Information extraction from legal documents is an important and open problem. A mixed approach, using linguistic information and machine learning techniques, is described in this paper. In this approach, top-level legal concepts are identified and used for document classifica- tion using Support Vector Machines. Named entities, such as, locations, organizations, dates, and document references, are identified using se- mantic information from the output of a natural language parser. This information, legal concepts and named entities, may be used to popu- late a simple ontology, allowing the enrichment of documents and the creation of high-level legal information retrieval systems. The proposed methodology was applied to a corpus of legal documents - from the EUR-Lex site – and it was evaluated. The obtained results were quite good and indicate this may be a promising approach to the legal information extraction problem

Crossref

Repositório Científico da Universidade de Évora

Extending the Galician Wordnet Using a Multilingual Bible Through Lexical Alignment and Semantic Annotation

Author
Publication venue: OASIcs - OpenAccess Series in Informatics. 7th Symposium on Languages, Applications and Technologies (SLATE 2018)
Publication date: 01/01/2018
Field of study

In this paper we describe the methodology and evaluation of the expansion of Galnet - the Galician wordnet - using a multilingual Bible through lexical alignment and semantic annotation. For this experiment we used the Galician, Portuguese, Spanish, Catalan and English versions of the Bible. They were annotated with part-of-speech and WordNet sense using FreeLing. The resulting synsets were aligned, and new variants for the Galician language were extracted. After manual evaluation the approach presented a 96.8% accuracy

Dagstuhl Research Online Publication Server

Knowledge Representation of Crime-Related Events: a Preliminary Approach

Author: Nogueira Vitor Beires
Publication venue: OASIcs - OpenAccess Series in Informatics. 8th Symposium on Languages, Applications and Technologies (SLATE 2019)
Publication date: 01/01/2019
Field of study

The crime is spread in every daily newspaper, and particularly on criminal investigation reports produced by several Police departments, creating an amount of data to be processed by Humans. Other research studies related to relation extraction (a branch of information retrieval) in Portuguese arisen along the years, but with few extracted relations and several computer methods approaches, that could be improved by recent features, to achieve better performance results. This paper aims to present the ongoing work related to SEM (Simple Event Model) ontology population with instances retrieved from crime-related documents, supported by an SVO (Subject, Verb, Object) algorithm using hand-crafted rules to extract events, achieving a performance measure of 0.86 (F-Measure)

Dagstuhl Research Online Publication Server

Communicating (in) wine tourism: what are the paths for harmonising the sector and the Translation Process?

Author: Galanes Santos Iolanda
Moreira Silva Manuel
Pataco Teresa
Publication venue: 'Coimbra University Press'
Publication date: 01/01/2023
Field of study

Wine tourism is an emerging area of specialisation to which several areas of knowledge (marketing, economics, anthropology, viticulture, etc.) converge. Portugal’s wine culture has a long tradition and is internationally recognised, placing it at the forefront of economic, professional, and academic initiatives in this sector. Communication between specialists and between specialists and national and international wine tourists requires an international terminology that is, simultaneously, mindful of tradition and that favours inclusive, efficient, and competitive trade exchanges. 226 Our research aims to contribute to the terminological harmonisation of Portuguese wine tourism, even though no ISO/IPQ standards have been issued in this emerging transdisciplinary area. In this article, two comparable academic sub-corpora (10 theses) on wine tourism will be analysed. Their comparison was carried out with the Sketch Engine programme, which allows, in addition to corpus management, to extract terms, identify keywords and represent their conceptual organisation. Our methodological approach included the analysis of the results based on the 50 most relevant terms in each of the corpus. Ten case studies taken from the corpora emphasise the diversity of terminogenic patterns in each language, the influence of cultural factors in the specialised wine tourism terminology of both languages, and, lastly, the influence of the English language on Portuguese wine tourism terminology. These results should be considered in the proposal of harmonised terminologies and in the translation of specialised wine tourism discourse.info:eu-repo/semantics/publishedVersio

Repositório Científico do Instituto Politécnico do Porto

Privacy in text documents

Author: Dias M.
Ferreira J. C.
Maia R.
Ribeiro R.
Santos P.
Publication venue: International Business Information Management Association, IBIMA
Publication date: 01/01/2019
Field of study

The process of sensitive data preservation is a manual and a semi-automatic procedure. Sensitive data preservation suffers various problems, in particular, affect the handling of confidential, sensitive and personal information, such as the identification of sensitive data in documents requiring human intervention that is costly and propense to generate error, and the identification of sensitive data in large-scale documents does not allow an approach that depends on human expertise for their identification and relationship. DataSense will be highly exportable software that will enable organizations to identify and understand the sensitive data in their possession in unstructured textual information (digital documents) in order to comply with legal, compliance and security purposes. The goal is to identify and classify sensitive data (Personal Data) present in large-scale structured and non-structured information in a way that allows entities and/or organizations to understand it without calling into question security or confidentiality issues. The DataSense project will be based on European-Portuguese text documents with different approaches of NLP (Natural Language Processing) technologies and the advances in machine learning, such as Named Entity Recognition, Disambiguation, Co-referencing (ARE) and Automatic Learning and Human Feedback. It will also be characterized by the ability to assist organizations in complying with standards such as the GDPR (General Data Protection Regulation), which regulate data protection in the European Union.info:eu-repo/semantics/acceptedVersio

Repositório Institucional do ISCTE-IUL

Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

Author: Ferrández Sergio
Monachini Monica
Muñoz Rafael
Toral Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2011
Field of study

This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for diﬀerent languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which aﬀects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The diﬀerent steps of the procedure (mapping, disambiguation, extraction, NE identiﬁcation and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

DCU Online Research Access Service

Natural language processing

Author: Adams
Amsler
Bangalore
Barker
Benoît
Bian
Bondale
Carrick
Ceric
Chandrasekar
Chang
Charniak
Chen
Chowdhury
Chowdhury
Costantino
Cowie
Craven
Craven
Craven
Dogru
Evans
Feldman
Fernandez
Gaizauskas
Glasgow
Haas
Hayes
Hayes
Hedlund
Herath
Ide
Isahara
Jelinek
Jeong
Jurafsky
Kazakov
Kehler
Khoo
Kim
King
Lange
Lee
Lehmam
Lehtokangas
Lewis
Liddy
Liddy
Lovis
Ma
Magnini
Mani
Manning
Marquez
Martinez
Martinez
McMurchie
Meyer
Mihalcea
Mock
Moens
Morin
Narita
Nerbonne
Oard
Ogura
Oudet
Owei
Paris
Pasero
Pedersen
Perez-Carballo
Petreley
Pirkola
Poesio
Rosenfield
Roux
Say
Scarlett
Schenker
Silber
Smeaton
Smeaton
Smith
Sokol
Song
Sparck Jones
Staab
Stock
Tolle
Trybula
Tsuda
Vickery
Waldrop
Warner
Weigard
Wilks
Wong
Yang
Yang
Zadrozny
Zweigenbaum
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

Crossref

University of Strathclyde Institutional Repository

OPUS - University of Technology Sydney