Search CORE

21 research outputs found

Named entity recognition for sensitive data discovery in Portuguese

Author: Boné J.
Dias M.
Ferreira J.
Maia R.
Ribeiro R.
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

The process of protecting sensitive data is continually growing and becoming increasingly important, especially as a result of the directives and laws imposed by the European Union. The effort to create automatic systems is continuous, but, in most cases, the processes behind them are still manual or semi-automatic. In this work, we have developed a component that can extract and classify sensitive data, from unstructured text information in European Portuguese. The objective was to create a system that allows organizations to understand their data and comply with legal and security purposes. We studied a hybrid approach to the problem of Named Entity Recognition for the Portuguese language. This approach combines several techniques such as rule-based/lexical-based models, machine learning algorithms, and neural networks. The rule-based and lexical-based approaches were used only for a set of specific classes. For the remaining classes of entities, two statistical models were tested—Conditional Random Fields and Random Forest and, finally, a Bidirectional-LSTM approach as experimented. Regarding the statistical models, we realized that Conditional Random Fields is the one that can obtain the best results, with a f1-score of 65.50%. With the Bi-LSTM approach, we have achieved a result of 83.01%. The corpora used for training and testing were HAREM Golden Collection, SIGARRA News Corpus, and DataSense NER Corpus.info:eu-repo/semantics/publishedVersio

Repositório Institucional do ISCTE-IUL

Analyzing and Visualizing Twitter Streams based on Trending Hashtags

Author: Kaschura Manuel
Publication venue: Karlsruher Institut für Technologie
Publication date: 22/12/2020
Field of study

KITopen

Основные задачи автоматической обработки текстов и подходы к их решению

Author: Рубашко Н. К.
Publication venue: БГУ
Publication date
Field of study

Секция 2. Интеллектуальные информационные системыДанная статья посвящена анализу основных подходов к решению задач автоматической обработки текстов, возникающих при создании высокотехнологичных интеллектуальных систем, обеспечивающих замену человеческого труда в интеллектуальной сфере, опирающейся на использование естественного языка

Kielellisen tiedon hyödyllisyydestä kieliteknologian eri sovellusalueilla

Author: Väyrynen Pertti
Publication venue: Informaatiotutkimuksen yhdistys ry
Publication date: 01/01/2002
Field of study

Journal.fi

National Library of Finland DSpace Services

Kielellisen tiedon hyödyllisyydestä kieliteknologian eri sovellusalueilla

Author: Pertti Väyrynen
Publication venue: Informaatiotutkimuksen yhdistys ITY ry
Publication date: 01/12/2008
Field of study

Directory of Open Access Journals

Synthesis of CVs Using a Context-free Grammar

Author: Ade-Ibijola Abejide
Semusemu Darren Tafadzwa
Publication venue: 'ASTES Journal'
Publication date: 01/01/2021
Field of study

Abstract: Please refer to full text to view abstrac

University of Johannesburg Institutional Repository

Maintaining consistency between business process diagrams and textual documentation using the EPSILON model management platform

Author: van der Molen T.T.G.P.
Publication venue
Publication date: 01/01/2011
Field of study

Repository TU/e

Pure OAI Repository

Theory and applications in information extraction from unstructured text

Author: Wu Tianhao
Publication venue: Lehigh Preserve
Publication date
Field of study

Lehigh University: Lehigh Preserve

Syntactic Generation of Research Thesis Sketches Across Disciplines Using Formal Grammars

Author: Abejide Ade-Ibijola
Ismail Babajide Adewumi
Publication venue: Asosiasi Perguruan Tinggi Informatika dan Komputer (APTIKOM) Sumsel
Publication date: 01/05/2023
Field of study

A part of the prerequisites for granting a degree in higher education institutions, students at postgraduate levels normally carry out research, which they do report in the form of theses or dissertations. Study has shown that students tend to go through difficulties in writing research thesis across all disciplines because they do not fully comprehend what constitutes a research thesis. This project proposes the syntactic generation of research thesis sketches across disciplines using formal grammars. Sketching is a synthesis technique which enables users to deliver high-level intuitions into a synthesis snag while leaving low-level details to synthesis tools. This work extends sketching to document generation for research thesis documents. Context-free grammar rules were designed and implemented for this task. A link to 10,000 generated thesis sketches was presented

Directory of Open Access Journals