2,507 research outputs found
Rembrandt - a named-entity recognition framework
Rembrandt is a named entity recognition system specially crafted to annotate documents by classifying named entities and ground them into unique identifiers. Rembrandt played an important role within our research over geographic IR, thus evolving into a more capable framework where documents can be annotated, manually curated and indexed. The goal of this paper is to present Rembrandtâs simple but powerful annotation framework to the NLP community
Generating Rembrandt: Artificial Intelligence, Copyright, and Accountability in the 3A Era--The Human-like Authors are Already Here- A New Model
Artificial intelligence (AI) systems are creative, unpredictable, independent, autonomous, rational, evolving, capable of data collection, communicative, efficient, accurate, and have free choice among alternatives. Similar to humans, AI systems can autonomously create and generate creative works. The use of AI systems in the production of works, either for personal or manufacturing purposes, has become common in the 3A era of automated, autonomous, and advanced technology. Despite this progress, there is a deep and common concern in modern society that AI technology will become uncontrollable. There is therefore a call for social and legal tools for controlling AI systemsâ functions and outcomes. This Article addresses the questions of the copyrightability of artworks generated by AI systems: ownership and accountability. The Article debates who should enjoy the benefits of copyright protection and who should be responsible for the infringement of rights and damages caused by AI systems that independently produce creative works. Subsequently, this Article presents the AI Multi- Player paradigm, arguing against the imposition of these rights and responsibilities on the AI systems themselves or on the different stakeholders, mainly the programmers who develop such systems. Most importantly, this Article proposes the adoption of a new model of accountability for works generated by AI systems: the AI Work Made for Hire (WMFH) model, which views the AI system as a creative employee or independent contractor of the user. Under this proposed model, ownership, control, and responsibility would be imposed on the humans or legal entities that use AI systems and enjoy its benefits. This model accurately reflects the human-like features of AI systems; it is justified by the theories behind copyright protection; and it serves as a practical solution to assuage the fears behind AI systems. In addition, this model unveils the powers behind the operation of AI systems; hence, it efficiently imposes accountability on clearly identifiable persons or legal entities. Since AI systems are copyrightable algorithms, this Article reflects on the accountability for AI systems in other legal regimes, such as tort or criminal law and in various industries using these systems
Artequakt: Generating tailored biographies from automatically annotated fragments from the web
The Artequakt project seeks to automatically generate narrativebiographies of artists from knowledge that has been extracted from the Web and maintained in a knowledge base. An overview of the system architecture is presented here and the three key components of that architecture are explained in detail, namely knowledge extraction, information management and biography construction. Conclusions are drawn from the initial experiences of the project and future progress is detailed
Automatic extraction of knowledge from web documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper
Knowledge extraction from minutes of Portuguese municipalities meetings
A very relevant problem in e-government is that a great amount
of knowledge is in natural language unstructured documents. If
that knowledge was stored using a computer-processable representation
it would be more easily accessed. In this paper we
present the architecture, modules and initial results of a prototype
under development for extracting information from government
documents. The prototype stores the information using
a formal representation of the set of concepts and the relationships
between those concepts - an ontology. The system was
tested using minutes of Portuguese Municipal Boards meetings.
Initial results are presented for an important and frequent topic
of the minutes: the subsidies granted by municipalities
Knowledge Representation of Crime-Related Events: a Preliminary Approach
The crime is spread in every daily newspaper, and particularly on criminal investigation reports produced by several Police departments, creating an amount of data to be processed by Humans. Other research studies related to relation extraction (a branch of information retrieval) in Portuguese arisen along the years, but with few extracted relations and several computer methods approaches, that could be improved by recent features, to achieve better performance results.
This paper aims to present the ongoing work related to SEM (Simple Event Model) ontology population with instances retrieved from crime-related documents, supported by an SVO (Subject, Verb, Object) algorithm using hand-crafted rules to extract events, achieving a performance measure of 0.86 (F-Measure)
Discovery of sensitive data with natural language processing
The process of protecting sensitive data is continually growing and becoming increasingly important,
especially as a result of the directives and laws imposed by the European Union. The effort
to create automatic systems is continuous, but in most cases, the processes behind them are
still manual or semi-automatic. In this work, we have developed a component that can extract
and classify sensitive data, from unstructured text information in European Portuguese. The
objective was to create a system that allows organizations to understand their data and comply
with legal and security purposes. We studied a hybrid approach to the problem of Named
Entities Recognition for the Portuguese language. This approach combines several techniques
such as rule-based/lexical-based models, machine learning algorithms and neural networks. The
rule-based and lexical-based approaches were used only for a set of specific classes. For the remaining
classes of entities, SpaCy and Stanford NLP tools were tested, two statistical models â
Conditional Random Fields and Random Forest â were implemented and, finally, a Bidirectional-
LSTM approach as experimented. The best results were achieved with the Stanford NER model
(86.41%), from the Stanford NLP tool. Regarding the statistical models, we realized that Conditional
Random Fields is the one that can obtain the best results, with a f1-score of 65.50%. With
the Bi-LSTM approach, we have achieved a result of 83.01%. The corpora used for training and
testing were HAREM Golden Collection, SIGARRA News Corpus and DataSense NER Corpus.O processo de preservação de dados sensĂveis estĂĄ em constante crescimento e cada vez apresenta
maior importĂąncia, proveniente especialmente das diretivas e leis impostas pela UniĂŁo Europeia.
O esforço para criar sistemas automĂĄticos Ă© contĂnuo, mas o processo Ă© realizado na maioria dos
casos de forma manual ou semiautomĂĄtica. Neste trabalho desenvolvemos um componente de
Extração e Classificação de dados sensĂveis, que processa textos nĂŁo-estruturados em PortuguĂȘs
Europeu. O objetivo consistiu em criar um sistema que permite às organizaçÔes compreender
os seus dados e cumprir com fins legais de conformidade e segurança. Para resolver este problema,
foi estudada uma abordagem hĂbrida de Reconhecimento de Entidades Mencionadas para
a lĂngua Portuguesa. Esta abordagem combina tĂ©cnicas baseadas em regras e lĂ©xicos, algoritmos
de aprendizagem automĂĄtica e redes neuronais. As primeiras abordagens baseadas em regras e
léxicos, foram utilizadas apenas para um conjunto de classes especificas. Para as restantes classes
de entidades foram utilizadas as ferramentas SpaCy e Stanford NLP, testados dois modelos estatĂsticos
â Conditional Random Fields e Random Forest â e por fim testada uma abordagem
baseada em redes neuronais â Bidirectional-LSTM. Ao nĂvel das ferramentas utilizadas os melhores
resultados foram conseguidos com o modelo Stanford NER (86,41%). Através dos modelos
estatĂsticos percebemos que o Conditional Random Fields Ă© o que consegue obter melhores resultados,
com um f1-score de 65,50%. Com a Ășltima abordagem, uma rede neuronal Bi-LSTM,
conseguimos resultado de f1-score de aproximadamente 83,01%. Para o treino e teste das diferentes
abordagens foram utilizados os conjuntos de dados HAREM Golden Collection, SIGARRA
News Corpus e DataSense NER Corpus
- âŠ