Search CORE

2,507 research outputs found

Rembrandt - a named-entity recognition framework

Author: Cardoso Nuno
Publication venue
Publication date: 01/01/2012
Field of study

Rembrandt is a named entity recognition system specially crafted to annotate documents by classifying named entities and ground them into unique identifiers. Rembrandt played an important role within our research over geographic IR, thus evolving into a more capable framework where documents can be annotated, manually curated and indexed. The goal of this paper is to present Rembrandt’s simple but powerful annotation framework to the NLP community

CiteSeerX

Repositório Comum

Generating Rembrandt: Artificial Intelligence, Copyright, and Accountability in the 3A Era--The Human-like Authors are Already Here- A New Model

Author: Yanisky Ravid Shlomit
Publication venue: FLASH: The Fordham Law Archive of Scholarship and History
Publication date: 01/01/2017
Field of study

Artificial intelligence (AI) systems are creative, unpredictable, independent, autonomous, rational, evolving, capable of data collection, communicative, efficient, accurate, and have free choice among alternatives. Similar to humans, AI systems can autonomously create and generate creative works. The use of AI systems in the production of works, either for personal or manufacturing purposes, has become common in the 3A era of automated, autonomous, and advanced technology. Despite this progress, there is a deep and common concern in modern society that AI technology will become uncontrollable. There is therefore a call for social and legal tools for controlling AI systems’ functions and outcomes. This Article addresses the questions of the copyrightability of artworks generated by AI systems: ownership and accountability. The Article debates who should enjoy the benefits of copyright protection and who should be responsible for the infringement of rights and damages caused by AI systems that independently produce creative works. Subsequently, this Article presents the AI Multi- Player paradigm, arguing against the imposition of these rights and responsibilities on the AI systems themselves or on the different stakeholders, mainly the programmers who develop such systems. Most importantly, this Article proposes the adoption of a new model of accountability for works generated by AI systems: the AI Work Made for Hire (WMFH) model, which views the AI system as a creative employee or independent contractor of the user. Under this proposed model, ownership, control, and responsibility would be imposed on the humans or legal entities that use AI systems and enjoy its benefits. This model accurately reflects the human-like features of AI systems; it is justified by the theories behind copyright protection; and it serves as a practical solution to assuage the fears behind AI systems. In addition, this model unveils the powers behind the operation of AI systems; hence, it efficiently imposes accountability on clearly identifiable persons or legal entities. Since AI systems are copyrightable algorithms, this Article reflects on the accountability for AI systems in other legal regimes, such as tort or criminal law and in various industries using these systems

Fordham University School of Law

Artequakt: Generating tailored biographies from automatically annotated fragments from the web

Author: Alani Harith
Hall Wendy
Kim Sanghee
Lewis Paul
Millard David
Shadbolt Nigel
Weal Mark
Publication venue
Publication date: 01/01/2002
Field of study

The Artequakt project seeks to automatically generate narrativebiographies of artists from knowledge that has been extracted from the Web and maintained in a knowledge base. An overview of the system architecture is presented here and the three key components of that architecture are explained in detail, namely knowledge extraction, information management and biography construction. Conclusions are drawn from the initial experiences of the project and future progress is detailed

Southampton (e-Prints Soton)

Open Research Online (The Open University)

Automatic extraction of knowledge from web documents

Author: Alani Harith
Hall Wendy
Kim Sanghee
Lewis Paul H.
Millard David E.
Shadbolt Nigel R.
Weal Mark J.
Publication venue
Publication date: 01/01/2003
Field of study

A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper

CiteSeerX

Southampton (e-Prints Soton)

Open Research Online (The Open University)

Generating Rembrandt: Artificial Intelligence, Copyright, and Accountability in the 3A Era—The Human-Like Authors are Already Here—A New Model

Author: Yanisky-Ravid Shlomit
Publication venue: Digital Commons at Michigan State University College of Law
Publication date: 01/01/2017
Field of study

Michigan State University College of Law: Digital Commons

Knowledge extraction from minutes of Portuguese municipalities meetings

Author: Dias Gonçalo Paiva
Rodrigues Mário
Teixeira António
Publication venue
Publication date: 01/11/2010
Field of study

A very relevant problem in e-government is that a great amount of knowledge is in natural language unstructured documents. If that knowledge was stored using a computer-processable representation it would be more easily accessed. In this paper we present the architecture, modules and initial results of a prototype under development for extracting information from government documents. The prototype stores the information using a formal representation of the set of concepts and the relationships between those concepts - an ontology. The system was tested using minutes of Portuguese Municipal Boards meetings. Initial results are presented for an important and frequent topic of the minutes: the subsidies granted by municipalities

Repositório Institucional da Universidade de Aveiro

Knowledge Representation of Crime-Related Events: a Preliminary Approach

Author: Nogueira Vitor Beires
Publication venue: OASIcs - OpenAccess Series in Informatics. 8th Symposium on Languages, Applications and Technologies (SLATE 2019)
Publication date: 01/01/2019
Field of study

The crime is spread in every daily newspaper, and particularly on criminal investigation reports produced by several Police departments, creating an amount of data to be processed by Humans. Other research studies related to relation extraction (a branch of information retrieval) in Portuguese arisen along the years, but with few extracted relations and several computer methods approaches, that could be improved by recent features, to achieve better performance results. This paper aims to present the ongoing work related to SEM (Simple Event Model) ontology population with instances retrieved from crime-related documents, supported by an SVO (Subject, Verb, Object) algorithm using hand-crafted rules to extract events, achieving a performance measure of 0.86 (F-Measure)

Dagstuhl Research Online Publication Server

Discovery of sensitive data with natural language processing

Author: Dias Mariana Rebelo
Publication venue
Publication date: 18/12/2019
Field of study

The process of protecting sensitive data is continually growing and becoming increasingly important, especially as a result of the directives and laws imposed by the European Union. The effort to create automatic systems is continuous, but in most cases, the processes behind them are still manual or semi-automatic. In this work, we have developed a component that can extract and classify sensitive data, from unstructured text information in European Portuguese. The objective was to create a system that allows organizations to understand their data and comply with legal and security purposes. We studied a hybrid approach to the problem of Named Entities Recognition for the Portuguese language. This approach combines several techniques such as rule-based/lexical-based models, machine learning algorithms and neural networks. The rule-based and lexical-based approaches were used only for a set of specific classes. For the remaining classes of entities, SpaCy and Stanford NLP tools were tested, two statistical models – Conditional Random Fields and Random Forest – were implemented and, finally, a Bidirectional- LSTM approach as experimented. The best results were achieved with the Stanford NER model (86.41%), from the Stanford NLP tool. Regarding the statistical models, we realized that Conditional Random Fields is the one that can obtain the best results, with a f1-score of 65.50%. With the Bi-LSTM approach, we have achieved a result of 83.01%. The corpora used for training and testing were HAREM Golden Collection, SIGARRA News Corpus and DataSense NER Corpus.O processo de preservação de dados sensíveis está em constante crescimento e cada vez apresenta maior importância, proveniente especialmente das diretivas e leis impostas pela União Europeia. O esforço para criar sistemas automáticos é contínuo, mas o processo é realizado na maioria dos casos de forma manual ou semiautomática. Neste trabalho desenvolvemos um componente de Extração e Classificação de dados sensíveis, que processa textos não-estruturados em Português Europeu. O objetivo consistiu em criar um sistema que permite às organizações compreender os seus dados e cumprir com fins legais de conformidade e segurança. Para resolver este problema, foi estudada uma abordagem híbrida de Reconhecimento de Entidades Mencionadas para a língua Portuguesa. Esta abordagem combina técnicas baseadas em regras e léxicos, algoritmos de aprendizagem automática e redes neuronais. As primeiras abordagens baseadas em regras e léxicos, foram utilizadas apenas para um conjunto de classes especificas. Para as restantes classes de entidades foram utilizadas as ferramentas SpaCy e Stanford NLP, testados dois modelos estatísticos — Conditional Random Fields e Random Forest – e por fim testada uma abordagem baseada em redes neuronais – Bidirectional-LSTM. Ao nível das ferramentas utilizadas os melhores resultados foram conseguidos com o modelo Stanford NER (86,41%). Através dos modelos estatísticos percebemos que o Conditional Random Fields é o que consegue obter melhores resultados, com um f1-score de 65,50%. Com a última abordagem, uma rede neuronal Bi-LSTM, conseguimos resultado de f1-score de aproximadamente 83,01%. Para o treino e teste das diferentes abordagens foram utilizados os conjuntos de dados HAREM Golden Collection, SIGARRA News Corpus e DataSense NER Corpus

Repositório Institucional do ISCTE-IUL