2,507 research outputs found

    Rembrandt - a named-entity recognition framework

    Get PDF
    Rembrandt is a named entity recognition system specially crafted to annotate documents by classifying named entities and ground them into unique identifiers. Rembrandt played an important role within our research over geographic IR, thus evolving into a more capable framework where documents can be annotated, manually curated and indexed. The goal of this paper is to present Rembrandt’s simple but powerful annotation framework to the NLP community

    Generating Rembrandt: Artificial Intelligence, Copyright, and Accountability in the 3A Era--The Human-like Authors are Already Here- A New Model

    Get PDF
    Artificial intelligence (AI) systems are creative, unpredictable, independent, autonomous, rational, evolving, capable of data collection, communicative, efficient, accurate, and have free choice among alternatives. Similar to humans, AI systems can autonomously create and generate creative works. The use of AI systems in the production of works, either for personal or manufacturing purposes, has become common in the 3A era of automated, autonomous, and advanced technology. Despite this progress, there is a deep and common concern in modern society that AI technology will become uncontrollable. There is therefore a call for social and legal tools for controlling AI systems’ functions and outcomes. This Article addresses the questions of the copyrightability of artworks generated by AI systems: ownership and accountability. The Article debates who should enjoy the benefits of copyright protection and who should be responsible for the infringement of rights and damages caused by AI systems that independently produce creative works. Subsequently, this Article presents the AI Multi- Player paradigm, arguing against the imposition of these rights and responsibilities on the AI systems themselves or on the different stakeholders, mainly the programmers who develop such systems. Most importantly, this Article proposes the adoption of a new model of accountability for works generated by AI systems: the AI Work Made for Hire (WMFH) model, which views the AI system as a creative employee or independent contractor of the user. Under this proposed model, ownership, control, and responsibility would be imposed on the humans or legal entities that use AI systems and enjoy its benefits. This model accurately reflects the human-like features of AI systems; it is justified by the theories behind copyright protection; and it serves as a practical solution to assuage the fears behind AI systems. In addition, this model unveils the powers behind the operation of AI systems; hence, it efficiently imposes accountability on clearly identifiable persons or legal entities. Since AI systems are copyrightable algorithms, this Article reflects on the accountability for AI systems in other legal regimes, such as tort or criminal law and in various industries using these systems

    Artequakt: Generating tailored biographies from automatically annotated fragments from the web

    Get PDF
    The Artequakt project seeks to automatically generate narrativebiographies of artists from knowledge that has been extracted from the Web and maintained in a knowledge base. An overview of the system architecture is presented here and the three key components of that architecture are explained in detail, namely knowledge extraction, information management and biography construction. Conclusions are drawn from the initial experiences of the project and future progress is detailed

    Automatic extraction of knowledge from web documents

    Get PDF
    A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper

    Knowledge extraction from minutes of Portuguese municipalities meetings

    Get PDF
    A very relevant problem in e-government is that a great amount of knowledge is in natural language unstructured documents. If that knowledge was stored using a computer-processable representation it would be more easily accessed. In this paper we present the architecture, modules and initial results of a prototype under development for extracting information from government documents. The prototype stores the information using a formal representation of the set of concepts and the relationships between those concepts - an ontology. The system was tested using minutes of Portuguese Municipal Boards meetings. Initial results are presented for an important and frequent topic of the minutes: the subsidies granted by municipalities

    Knowledge Representation of Crime-Related Events: a Preliminary Approach

    Get PDF
    The crime is spread in every daily newspaper, and particularly on criminal investigation reports produced by several Police departments, creating an amount of data to be processed by Humans. Other research studies related to relation extraction (a branch of information retrieval) in Portuguese arisen along the years, but with few extracted relations and several computer methods approaches, that could be improved by recent features, to achieve better performance results. This paper aims to present the ongoing work related to SEM (Simple Event Model) ontology population with instances retrieved from crime-related documents, supported by an SVO (Subject, Verb, Object) algorithm using hand-crafted rules to extract events, achieving a performance measure of 0.86 (F-Measure)

    Discovery of sensitive data with natural language processing

    Get PDF
    The process of protecting sensitive data is continually growing and becoming increasingly important, especially as a result of the directives and laws imposed by the European Union. The effort to create automatic systems is continuous, but in most cases, the processes behind them are still manual or semi-automatic. In this work, we have developed a component that can extract and classify sensitive data, from unstructured text information in European Portuguese. The objective was to create a system that allows organizations to understand their data and comply with legal and security purposes. We studied a hybrid approach to the problem of Named Entities Recognition for the Portuguese language. This approach combines several techniques such as rule-based/lexical-based models, machine learning algorithms and neural networks. The rule-based and lexical-based approaches were used only for a set of specific classes. For the remaining classes of entities, SpaCy and Stanford NLP tools were tested, two statistical models – Conditional Random Fields and Random Forest – were implemented and, finally, a Bidirectional- LSTM approach as experimented. The best results were achieved with the Stanford NER model (86.41%), from the Stanford NLP tool. Regarding the statistical models, we realized that Conditional Random Fields is the one that can obtain the best results, with a f1-score of 65.50%. With the Bi-LSTM approach, we have achieved a result of 83.01%. The corpora used for training and testing were HAREM Golden Collection, SIGARRA News Corpus and DataSense NER Corpus.O processo de preservação de dados sensĂ­veis estĂĄ em constante crescimento e cada vez apresenta maior importĂąncia, proveniente especialmente das diretivas e leis impostas pela UniĂŁo Europeia. O esforço para criar sistemas automĂĄticos Ă© contĂ­nuo, mas o processo Ă© realizado na maioria dos casos de forma manual ou semiautomĂĄtica. Neste trabalho desenvolvemos um componente de Extração e Classificação de dados sensĂ­veis, que processa textos nĂŁo-estruturados em PortuguĂȘs Europeu. O objetivo consistiu em criar um sistema que permite Ă s organizaçÔes compreender os seus dados e cumprir com fins legais de conformidade e segurança. Para resolver este problema, foi estudada uma abordagem hĂ­brida de Reconhecimento de Entidades Mencionadas para a lĂ­ngua Portuguesa. Esta abordagem combina tĂ©cnicas baseadas em regras e lĂ©xicos, algoritmos de aprendizagem automĂĄtica e redes neuronais. As primeiras abordagens baseadas em regras e lĂ©xicos, foram utilizadas apenas para um conjunto de classes especificas. Para as restantes classes de entidades foram utilizadas as ferramentas SpaCy e Stanford NLP, testados dois modelos estatĂ­sticos — Conditional Random Fields e Random Forest – e por fim testada uma abordagem baseada em redes neuronais – Bidirectional-LSTM. Ao nĂ­vel das ferramentas utilizadas os melhores resultados foram conseguidos com o modelo Stanford NER (86,41%). AtravĂ©s dos modelos estatĂ­sticos percebemos que o Conditional Random Fields Ă© o que consegue obter melhores resultados, com um f1-score de 65,50%. Com a Ășltima abordagem, uma rede neuronal Bi-LSTM, conseguimos resultado de f1-score de aproximadamente 83,01%. Para o treino e teste das diferentes abordagens foram utilizados os conjuntos de dados HAREM Golden Collection, SIGARRA News Corpus e DataSense NER Corpus
    • 

    corecore