4,152 research outputs found

    Temporal Information Processing: A Survey

    Get PDF
    Temporal Information Processing is a subfield of Natural Language Processing, valuable in many tasks like Question Answering and Summarization. Temporal Information Processing is broadened, ranging from classical theories of time and language to current computational approaches for Temporal Information Extraction. This later trend consists on the automatic extraction of events and temporal expressions. Such issues have attracted great attention especially with the development of annotated corpora and annotations schemes mainly TimeBank and TimeML. In this paper, we give a survey of Temporal Information Extraction from Natural Language texts

    A comparative study of Chinese and European Internet companies' privacy policy based on knowledge graph

    Get PDF
    Privacy policy is not only a means of industry self-discipline, but also a way for users to protect their online privacy. The European Union (EU) promulgated the General Data Protection Regulation (GDPR) on May 25th, 2018, while China has no explicit personal data protection law. Based on knowledge graph, this thesis makes a comparative analysis of the Chinese and European Internet companies’ privacy policies, and combines with the relevant provisions of GDPR, puts forward suggestions on the privacy policy of Internet companies, so as to solve the problem of personal in-formation protection to a certain extent. Firstly, this thesis chooses the process and methods of knowledge graph construction and analysis. The process of constructing and analyzing the knowledge graph is: data preprocessing, entity extraction, storage in graph database and query. Data preprocessing includes word segmentation and part-of-speech tagging, as well as text format adjustment. Entity extraction is the core of knowledge graph construction in this thesis. Based on the principle of Conditional Random Fields (CRF), CFR++ toolkit is used for the entity extraction. Subsequently, the extracted entities are transformed into “.csv” format and stored in the graph database Neo4j, so the knowledge graph is generated. Cypher query statements can be used to query information in the graph database. The next part is about comparison and analysis of the Internet companies’ privacy policies in China and Europe. After sampling, the overall characteristics of the privacy policies of Chinese and European Internet companies are compared. According to the process of constructing knowledge graphs mentioned above, the “collected information” and “contact us” parts of the privacy policy are used to construct the knowledge graphs. Finally, combined with the relevant content of GDPR, the results of the comparative analysis are further discussed, and suggestions are proposed. Although Chinese Internet companies’ privacy policies have some merits, they are far inferior to those of European Internet companies. China also needs to enact a personal data protection law according to its national conditions. This thesis applies knowledge graph to the privacy policy research, and analyses Internet companies’ privacy policies from a comparative perspective. It also discusses the comparative results with GDPR and puts forward suggestions, and provides reference for the formulation of China's personal information protection law

    Arabic named entity recognition

    Full text link
    En esta tesis doctoral se describen las investigaciones realizadas con el objetivo de determinar las mejores tecnicas para construir un Reconocedor de Entidades Nombradas en Arabe. Tal sistema tendria la habilidad de identificar y clasificar las entidades nombradas que se encuentran en un texto arabe de dominio abierto. La tarea de Reconocimiento de Entidades Nombradas (REN) ayuda a otras tareas de Procesamiento del Lenguaje Natural (por ejemplo, la Recuperacion de Informacion, la Busqueda de Respuestas, la Traduccion Automatica, etc.) a lograr mejores resultados gracias al enriquecimiento que a~nade al texto. En la literatura existen diversos trabajos que investigan la tarea de REN para un idioma especifico o desde una perspectiva independiente del lenguaje. Sin embargo, hasta el momento, se han publicado muy pocos trabajos que estudien dicha tarea para el arabe. El arabe tiene una ortografia especial y una morfologia compleja, estos aspectos aportan nuevos desafios para la investigacion en la tarea de REN. Una investigacion completa del REN para elarabe no solo aportaria las tecnicas necesarias para conseguir un alto rendimiento, sino que tambien proporcionara un analisis de los errores y una discusion sobre los resultados que benefician a la comunidad de investigadores del REN. El objetivo principal de esta tesis es satisfacer esa necesidad. Para ello hemos: 1. Elaborado un estudio de los diferentes aspectos del arabe relacionados con dicha tarea; 2. Analizado el estado del arte del REN; 3. Llevado a cabo una comparativa de los resultados obtenidos por diferentes tecnicas de aprendizaje automatico; 4. Desarrollado un metodo basado en la combinacion de diferentes clasificadores, donde cada clasificador trata con una sola clase de entidades nombradas y emplea el conjunto de caracteristicas y la tecnica de aprendizaje automatico mas adecuados para la clase de entidades nombradas en cuestion. Nuestros experimentos han sido evaluados sobre nueve conjuntos de test.Benajiba, Y. (2009). Arabic named entity recognition [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8318Palanci

    A history and theory of textual event detection and recognition

    Get PDF

    Fine-grained Dutch named entity recognition

    Get PDF
    This paper describes the creation of a fine-grained named entity annotation scheme and corpus for Dutch, and experiments on automatic main type and subtype named entity recognition. We give an overview of existing named entity annotation schemes, and motivate our own, which describes six main types (persons, organizations, locations, products, events and miscellaneous named entities) and finer-grained information on subtypes and metonymic usage. This was applied to a one-million-word subset of the Dutch SoNaR reference corpus. The classifier for main type named entities achieves a micro-averaged F-score of 84.91 %, and is publicly available, along with the corpus and annotations
    • …
    corecore