3,215 research outputs found

    A comparative study of Chinese and European Internet companies' privacy policy based on knowledge graph

    Get PDF
    Privacy policy is not only a means of industry self-discipline, but also a way for users to protect their online privacy. The European Union (EU) promulgated the General Data Protection Regulation (GDPR) on May 25th, 2018, while China has no explicit personal data protection law. Based on knowledge graph, this thesis makes a comparative analysis of the Chinese and European Internet companies’ privacy policies, and combines with the relevant provisions of GDPR, puts forward suggestions on the privacy policy of Internet companies, so as to solve the problem of personal in-formation protection to a certain extent. Firstly, this thesis chooses the process and methods of knowledge graph construction and analysis. The process of constructing and analyzing the knowledge graph is: data preprocessing, entity extraction, storage in graph database and query. Data preprocessing includes word segmentation and part-of-speech tagging, as well as text format adjustment. Entity extraction is the core of knowledge graph construction in this thesis. Based on the principle of Conditional Random Fields (CRF), CFR++ toolkit is used for the entity extraction. Subsequently, the extracted entities are transformed into “.csv” format and stored in the graph database Neo4j, so the knowledge graph is generated. Cypher query statements can be used to query information in the graph database. The next part is about comparison and analysis of the Internet companies’ privacy policies in China and Europe. After sampling, the overall characteristics of the privacy policies of Chinese and European Internet companies are compared. According to the process of constructing knowledge graphs mentioned above, the “collected information” and “contact us” parts of the privacy policy are used to construct the knowledge graphs. Finally, combined with the relevant content of GDPR, the results of the comparative analysis are further discussed, and suggestions are proposed. Although Chinese Internet companies’ privacy policies have some merits, they are far inferior to those of European Internet companies. China also needs to enact a personal data protection law according to its national conditions. This thesis applies knowledge graph to the privacy policy research, and analyses Internet companies’ privacy policies from a comparative perspective. It also discusses the comparative results with GDPR and puts forward suggestions, and provides reference for the formulation of China's personal information protection law

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    A Review of Voice-Base Person Identification: State-of-the-Art

    Get PDF
    Automated person identification and authentication systems are useful for national security, integrity of electoral processes, prevention of cybercrimes and many access control applications. This is a critical component of information and communication technology which is central to national development. The use of biometrics systems in identification is fast replacing traditional methods such as use of names, personal identification numbers codes, password, etc., since nature bestow individuals with distinct personal imprints and signatures. Different measures have been put in place for person identification, ranging from face, to fingerprint and so on. This paper highlights the key approaches and schemes developed in the last five decades for voice-based person identification systems. Voice-base recognition system has gained interest due to its non-intrusive technique of data acquisition and its increasing method of continually studying and adapting to the person’s changes. Information on the benefits and challenges of various biometric systems are also presented in this paper. The present and prominent voice-based recognition methods are discussed. It was observed that these systems application areas have covered intelligent monitoring, surveillance, population management, election forensics, immigration and border control

    Personal information system

    Get PDF
    None provided

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance
    • …
    corecore