9 research outputs found

    Detecting Term Relationships to Improve Textual Document Sanitization

    Get PDF
    Nowadays, the publication of textual documents provides critical benefits to scientific research and business scenarios where information analysis plays an essential role. Nevertheless, the possible existence of identifying or confidential data in this kind of documents motivates the use of measures to sanitize sensitive information before being published, while keeping the innocuous data unmodified. Several automatic sanitization mechanisms can be found in the literature; however, most of them evaluate the sensitivity of the textual terms considering them as independent variables. At the same time, some authors have shown that there are important information disclosure risks inherent to the existence of relationships between sanitized and non-sanitized terms. Therefore, neglecting term relationships in document sanitization represents a serious privacy threat. In this paper, we present a general-purpose method to automatically detect semantically related terms that may enable disclosure of sensitive data. The foundations of Information Theory and a corpus as large as the Web are used to assess the degree relationship between textual terms according to the amount of information they provide from each other. Preliminary evaluation results show that our proposal significantly improves the detection recall of current sanitization schemes, which reduces the disclosure risk

    Using Part-of-Speech N-grams for Sensitive-Text Classification

    Get PDF
    Freedom of Information legislations in many western democ- racies, including the United Kingdom (UK) and the United States of America (USA), state that citizens have typically the right to access government documents. However, certain sensitive information is exempt from release into the pub- lic domain. For example, in the UK, FOIA Exemption 27 (International Relations) excludes the release of Informa- tion that might damage the interests of the UK abroad. Therefore, the process of reviewing government documents for sensitivity is essential to determine if a document must be redacted before it is archived, or closed until the infor- mation is no longer sensitive. With the increased volume of digital government documents in recent years, there is a need for new tools to assist the digital sensitivity review process. Therefore, in this paper we propose an automatic approach for identifying sensitive text in documents by measuring the amount of sensitivity in sequences of text. Using government documents reviewed by trained sensitivity reviewers, we fo- cus on an aspect of FOIA Exemption 27 which can have a major impact on international relations, namely information supplied in con�dence. We show that our approach leads to markedly increased recall of sensitive text, while achieving a very high level of precision, when compared to a baseline that has been shown to be e�ective at identifying sensitive text in other domains

    Principles of Security and Trust

    Get PDF
    This open access book constitutes the proceedings of the 8th International Conference on Principles of Security and Trust, POST 2019, which took place in Prague, Czech Republic, in April 2019, held as part of the European Joint Conference on Theory and Practice of Software, ETAPS 2019. The 10 papers presented in this volume were carefully reviewed and selected from 27 submissions. They deal with theoretical and foundational aspects of security and trust, including on new theoretical results, practical applications of existing foundational ideas, and innovative approaches stimulated by pressing practical problems

    Ethical and Unethical Hacking

    Get PDF
    The goal of this chapter is to provide a conceptual analysis of ethical, comprising history, common usage and the attempt to provide a systematic classification that is both compatible with common usage and normatively adequate. Subsequently, the article identifies a tension between common usage and a normativelyadequate nomenclature. ‘Ethical hackers’ are often identified with hackers that abide to a code of ethics privileging business-friendly values. However, there is no guarantee that respecting such values is always compatible with the all-things-considered morally best act. It is recognised, however, that in terms of assessment, it may be quite difficult to determine who is an ethical hacker in the ‘all things considered’ sense, while society may agree more easily on the determination of who is one in the ‘business-friendly’ limited sense. The article concludes by suggesting a pragmatic best-practice approach for characterising ethical hacking, which reaches beyond business-friendly values and helps in the taking of decisions that are respectful of the hackers’ individual ethics in morally debatable, grey zones

    Best Practices and Recommendations for Cybersecurity Service Providers

    Full text link
    This chapter outlines some concrete best practices and recommendations for cybersecurity service providers, with a focus on data sharing, data protection and penetration testing. Based on a brief outline of dilemmas that cybersecurity service providers may experience in their daily operations, it discusses data handling policies and practices of cybersecurity vendors along the following five topics: customer data handling; information about breaches; threat intelligence; vulnerability-related information; and data involved when collaborating with peers, CERTs, cybersecurity research groups, etc. There is, furthermore, a discussion of specific issues of penetration testing such as customer recruitment and execution as well as the supervision and governance of penetration testing. The chapter closes with some general recommendations regarding improving the ethical decision-making procedures of private cybersecurity service providers

    The Ethics of Cybersecurity

    Get PDF
    This open access book provides the first comprehensive collection of papers that provide an integrative view on cybersecurity. It discusses theories, problems and solutions on the relevant ethical issues involved. This work is sorely needed in a world where cybersecurity has become indispensable to protect trust and confidence in the digital infrastructure whilst respecting fundamental values like equality, fairness, freedom, or privacy. The book has a strong practical focus as it includes case studies outlining ethical issues in cybersecurity and presenting guidelines and other measures to tackle those issues. It is thus not only relevant for academics but also for practitioners in cybersecurity such as providers of security software, governmental CERTs or Chief Security Officers in companies

    Moving towards the semantic web: enabling new technologies through the semantic annotation of social contents.

    Get PDF
    La Web Social ha causat un creixement exponencial dels continguts disponibles deixant enormes quantitats de recursos textuals electrònics que sovint aclaparen els usuaris. Aquest volum d’informació és d’interès per a la comunitat de mineria de dades. Els algorismes de mineria de dades exploten característiques de les entitats per tal de categoritzar-les, agrupar-les o classificar-les segons la seva semblança. Les dades per si mateixes no aporten cap mena de significat: han de ser interpretades per esdevenir informació. Els mètodes tradicionals de mineria de dades no tenen com a objectiu “entendre” el contingut d’un recurs, sinó que extreuen valors numèrics els quals esdevenen models en aplicar-hi càlculs estadístics, que només cobren sentit sota l’anàlisi manual d’un expert. Els darrers anys, motivat per la Web Semàntica, molts investigadors han proposat mètodes semàntics de classificació de dades capaços d’explotar recursos textuals a nivell conceptual. Malgrat això, normalment aquests mètodes depenen de recursos anotats prèviament per poder interpretar semànticament el contingut d’un document. L’ús d’aquests mètodes està estretament relacionat amb l’associació de dades i el seu significat. Aquest treball es centra en el desenvolupament d’una metodologia genèrica capaç de detectar els trets més rellevants d’un recurs textual descobrint la seva associació semàntica, es a dir, enllaçant-los amb conceptes modelats a una ontologia, i detectant els principals temes de discussió. Els mètodes proposats són no supervisats per evitar el coll d’ampolla generat per l’anotació manual, independents del domini (aplicables a qualsevol àrea de coneixement) i flexibles (capaços d’analitzar recursos heterogenis: documents textuals o documents semi-estructurats com els articles de la Viquipèdia o les publicacions de Twitter). El treball ha estat avaluat en els àmbits turístic i mèdic. Per tant, aquesta dissertació és un primer pas cap a l'anotació semàntica automàtica de documents necessària per possibilitar el camí cap a la visió de la Web Semàntica.La Web Social ha provocado un crecimiento exponencial de los contenidos disponibles, dejando enormes cantidades de recursos electrónicos que a menudo abruman a los usuarios. Tal volumen de información es de interés para la comunidad de minería de datos. Los algoritmos de minería de datos explotan características de las entidades para categorizarlas, agruparlas o clasificarlas según su semejanza. Los datos por sí mismos no aportan ningún significado: deben ser interpretados para convertirse en información. Los métodos tradicionales no tienen como objetivo "entender" el contenido de un recurso, sino que extraen valores numéricos que se convierten en modelos tras aplicar cálculos estadísticos, los cuales cobran sentido bajo el análisis manual de un experto. Actualmente, motivados por la Web Semántica, muchos investigadores han propuesto métodos semánticos de clasificación de datos capaces de explotar recursos textuales a nivel conceptual. Sin embargo, generalmente estos métodos dependen de recursos anotados previamente para poder interpretar semánticamente el contenido de un documento. El uso de estos métodos está estrechamente relacionado con la asociación de datos y su significado. Este trabajo se centra en el desarrollo de una metodología genérica capaz de detectar los rasgos más relevantes de un recurso textual descubriendo su asociación semántica, es decir, enlazándolos con conceptos modelados en una ontología, y detectando los principales temas de discusión. Los métodos propuestos son no supervisados para evitar el cuello de botella generado por la anotación manual, independientes del dominio (aplicables a cualquier área de conocimiento) y flexibles (capaces de analizar recursos heterogéneos: documentos textuales o documentos semi-estructurados, como artículos de la Wikipedia o publicaciones de Twitter). El trabajo ha sido evaluado en los ámbitos turístico y médico. Esta disertación es un primer paso hacia la anotación semántica automática de documentos necesaria para posibilitar el camino hacia la visión de la Web Semántica.Social Web technologies have caused an exponential growth of the documents available through the Web, making enormous amounts of textual electronic resources available. Users may be overwhelmed by such amount of contents and, therefore, the automatic analysis and exploitation of all this information is of interest to the data mining community. Data mining algorithms exploit features of the entities in order to characterise, group or classify them according to their resemblance. Data by itself does not carry any meaning; it needs to be interpreted to convey information. Classical data analysis methods did not aim to “understand” the content and the data were treated as meaningless numbers and statistics were calculated on them to build models that were interpreted manually by human domain experts. Nowadays, motivated by the Semantic Web, many researchers have proposed semantic-grounded data classification and clustering methods that are able to exploit textual data at a conceptual level. However, they usually rely on pre-annotated inputs to be able to semantically interpret textual data such as the content of Web pages. The usability of all these methods is related to the linkage between data and its meaning. This work focuses on the development of a general methodology able to detect the most relevant features of a particular textual resource finding out their semantics (associating them to concepts modelled in ontologies) and detecting its main topics. The proposed methods are unsupervised (avoiding the manual annotation bottleneck), domain-independent (applicable to any area of knowledge) and flexible (being able to deal with heterogeneous resources: raw text documents, semi-structured user-generated documents such Wikipedia articles or short and noisy tweets). The methods have been evaluated in different fields (Tourism, Oncology). This work is a first step towards the automatic semantic annotation of documents, needed to pave the way towards the Semantic Web vision
    corecore