539 research outputs found

    Representation Learning for Words and Entities

    Get PDF
    This thesis presents new methods for unsupervised learning of distributed representations of words and entities from text and knowledge bases. The first algorithm presented in the thesis is a multi-view algorithm for learning representations of words called Multiview Latent Semantic Analysis (MVLSA). By incorporating up to 46 different types of co-occurrence statistics for the same vocabulary of english words, I show that MVLSA outperforms other state-of-the-art word embedding models. Next, I focus on learning entity representations for search and recommendation and present the second method of this thesis, Neural Variational Set Expansion (NVSE). NVSE is also an unsupervised learning method, but it is based on the Variational Autoencoder framework. Evaluations with human annotators show that NVSE can facilitate better search and recommendation of information gathered from noisy, automatic annotation of unstructured natural language corpora. Finally, I move from unstructured data and focus on structured knowledge graphs. I present novel approaches for learning embeddings of vertices and edges in a knowledge graph that obey logical constraints.Comment: phd thesis, Machine Learning, Natural Language Processing, Representation Learning, Knowledge Graphs, Entities, Word Embeddings, Entity Embedding

    Trading Indistinguishability-based Privacy and Utility of Complex Data

    Get PDF
    The collection and processing of complex data, like structured data or infinite streams, facilitates novel applications. At the same time, it raises privacy requirements by the data owners. Consequently, data administrators use privacy-enhancing technologies (PETs) to sanitize the data, that are frequently based on indistinguishability-based privacy definitions. Upon engineering PETs, a well-known challenge is the privacy-utility trade-off. Although literature is aware of a couple of trade-offs, there are still combinations of involved entities, privacy definition, type of data and application, in which we miss valuable trade-offs. In this thesis, for two important groups of applications processing complex data, we study (a) which indistinguishability-based privacy and utility requirements are relevant, (b) whether existing PETs solve the trade-off sufficiently, and (c) propose novel PETs extending the state-of-the-art substantially in terms of methodology, as well as achieved privacy or utility. Overall, we provide four contributions divided into two parts. In the first part, we study applications that analyze structured data with distance-based mining algorithms. We reveal that an essential utility requirement is the preservation of the pair-wise distances of the data items. Consequently, we propose distance-preserving encryption (DPE), together with a general procedure to engineer respective PETs by leveraging existing encryption schemes. As proof of concept, we apply it to SQL log mining, useful for database performance tuning. In the second part, we study applications that monitor query results over infinite streams. To this end, -event differential privacy is state-of-the-art. Here, PETs use mechanisms that typically add noise to query results. First, we study state-of-the-art mechanisms with respect to the utility they provide. Conducting the so far largest benchmark that fulfills requirements derived from limitations of prior experimental studies, we contribute new insights into the strengths and weaknesses of existing mechanisms. One of the most unexpected, yet explainable result, is a baseline supremacy. It states that one of the two baseline mechanisms delivers high or even the best utility. A natural follow-up question is whether baseline mechanisms already provide reasonable utility. So, second, we perform a case study from the area of electricity grid monitoring revealing two results. First, achieving reasonable utility is only possible under weak privacy requirements. Second, the utility measured with application-specific utility metrics decreases faster than the sanitization error, that is used as utility metric in most studies, suggests. As a third contribution, we propose a novel differential privacy-based privacy definition called Swellfish privacy. It allows tuning utility beyond incremental -event mechanism design by supporting time-dependent privacy requirements. Formally, as well as by experiments, we prove that it increases utility significantly. In total, our thesis contributes substantially to the research field, and reveals directions for future research

    Natural Language Processing in-and-for Design Research

    Full text link
    We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research

    Context based multimedia information retrieval

    Get PDF

    Representation Learning for Words and Entities

    Get PDF
    This thesis presents new methods for unsupervised learning of distributed representations of words and entities from text and knowledge bases. The first algorithm presented in the thesis is a multi-view algorithm for learning representations of words called Multiview LSA (MVLSA). Through experiments on close to 50 different views, I show that MVLSA outperforms other state-of-the-art word embedding models. After that, I focus on learning entity representations for search and recommendation and present the second algorithm of this thesis called Neural Variational Set Expansion (NVSE). NVSE is also an unsupervised learning method, but it is based on the Variational Autoencoder framework. Evaluations with human annotators show that NVSE can facilitate better search and recommendation of information gathered from noisy, automatic annotation of unstructured natural language corpora. Finally, I move from unstructured data and focus on structured knowledge graphs. Moreover, I present novel approaches for learning embeddings of vertices and edges in a knowledge graph that obey logical constraints

    Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

    Get PDF
    Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents--or short passages--in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms--such as a person's name or a product model number--not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections--such as the document index of a commercial Web search engine--containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.Comment: PhD thesis, Univ College London (2020

    An intelligent system for facility management

    Get PDF
    A software system has been developed that monitors and interprets temporally changing (internal) building environments and generates related knowledge that can assist in facility management (FM) decision making. The use of the multi agent paradigm renders a system that delivers demonstrable rationality and is robust within the dynamic environment that it operates. Agent behaviour directed at working toward goals is rendered intelligent with semantic web technologies. The capture of semantics though formal expression to model the environment, adds a richness that the agents exploit to intelligently determine behaviours to satisfy goals that are flexible and adaptable. The agent goals are to generate knowledge about building space usage as well as environmental conditions by elaborating and combining near real time sensor data and information from conventional building models. Additionally further inferences are facilitated including those about wasted resources such as unnecessary lighting and heating for example. In contrast, current FM tools, lacking automatic synchronisation with the domain and rich semantic modelling, are limited to the simpler querying of manually maintained models.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Identity Management and Authorization Infrastructure in Secure Mobile Access to Electronic Health Records

    Get PDF
    We live in an age of the mobile paradigm of anytime/anywhere access, as the mobile device is the most ubiquitous device that people now hold. Due to their portability, availability, easy of use, communication, access and sharing of information within various domains and areas of our daily lives, the acceptance and adoption of these devices is still growing. However, due to their potential and raising numbers, mobile devices are a growing target for attackers and, like other technologies, mobile applications are still vulnerable. Health information systems are composed with tools and software to collect, manage, analyze and process medical information (such as electronic health records and personal health records). Therefore, such systems can empower the performance and maintenance of health services, promoting availability, readability, accessibility and data sharing of vital information about a patients overall medical history, between geographic fragmented health services. Quick access to information presents a great importance in the health sector, as it accelerates work processes, resulting in better time utilization. Additionally, it may increase the quality of care. However health information systems store and manage highly sensitive data, which raises serious concerns regarding patients privacy and safety, and may explain the still increasing number of malicious incidents reports within the health domain. Data related to health information systems are highly sensitive and subject to severe legal and regulatory restrictions, that aim to protect the individual rights and privacy of patients. Along side with these legislations, security requirements must be analyzed and measures implemented. Within the necessary security requirements to access health data, secure authentication, identity management and access control are essential to provide adequate means to protect data from unauthorized accesses. However, besides the use of simple authentication models, traditional access control models are commonly based on predefined access policies and roles, and are inflexible. This results in uniform access control decisions through people, different type of devices, environments and situational conditions, and across enterprises, location and time. Although already existent models allow to ensure the needs of the health care systems, they still lack components for dynamicity and privacy protection, which leads to not have desire levels of security and to the patient not to have a full and easy control of his privacy. Within this master thesis, after a deep research and review of the stat of art, was published a novel dynamic access control model, Socio-Technical Risk-Adaptable Access Control modEl (SoTRAACE), which can model the inherent differences and security requirements that are present in this thesis. To do this, SoTRAACE aggregates attributes from various domains to help performing a risk assessment at the moment of the request. The assessment of the risk factors identified in this work is based in a Delphi Study. A set of security experts from various domains were selected, to classify the impact in the risk assessment of each attribute that SoTRAACE aggregates. SoTRAACE was integrated in an architecture with requirements well-founded, and based in the best recommendations and standards (OWASP, NIST 800-53, NIST 800-57), as well based in deep review of the state-of-art. The architecture is further targeted with the essential security analysis and the threat model. As proof of concept, the proposed access control model was implemented within the user-centric architecture, with two mobile prototypes for several types of accesses by patients and healthcare professionals, as well the web servers that handles the access requests, authentication and identity management. The proof of concept shows that the model works as expected, with transparency, assuring privacy and data control to the user without impact for user experience and interaction. It is clear that the model can be extended to other industry domains, and new levels of risks or attributes can be added because it is modular. The architecture also works as expected, assuring secure authentication with multifactor, and secure data share/access based in SoTRAACE decisions. The communication channel that SoTRAACE uses was also protected with a digital certificate. At last, the architecture was tested within different Android versions, tested with static and dynamic analysis and with tests with security tools. Future work includes the integration of health data standards and evaluating the proposed system by collecting users’ opinion after releasing the system to real world.Hoje em dia vivemos em um paradigma móvel de acesso em qualquer lugar/hora, sendo que os dispositivos móveis são a tecnologia mais presente no dia a dia da sociedade. Devido à sua portabilidade, disponibilidade, fácil manuseamento, poder de comunicação, acesso e partilha de informação referentes a várias áreas e domínios das nossas vidas, a aceitação e integração destes dispositivos é cada vez maior. No entanto, devido ao seu potencial e aumento do número de utilizadores, os dispositivos móveis são cada vez mais alvos de ataques, e tal como outras tecnologias, aplicações móveis continuam a ser vulneráveis. Sistemas de informação de saúde são compostos por ferramentas e softwares que permitem recolher, administrar, analisar e processar informação médica (tais como documentos de saúde eletrónicos). Portanto, tais sistemas podem potencializar a performance e a manutenção dos serviços de saúde, promovendo assim a disponibilidade, acessibilidade e a partilha de dados vitais referentes ao registro médico geral dos pacientes, entre serviços e instituições que estão geograficamente fragmentadas. O rápido acesso a informações médicas apresenta uma grande importância para o setor da saúde, dado que acelera os processos de trabalho, resultando assim numa melhor eficiência na utilização do tempo e recursos. Consequentemente haverá uma melhor qualidade de tratamento. Porém os sistemas de informação de saúde armazenam e manuseiam dados bastantes sensíveis, o que levanta sérias preocupações referentes à privacidade e segurança do paciente. Assim se explica o aumento de incidentes maliciosos dentro do domínio da saúde. Os dados de saúde são altamente sensíveis e são sujeitos a severas leis e restrições regulamentares, que pretendem assegurar a proteção dos direitos e privacidade dos pacientes, salvaguardando os seus dados de saúde. Juntamente com estas legislações, requerimentos de segurança devem ser analisados e medidas implementadas. Dentro dos requerimentos necessários para aceder aos dados de saúde, uma autenticação segura, gestão de identidade e controlos de acesso são essenciais para fornecer meios adequados para a proteção de dados contra acessos não autorizados. No entanto, além do uso de modelos simples de autenticação, os modelos tradicionais de controlo de acesso são normalmente baseados em políticas de acesso e cargos pré-definidos, e são inflexíveis. Isto resulta em decisões de controlo de acesso uniformes para diferentes pessoas, tipos de dispositivo, ambientes e condições situacionais, empresas, localizações e diferentes alturas no tempo. Apesar dos modelos existentes permitirem assegurar algumas necessidades dos sistemas de saúde, ainda há escassez de componentes para accesso dinâmico e proteção de privacidade , o que resultam em níveis de segurança não satisfatórios e em o paciente não ter controlo directo e total sobre a sua privacidade e documentos de saúde. Dentro desta tese de mestrado, depois da investigação e revisão intensiva do estado da arte, foi publicado um modelo inovador de controlo de acesso, chamado SoTRAACE, que molda as diferenças de acesso inerentes e requerimentos de segurança presentes nesta tese. Para isto, o SoTRAACE agrega atributos de vários ambientes e domínios que ajudam a executar uma avaliação de riscos, no momento em que os dados são requisitados. A avaliação dos fatores de risco identificados neste trabalho são baseados num estudo de Delphi. Um conjunto de peritos de segurança de vários domínios industriais foram selecionados, para classificar o impacto de cada atributo que o SoTRAACE agrega. O SoTRAACE foi integrado numa arquitectura para acesso a dados médicos, com requerimentos bem fundados, baseados nas melhores normas e recomendações (OWASP, NIST 800-53, NIST 800-57), e em revisões intensivas do estado da arte. Esta arquitectura é posteriormente alvo de uma análise de segurança e modelos de ataque. Como prova deste conceito, o modelo de controlo de acesso proposto é implementado juntamente com uma arquitetura focada no utilizador, com dois protótipos para aplicações móveis, que providênciam vários tipos de acesso de pacientes e profissionais de saúde. A arquitetura é constituída também por servidores web que tratam da gestão de dados, controlo de acesso e autenticação e gestão de identidade. O resultado final mostra que o modelo funciona como esperado, com transparência, assegurando a privacidade e o controlo de dados para o utilizador, sem ter impacto na sua interação e experiência. Consequentemente este modelo pode-se extender para outros setores industriais, e novos níveis de risco ou atributos podem ser adicionados a este mesmo, por ser modular. A arquitetura também funciona como esperado, assegurando uma autenticação segura com multi-fator, acesso e partilha de dados segura baseado em decisões do SoTRAACE. O canal de comunicação que o SoTRAACE usa foi também protegido com um certificado digital. A arquitectura foi testada em diferentes versões de Android, e foi alvo de análise estática, dinâmica e testes com ferramentas de segurança. Para trabalho futuro está planeado a integração de normas de dados de saúde e a avaliação do sistema proposto, através da recolha de opiniões de utilizadores no mundo real

    Security-Driven Software Evolution Using A Model Driven Approach

    Get PDF
    High security level must be guaranteed in applications in order to mitigate risks during the deployment of information systems in open network environments. However, a significant number of legacy systems remain in use which poses security risks to the enterprise’ assets due to the poor technologies used and lack of security concerns when they were in design. Software reengineering is a way out to improve their security levels in a systematic way. Model driven is an approach in which model as defined by its type directs the execution of the process. The aim of this research is to explore how model driven approach can facilitate the software reengineering driven by security demand. The research in this thesis involves the following three phases. Firstly, legacy system understanding is performed using reverse engineering techniques. Task of this phase is to reverse engineer legacy system into UML models, partition the legacy system into subsystems with the help of model slicing technique and detect existing security mechanisms to determine whether or not the provided security in the legacy system satisfies the user’s security objectives. Secondly, security requirements are elicited using risk analysis method. It is the process of analysing key aspects of the legacy systems in terms of security. A new risk assessment method, taking consideration of asset, threat and vulnerability, is proposed and used to elicit the security requirements which will generate the detailed security requirements in the specific format to direct the subsequent security enhancement. Finally, security enhancement for the system is performed using the proposed ontology based security pattern approach. It is the stage that security patterns derived from security expertise and fulfilling the elicited security requirements are selected and integrated in the legacy system models with the help of the proposed security ontology. The proposed approach is evaluated by the selected case study. Based on the analysis, conclusions are drawn and future research is discussed at the end of this thesis. The results show this thesis contributes an effective, reusable and suitable evolution approach for software security
    corecore