539 research outputs found
Representation Learning for Words and Entities
This thesis presents new methods for unsupervised learning of distributed
representations of words and entities from text and knowledge bases. The first
algorithm presented in the thesis is a multi-view algorithm for learning
representations of words called Multiview Latent Semantic Analysis (MVLSA). By
incorporating up to 46 different types of co-occurrence statistics for the same
vocabulary of english words, I show that MVLSA outperforms other
state-of-the-art word embedding models. Next, I focus on learning entity
representations for search and recommendation and present the second method of
this thesis, Neural Variational Set Expansion (NVSE). NVSE is also an
unsupervised learning method, but it is based on the Variational Autoencoder
framework. Evaluations with human annotators show that NVSE can facilitate
better search and recommendation of information gathered from noisy, automatic
annotation of unstructured natural language corpora. Finally, I move from
unstructured data and focus on structured knowledge graphs. I present novel
approaches for learning embeddings of vertices and edges in a knowledge graph
that obey logical constraints.Comment: phd thesis, Machine Learning, Natural Language Processing,
Representation Learning, Knowledge Graphs, Entities, Word Embeddings, Entity
Embedding
Trading Indistinguishability-based Privacy and Utility of Complex Data
The collection and processing of complex data, like structured data or infinite streams, facilitates novel applications. At the same time, it raises privacy requirements by the data owners. Consequently, data administrators use privacy-enhancing technologies (PETs) to sanitize the data, that are frequently based on indistinguishability-based privacy definitions. Upon engineering PETs, a well-known challenge is the privacy-utility trade-off. Although literature is aware of a couple of trade-offs, there are still combinations of involved entities, privacy definition, type of data and application, in which we miss valuable trade-offs.
In this thesis, for two important groups of applications processing complex data, we study (a) which indistinguishability-based privacy and utility requirements are relevant, (b) whether existing PETs solve the trade-off sufficiently, and (c) propose novel PETs extending the state-of-the-art substantially in terms of methodology, as well as achieved privacy or utility. Overall, we provide four contributions divided into two parts. In the first part, we study applications that analyze structured data with distance-based mining algorithms. We reveal that an essential utility requirement is the preservation of the pair-wise distances of the data items. Consequently, we propose distance-preserving encryption (DPE), together with a general procedure to engineer respective PETs by leveraging existing encryption schemes. As proof of concept, we apply it to SQL log mining, useful for database performance tuning. In the second part, we study applications that monitor query results over infinite streams. To this end, -event differential privacy is state-of-the-art. Here, PETs use mechanisms that typically add noise to query results. First, we study state-of-the-art mechanisms with respect to the utility they provide. Conducting the so far largest benchmark that fulfills requirements derived from limitations of prior experimental studies, we contribute new insights into the strengths and weaknesses of existing mechanisms. One of the most unexpected, yet explainable result, is a baseline supremacy. It states that one of the two baseline mechanisms delivers high or even the best utility. A natural follow-up question is whether baseline mechanisms already provide reasonable utility. So, second, we perform a case study from the area of electricity grid monitoring revealing two results. First, achieving reasonable utility is only possible under weak privacy requirements. Second, the utility measured with application-specific utility metrics decreases faster than the sanitization error, that is used as utility metric in most studies, suggests. As a third contribution, we propose a novel differential privacy-based privacy definition called Swellfish privacy. It allows tuning utility beyond incremental -event mechanism design by supporting time-dependent privacy requirements. Formally, as well as by experiments, we prove that it increases utility significantly. In total, our thesis contributes substantially to the research field, and reveals directions for future research
Natural Language Processing in-and-for Design Research
We review the scholarly contributions that utilise Natural Language
Processing (NLP) methods to support the design process. Using a heuristic
approach, we collected 223 articles published in 32 journals and within the
period 1991-present. We present state-of-the-art NLP in-and-for design research
by reviewing these articles according to the type of natural language text
sources: internal reports, design concepts, discourse transcripts, technical
publications, consumer opinions, and others. Upon summarizing and identifying
the gaps in these contributions, we utilise an existing design innovation
framework to identify the applications that are currently being supported by
NLP. We then propose a few methodological and theoretical directions for future
NLP in-and-for design research
Representation Learning for Words and Entities
This thesis presents new methods for unsupervised learning of distributed representations of words and entities from text and knowledge bases. The first algorithm presented in the thesis is a multi-view algorithm for learning representations of words called Multiview LSA (MVLSA). Through experiments on close to 50 different views, I show that MVLSA outperforms other state-of-the-art word embedding models. After that, I focus on learning entity representations for search and recommendation and present the second algorithm of this thesis called Neural Variational Set Expansion (NVSE). NVSE is also an unsupervised learning method, but it is based on the Variational Autoencoder framework. Evaluations with human annotators show that NVSE can facilitate better search and recommendation of information gathered from noisy, automatic annotation of unstructured natural language corpora. Finally, I move from unstructured data and focus on structured knowledge graphs. Moreover, I present novel approaches for learning embeddings of vertices and edges in a knowledge graph that obey logical constraints
Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval
Neural networks with deep architectures have demonstrated significant
performance improvements in computer vision, speech recognition, and natural
language processing. The challenges in information retrieval (IR), however, are
different from these other application areas. A common form of IR involves
ranking of documents--or short passages--in response to keyword-based queries.
Effective IR systems must deal with query-document vocabulary mismatch problem,
by modeling relationships between different query and document terms and how
they indicate relevance. Models should also consider lexical matches when the
query contains rare terms--such as a person's name or a product model
number--not seen during training, and to avoid retrieving semantically related
but irrelevant results. In many real-life IR tasks, the retrieval involves
extremely large collections--such as the document index of a commercial Web
search engine--containing billions of documents. Efficient IR methods should
take advantage of specialized IR data structures, such as inverted index, to
efficiently retrieve from large collections. Given an information need, the IR
system also mediates how much exposure an information artifact receives by
deciding whether it should be displayed, and where it should be positioned,
among other results. Exposure-aware IR systems may optimize for additional
objectives, besides relevance, such as parity of exposure for retrieved items
and content publishers. In this thesis, we present novel neural architectures
and methods motivated by the specific needs and challenges of IR tasks.Comment: PhD thesis, Univ College London (2020
An intelligent system for facility management
A software system has been developed that monitors and interprets temporally changing (internal) building environments and generates related knowledge that can assist in facility management (FM) decision making. The use of the multi agent paradigm renders a system that delivers demonstrable rationality and is robust within the dynamic environment that it operates. Agent behaviour directed at working toward goals is rendered intelligent with semantic web technologies. The capture of semantics though formal expression to model the environment, adds a richness that the agents exploit to intelligently determine behaviours to satisfy goals that are flexible and adaptable. The agent goals are to generate knowledge about building space usage as well as environmental conditions by elaborating and combining near real time sensor data and information from conventional building models. Additionally further inferences are facilitated including those about wasted resources such as unnecessary lighting and heating for example. In contrast, current FM tools, lacking automatic synchronisation with the domain and rich semantic modelling, are limited to the simpler querying of manually maintained models.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Identity Management and Authorization Infrastructure in Secure Mobile Access to Electronic Health Records
We live in an age of the mobile paradigm of anytime/anywhere access, as the mobile device
is the most ubiquitous device that people now hold. Due to their portability, availability, easy
of use, communication, access and sharing of information within various domains and areas of
our daily lives, the acceptance and adoption of these devices is still growing. However, due to
their potential and raising numbers, mobile devices are a growing target for attackers and, like
other technologies, mobile applications are still vulnerable.
Health information systems are composed with tools and software to collect, manage, analyze
and process medical information (such as electronic health records and personal health records).
Therefore, such systems can empower the performance and maintenance of health services,
promoting availability, readability, accessibility and data sharing of vital information about a
patients overall medical history, between geographic fragmented health services. Quick access
to information presents a great importance in the health sector, as it accelerates work processes,
resulting in better time utilization. Additionally, it may increase the quality of care.
However health information systems store and manage highly sensitive data, which raises serious
concerns regarding patients privacy and safety, and may explain the still increasing number
of malicious incidents reports within the health domain.
Data related to health information systems are highly sensitive and subject to severe legal
and regulatory restrictions, that aim to protect the individual rights and privacy of patients.
Along side with these legislations, security requirements must be analyzed and measures implemented.
Within the necessary security requirements to access health data, secure authentication,
identity management and access control are essential to provide adequate means to
protect data from unauthorized accesses. However, besides the use of simple authentication
models, traditional access control models are commonly based on predefined access policies
and roles, and are inflexible. This results in uniform access control decisions through people,
different type of devices, environments and situational conditions, and across enterprises, location
and time.
Although already existent models allow to ensure the needs of the health care systems, they still
lack components for dynamicity and privacy protection, which leads to not have desire levels
of security and to the patient not to have a full and easy control of his privacy. Within this
master thesis, after a deep research and review of the stat of art, was published a novel dynamic
access control model, Socio-Technical Risk-Adaptable Access Control modEl (SoTRAACE),
which can model the inherent differences and security requirements that are present in this
thesis. To do this, SoTRAACE aggregates attributes from various domains to help performing
a risk assessment at the moment of the request. The assessment of the risk factors identified
in this work is based in a Delphi Study. A set of security experts from various domains were
selected, to classify the impact in the risk assessment of each attribute that SoTRAACE aggregates.
SoTRAACE was integrated in an architecture with requirements well-founded, and based
in the best recommendations and standards (OWASP, NIST 800-53, NIST 800-57), as well based in
deep review of the state-of-art. The architecture is further targeted with the essential security
analysis and the threat model. As proof of concept, the proposed access control model was implemented within the user-centric
architecture, with two mobile prototypes for several types of accesses by patients and healthcare
professionals, as well the web servers that handles the access requests, authentication and
identity management.
The proof of concept shows that the model works as expected, with transparency, assuring privacy
and data control to the user without impact for user experience and interaction. It is clear
that the model can be extended to other industry domains, and new levels of risks or attributes
can be added because it is modular. The architecture also works as expected, assuring secure
authentication with multifactor, and secure data share/access based in SoTRAACE decisions.
The communication channel that SoTRAACE uses was also protected with a digital certificate.
At last, the architecture was tested within different Android versions, tested with static and
dynamic analysis and with tests with security tools.
Future work includes the integration of health data standards and evaluating the proposed system
by collecting users’ opinion after releasing the system to real world.Hoje em dia vivemos em um paradigma móvel de acesso em qualquer lugar/hora, sendo que
os dispositivos móveis são a tecnologia mais presente no dia a dia da sociedade. Devido à sua
portabilidade, disponibilidade, fácil manuseamento, poder de comunicação, acesso e partilha
de informação referentes a várias áreas e domínios das nossas vidas, a aceitação e integração
destes dispositivos é cada vez maior. No entanto, devido ao seu potencial e aumento do número
de utilizadores, os dispositivos móveis são cada vez mais alvos de ataques, e tal como outras
tecnologias, aplicações móveis continuam a ser vulneráveis.
Sistemas de informação de saúde são compostos por ferramentas e softwares que permitem
recolher, administrar, analisar e processar informação médica (tais como documentos de saúde
eletrónicos). Portanto, tais sistemas podem potencializar a performance e a manutenção dos
serviços de saúde, promovendo assim a disponibilidade, acessibilidade e a partilha de dados
vitais referentes ao registro médico geral dos pacientes, entre serviços e instituições que estão
geograficamente fragmentadas. O rápido acesso a informações médicas apresenta uma grande
importância para o setor da saúde, dado que acelera os processos de trabalho, resultando assim
numa melhor eficiência na utilização do tempo e recursos. Consequentemente haverá uma
melhor qualidade de tratamento. Porém os sistemas de informação de saúde armazenam e
manuseiam dados bastantes sensíveis, o que levanta sérias preocupações referentes à privacidade
e segurança do paciente. Assim se explica o aumento de incidentes maliciosos dentro do
domínio da saúde.
Os dados de saúde são altamente sensíveis e são sujeitos a severas leis e restrições regulamentares,
que pretendem assegurar a proteção dos direitos e privacidade dos pacientes, salvaguardando
os seus dados de saúde. Juntamente com estas legislações, requerimentos de segurança
devem ser analisados e medidas implementadas. Dentro dos requerimentos necessários
para aceder aos dados de saúde, uma autenticação segura, gestão de identidade e controlos de
acesso são essenciais para fornecer meios adequados para a proteção de dados contra acessos
não autorizados. No entanto, além do uso de modelos simples de autenticação, os modelos
tradicionais de controlo de acesso são normalmente baseados em políticas de acesso e cargos
pré-definidos, e são inflexíveis. Isto resulta em decisões de controlo de acesso uniformes para
diferentes pessoas, tipos de dispositivo, ambientes e condições situacionais, empresas, localizações
e diferentes alturas no tempo. Apesar dos modelos existentes permitirem assegurar
algumas necessidades dos sistemas de saúde, ainda há escassez de componentes para accesso
dinâmico e proteção de privacidade , o que resultam em níveis de segurança não satisfatórios e
em o paciente não ter controlo directo e total sobre a sua privacidade e documentos de saúde.
Dentro desta tese de mestrado, depois da investigação e revisão intensiva do estado da arte,
foi publicado um modelo inovador de controlo de acesso, chamado SoTRAACE, que molda as
diferenças de acesso inerentes e requerimentos de segurança presentes nesta tese. Para isto,
o SoTRAACE agrega atributos de vários ambientes e domínios que ajudam a executar uma avaliação
de riscos, no momento em que os dados são requisitados. A avaliação dos fatores de risco
identificados neste trabalho são baseados num estudo de Delphi. Um conjunto de peritos de
segurança de vários domínios industriais foram selecionados, para classificar o impacto de cada
atributo que o SoTRAACE agrega. O SoTRAACE foi integrado numa arquitectura para acesso a
dados médicos, com requerimentos bem fundados, baseados nas melhores normas e recomendações (OWASP, NIST 800-53, NIST 800-57), e em revisões intensivas do estado da arte. Esta
arquitectura é posteriormente alvo de uma análise de segurança e modelos de ataque.
Como prova deste conceito, o modelo de controlo de acesso proposto é implementado juntamente
com uma arquitetura focada no utilizador, com dois protótipos para aplicações móveis,
que providênciam vários tipos de acesso de pacientes e profissionais de saúde. A arquitetura é
constituída também por servidores web que tratam da gestão de dados, controlo de acesso e
autenticação e gestão de identidade. O resultado final mostra que o modelo funciona como esperado,
com transparência, assegurando a privacidade e o controlo de dados para o utilizador,
sem ter impacto na sua interação e experiência. Consequentemente este modelo pode-se extender
para outros setores industriais, e novos níveis de risco ou atributos podem ser adicionados
a este mesmo, por ser modular. A arquitetura também funciona como esperado, assegurando
uma autenticação segura com multi-fator, acesso e partilha de dados segura baseado em decisões
do SoTRAACE. O canal de comunicação que o SoTRAACE usa foi também protegido com
um certificado digital.
A arquitectura foi testada em diferentes versões de Android, e foi alvo de análise estática,
dinâmica e testes com ferramentas de segurança.
Para trabalho futuro está planeado a integração de normas de dados de saúde e a avaliação do
sistema proposto, através da recolha de opiniões de utilizadores no mundo real
Security-Driven Software Evolution Using A Model Driven Approach
High security level must be guaranteed in applications in order to mitigate risks during the deployment of information systems in open network environments. However, a significant number of legacy systems remain in use which poses security risks to the enterprise’ assets due to the poor technologies used and lack of security concerns when they were in design. Software reengineering is a way out to improve their security levels in a systematic way. Model driven is an approach in which model as defined by its type directs the execution of the process. The aim of this research is to explore how model driven approach can facilitate the software reengineering driven by security demand. The research in this thesis involves the following three phases.
Firstly, legacy system understanding is performed using reverse engineering techniques. Task of this phase is to reverse engineer legacy system into UML models, partition the legacy system into subsystems with the help of model slicing technique and detect existing security mechanisms to determine whether or not the provided security in the legacy system satisfies the user’s security objectives.
Secondly, security requirements are elicited using risk analysis method. It is the process of analysing key aspects of the legacy systems in terms of security. A new risk assessment method, taking consideration of asset, threat and vulnerability, is proposed and used to elicit the security requirements which will generate the detailed security requirements in the specific format to direct the subsequent security enhancement.
Finally, security enhancement for the system is performed using the proposed ontology based security pattern approach. It is the stage that security patterns derived from security expertise and fulfilling the elicited security requirements are selected and integrated in the legacy system models with the help of the proposed security ontology.
The proposed approach is evaluated by the selected case study. Based on the analysis, conclusions are drawn and future research is discussed at the end of this thesis. The results show this thesis contributes an effective, reusable and suitable evolution approach for software security
- …