505 research outputs found
Secure storage systems for untrusted cloud environments
The cloud has become established for applications that need to be scalable and highly
available. However, moving data to data centers owned and operated by a third party,
i.e., the cloud provider, raises security concerns because a cloud provider could easily
access and manipulate the data or program flow, preventing the cloud from being
used for certain applications, like medical or financial.
Hardware vendors are addressing these concerns by developing Trusted Execution
Environments (TEEs) that make the CPU state and parts of memory inaccessible from
the host software. While TEEs protect the current execution state, they do not provide
security guarantees for data which does not fit nor reside in the protected memory
area, like network and persistent storage.
In this work, we aim to address TEEs’ limitations in three different ways, first we
provide the trust of TEEs to persistent storage, second we extend the trust to multiple
nodes in a network, and third we propose a compiler-based solution for accessing
heterogeneous memory regions. More specifically,
• SPEICHER extends the trust provided by TEEs to persistent storage. SPEICHER
implements a key-value interface. Its design is based on LSM data structures, but
extends them to provide confidentiality, integrity, and freshness for the stored
data. Thus, SPEICHER can prove to the client that the data has not been tampered
with by an attacker.
• AVOCADO is a distributed in-memory key-value store (KVS) that extends the
trust that TEEs provide across the network to multiple nodes, allowing KVSs to
scale beyond the boundaries of a single node. On each node, AVOCADO carefully
divides data between trusted memory and untrusted host memory, to maximize
the amount of data that can be stored on each node. AVOCADO leverages the
fact that we can model network attacks as crash-faults to trust other nodes with
a hardened ABD replication protocol.
• TOAST is based on the observation that modern high-performance systems
often use several different heterogeneous memory regions that are not easily
distinguishable by the programmer. The number of regions is increased by the
fact that TEEs divide memory into trusted and untrusted regions. TOAST is a
compiler-based approach to unify access to different heterogeneous memory
regions and provides programmability and portability. TOAST uses a
load/store interface to abstract most library interfaces for different memory
regions
ONLINE INTERACTIVE TOOL FOR LEARNING LOGIC
This dissertation presents the design and implementation of an online platform for solving
logic exercises, aimed at complementing theoretical classes for students of logicrelated
courses at the University of Nova Lisbon. The platform is integrated with a
Learning Management System (LMS) using the LTI protocol, allowing instructors to
grade students’ work.
We provide an overview of related literature and detailed explanations of each component
of the platform, including the design of logic exercises and their integration with
the LMS. Additionally, we discuss the challenges and difficulties faced during the development
process.
The main contributions of this work are the platform itself, a guide on integrating an
external tool with LTI, and the implementation of the tool with the LTI learning platform.
Our results and evaluations show that the platform is effective for enhancing online
learning experiences and improving assessment methods.
In conclusion, this dissertation provides a valuable resource for educational institutions
seeking to improve their online learning offerings and assessment practices.Esta dissertação apresenta o design e a implementação de uma plataforma online para
resolver exercÃcios de lógica, com o objetivo de complementar as aulas teóricas para estudantes
de cursos relacionados à lógica na Universidade de Nova Lisboa. A plataforma
está integrada a um Sistema de Gestão de Aprendizagem (SGA) usando o protocolo LTI,
permitindo que os instrutores avaliem o trabalho de seus alunos.
Oferecemos uma visão geral da literatura relacionada e explicações detalhadas de cada
componente da plataforma, incluindo o design dos exercÃcios de lógica e sua integração
com o SGA. Além disso, discutimos os desafios e dificuldades enfrentados durante o
processo de desenvolvimento.
As principais contribuições deste trabalho são a própria plataforma, um guia sobre
a integração de uma ferramenta externa com o LTI e a implementação da ferramenta na
plataforma de aprendizagem LTI.
Em conclusão, esta dissertação fornece um recurso valioso para as instituições educacionais
que buscam melhorar suas ofertas de aprendizagem online e práticas de avaliação
Recommended from our members
Complaint Driven Training Data Debugging for Machine Learning Workflows
As the need for machine learning (ML) increases rapidly across all industry sectors, so has theinterest in building ML platforms that manage and automate parts of the ML life-cycle. This has enabled companies to use ML inference as a part of their downstream analytics or their applications. Unfortunately, debugging unexpected outcomes in the result of these ML workflows remains a necessary but difficult task of the ML life-cycle. The challenge of debugging ML workflows is that it requires reasoning about the correctness of the workflow logic, the datasets used for inference and training, the models, and interactions between them. Even if the workflow logic is correct, errors in the data used across the ML workflow can still lead to wrong outcomes. In short, developers are not just debugging the code, but also the data.
We advocate in favor of a complaint driven approach towards specifying and debugging data errors in ML workflows. The approach takes as input user specified complaints specified as constraints over the final or intermediate outputs of workflows that use trained ML models. The approach outputs explanations in the form of specific operator(s) or data subsets, and how they may be changed to address the constraint violations.
In this thesis we make the first steps towards our complaint driven approach to data debugging. As a stepping stone, we focus our attention on complaints specified on top of relational workflows that use ML model inference and whose errors are caused by errors in ML model’s training data. To the best of our knowledge, we contribute the first debugging system for this task, which we call Rain. In response to a user complaint, Rain ranks the ML model’s training examples based on their ability to address the user’s complaint if they were removed. Our experiments show that users can use Rain to debug training data errors by specifying complaints over aggregations of model predictions without having to specify the correct label for each individual prediction.
Unfortunately, Rain’s latency may be prohibitive for use in interactive applications like analytical dashboards or business intelligence tools where users are likely to observe errors and complain. To address Rain’s latency problem when scaling to large ML models and training sets, we propose Rain++. Rain++ pushes the majority of Rain’s computation offline ahead of user interaction, achieving orders of magnitude online latency improvements compared to Rain.
To go beyond Rain’s and Rain++’s approach that evaluates individual training example deletionsindependently we propose MetaRain, a framework for training classifiers that detect training data corruptions in response to user complaints. Thanks to the generality of MetaRain, users can adapt the classifiers chosen to the training corruptions and the complaints they seek to resolve. Our experiments indicate that making use of this ability results in improved debugging outcomes.
Last but not least, we study the problem of updating relational workflow results in response tochanges to the inference ML model used. This can be leveraged by current or future complaint driven debugging systems that repeatedly change the model and reevaluate the relational workflow. We propose FaDE, a compiler that generates efficient code for the workflow update problem by casting it as view maintenance under input tuple deletions. Our experiments indicate that the code generated by FaDE has orders of magnitude lower latency than existing view maintenance systems
Remote health monitoring systems for elderly people: a survey
This paper addresses the growing demand for healthcare systems, particularly among the elderly population. The need for these systems arises from the desire to enable patients and seniors to live independently in their homes without relying heavily on their families or caretakers. To achieve substantial improvements in healthcare, it is essential to ensure the continuous development and availability of information technologies tailored explicitly for patients and elderly individuals. The primary objective of this study is to comprehensively review the latest remote health monitoring systems, with a specific focus on those designed for older adults. To facilitate a comprehensive understanding, we categorize these remote monitoring systems and provide an overview of their general architectures. Additionally, we emphasize the standards utilized in their development and highlight the challenges encountered throughout the developmental processes. Moreover, this paper identifies several potential areas for future research, which promise further advancements in remote health monitoring systems. Addressing these research gaps can drive progress and innovation, ultimately enhancing the quality of healthcare services available to elderly individuals. This, in turn, empowers them to lead more independent and fulfilling lives while enjoying the comforts and familiarity of their own homes. By acknowledging the importance of healthcare systems for the elderly and recognizing the role of information technologies, we can address the evolving needs of this population. Through ongoing research and development, we can continue to enhance remote health monitoring systems, ensuring they remain effective, efficient, and responsive to the unique requirements of elderly individuals
Enriching information extraction pipelines in clinical decision support systems
Programa Oficial de Doutoramento en TecnoloxÃas da Información e as Comunicacións. 5032V01[Resumo] Os estudos sanitarios de múltiples centros son importantes para aumentar a repercusión dos resultados da investigación médica debido ao número de suxeitos que poden participar neles. Para simplificar a execución destes estudos, o proceso de intercambio de datos deberÃa ser sinxelo, por exemplo, mediante o uso de bases de datos interoperables. Con todo, a consecución desta interoperabilidade segue sendo
un tema de investigación en curso, sobre todo debido aos problemas de gobernanza e privacidade dos datos. Na primeira fase deste traballo, propoñemos varias metodoloxÃas para optimizar os procesos de estandarización das bases de datos sanitarias. Este
traballo centrouse na estandarización de fontes de datos heteroxéneas nun esquema de datos estándar, concretamente o OMOP CDM, que foi desenvolvido e promovido
pola comunidade OHDSI. Validamos a nosa proposta utilizando conxuntos de datos de pacientes con enfermidade de Alzheimer procedentes de distintas institucións.
Na seguinte etapa, co obxectivo de enriquecer a información almacenada nas bases de datos de OMOP CDM, investigamos solucións para extraer conceptos clÃnicos de narrativas non estruturadas, utilizando técnicas de recuperación de información e
de procesamento da linguaxe natural. A validación realizouse a través de conxuntos de datos proporcionados en desafÃos cientÃficos, concretamente no National NLP Clinical Challenges(n2c2). Na etapa final, propuxémonos simplificar a execución de
protocolos de estudos provenientes de múltiples centros, propoñendo solucións novas para perfilar, publicar e facilitar o descubrimento de bases de datos. Algunhas das solucións desenvolvidas están a utilizarse actualmente en tres proxectos europeos
destinados a crear redes federadas de bases de datos de saúde en toda Europa.[Resumen] Los estudios sanitarios de múltiples centros son importantes para aumentar la repercusión de los resultados de la investigación médica debido al número de sujetos que pueden participar en ellos. Para simplificar la ejecución de estos estudios, el proceso de intercambio de datos deberÃa ser sencillo, por ejemplo, mediante el uso de bases de datos interoperables. Sin embargo, la consecución de esta interoperabilidad
sigue siendo un tema de investigación en curso, sobre todo debido a los problemas de gobernanza y privacidad de los datos. En la primera fase de este trabajo, proponemos varias metodologÃas para optimizar los procesos de estandarización de las
bases de datos sanitarias. Este trabajo se centró en la estandarización de fuentes de datos heterogéneas en un esquema de datos estándar, concretamente el OMOP CDM, que ha sido desarrollado y promovido por la comunidad OHDSI. Validamos nuestra propuesta utilizando conjuntos de datos de pacientes con enfermedad de Alzheimer procedentes de distintas instituciones. En la siguiente etapa, con el objetivo de enriquecer la información almacenada en las bases de datos de OMOP CDM, hemos investigado soluciones para extraer conceptos clÃnicos de narrativas no estructuradas, utilizando técnicas de recuperación de información y de procesamiento del lenguaje natural. La validación se realizó a través de conjuntos de datos proporcionados en desafÃos cientÃficos, concretamente en el National NLP Clinical Challenges (n2c2). En la etapa final, nos propusimos simplificar la ejecución de protocolos de estudios provenientes de múltiples centros, proponiendo soluciones novedosas para perfilar, publicar y facilitar el descubrimiento de bases de datos. Algunas de las soluciones desarrolladas se están utilizando actualmente en tres proyectos europeos destinados a crear redes federadas de bases de datos de salud en toda Europa.[Abstract] Multicentre health studies are important to increase the impact of medical research
findings due to the number of subjects that they are able to engage. To simplify the execution of these studies, the data-sharing process should be effortless, for instance, through the use of interoperable databases. However, achieving this interoperability is still an ongoing research topic, namely due to data governance and privacy issues. In the first stage of this work, we propose several methodologies to optimise the harmonisation pipelines of health databases. This work was focused on harmonising heterogeneous data sources into a standard data schema, namely the OMOP CDM which has been developed and promoted by the OHDSI community. We validated our proposal using data sets of Alzheimer’s disease patients from distinct institutions. In the following stage, aiming to enrich the information stored in OMOP CDM databases, we have investigated solutions to extract clinical concepts from unstructured narratives, using information retrieval and natural language processing
techniques. The validation was performed through datasets provided in scientific challenges, namely in the National NLP Clinical Challenges (n2c2). In the final stage, we aimed to simplify the protocol execution of multicentre studies, by proposing novel solutions for profiling, publishing and facilitating the discovery of databases. Some of the developed solutions are currently being used in three European projects
aiming to create federated networks of health databases across Europe
Personal Data Stores (PDS): A Review
Internet services have collected our personal data since their inception. In the beginning, the personal data collection was uncoordinated and was limited to a few selected data types such as names, ages, birthdays, etc. Due to the widespread use of social media, more and more personal data has been collected by different online services. We increasingly see that Internet of Things (IoT) devices are also being adopted by consumers, making it possible for companies to capture personal data (including very sensitive data) with much less effort and autonomously at a very low cost. Current systems architectures aim to collect, store, and process our personal data in the cloud with very limited control when it comes to giving back to citizens. However, Personal Data Stores (PDS) have been proposed as an alternative architecture where personal data will be stored within households, giving us complete control (self-sovereignty) over our data. This paper surveys the current literature on Personal Data Stores (PDS) that enable individuals to collect, control, store, and manage their data. In particular, we provide a comprehensive review of related concepts and the expected benefits of PDS platforms. Further, we compare and analyse existing PDS platforms in terms of their capabilities and core components. Subsequently, we summarise the major challenges and issues facing PDS platforms’ development and widespread adoption
A deep learning palpebral fissure segmentation model in the context of computer user monitoring
The intense use of computers and visual terminals is a daily practice for many people. As a consequence, there are frequent complaints of visual and non-visual symptoms, such as headaches and neck pain. These symptoms make up Computer Vision Syndrome and among the factors related to this syndrome are: the distance between the user and the screen, the number of hours of use of the equipment and the reduction in the blink rate, and also the number of incomplete blinks while using the device. Although some of these items can be controlled by ergonomic measures, controlling blinks and their efficiency is more complex. A considerable number of studies have looked at measuring blinks, but few have dealt with the presence of incomplete blinks. Conventional measurement techniques have limitations when it comes to detecting and analyzing the completeness of blinks, especially due to the different eye and blink characteristics of individuals, as well as the position and movement of the user. Segmenting the palpebral fissure can be a first step towards solving this problem, by characterizing individuals well regardless of these factors. This work investigates with the development of Deep Learning models to perform palpebral fissure segmentation in situations where the eyes cover a small region of the images, such as images from a computer webcam. The segmentation of the palpebral fissure can be a first step in solving this problem, characterizing individuals well regardless of these factors. Training, validation and test sets were generated based on the CelebAMask-HQ and Closed Eyes in the Wild datasets. Various machine learning techniques are used, resulting in a final trained model with a Dice Coefficient metric close to 0.90 for the test data, a result similar to that obtained by models trained with images in which the eye region occupies most of the image.A utilização intensa de computadores e terminais visuais é algo cotidiano para muitas pessoas. Como consequência, queixas com sintomas visuais e não visuais, como dores de cabeça e no pescoço, são frequentes. Esses sintomas compõem a SÃndrome da visão de computador e entre os fatores relacionados a essa sÃndrome estão: a distância entre o usuário e a tela, o número de horas de uso do equipamento e a redução da taxa de piscadas, e, também, o número de piscadas incompletas, durante a utilização do dispositivo. Ainda que alguns desses itens possam ser controlados por medidas ergonômicas, o controle das piscadas e a eficiência dessas é mais complexo. Um número considerável de estudos abordou a medição de piscadas, porém, poucos trataram da presença de piscadas incompletas. As técnicas convencionais de medição apresentam limitações para detecção e análise completeza das piscadas, em especial devido as diferentes caracterÃsticas de olhos e de piscadas dos indivÃduos, e ainda, pela posição e movimentação do usuário. A segmentação da fissura palpebral pode ser um primeiro passo na resolução desse problema, caracterizando bem os indivÃduos independentemente desses fatores. Este trabalho aborda o desenvolvimento de modelos de Deep Learning para realizar a segmentação de fissura palpebral em situações em que os olhos cobrem uma região pequena das imagens, como são as imagens de uma webcam de computador. Foram gerados conjuntos de treinamento, validação e teste com base nos conjuntos de dados CelebAMask-HQ e Closed Eyes in the Wild. São utilizadas diversas técnicas de aprendizado de máquina, resultando em um modelo final treinado com uma métrica Coeficiente Dice próxima a 0,90 para os dados de teste, resultado similar ao obtido por modelos treinados com imagens nas quais a região dos olhos ocupa a maior parte da imagem
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
SoK: Distributed Computing in ICN
Information-Centric Networking (ICN), with its data-oriented operation and
generally more powerful forwarding layer, provides an attractive platform for
distributed computing. This paper provides a systematic overview and
categorization of different distributed computing approaches in ICN
encompassing fundamental design principles, frameworks and orchestration,
protocols, enablers, and applications. We discuss current pain points in legacy
distributed computing, attractive ICN features, and how different systems use
them. This paper also provides a discussion of potential future work for
distributed computing in ICN.Comment: 10 pages, 3 figures, 1 table. Accepted by ACM ICN 202
- …