5,356 research outputs found
Rapid health data repository allocation using predictive machine learning
Health-related data is stored in a number of repositories that are managed and controlled by different entities. For instance, Electronic Health Records are usually administered by governments. Electronic Medical Records are typically controlled by health care providers, whereas Personal Health Records are managed directly by patients. Recently, Blockchain-based health record systems largely regulated by technology have emerged as another type of repository. Repositories for storing health data differ from one another based on cost, level of security and quality of performance. Not only has the type of repositories increased in recent years, but the quantum of health data to be stored has increased. For instance, the advent of wearable sensors that capture physiological signs has resulted in an exponential growth in digital health data. The increase in the types of repository and amount of data has driven a need for intelligent processes to select appropriate repositories as data is collected. However, the storage allocation decision is complex and nuanced. The challenges are exacerbated when health data are continuously streamed, as is the case with wearable sensors. Although patients are not always solely responsible for determining which repository should be used, they typically have some input into this decision. Patients can be expected to have idiosyncratic preferences regarding storage decisions depending on their unique contexts. In this paper, we propose a predictive model for the storage of health data that can meet patient needs and make storage decisions rapidly, in real-time, even with data streaming from wearable sensors. The model is built with a machine learning classifier that learns the mapping between characteristics of health data and features of storage repositories from a training set generated synthetically from correlations evident from small samples of experts. Results from the evaluation demonstrate the viability of the machine learning technique used. © The Author(s) 2020
Systematic Review on Privacy Categorization
In the modern digital world users need to make privacy and security choices
that have far-reaching consequences. Researchers are increasingly studying
people's decisions when facing with privacy and security trade-offs, the
pressing and time consuming disincentives that influence those decisions, and
methods to mitigate them. This work aims to present a systematic review of the
literature on privacy categorization, which has been defined in terms of
profile, profiling, segmentation, clustering and personae. Privacy
categorization involves the possibility to classify users according to specific
prerequisites, such as their ability to manage privacy issues, or in terms of
which type of and how many personal information they decide or do not decide to
disclose. Privacy categorization has been defined and used for different
purposes. The systematic review focuses on three main research questions that
investigate the study contexts, i.e. the motivations and research questions,
that propose privacy categorisations; the methodologies and results of privacy
categorisations; the evolution of privacy categorisations over time. Ultimately
it tries to provide an answer whether privacy categorization as a research
attempt is still meaningful and may have a future
SensorCloud: Towards the Interdisciplinary Development of a Trustworthy Platform for Globally Interconnected Sensors and Actuators
Although Cloud Computing promises to lower IT costs and increase users'
productivity in everyday life, the unattractive aspect of this new technology
is that the user no longer owns all the devices which process personal data. To
lower scepticism, the project SensorCloud investigates techniques to understand
and compensate these adoption barriers in a scenario consisting of cloud
applications that utilize sensors and actuators placed in private places. This
work provides an interdisciplinary overview of the social and technical core
research challenges for the trustworthy integration of sensor and actuator
devices with the Cloud Computing paradigm. Most importantly, these challenges
include i) ease of development, ii) security and privacy, and iii) social
dimensions of a cloud-based system which integrates into private life. When
these challenges are tackled in the development of future cloud systems, the
attractiveness of new use cases in a sensor-enabled world will considerably be
increased for users who currently do not trust the Cloud.Comment: 14 pages, 3 figures, published as technical report of the Department
of Computer Science of RWTH Aachen Universit
Intelligence at the Extreme Edge: A Survey on Reformable TinyML
The rapid miniaturization of Machine Learning (ML) for low powered processing
has opened gateways to provide cognition at the extreme edge (E.g., sensors and
actuators). Dubbed Tiny Machine Learning (TinyML), this upsurging research
field proposes to democratize the use of Machine Learning (ML) and Deep
Learning (DL) on frugal Microcontroller Units (MCUs). MCUs are highly
energy-efficient pervasive devices capable of operating with less than a few
Milliwatts of power. Nevertheless, many solutions assume that TinyML can only
run inference. Despite this, growing interest in TinyML has led to work that
makes them reformable, i.e., work that permits TinyML to improve once deployed.
In line with this, roadblocks in MCU based solutions in general, such as
reduced physical access and long deployment periods of MCUs, deem reformable
TinyML to play a significant part in more effective solutions. In this work, we
present a survey on reformable TinyML solutions with the proposal of a novel
taxonomy for ease of separation. Here, we also discuss the suitability of each
hierarchical layer in the taxonomy for allowing reformability. In addition to
these, we explore the workflow of TinyML and analyze the identified deployment
schemes and the scarcely available benchmarking tools. Furthermore, we discuss
how reformable TinyML can impact a few selected industrial areas and discuss
the challenges and future directions
FedCSD: A Federated Learning Based Approach for Code-Smell Detection
This paper proposes a Federated Learning Code Smell Detection (FedCSD)
approach that allows organizations to collaboratively train federated ML models
while preserving their data privacy. These assertions have been supported by
three experiments that have significantly leveraged three manually validated
datasets aimed at detecting and examining different code smell scenarios. In
experiment 1, which was concerned with a centralized training experiment,
dataset two achieved the lowest accuracy (92.30%) with fewer smells, while
datasets one and three achieved the highest accuracy with a slight difference
(98.90% and 99.5%, respectively). This was followed by experiment 2, which was
concerned with cross-evaluation, where each ML model was trained using one
dataset, which was then evaluated over the other two datasets. Results from
this experiment show a significant drop in the model's accuracy (lowest
accuracy: 63.80\%) where fewer smells exist in the training dataset, which has
a noticeable reflection (technical debt) on the model's performance. Finally,
the last and third experiments evaluate our approach by splitting the dataset
into 10 companies. The ML model was trained on the company's site, then all
model-updated weights were transferred to the server. Ultimately, an accuracy
of 98.34% was achieved by the global model that has been trained using 10
companies for 100 training rounds. The results reveal a slight difference in
the global model's accuracy compared to the highest accuracy of the centralized
model, which can be ignored in favour of the global model's comprehensive
knowledge, lower training cost, preservation of data privacy, and avoidance of
the technical debt problem.Comment: 17 pages, 7 figures, Journal pape
A Survey of Symbolic Execution Techniques
Many security and software testing applications require checking whether
certain properties of a program hold for any possible usage scenario. For
instance, a tool for identifying software vulnerabilities may need to rule out
the existence of any backdoor to bypass a program's authentication. One
approach would be to test the program using different, possibly random inputs.
As the backdoor may only be hit for very specific program workloads, automated
exploration of the space of possible inputs is of the essence. Symbolic
execution provides an elegant solution to the problem, by systematically
exploring many possible execution paths at the same time without necessarily
requiring concrete inputs. Rather than taking on fully specified input values,
the technique abstractly represents them as symbols, resorting to constraint
solvers to construct actual instances that would cause property violations.
Symbolic execution has been incubated in dozens of tools developed over the
last four decades, leading to major practical breakthroughs in a number of
prominent software reliability applications. The goal of this survey is to
provide an overview of the main ideas, challenges, and solutions developed in
the area, distilling them for a broad audience.
The present survey has been accepted for publication at ACM Computing
Surveys. If you are considering citing this survey, we would appreciate if you
could use the following BibTeX entry: http://goo.gl/Hf5FvcComment: This is the authors pre-print copy. If you are considering citing
this survey, we would appreciate if you could use the following BibTeX entry:
http://goo.gl/Hf5Fv
Cloud-based homomorphic encryption for privacy-preserving machine learning in clinical decision support
While privacy and security concerns dominate public cloud services, Homomorphic Encryption (HE) is seen as an emerging solution that ensures secure processing of sensitive data via untrusted networks in the public cloud or by third-party cloud vendors. It relies on the fact that some encryption algorithms display the property of homomorphism, which allows them to manipulate data meaningfully while still in encrypted form; although there are major stumbling blocks to overcome before the technology is considered mature for production cloud environments. Such a framework would find particular relevance in Clinical Decision Support (CDS) applications deployed in the public cloud. CDS applications have an important computational and analytical role over confidential healthcare information with the aim of supporting decision-making in clinical practice. Machine Learning (ML) is employed in CDS applications that typically learn and can personalise actions based on individual behaviour. A relatively simple-to-implement, common and consistent framework is sought that can overcome most limitations of Fully Homomorphic Encryption (FHE) in order to offer an expanded and flexible set of HE capabilities. In the absence of a significant breakthrough in FHE efficiency and practical use, it would appear that a solution relying on client interactions is the best known entity for meeting the requirements of private CDS-based computation, so long as security is not significantly compromised. A hybrid solution is introduced, that intersperses limited two-party interactions amongst the main homomorphic computations, allowing exchange of both numerical and logical cryptographic contexts in addition to resolving other major FHE limitations. Interactions involve the use of client-based ciphertext decryptions blinded by data obfuscation techniques, to maintain privacy. This thesis explores the middle ground whereby HE schemes can provide improved and efficient arbitrary computational functionality over a significantly reduced two-party network interaction model involving data obfuscation techniques. This compromise allows for the powerful capabilities of HE to be leveraged, providing a more uniform, flexible and general approach to privacy-preserving system integration, which is suitable for cloud deployment. The proposed platform is uniquely designed to make HE more practical for mainstream clinical application use, equipped with a rich set of capabilities and potentially very complex depth of HE operations. Such a solution would be suitable for the long-term privacy preserving-processing requirements of a cloud-based CDS system, which would typically require complex combinatorial logic, workflow and ML capabilities
- …