16,408 research outputs found
Enabling Interactive Analytics of Secure Data using Cloud Kotta
Research, especially in the social sciences and humanities, is increasingly
reliant on the application of data science methods to analyze large amounts of
(often private) data. Secure data enclaves provide a solution for managing and
analyzing private data. However, such enclaves do not readily support discovery
science---a form of exploratory or interactive analysis by which researchers
execute a range of (sometimes large) analyses in an iterative and collaborative
manner. The batch computing model offered by many data enclaves is well suited
to executing large compute tasks; however it is far from ideal for day-to-day
discovery science. As researchers must submit jobs to queues and wait for
results, the high latencies inherent in queue-based, batch computing systems
hinder interactive analysis. In this paper we describe how we have augmented
the Cloud Kotta secure data enclave to support collaborative and interactive
analysis of sensitive data. Our model uses Jupyter notebooks as a flexible
analysis environment and Python language constructs to support the execution of
arbitrary functions on private data within this secure framework.Comment: To appear in Proceedings of Workshop on Scientific Cloud Computing,
Washington, DC USA, June 2017 (ScienceCloud 2017), 7 page
The medical science DMZ: a network design pattern for data-intensive medical science
Abstract:
Objective
We describe a detailed solution for maintaining high-capacity, data-intensive network flows (eg, 10, 40, 100 Gbps+) in a scientific, medical context while still adhering to security and privacy laws and regulations.
Materials and Methods
High-end networking, packet-filter firewalls, network intrusion-detection systems.
Results
We describe a “Medical Science DMZ” concept as an option for secure, high-volume transport of large, sensitive datasets between research institutions over national research networks, and give 3 detailed descriptions of implemented Medical Science DMZs.
Discussion
The exponentially increasing amounts of “omics” data, high-quality imaging, and other rapidly growing clinical datasets have resulted in the rise of biomedical research “Big Data.” The storage, analysis, and network resources required to process these data and integrate them into patient diagnoses and treatments have grown to scales that strain the capabilities of academic health centers. Some data are not generated locally and cannot be sustained locally, and shared data repositories such as those provided by the National Library of Medicine, the National Cancer Institute, and international partners such as the European Bioinformatics Institute are rapidly growing. The ability to store and compute using these data must therefore be addressed by a combination of local, national, and industry resources that exchange large datasets. Maintaining data-intensive flows that comply with the Health Insurance Portability and Accountability Act (HIPAA) and other regulations presents a new challenge for biomedical research. We describe a strategy that marries performance and security by borrowing from and redefining the concept of a Science DMZ, a framework that is used in physical sciences and engineering research to manage high-capacity data flows.
Conclusion
By implementing a Medical Science DMZ architecture, biomedical researchers can leverage the scale provided by high-performance computer and cloud storage facilities and national high-speed research networks while preserving privacy and meeting regulatory requirements
Digital curation and the cloud
Digital curation involves a wide range of activities, many of which could benefit from cloud
deployment to a greater or lesser extent. These range from infrequent, resource-intensive tasks
which benefit from the ability to rapidly provision resources to day-to-day collaborative activities
which can be facilitated by networked cloud services. Associated benefits are offset by risks
such as loss of data or service level, legal and governance incompatibilities and transfer
bottlenecks. There is considerable variability across both risks and benefits according to the
service and deployment models being adopted and the context in which activities are
performed. Some risks, such as legal liabilities, are mitigated by the use of alternative, e.g.,
private cloud models, but this is typically at the expense of benefits such as resource elasticity
and economies of scale. Infrastructure as a Service model may provide a basis on which more
specialised software services may be provided.
There is considerable work to be done in helping institutions understand the cloud and its
associated costs, risks and benefits, and how these compare to their current working methods,
in order that the most beneficial uses of cloud technologies may be identified. Specific
proposals, echoing recent work coordinated by EPSRC and JISC are the development of
advisory, costing and brokering services to facilitate appropriate cloud deployments, the
exploration of opportunities for certifying or accrediting cloud preservation providers, and
the targeted publicity of outputs from pilot studies to the full range of stakeholders within the
curation lifecycle, including data creators and owners, repositories, institutional IT support
professionals and senior manager
Densifying the sparse cloud SimSaaS: The need of a synergy among agent-directed simulation, SimSaaS and HLA
Modelling & Simulation (M&S) is broadly used in real scenarios where making
physical modifications could be highly expensive. With the so-called Simulation
Software-as-a-Service (SimSaaS), researchers could take advantage of the huge
amount of resource that cloud computing provides. Even so, studying and
analysing a problem through simulation may need several simulation tools, hence
raising interoperability issues. Having this in mind, IEEE developed a standard
for interoperability among simulators named High Level Architecture (HLA).
Moreover, the multi-agent system approach has become recognised as a convenient
approach for modelling and simulating complex systems. Despite all the recent
works and acceptance of these technologies, there is still a great lack of work
regarding synergies among them. This paper shows by means of a literature
review this lack of work or, in other words, the sparse Cloud SimSaaS. The
literature review and the resulting taxonomy are the main contributions of this
paper, as they provide a research agenda illustrating future research
opportunities and trends
- …