7,447 research outputs found
Towards Fast and Scalable Private Inference
Privacy and security have rapidly emerged as first order design constraints.
Users now demand more protection over who can see their data (confidentiality)
as well as how it is used (control). Here, existing cryptographic techniques
for security fall short: they secure data when stored or communicated but must
decrypt it for computation. Fortunately, a new paradigm of computing exists,
which we refer to as privacy-preserving computation (PPC). Emerging PPC
technologies can be leveraged for secure outsourced computation or to enable
two parties to compute without revealing either users' secret data. Despite
their phenomenal potential to revolutionize user protection in the digital age,
the realization has been limited due to exorbitant computational,
communication, and storage overheads.
This paper reviews recent efforts on addressing various PPC overheads using
private inference (PI) in neural network as a motivating application. First,
the problem and various technologies, including homomorphic encryption (HE),
secret sharing (SS), garbled circuits (GCs), and oblivious transfer (OT), are
introduced. Next, a characterization of their overheads when used to implement
PI is covered. The characterization motivates the need for both GCs and HE
accelerators. Then two solutions are presented: HAAC for accelerating GCs and
RPU for accelerating HE. To conclude, results and effects are shown with a
discussion on what future work is needed to overcome the remaining overheads of
PI.Comment: Appear in the 20th ACM International Conference on Computing
Frontier
A conceptual framework for developing dashboards for big mobility data
Dashboards are an increasingly popular form of data visualization. Large, complex, and dynamic mobility data present a number of challenges in dashboard design. The overall aim for dashboard design is to improve information communication and decision making, though big mobility data in particular require considering privacy alongside size and complexity. Taking these issues into account, a gap remains between wrangling mobility data and developing meaningful dashboard output. Therefore, there is a need for a framework that bridges this gap to support the mobility dashboard development and design process. In this paper we outline a conceptual framework for mobility data dashboards that provides guidance for the development process while considering mobility data structure, volume, complexity, varied application contexts, and privacy constraints. We illustrate the proposed framework’s components and process using example mobility dashboards with varied inputs, end-users and objectives. Overall, the framework offers a basis for developers to understand how informational displays of big mobility data are determined by end-user needs as well as the types of data selection, transformation, and display available to particular mobility datasets
Interactive visualizations of unstructured oceanographic data
The newly founded company Oceanbox is creating a novel oceanographic forecasting system to provide oceanography as a service. These services use mathematical models that generate large hydrodynamic data sets as unstructured triangular grids with high-resolution model areas. Oceanbox makes the model results accessible in a web application. New visualizations are needed to accommodate land-masking and large data volumes.
In this thesis, we propose using a k-d tree to spatially partition unstructured triangular grids to provide the look-up times needed for interactive visualizations. A k-d tree is implemented in F# called FsKDTree. This thesis also describes the implementation of dynamic tiling map layers to visualize current barbs, scalar fields, and particle streams. The current barb layer queries data from the data server with the help of the k-d tree and displays it in the browser. Scalar fields and particle streams are implemented using WebGL, which enables the rendering of triangular grids. Stream particle visualization effects are implemented as velocity advection computed on the GPU with textures.
The new visualizations are used in Oceanbox's production systems, and spatial indexing has been integrated into Oceanbox's archive retrieval system. FsKDTree improves tree creation times by up to 4x over the C# equivalent and improves search times by up to 13x compared to the .NET C# implementation. Finally, the largest model areas can be viewed with current barbs, scalar fields, and particle stream visualizations at 60 FPS, even for the largest model areas provided by the service
Privacy-preserving patient clustering for personalized federated learning
Federated Learning (FL) is a machine learning framework that enables multiple
organizations to train a model without sharing their data with a central
server. However, it experiences significant performance degradation if the data
is non-identically independently distributed (non-IID). This is a problem in
medical settings, where variations in the patient population contribute
significantly to distribution differences across hospitals. Personalized FL
addresses this issue by accounting for site-specific distribution differences.
Clustered FL, a Personalized FL variant, was used to address this problem by
clustering patients into groups across hospitals and training separate models
on each group. However, privacy concerns remained as a challenge as the
clustering process requires exchange of patient-level information. This was
previously solved by forming clusters using aggregated data, which led to
inaccurate groups and performance degradation. In this study, we propose
Privacy-preserving Community-Based Federated machine Learning (PCBFL), a novel
Clustered FL framework that can cluster patients using patient-level data while
protecting privacy. PCBFL uses Secure Multiparty Computation, a cryptographic
technique, to securely calculate patient-level similarity scores across
hospitals. We then evaluate PCBFL by training a federated mortality prediction
model using 20 sites from the eICU dataset. We compare the performance gain
from PCBFL against traditional and existing Clustered FL frameworks. Our
results show that PCBFL successfully forms clinically meaningful cohorts of
low, medium, and high-risk patients. PCBFL outperforms traditional and existing
Clustered FL frameworks with an average AUC improvement of 4.3% and AUPRC
improvement of 7.8%
Enhancing Query Processing on Stock Market Cloud-based Database
Cloud computing is rapidly expanding because it allows users to save the development and implementation time on their work. It also reduces the maintenance and operational costs of the used systems. Furthermore, it enables the elastic use of any resource rather than estimating workload, which may be inaccurate, as database systems can benefit from such a trend. In this paper, we propose an algorithm that allocates the materialized view over cloud-based replica sets to enhance the database system\u27s performance in stock market using a Peer-to-Peer architecture. The results show that the proposed model improves the query processing time and network transfer cost by distributing the materialized views over cloud-based replica sets. Also, it has a significant effect on decision-making and achieving economic returns
Towards Scalable Real-time Analytics:: An Architecture for Scale-out of OLxP Workloads
We present an overview of our work on the SAP HANA Scale-out Extension, a novel distributed database architecture designed to support large scale analytics over real-time data. This platform permits high performance OLAP with massive scale-out capabilities, while concurrently allowing OLTP workloads. This dual capability enables analytics over real-time changing data and allows fine grained user-specified service level agreements (SLAs) on data freshness. We advocate the decoupling of core database components such as query processing, concurrency control, and persistence, a design choice made possible by advances in high-throughput low-latency networks and storage devices. We provide full ACID guarantees and build on a logical timestamp mechanism to provide MVCC-based snapshot isolation, while not requiring synchronous updates of replicas. Instead, we use asynchronous update propagation guaranteeing consistency with timestamp validation. We provide a view into the design and development of a large scale data management platform for real-time analytics, driven by the needs of modern enterprise customers
Specificity of the innate immune responses to different classes of non-tuberculous mycobacteria
Mycobacterium avium is the most common nontuberculous mycobacterium (NTM) species causing infectious disease. Here, we characterized a M. avium infection model in zebrafish larvae, and compared it to M. marinum infection, a model of tuberculosis. M. avium bacteria are efficiently phagocytosed and frequently induce granuloma-like structures in zebrafish larvae. Although macrophages can respond to both mycobacterial infections, their migration speed is faster in infections caused by M. marinum. Tlr2 is conservatively involved in most aspects of the defense against both mycobacterial infections. However, Tlr2 has a function in the migration speed of macrophages and neutrophils to infection sites with M. marinum that is not observed with M. avium. Using RNAseq analysis, we found a distinct transcriptome response in cytokine-cytokine receptor interaction for M. avium and M. marinum infection. In addition, we found differences in gene expression in metabolic pathways, phagosome formation, matrix remodeling, and apoptosis in response to these mycobacterial infections. In conclusion, we characterized a new M. avium infection model in zebrafish that can be further used in studying pathological mechanisms for NTM-caused diseases
Multitenant Containers as a Service (CaaS) for Clouds and Edge Clouds
Cloud computing, offering on-demand access to computing resources through the
Internet and the pay-as-you-go model, has marked the last decade with its three
main service models; Infrastructure as a Service (IaaS), Platform as a Service
(PaaS), and Software as a Service (SaaS). The lightweight nature of containers
compared to virtual machines has led to the rapid uptake of another in recent
years, called Containers as a Service (CaaS), which falls between IaaS and PaaS
regarding control abstraction. However, when CaaS is offered to multiple
independent users, or tenants, a multi-instance approach is used, in which each
tenant receives its own separate cluster, which reimposes significant overhead
due to employing virtual machines for isolation. If CaaS is to be offered not
just at the cloud, but also at the edge cloud, where resources are limited,
another solution is required. We introduce a native CaaS multitenancy
framework, meaning that tenants share a cluster, which is more efficient than
the one tenant per cluster model. Whenever there are shared resources,
isolation of multitenant workloads is an issue. Such workloads can be isolated
by Kata Containers today. Besides, our framework esteems the application
requirements that compel complete isolation and a fully customized environment.
Node-level slicing empowers tenants to programmatically reserve isolated
subclusters where they can choose the container runtime that suits application
needs. The framework is publicly available as liberally-licensed, free,
open-source software that extends Kubernetes, the de facto standard container
orchestration system. It is in production use within the EdgeNet testbed for
researchers
Using Crowd-Based Software Repositories to Better Understand Developer-User Interactions
Software development is a complex process. To serve the final software product to the end user, developers need to rely on a variety of software artifacts throughout the development process. The term software repository used to denote only containers of source code such as version control systems; more recent usage has generalized the concept to include a plethora of software development artifact kinds and their related meta-data.
Broadly speaking, software repositories include version control systems, technical documentation, issue trackers, question and answer sites, distribution information, etc. The software repositories can be based on a specific project (e.g., bug tracker for Firefox), or be crowd-sourced (e.g., questions and answers on technical Q&A websites). Crowd-based software artifacts are created as by-products of developer-user interactions which are sometimes referred to as communication channels. In this thesis, we investigate three distinct crowd-based software repositories that follow different models of developer-user interactions. We believe through a better understanding of the crowd-based software repositories, we can identify challenges in software development and provide insights to improve the software development process.
In our first study, we investigate Stack Overflow. It is the largest collection of programming related questions and answers. On Stack Overflow, developers interact with other developers to create crowd-sourced knowledge in the form of questions and answers. The results of the interactions (i.e., the question threads) become valuable information to the entire developer community. Prior research on Stack Overflow tacitly assume that questions receives answers directly on the platform and no need of interaction is required during the process. Meanwhile, the platform allows attaching comments to questions which forms discussions of the question. Our study found that question discussions occur for 59.2% of questions on Stack Overflow. For discussed and solved questions on Stack Overflow, 80.6% of the questions have the discussion begin before the accepted answer is submitted. The results of our study show the importance and nuances of interactions in technical Q&A.
We then study dotfiles, a set of publicly shared user-specific configuration files for software tools. There is a culture of sharing dotfiles within the developer community, where the idea is to learn from other developers’ dotfiles and share your variants. The interaction of dotfiles sharing can be viewed as developers sources information from other developers, adapt the information to their own needs, and share their adaptations back to the community. Our study on dotfiles suggests that is a common practice among developers to share dotfiles where 25.8% of the most stared users on GitHub have a dotfiles repository. We provide a taxonomy of the commonly tracked dotfiles and a qualitative study on the commits in dotfiles repositories. We also leveraged the state-of-the-art time-series clustering technique (K-shape) to identify code churn pattern for dotfile edits. This study is the first step towards understanding the practices of maintaining and sharing dotfiles.
Finally, we study app stores, the platforms that distribute software products and contain many non-technical attributes (e.g., ratings and reviews) of software products. Three major stakeholders interacts with each other in app stores: the app store owner who governs the operation of the app store; developers who publish applications on the app store; and users who browse and download applications in the app store. App stores often provide means of interaction between all three actors (e.g., app reviews, store policy) and sometimes interactions with in the same actor (e.g., developer forum). We surveyed existing app stores to extract key features from app store operation. We then labeled a representative set of app store collected by web queries. K-means is applied to the labeled app stores to detect natural groupings of app stores. We observed a diverse set of app stores through the process. Instead of a single model that describes all app stores, fundamentally, our observations show that app stores operates differently. This study provide insights in understanding how app stores can affect software development.
In summary, we investigated software repositories containing software artifacts created from different developer-user interactions. These software repositories are essential for software development in providing referencing information (i.e., Stack Overflow), improving development productivity (i.e., dotfiles), and help distributing the software products to end users (i.e., app stores)
Unified System on Chip RESTAPI Service (USOCRS)
Abstract. This thesis investigates the development of a Unified System on Chip RESTAPI Service (USOCRS) to enhance the efficiency and effectiveness of SOC verification reporting. The research aims to overcome the challenges associated with the transfer, utilization, and interpretation of SoC verification reports by creating a unified platform that integrates various tools and technologies.
The research methodology used in this study follows a design science approach. A thorough literature review was conducted to explore existing approaches and technologies related to SOC verification reporting, automation, data visualization, and API development. The review revealed gaps in the current state of the field, providing a basis for further investigation. Using the insights gained from the literature review, a system design and implementation plan were developed. This plan makes use of cutting-edge technologies such as FASTAPI, SQL and NoSQL databases, Azure Active Directory for authentication, and Cloud services. The Verification Toolbox was employed to validate SoC reports based on the organization’s standards. The system went through manual testing, and user satisfaction was evaluated to ensure its functionality and usability.
The results of this study demonstrate the successful design and implementation of the USOCRS, offering SOC engineers a unified and secure platform for uploading, validating, storing, and retrieving verification reports. The USOCRS facilitates seamless communication between users and the API, granting easy access to vital information including successes, failures, and test coverage derived from submitted SoC verification reports. By automating and standardizing the SOC verification reporting process, the USOCRS eliminates manual and repetitive tasks usually done by developers, thereby enhancing productivity, and establishing a robust and reliable framework for report storage and retrieval. Through the integration of diverse tools and technologies, the USOCRS presents a comprehensive solution that adheres to the required specifications of the SOC schema used within the organization.
Furthermore, the USOCRS significantly improves the efficiency and effectiveness of SOC verification reporting. It facilitates the submission process, reduces latency through optimized data storage, and enables meaningful extraction and analysis of report data
- …