727 research outputs found

    Privacy-preserving scoring of tree ensembles : a novel framework for AI in healthcare

    Get PDF
    Machine Learning (ML) techniques now impact a wide variety of domains. Highly regulated industries such as healthcare and finance have stringent compliance and data governance policies around data sharing. Advances in secure multiparty computation (SMC) for privacy-preserving machine learning (PPML) can help transform these regulated industries by allowing ML computations over encrypted data with personally identifiable information (PII). Yet very little of SMC-based PPML has been put into practice so far. In this paper we present the very first framework for privacy-preserving classification of tree ensembles with application in healthcare. We first describe the underlying cryptographic protocols that enable a healthcare organization to send encrypted data securely to a ML scoring service and obtain encrypted class labels without the scoring service actually seeing that input in the clear. We then describe the deployment challenges we solved to integrate these protocols in a cloud based scalable risk-prediction platform with multiple ML models for healthcare AI. Included are system internals, and evaluations of our deployment for supporting physicians to drive better clinical outcomes in an accurate, scalable, and provably secure manner. To the best of our knowledge, this is the first such applied framework with SMC-based privacy-preserving machine learning for healthcare

    Ensemble deep learning: A review

    Get PDF
    Ensemble learning combines several individual models to obtain better generalization performance. Currently, deep learning models with multilayer processing architecture is showing better performance as compared to the shallow or traditional classification models. Deep ensemble learning models combine the advantages of both the deep learning models as well as the ensemble learning such that the final model has better generalization performance. This paper reviews the state-of-art deep ensemble models and hence serves as an extensive summary for the researchers. The ensemble models are broadly categorised into ensemble models like bagging, boosting and stacking, negative correlation based deep ensemble models, explicit/implicit ensembles, homogeneous /heterogeneous ensemble, decision fusion strategies, unsupervised, semi-supervised, reinforcement learning and online/incremental, multilabel based deep ensemble models. Application of deep ensemble models in different domains is also briefly discussed. Finally, we conclude this paper with some future recommendations and research directions

    Identificação de aplicações de vídeo em canais protegidos com aprendizagem automática

    Get PDF
    As encrypted traffic is becoming a standard and traffic obfuscation techniques become more accessible and common, companies are struggling to enforce their network usage policies and ensure optimal operational network performance. Users are more technologically knowledgeable, being able to circumvent web content filtering tools with the usage of protected tunnels such as VPNs. Consequently, techniques such as DPI, which already were considered outdated due to their impracticality, become even more ineffective. Furthermore, the continuous regulations being established by governments and international unions regarding citizen privacy rights makes network monitoring increasingly challenging. This work presents a scalable and easily deployable network-based framework for application identification in a corporate environment, focusing on video applications. This framework should be effective regardless of the environment and network setup, with the objective of being a useful tool in the network monitoring process. The proposed framework offers a compromise between allowing network supervision and assuring workers’ privacy. The results evaluation indicates that we can identify web services that are running over a protected channel with an accuracy of 95%, using low-level packet information that does not jeopardize sensitive worker data.Com a adoção de tráfego cifrado a tornar-se a norma e a crescente utilização de técnicas de obfuscação de tráfego, as empresas têm cada vez mais dificuldades em aplicar políticas de uso nas suas redes, bem como garantir o seu bom funcionamento. Os utilizadores têm mais conhecimentos tecnológicos, sendo facilmente capazes de contornar ferramentas de filtros de conteúdo online com a utilização de túneis protegidos como VPNs. Consequentemente, técnicas como DPI, que já estão ultrapassadas devido à sua impraticabilidade, tornam-se cada vez mais ineficazes. Além disso, todos os regulamentos que têm vindo a ser estabelecidos por governos e organizações internacionais sobre a privacidade dos cidadãos tornam a tarefa de monitorização de uma rede cada vez mais difícil. Este documento apresenta uma plataforma escalável e facilmente instalável para identificação de aplicações numa rede empresarial, focando-se em aplicações de vídeo. Esta abordagem deve ser eficaz independentemente do contexto e organização da rede, com o objectivo de ser uma ferramenta útil no processo de supervisão de redes. O modelo proposto oferece um compromisso entre a capacidade de supervisionar uma rede e assegurar a privacidade dos trabalhadores. A avaliação de resultados indica que é possível identificar serviços web em ligações estabelecidas sobre canais protegidos com uma precisão geral de 95%, usando informações de baixo-nível dos pacotes que não comprometem informação sensível dos trabalhadores.Mestrado em Engenharia de Computadores e Telemátic

    Confidential Boosting with Random Linear Classifiers for Outsourced User-generated Data

    Full text link
    User-generated data is crucial to predictive modeling in many applications. With a web/mobile/wearable interface, a data owner can continuously record data generated by distributed users and build various predictive models from the data to improve their operations, services, and revenue. Due to the large size and evolving nature of users data, data owners may rely on public cloud service providers (Cloud) for storage and computation scalability. Exposing sensitive user-generated data and advanced analytic models to Cloud raises privacy concerns. We present a confidential learning framework, SecureBoost, for data owners that want to learn predictive models from aggregated user-generated data but offload the storage and computational burden to Cloud without having to worry about protecting the sensitive data. SecureBoost allows users to submit encrypted or randomly masked data to designated Cloud directly. Our framework utilizes random linear classifiers (RLCs) as the base classifiers in the boosting framework to dramatically simplify the design of the proposed confidential boosting protocols, yet still preserve the model quality. A Cryptographic Service Provider (CSP) is used to assist the Cloud's processing, reducing the complexity of the protocol constructions. We present two constructions of SecureBoost: HE+GC and SecSh+GC, using combinations of homomorphic encryption, garbled circuits, and random masking to achieve both security and efficiency. For a boosted model, Cloud learns only the RLCs and the CSP learns only the weights of the RLCs. Finally, the data owner collects the two parts to get the complete model. We conduct extensive experiments to understand the quality of the RLC-based boosting and the cost distribution of the constructions. Our results show that SecureBoost can efficiently learn high-quality boosting models from protected user-generated data

    DTi2Vec: Drug-target interaction prediction using network embedding and ensemble learning.

    Get PDF
    Drug-target interaction (DTI) prediction is a crucial step in drug discovery and repositioning as it reduces experimental validation costs if done right. Thus, developing in-silico methods to predict potential DTI has become a competitive research niche, with one of its main focuses being improving the prediction accuracy. Using machine learning (ML) models for this task, specifically network-based approaches, is effective and has shown great advantages over the other computational methods. However, ML model development involves upstream hand-crafted feature extraction and other processes that impact prediction accuracy. Thus, network-based representation learning techniques that provide automated feature extraction combined with traditional ML classifiers dealing with downstream link prediction tasks may be better-suited paradigms. Here, we present such a method, DTi2Vec, which identifies DTIs using network representation learning and ensemble learning techniques. DTi2Vec constructs the heterogeneous network, and then it automatically generates features for each drug and target using the nodes embedding technique. DTi2Vec demonstrated its ability in drug-target link prediction compared to several state-of-the-art network-based methods, using four benchmark datasets and large-scale data compiled from DrugBank. DTi2Vec showed a statistically significant increase in the prediction performances in terms of AUPR. We verified the novel predicted DTIs using several databases and scientific literature. DTi2Vec is a simple yet effective method that provides high DTI prediction performance while being scalable and efficient in computation, translating into a powerful drug repositioning tool
    corecore