17 research outputs found

    A Cooperative Game Approach

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2021. 2. ์ด๋•์ฃผ.As machine learning thrives in both academia and industry at the moment, data plays a salient role in training and validating machines. Meanwhile, few works have been developed on the economic evaluation of the data in data exchange market. The contribution of our work is two-fold. First, we take advantage of semi-values from cooperative game theory to model revenue distribution problem. Second, we construct a model consisting of provider, firm, and market while considering the privacy and fairness of machine learning. We showed Banzhaf value could be a reliable alternative to Shapley value in calculating the contribution of each datum. Also, we formulate the firms revenue maximization problem and present numerical analysis in the case of binary classifier with classical data examples. By assuming the firm only uses high quality data, we analyze its behavior in four different scenarios varying the datas fairness and compensating cost for data providers privacy. It turned out that the Banzhaf value is more sensitive to the fairness of data than the Shapley value. We analyzed the maximum revenue proportion which the firm gives away to data providers, as well as the range of number of data the firm would acquire.๊ธฐ๊ณ„ํ•™์Šต์ด ํ˜„์žฌ ์ด๋ก ๊ณผ ์‹ค์ƒํ™œ ์ ์šฉ ๋ชจ๋‘์—์„œ ๋ฐœ์ „ํ•จ์— ๋”ฐ๋ผ ๋ฐ์ดํ„ฐ๋Š” ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ  ๊ฒ€์ฆํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๊ณ  ์žˆ๋‹ค. ํ•œํŽธ, ๋ฐ์ดํ„ฐ ๊ตํ™˜ ์‹œ์žฅ์—์„œ ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์ œ์„ฑ ํ‰๊ฐ€์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๋Š” ์ดˆ๊ธฐ ๋‹จ๊ณ„์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ๊ธฐ์—ฌ๋Š” ๋‘ ๊ฐ€์ง€ ๊ด€์ ์—์„œ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฒซ์งธ, ํ˜‘๋™ ๊ฒŒ์ž„ ์ด๋ก ์˜ ๊ฐœ๋…์ธ semi-value๋ฅผ ๋ชจ๋ธ ์ˆ˜์ต ๋ถ„๋ฐฐ ๋ฌธ์ œ์— ํ™œ์šฉํ•œ๋‹ค. ๋‘˜์งธ, ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์˜ ๊ณต์ •์„ฑ๊ณผ ๊ฐœ์ธ์ •๋ณด๋ณดํ˜ธ์„ฑ์„ ๊ณ ๋ คํ•œ ๋ฐ์ดํ„ฐ ์ œ๊ณต์ž, ๊ธฐ์—…, ์‹œ์žฅ์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ชจ๋ธ์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ Banzhaf ๊ฐ’์€ ๊ฐ ๋ฐ์ดํ„ฐ์˜ ๊ธฐ์—ฌ๋„๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ Shapley ๊ฐ’์˜ ๋Œ€์•ˆ์ด ๋  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋˜ํ•œ ํšŒ์‚ฌ์˜ ์ˆ˜์ต ๊ทน๋Œ€ํ™” ๋ฌธ์ œ๋ฅผ ๋ชจ๋ธ๋งํ•˜์˜€๊ณ , ์ถ”๊ฐ€์ ์œผ๋กœ ๋ฐ์ดํ„ฐ ์˜ˆ์ œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด์ง„ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ์ˆ˜์น˜ ๋ถ„์„์„ ์ œ์‹œํ•˜์˜€๋‹ค. ์ด๋ฅผ ํ†ตํ•ด, Banzhaf ๊ฐ’์€ Shapley ๊ฐ’๋ณด๋‹ค ๋ฐ์ดํ„ฐ์˜ ๊ณต์ •์„ฑ์— ๋” ๋ฏผ๊ฐํ•˜๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋‚˜์•„๊ฐ€ ๊ธฐ์—…์ด ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋งŒ์„ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ๊ฐ€์ •ํ•˜์— ๋ฐ์ดํ„ฐ์˜ ๊ณต์ •์„ฑ๊ณผ ๋ฐ์ดํ„ฐ ์ œ๊ณต์ž์˜ ๊ฐœ์ธ์ •๋ณด์— ๋Œ€ํ•œ ๋ณด์ƒ๋น„์šฉ์„ ๋‹ฌ๋ฆฌํ•˜๋Š” ๋„ค ๊ฐ€์ง€ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ๊ธฐ์—…์˜ ํ–‰๋™์„ ๋ถ„์„ํ•˜์˜€๋‹ค. ๊ธฐ์—…์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ณต์ •ํ• ์ˆ˜๋ก ๋ฐ์ดํ„ฐ ์ œ๊ณต์ž์—๊ฒŒ ๋” ํฐ ์ˆ˜์ต์„ ๋ณด์žฅํ•ด์ฃผ์—ˆ๊ณ , ๊ณ ์ •๋น„์šฉ์ด ์ž‘์•„์งˆ์ˆ˜๋ก ๊ฐ€๋ณ€๋น„์šฉ์„ ํ†ตํ•ด์„œ ๋ฐ์ดํ„ฐ ์ œ๊ณต์ž์—๊ฒŒ ์ˆ˜์ต์„ ๋‚˜๋ˆ ์ฃผ๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค.Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Problem Description 2 1.3 Organization of the Thesis 3 Chapter 2 Literature Review 4 2.1 Fair Machine Learning 4 2.2 Private Machine Learning 5 2.3 Data Valuation 6 2.3.1 Dataset Price Estimation 6 2.3.2 Equitable Price Estimation 7 Chapter 3 Data Market Model 8 3.1 Basic Assumptions and Model Settings 8 3.2 Firms Profit Maximizing Problem 10 3.3 Data Valuation 12 3.4 Binary Classification Setting 14 Chapter 4 Analysis 17 4.1 Semi-value Approximation 17 4.1.1 Convergence Analysis 17 4.1.2 Group Data Calculation 20 4.2 Binary Classification 22 4.2.1 Parameter Analysis 22 4.2.2 Scenario Analysis 24 4.2.2.1 Description 24 4.2.2.2 Synthetic Data 25 4.2.2.3 Shapley Value Based Valuation 26 4.2.2.4 Banzhaf Value Based Valuation 28 4.2.2.5 Comparative Analysis 30 4.3 Data Pricing 33 Chapter 5 Conclusion 35 Bibliography 38 ๊ตญ๋ฌธ์ดˆ๋ก 43Maste

    Practices and challenges in clinical data sharing

    Full text link
    The debate on data access and privacy is an ongoing one. It is kept alive by the never-ending changes/upgrades in (i) the shape of the data collected (in terms of size, diversity, sensitivity and quality), (ii) the laws governing data sharing, (iii) the amount of free public data available on individuals (social media, blogs, population-based databases, etc.), as well as (iv) the available privacy enhancing technologies. This paper identifies current directions, challenges and best practices in constructing a clinical data-sharing framework for research purposes. Specifically, we create a taxonomy for the framework, identify the design choices available within each taxon, and demonstrate thew choices using current legal frameworks. The purpose is to devise best practices for the implementation of an effective, safe and transparent research access framework

    Scalable and approximate privacy-preserving record linkage

    No full text
    Record linkage, the task of linking multiple databases with the aim to identify records that refer to the same entity, is occurring increasingly in many application areas. Generally, unique entity identifiers are not available in all the databases to be linked. Therefore, record linkage requires the use of personal identifying attributes, such as names and addresses, to identify matching records that need to be reconciled to the same entity. Often, it is not permissible to exchange personal identifying data across different organizations due to privacy and confidentiality concerns or regulations. This has led to the novel research area of privacy-preserving record linkage (PPRL). PPRL addresses the problem of how to link different databases to identify records that correspond to the same real-world entities, without revealing the identities of these entities or any private or confidential information to any party involved in the process, or to any external party, such as a researcher. The three key challenges that a PPRL solution in a real-world context needs to address are (1) scalability to largedatabases by efficiently conducting linkage; (2) achieving high quality of linkage through the use of approximate (string) matching and effective classification of the compared record pairs into matches (i.e. pairs of records that refer to the same entity) and non-matches (i.e. pairs of records that refer to different entities); and (3) provision of sufficient privacy guarantees such that the interested parties only learn the actual values of certain attributes of the records that were classified as matches, and the process is secure with regard to any internal or external adversary. In this thesis, we present extensive research in PPRL, where we have addressed several gaps and problems identified in existing PPRL approaches. First, we begin the thesis with a review of the literature and we propose a taxonomy of PPRL to characterize existing techniques. This allows us to identify gaps and research directions. In the remainder of the thesis, we address several of the identified shortcomings. One main shortcoming we address is a framework for empirical and comparative evaluation of different PPRL solutions, which has not been studied in the literature so far. Second, we propose several novel algorithms for scalable and approximate PPRL by addressing the three main challenges of PPRL. We propose efficient private blocking techniques, for both three-party and two-party scenarios, based on sorted neighborhood clustering to address the scalability challenge. Following, we propose two efficient two-party techniques for private matching and classification to address the linkage quality challenge in terms of approximate matching and effective classification. Privacy is addressed in these approaches using efficient data perturbation techniques including k-anonymous mapping, reference values, and Bloom filters. Finally, the thesis reports on an extensive comparative evaluation of our proposed solutions with several other state-of-the-art techniques on real-world datasets, which shows that our solutions outperform others in terms of all three key challenges

    Record Linkage Techniques: Exploring and developing data matching methods to create national record linkage infrastructure to support population level research

    Get PDF
    In a world where the growth in digital information and systems continues to expand, researchers have access to unprecedented amounts of data. These large and complex data reservoirs require creative, innovative and scalable tools to unlock the potential of this โ€˜big dataโ€™. Record linkage is a powerful tool in the โ€˜big dataโ€™ arsenal. This thesis demonstrates the value of national record linkage infrastructure and how this has been achieved for the Australian research community

    E-Governance: Strategy for Mitigating Non-Inclusion of Citizens in Policy Making in Nigeria

    Get PDF
    The Nigerian federation that currently has 36 states structure adopted the Weberian Public Administrative system before now as an ideal way of running government, which was characterized with the traditional way of doing things without recourse to the deployment of Information Communication Technology (ICT). Today e-governance is seen as a paradigm shift from the previous way of governance. Research has shown that, the adoption and implementation of e-governance is more likely to bring about effective service delivery, mitigate corruption and ultimately enhance citizensโ€™ participation in governmental affairs. However, it has been argued that infrastructure such as regular electricity power and access to the Internet, in addition to a society with high rate of literacy level are required to effectively implement and realize the potentials of e-governance for improved delivery of services. Due to the difficulties currently experienced, developing nations need to adequately prepare for the implementation of e-governance on the platform of Information Communication Technology (ICT). Hence, this study seeks to examine whether the adoption and implementation of e-governance in the context of Nigeria would mitigate the hitherto non-inclusion of citizens in the formulation and implementation of government policies aimed at enhanced development. To achieve the objective of the study, data were sourced and analyzed majorly by examining government websites of 20 states in the Nigerian federation to ascertain if there are venues for citizens to interact with government in the area of policy making and feedback on government actions, as a way of promoting participatory governance. The study revealed that the adoption and implementation of e-governance in the country is yet to fully take place. This is due to lack of infrastructure, low level of literacy rate and government inability to provide the necessary infrastructure for e-governance to materialize. The paper therefore, recommends among others the need for the Federal Government to involve a sound and clear policy on how to go about the adoption and implementation of egovernance through deliberate effort at increasing budgetary allocation towards infrastructural development and mass education of citizens

    The Impact of e-Democracy in Political Stability of Nigeria

    Get PDF
    The history of the Nigerian electoral process has been hitherto characterized by violence stemming from disputes in election outcomes. For instance, violence erupted across some states in Northern Nigeria when results indicated that a candidate who was popular in that part of the country was losing the election leading to avoidable loss of lives. Beside, this dispute in election outcome lingers for a long time in litigation at the electoral tribunals which distracts effective governance. However, the increasing penetrating use of ICTs in Nigeria is evident in the electoral processes with consequent shift in the behavior of actors in the democratic processes, thus changing the ways Nigerians react to election outcomes. This paper examines the trend in the use ICT in the Nigerian political system and its impact on the stability of the polity. It assesses the role of ICT in recent electoral processes and compares its impact on the outcome of the process in lieu of previous experiences in the Nigeria. Furthermore, the paper also examines the challenges and risks of implementing e-Democracy in Nigeria and its relationship to the economy in the light of the socio-economic situation of the country. The paper adopted qualitative approach in data gathering and analysis. From the findings, the paper observed that e-democracy is largely dependent on the level of ICT adoption, which is still at its lowest ebb in the country. It recognizes the challenges in the provision of ICT infrastructure and argues that appropriate low-cost infrastructure applicable to the Nigerian condition can be made available to implement e-democracy and thus arouse the interest of the populace in governance, increase the number of voters, and enhance transparency, probity and accountability, and participation in governance as well as help stabilize the nascent democrac
    corecore