15 research outputs found

    The Information Privacy Law of Web Applications and Cloud Computing

    Get PDF
    This article surveys and evaluates the privacy law of web applications and cloud computing. Cloud services, and web applications in particular, are subject to many different privacy law requirements. While these requirements are often perceived as illfitting, they can be interpreted to provide a structurally sound and coherent privacy regime. The applicable body of law can be separated into two tiers: the primary privacy law and the secondary privacy law. The primary privacy law is created by the providers and users of cloud services through privacy contracts, especially, privacy policies. The secondary privacy law, contained, for example, in statutes and regulations, is for the most part only applicable where no valid privacy contracts exist. This supremacy of privacy contracts over statutory and other secondary privacy law enables individualized privacy protection levels and commercial use of privacy rights according to the contracting parties’ individual wishes

    When Enough is Enough: Location Tracking, Mosaic Theory, and Machine Learning

    Get PDF
    Since 1967, when it decided Katz v. United States, the Supreme Court has tied the right to be free of unwanted government scrutiny to the concept of reasonable xpectations of privacy.[1] An evaluation of reasonable expectations depends, among other factors, upon an assessment of the intrusiveness of government action. When making such assessment historically the Court has considered police conduct with clear temporal, geographic, or substantive limits. However, in an era where new technologies permit the storage and compilation of vast amounts of personal data, things are becoming more complicated. A school of thought known as “mosaic theory” has stepped into the void, ringing the alarm that our old tools for assessing the intrusiveness of government conduct potentially undervalue privacy rights. Mosaic theorists advocate a cumulative approach to the evaluation of data collection. Under the theory, searches are “analyzed as a collective sequence of steps rather than as individual steps.”[2] The approach is based on the recognition that comprehensive aggregation of even seemingly innocuous data reveals greater insight than consideration of each piece of information in isolation. Over time, discrete units of surveillance data can be processed to create a mosaic of habits, relationships, and much more. Consequently, a Fourth Amendment analysis that focuses only on the government’s collection of discrete units of trivial data fails to appreciate the true harm of long-term surveillance—the composite. In the context of location tracking, the Court has previously suggested that the Fourth Amendment may (at some theoretical threshold) be concerned with the accumulated information revealed by surveillance.[3] Similarly, in the Court’s recent decision in United States v. Jones, a majority of concurring justices indicated willingness to explore such an approach.[4] However, in general, the Court has rejected any notion that technological enhancement matters to the constitutional treatment of location tracking.[5] Rather, it has found that such surveillance in public spaces, which does not require physical trespass, is equivalent to a human tail and thus not regulated by the Fourth Amendment. In this way, the Court has avoided quantitative analysis of the amendment’s protections. The Court’s reticence is built on the enticingly direct assertion that objectivity under the mosaic theory is impossible. This is true in large part because there has been no rationale yet offered to objectively distinguish relatively short-term monitoring from its counterpart of greater duration. As Justice Scalia recently observed in Jones: “it remains unexplained why a 4-week investigation is ‘surely’ too long.”[6] This article suggests that by combining the lessons of machine learning with the mosaic theory and applying the pairing to the Fourth Amendment we can see the contours of a response. Machine learning makes clear that mosaics can be created. Moreover, there are also important lessons to be learned on when that is the case. Machine learning is the branch of computer science that studies systems that can draw inferences from collections of data, generally by means of mathematical algorithms. In a recent competition called “The Nokia Mobile Data Challenge,”[7] researchers evaluated machine learning’s applicability to GPS and cell phone tower data. From a user’s location history alone, the researchers were able to estimate the user’s gender, marital status, occupation and age.[8] Algorithms developed for the competition were also able to predict a user’s likely future location by observing past location history. The prediction of a user’s future location could be even further improved by using the location data of friends and social contacts.[9] Machine learning of the sort on display during the Nokia competition seeks to harness the data deluge of today’s information society by efficiently organizing data, finding statistical regularities and other patterns in it, and making predictions therefrom. Machine learning algorithms are able to deduce information—including information that has no obvious linkage to the input data—that may otherwise have remained private due to the natural limitations of manual and human-driven investigation. Analysts can “train” machine learning programs using one dataset to find similar characteristics in new datasets. When applied to the digital “bread crumbs” of data generated by people, machine learning algorithms can make targeted personal predictions. The greater the number of data points evaluated, the greater the accuracy of the algorithm’s results. In five parts, this article advances the conclusion that the duration of investigations is relevant to their substantive Fourth Amendment treatment because duration affects the accuracy of the predictions. Though it was previously difficult to explain why an investigation of four weeks was substantively different from an investigation of four hours, we now have a better understanding of the value of aggregated data when viewed through a machine learning lens. In some situations, predictions of startling accuracy can be generated with remarkably few data points. Furthermore, in other situations accuracy can increase dramatically above certain thresholds. For example, a 2012 study found the ability to deduce ethnicity moved sideways through five weeks of phone data monitoring, jumped sharply to a new plateau at that point, and then increased sharply again after twenty-eight weeks.[10] More remarkably, the accuracy of identification of a target’s significant other improved dramatically after five days’ worth of data inputs.[11] Experiments like these support the notion of a threshold, a point at which it makes sense to draw a Fourth Amendment line. In order to provide an objective basis for distinguishing between law enforcement activities of differing duration the results of machine learning algorithms can be combined with notions of privacy metrics, such as k-anonymity or l-diversity. While reasonable minds may dispute the most suitable minimum accuracy threshold, this article makes the case that the collection of data points allowing predictions that exceed selected thresholds should be deemed unreasonable searches in the absence of a warrant.[12] Moreover, any new rules should take into account not only the data being collected but also the foreseeable improvements in the machine learning technology that will ultimately be brought to bear on it; this includes using future algorithms on older data. In 2001, the Supreme Court asked “what limits there are upon the power of technology to shrink the realm of guaranteed privacy.”[13] In this piece, we explore an answer and investigate what lessons there are in the power of technology to protect the realm of guaranteed privacy. After all, as technology takes away, it also gives. The objective understanding of data compilation and analysis that is revealed by machine learning provides important Fourth Amendment insights. We should begin to consider these insights more closely. [1] Katz v. United States, 389 U.S. 347, 361 (1967) (Harlan, J., concurring). [2] Orin Kerr, The Mosaic Theory of the Fourth Amendment, 111 Mich. L. Rev. 311, 312 (2012). [3] United States v. Knotts, 460 U.S. 276, 284 (1983). [4] Justice Scalia writing for the majority left the question open. United States v. Jones, 132 S. Ct. 945, 954 (2012) (“It may be that achieving the same result [as in traditional surveillance] through electronic means, without an accompanying trespass, is an unconstitutional invasion of privacy, but the present case does not require us to answer that question.”). [5] Compare Knotts, 460 U.S. at 276 (rejecting the contention that an electronic beeper should be treated differently than a human tail) and Smith v. Maryland, 442 U.S. 735, 744 (1979) (approving the warrantless use of a pen register in part because the justices were “not inclined to hold that a different constitutional result is required because the telephone company has decided to automate”) with Kyllo v. United States, 533 U.S. 27, 33 (2001) (recognizing that advances in technology affect the degree of privacy secured by the Fourth Amendment). [6] United States v. Jones, 132 S.Ct. 945 (2012); see also Kerr, 111 Mich. L. Rev. at 329-330. [7] See Nokia Research Center, Mobile Data Challenge 2012 Workshop, http://research.nokia.com/page/12340. [8] Demographic Attributes Prediction on the Real-World Mobile Data, Sanja Brdar, Dubravko Culibrk & Vladimir Crnojevic, Nokia Mobile Data Challenge Workshop 2012. [9] Interdependence and Predictability of Human Mobility and Social Interactions, Manlio de Domenico, Antonio Lima & Mirco Musolesi, Nokia Mobile Data Challenge Workshop 2012. [10] See Yaniv Altshuler, Nadav Aharony, Michael Fire, Yuval Elovici, Alex Pentland, Incremental Learning with Accuracy Prediction of Social and Individual Properties from Mobile-Phone Data, WS3P, IEEE Social Computing (2012), Figure 10. [11] Id., Figure 9. [12] Admittedly, there are differing views on sources of authority beyond the Constitution that might justify location tracking. See, e.g., Stephanie K. Pell & Christopher Soghoian, Can You See Me Now? Toward Reasonable Standards for Law Enforcement Access to Location Data That Congress Could Enact, 27 Berkeley Tech. L.J. 117 (2012). [13] Kyllo, 533 U.S. at 34

    The Information Privacy Law of Web Applications and Cloud Computing

    No full text
    This article surveys and evaluates the privacy law of web applications and cloud computing. Cloud services, and web applications in particular, are subject to many different privacy law requirements. While these requirements are often perceived as illfitting, they can be interpreted to provide a structurally sound and coherent privacy regime. The applicable body of law can be separated into two tiers: the primary privacy law and the secondary privacy law. The primary privacy law is created by the providers and users of cloud services through privacy contracts, especially, privacy policies. The secondary privacy law, contained, for example, in statutes and regulations, is for the most part only applicable where no valid privacy contracts exist. This supremacy of privacy contracts over statutory and other secondary privacy law enables individualized privacy protection levels and commercial use of privacy rights according to the contracting parties’ individual wishes

    When Enough is Enough: Location Tracking, Machine Learning and the Mosaic Theory

    No full text
    Since 1967, when it decided Katz v. United States, the Supreme Court has tied the right to be free of unwanted government scrutiny to the concept of reasonable expectations of privacy.1 An evaluation of reasonable expectations depends, among other factors, upon an assessment of the intrusiveness of government action. When making such assessment historically the Court considered police conduct with clear temporal, geographic, or substantive limits. However, in an era where new technologies permit the storage and compilation of vast amounts of personal data, things are becoming more complicated. A school of thought known as “mosaic theory” has stepped into the void, ringing the alarm that our old tools for assessing the intrusiveness of government conduct potentially undervalue privacy rights. Mosaic theorists advocate a cumulative approach to the evaluation of data collection. Under the theory, searches are “analyzed as a collective sequence of steps rather than as individual steps.”2 The approach is based on the observation that comprehensive aggregation of even seemingly innocuous data reveals greater insight than consideration of each piece of information in isolation. Over time, discrete units of surveillance data can be processed to create a mosaic of habits, relationships, and much more. Consequently, a Fourth Amendment analysis that focuses only on the government’s collection of discrete units of data fails to appreciate the true harm of long-term surveillance—the composite

    “I Don’t Have a Photograph, But You Can Have My Footprints." – Revealing the Demographics of Location Data

    No full text
    High accuracy location data are routinely available to a plethora of mobile apps and web services. The availability of such data lead to a better general understanding of human mobility. However, as location data are usually not associated with demographic information, little work has been done to understand the differences in human mobility across demographics. In this study we begin to fill the void. In particular, we explore how the growing number of geotagged footprints that social network users create can reveal demographic attributes and how these footprints enable the understanding of mobility at a demographic level. Our methodology gives rise to novel opportunities in the study of mobility. We leverage publicly available geotagged photographs from a popular photosharing network to build a dataset on demographic mobility patterns. Our analysis of this dataset not only reproduces previous results on mobility behavior at various geographical levels but further extends the existing picture: it allows for the refinement of mobility modeling from entire populations to specific demographic groups. Our analysis suggests the existence of regional variations in mobility and reveals statistically significant differences in mobility between genders and ethnicities

    Towards Automatic Classification of Privacy Policy Text

    No full text
    <p>CMU-ISR-17-118 / CMU-LTI-17-010--Privacy policies notify Internet users about the privacy practices of websites, mobile apps, and other products and services. However, users rarely read them and struggle to understand their contents. Also, the entities that provide these policies are sometimes unmotivated to make them comprehensible. Recently, annotated corpora of privacy policies have been introduced to the research community. They open the door to the development of machine learning and natural language processing techniques to automate the annotation of these documents. In turn, these annotations can be passed on to interfaces (e.g., web browser plugins) that help users quickly identify and understand relevant privacy statements. We present advances in extracting privacy policy paragraphs (termed segments in this paper) and individual sentences that relate to expert-identified categories of policy contents, using methods in supervised learning. In particular, we show that relevant segments and sentences can be classified with average micro-F1 scores of 0.79 and 0.70 respectively, improving over prior work. We discuss how the techniques introduced in this paper have been used to automatically annotate the text of about 7,000 privacy policies. Our discussion highlights opportunities as well as limitations associated with our classification approach.</p

    Quality assessment of online automated privacy policy generators: An empirical study

    No full text
    Online Automated Privacy Policy Generators (APPGs) are tools used by app developers to quickly create app privacy policies which are required by privacy regulations to be incorporated to each mobile app. The creation of these tools brings convenience to app developers; however, the quality of these tools puts developers and stakeholders at legal risk. In this paper, we conduct an empirical study to assess the quality of online APPGs. We analyze the completeness of privacy policies, determine what categories and items should be covered in a complete privacy policy, and conduct APPG assessment with boilerplate apps. The results of assessment show that due to the lack of static or dynamic analysis of app’s behavior, developers may encounter two types of issues caused by APPGs. First, the generated policies could be incomplete because they do not cover all the essential items required by a privacy policy. Second, some generated privacy policies contain unnecessary personal information collection or arbitrary commitments inconsistent with user input. Ultimately, the defects of APPGs may potentially lead to serious legal issues.We hope that the results and insights developed in this paper can motivate the healthy and ethical development of APPGs towards generating a more complete, accurate, and robust privacy policy.Ruoxi Sun, Minhui Xu

    MAPS: Scaling Privacy Compliance Analysis to a Million Apps

    No full text
    The app economy is largely reliant on data collection as its primary revenue model. To comply with legal requirements, app developers are often obligated to notify users of their privacy practices in privacy policies. However, prior research has suggested that many developers are not accurately disclosing their apps’ privacy practices. Evaluating discrepancies between apps’ code and privacy policies enables the identification of potential compliance issues. In this study, we introduce the Mobile App Privacy System (MAPS) for conducting an extensive privacy census of Android apps. We designed a pipeline for retrieving and analyzing large app populations based on code analysis and machine learning techniques. In its first application, we conduct a privacy evaluation for a set of 1,035,853 Android apps from the Google Play Store. We find broad evidence of potential non-compliance. Many apps do not have a privacy policy to begin with. Policies that do exist are often silent on the practices performed by apps. For example, 12.1% of apps have at least one location-related potential compliance issue. We hope that our extensive analysis will motivate app stores, government regulators, and app developers to more effectively review apps for potential compliance issues

    MAPS: Scaling Privacy Compliance Analysis to a Million Apps

    Get PDF
    The app economy is largely reliant on data collection as its primary revenue model. To comply with legal requirements, app developers are often obligated to notify users of their privacy practices in privacy policies. However, prior research has suggested that many developers are not accurately disclosing their apps’ privacy practices. Evaluating discrepancies between apps’ code and privacy policies enables the identification of potential compliance issues. In this study, we introduce the Mobile App Privacy System (MAPS) for conducting an extensive privacy census of Android apps. We designed a pipeline for retrieving and analyzing large app populations based on code analysis and machine learning techniques. In its first application, we conduct a privacy evaluation for a set of 1,035,853 Android apps from the Google Play Store. We find broad evidence of potential non-compliance. Many apps do not have a privacy policy to begin with. Policies that do exist are often silent on the practices performed by apps. For example, 12.1% of apps have at least one location-related potential compliance issue. We hope that our extensive analysis will motivate app stores, government regulators, and app developers to more effectively review apps for potential compliance issues
    corecore