491 research outputs found

    Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

    Full text link
    Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

    Datafication and the push for ubiquitous listening in music streaming

    Get PDF
    This article discusses Spotify’s approach to music recommendation as datafication of listening. It discusses the hybrid types of music recommendation that Spotify presents to users. The article explores how datafication is connected to Spotify’s push for the personalization and contextualization of music recommendations based on a combination of the cultural knowledge found in editorial curation and the potential for large-scale personalization found in algorithmic curation. The article draws on the concept of ubiquitous music and other understandings of the affective and functional aspects of music listening as an everyday practice to reflect upon how Spotify’s approach to datafication of listening potentially leads it to prioritize music recommendations that entice users to engage in inattentive and continuous listening. In extension to this, the article seeks to contribute with knowledge about how the datafication of listening potentially shapes listening practices and conceptions of relevance and quality in music recommendation

    Navigating Risk in Vendor Data Privacy Practices: An Analysis of Elsevier\u27s ScienceDirect

    Get PDF
    Executive Summary As libraries transitioned from buying materials to licensing content, serious threats to privacy followed. This change shifted more control over library user data (and whether it is collected or kept at all) from the local library to third-party vendors, including personal data about what people search for and what they read. This transition has further reinforced the move by some of the largest academic publishers to move beyond content and become data analytics businesses that provide platforms of tools used throughout the research lifecycle that can collect user data at each stage. These companies have an increasing incentive to collect and monetize the rich streams of data that these platforms can generate from users. As a result, user privacy depends on the strength of privacy protections guaranteed by vendors (e.g., negotiated for in contracts), and a growing body of evidence indicates that this should be a source of concern. User tracking that would be unthinkable in a physical library setting now happens routinely through such platforms. The potential integration of this tracking with other lines of business, including research analytics tools and data brokering services, raises pressing questions for users and institutions. Elsevier provides an important case study in this dynamic. Elsevier is many academic libraries’ largest vendor for collections, and its platforms span the knowledge production process, from discovery and idea generation to publication to evaluation. Furthermore, Elsevier’s parent company, RELX, is a leading data broker. Its “risk” business, which provides services to corporations, governments, and law enforcement agencies based on expansive databases of personal data, has surpassed its Elsevier division in revenue and profitability. For these reasons, it is important to carefully consider Elsevier’s privacy practices, the risks they may pose, and proactive steps to protect users. This analysis focuses on ScienceDirect due to its position as a leading discovery platform for research as well as the Elsevier product that researchers are most likely to interact with regularly. Based on our findings, many of ScienceDirect\u27s data privacy practices directly conflict with library privacy standards and guidelines. The data privacy practices identified in our analysis are like the practices found in many businesses and organizations that track and harvest user data to sustain privacy-intrusive data-driven business models. The widespread data collection, user tracking and surveillance, and disclosure of user data inherent to these business models run counter to the library\u27s commitment to user privacy as specified in the ALA Code of Ethics, Library Bill of Rights, and the IFLA Statement on Privacy in the Library Environment. Examples of current ScienceDirect practices found in our analysis that conflict with these standards include: • Use of web beacons, cookies, and other invasive web surveillance methods to track user behavior outside and beyond the ScienceDirect website • Extensive collection of a broad range of personal data (e.g., behavioral and location data) from ScienceDirect combined with personal data harvested from sources beyond ScienceDirect (i.e., third parties in and outside of RELX and data brokers as stated in Elsevier’s Privacy Policy and U.S. Consumer Privacy Notice) • Collection of personal data by third parties, including search engines, social media platforms, and other personal-data aggregators and profilers such as Google, Adobe, Cloudflare, and New Relic, through extensive use of third-party trackers on the ScienceDirect site • Disclosure of personal data to other Elsevier products and the potential for disclosure of personal data to other business units within RELX, including risk products and services sold to corporations, governments, and law enforcement agencies • Processing and disclosure of personal data (and personal data inferred from personal data) for targeted, personalized advertising and marketing In particular, ScienceDirect’s U.S. Consumer Privacy Notice, posted and updated in 2023, raises important concerns. The notice describes the disclosure of detailed user data—including geolocation data, sensitive personal information, and inference data used to create profiles on individuals—both for wide-ranging internal use and to external third parties, including “affiliates” and “business and joint venture partners.” The collection and disclosure of data about who someone is, where they are, and what they search for and read by the same overarching company that provides sophisticated surveillance and data brokering products to corporations, governments, and law enforcement should be alarming. These practices raise the question of whether simultaneous ownership of key academic infrastructure alongside sophisticated surveillance and data brokering businesses should be permitted at all—by users, by institutions, or by policymakers and regulatory authorities. Our analysis cannot definitively confirm whether personal data derived from academic products is currently being used in data brokering or “risk” products. Nevertheless, ScienceDirect’s privacy practices highlight the need to be aware of this risk, which is not mitigated by privacy policy revisions or potential verbal assurances concerning specific data uses. Privacy policies can be changed unilaterally, and denials are not legally binding. To be meaningful, any privacy guarantee a vendor makes must be durable, verifiable, and not limited to a particular jurisdiction. As many of the largest publishers reinvent themselves as platform businesses, users and institutions should actively evaluate and address the potential privacy risks as this transition occurs rather than after it is complete. In closely analyzing the privacy practices of the leading vendor in this transition, this report highlights the need for institutions to be proactive in responding to these risks and provides initial steps for doing so. This report underscores the significant expertise and capacity required for any institution to understand even one vendor’s privacy practices—and the power asymmetry this creates between vendors and libraries. Collaborative efforts, such as SPARC’s Privacy & Surveillance Community of Practice, can plan a key role in supporting future action to address the real privacy risks posed by vendors’ platforms. This report closes with options that institutions may consider to mitigate these risks over the short and longer term

    Risky Business: Creative Development in Digital Media

    Full text link
    Honors (Bachelor's)Screen Arts and CulturesUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/134713/1/nsalka.pd

    AI Stylist: What Do I Wear? Mobile Application

    Get PDF
    This thesis is an exploration of the process of getting dressed, and how it is incredibly complex. This paper identifies the multiple variables that go into the decisions and motives people have when styling a complete outfit. These variables will be explored through my expertise of being a personal stylist and shopper, and with expert consultations in different technology fields. The purpose of the What Do I Wear? mobile application is to assist people getting dressed by recommending an outfit that suits the users body shape, activity, represents their personal style, and is suitable for the weather. The final product is a mobile application that functions as a personal stylist, utilizing an artificial intelligence agent trained on using identified variables when styling an outfit

    Business model analysis of interest-based social networking services

    Get PDF
    Objectives of the study This research paper sets out to examine interest-based social networking services and the underlying business models that provide the logic for value creation, delivery, and capture. The objective of this paper is to uncover the common characteristics of interest-based social networking services' business models in order to understand the necessary building blocks that need to be present for a new service to function properly. Furthermore, it aims at giving managerial suggestions for companies that are planning to launch an interest-based social networking service. Academic background and methodology The research in this paper is based on an extensive literature review on social networks, evolution of the internet, online social networks and their respective business models. The paper continues to examine the theory behind business models, which provides the grounds for choosing a business model framework that is used in the empirical part of the thesis. The empirical part of this paper is conducted as an observatory case study. The results of the empirical study are utilized to uncover the key characteristics of interest-based social networks and to provide general guidelines for new entrants that are launching a similar service. Findings and conclusions The research shows that interest-based social networking services use the same technical and basic functions when it comes to the design and implementation of the platform. The services are built based on the user interest graph rather than on existing social connections that are imported from other services. This interest graph works as the basis of the services and is tightly linked to the underlying business model components

    prototypical implementations ; working packages in project phase II

    Get PDF
    In this technical report, we present the concepts and first prototypical imple- mentations of innovative tools and methods for personalized and contextualized (multimedia) search, collaborative ontology evolution, ontology evaluation and cost models, and dynamic access and trends in distributed (semantic) knowledge. The concepts and prototypes are based on the state of art analysis and identified requirements in the CSW report IV

    Metapolitical New Right Influencers:The Case of Brittany Pettibone

    Get PDF
    Far-right movements, activists, and political parties are on the rise worldwide. Several scholars connect this rise of the far-right at least partially to the affordances of digital media and to a new digital metapolitical battle. A lot has been written about the far-right’s adoption of trolling, harassment, and meme-culture in their metapolitical strategy, but researchers have focused less on how far-right vloggers are using the practices of influencer culture for metapolitical goals. This paper tries to fill this gap and bring new theoretical insights based on a digital ethnographic case study. By analyzing political YouTuber and #pizzagate propagator Brittany Pettibone, this paper contributes to our understanding of radicalization processes in relation to the use of digital media

    The Promise and Peril of Big Data

    Get PDF
    The Promise and Peril of Big Data explores the implications of inferential technologies used to analyze massive amounts of data and the ways in which these techniques can positively affect business, medicine, and government. The report is the result of the Eighteenth Annual Roundtable on Information Technology

    Multiformat Communication Strategies: A Conceptual Framework and Empirical Investigation of Video Formats

    Get PDF
    Essay One was conducted to build a more complete view of bilateral, multiformat customer–firm communication. A review of communication theory builds a foundation for effective multiformat strategies across different exchange contexts (e.g., message complexity) and timing factors (e.g., relationship duration), while accounting for both positive and negative aspects of communication richness. Four perspectives on multiformat communication during exchange events suggest pertinent propositions and produce three parsimonious tenets. First, the authors propose a communication theory foundation for relationship marketing; second, they compile and synthesize extant research. Third, they identify six fundamental communication characteristics associated with different formats. Finally, they integrate insights from the previous perspectives into a single conceptual model to provide a more comprehensive view of multiformat communication. This conceptual framework can serve as a platform that academics and managers can use to develop effective communication strategies and thereby optimize customer experiences while simultaneously reducing firm costs and enhancing customer profitability and relationships. Essays Two and Three apply the characteristic-level insights derived in Essay One to a unilateral communication context, investigating whether, when and how the video format impacts performance, with four experimental studies. Consumers are increasingly watching online product videos without sound (no audio narration). Yet, managers have few insights into developing effective video marketing strategies, in the presence of this trend. In Essay Two, the authors first identify two distinct advantages of a video watched with sound, richness (greater message understanding) and vividness (greater message visualization), both of which have a positive impact on performance (Study 1). Next, the authors uncover that the vividness effect is important for consumers with hedonic shopping goals but not for those with utilitarian shopping goals (Studies 2a and 2b). In Essay Three, the authors find the richness effect is important for consumers with utilitarian shopping goals when they are visually distracted (Study 3). Finally, the authors find that adding text captions to the video, a frequently employed strategy, can backfire (Study 4). Adding text captions to a product video lowers message understanding and purchase intentions, when the video is still watched with sound. These findings have important theoretical and managerial implications
    • …
    corecore