7,190 research outputs found

    A deep dive into user display names across social networks

    Get PDF
    The display names from an individual across Online Social Networks (OSNs) always contain abundant information redundancies because most users tend to use one main name or similar names across OSNs to make them easier to remember or to build their online reputation. These information redundancies are of great benefit to information fusion across OSNs. In this paper, we aim to measure these information redundancies between different display names of the same individual. Based on the cross-site linking function of Foursquare, we first develop a distributed crawler to extract the display names that individuals used in Facebook, Twitter and Foursquare, respectively. We construct three display name datasets across three OSNs, and measure the information redundancies in three ways: length similarity, character similarity and letter distribution similarity. We also analyze the evolution of redundant information over time. Finally, we apply the measurement results to the user identification across OSNs. We find that (1) more than 45% of users tend to use the same display name across OSNs; (2) the display names of the same individual for different OSNs show high similarity; (3) the information redundancies of display names are time-independent; (4) the AUC values of user identification results only based on display names are more than 0.9 on three datasets

    Deep dive on politician impersonating accounts in social media

    Get PDF
    International audienceThere is an ever-growing number of users who duplicate the social media accounts of celebrities or generally impersonate their presence on online social media as well as Instagram. Of course, this has led to an increasing interest in detecting fake profiles and investigating their behaviour. We begin this research by targeting a few famous politicians, including Donald J. Trump, Barack Obama, and Emmanuel Macron and collecting their activity for the period of 3 months using a specifically-designed crawler across Instagram. We then experimented with several profile characteristics such as user-name, display name, biography, and profile picture to identify impersonator among 1,5M unique users. Using publicly crawled data, our model was able to distinguish crowds of impersonators and political bots. We continued by providing an analysis of the characteristics and behaviour of these impersonators. Finally, we conclude the analysis by classifying impersonators into four different categories

    From Bonehead to @realDonaldTrump : A Review of Studies on Online Usernames

    Get PDF
    In many online services, we are identified by self-chosen usernames, also known as nicknames or pseudonyms. Usernames have been studied quite extensively within several academic disciplines, yet few existing literature reviews or meta-analyses provide a comprehensive picture of the name category. This article addresses this gap by thoroughly analyzing 103 research articles with usernames as their primary focus. Despite the great variety of approaches taken to investigate usernames, three main types of studies can be identified: (1) qualitative analyses examining username semantics, the motivations for name choices, and how the names are linked to the identities of the users; (2) experiments testing the communicative functions of usernames; and (3) computational studies analyzing large corpora of usernames to acquire information about the users and their behavior. The current review investigates the terminology, objectives, methods, data, results, and impact of these three study types in detail. Finally, research gaps and potential directions for future works are discussed. As this investigation will demonstrate, more research is needed to examine naming practices in social media, username-related online discrimination and harassment, and username usage in conversations.Peer reviewe

    Table Search Using a Deep Contextualized Language Model

    Full text link
    Pretrained contextualized language models such as BERT have achieved impressive results on various natural language processing benchmarks. Benefiting from multiple pretraining tasks and large scale training corpora, pretrained models can capture complex syntactic word relations. In this paper, we use the deep contextualized language model BERT for the task of ad hoc table retrieval. We investigate how to encode table content considering the table structure and input length limit of BERT. We also propose an approach that incorporates features from prior literature on table retrieval and jointly trains them with BERT. In experiments on public datasets, we show that our best approach can outperform the previous state-of-the-art method and BERT baselines with a large margin under different evaluation metrics.Comment: Accepted at SIGIR 2020 (Long

    Sampling labelled profile data for identity resolution

    Get PDF
    Identity resolution capability for social networking profiles is important for a range of purposes, from open-source intelligence applications to forming semantic web connections. Yet replication of research in this area is hampered by the lack of access to ground-truth data linking the identities of profiles from different networks. Almost all data sources previously used by researchers are no longer available, and historic datasets are both of decreasing relevance to the modern social networking landscape and ethically troublesome regarding the preservation and publication of personal data. We present and evaluate a method which provides researchers in identity resolution with easy access to a realistically-challenging labelled dataset of online profiles, drawing on four of the currently largest and most influential online social networks. We validate the comparability of samples drawn through this method and discuss the implications of this mechanism for researchers as well as potential alternatives and extensions

    The solid ecosystem: ready for mainstream web development?

    Get PDF
    Companies have been collecting data from its users over the years. This data it is often grouped in places called data silos and may then be used for profit in many ways: building data models to predict or enforce user behaviour, selling their data to other companies, among others. Moreover, the centralisation of data makes it appealing for people with malicious intentions to attack data silos. Security breaches violate users’ privacy, by exposing its sensitive data such as passwords, credit card information, and personal details. One solution to this problem is to separate data from these systems, demanding a shift in the way companies create web applications. This dissertation explores different solutions and compares them, focusing on a particular project named Solid. Created by the inventor of the World Wide Web, Tim Berners-Lee, Solid is a solution that takes advantage of the power of RDF in order to create a web of Linked Data, introducing decentralisation on software architecture in different layers. In order to achieve mainstream adoption, various aspects such as the impact of the introduction of this technology have on the user experience and development experience need to be considered. This dissertation documents the development of a prototype web application built with Solid at its core and compares it with the same application developed using a more traditional stack of technologies. An analysis was conducted under two perspectives: developer and final user. While in the former it is considered aspects such as development time and documentation diversity and quality, the latter is focused on the user experience. Resorting to a questionnaire presented to real users, it was concluded that the user experience of some the features of these applications, such as the user’s registration and the login process is affected by introducing this type of decentralisation. Moreover, it was also considered the lack of documentation this technology has at the moment, though it has improved throughout the development of this dissertation.As empresas têm coletado dados dos seus utilizadores ao longo dos anos. Esses dados são frequentemente agrupados em locais denominados de data silos e podem ser usados para fins lucrativos através de várias formas: construção de modelos de dados para prever ou impor comportamentos nos seus utlizadores, venda dos seus dados a outras empresas, entre outras. Para além disso, a centralização desses dados capta a atenção de pessoas com intenções maliciosas, que possuem interesse em atacar esses agrupamentos de dados. Falhas de segurança violam a privacidade dos utilizadores, expondo dados confidenciais, como passwords, informações de cartões de crédito e outros detalhes pessoais. Uma solução para este problema passa por separar os dados da aplicação, exigindo uma mudança na forma como as empresas criam aplicações. Esta dissertação explora diferentes soluções e efetua uma comparação entre elas, com foco num projecto específico denominado de Solid. Desenvolvido pelo criador da World Wide Web, Tim Berners-Lee, Solid é uma tecnologia que aproveita o poder de RDF para criar uma rede de informação interligada, introduzindo descentralização nas arquitetures de software em diferentes camadas. Por forma a conseguir uma adoção massiva, vários aspetos, como o impacto que esta tecnologia tem na experiência de utilizador e no desenvolvimento de software, necessitam de ser considerados. Esta dissertação documenta o desenvolvimento de uma aplicação que utiliza Solid no seu núcleo e compara-a com uma outra aplicação desenvolvida com uma pilha de tecnologias mais tradicional. Foi conduzida uma análise através de duas perspectivas: desenvolvedores e utilizador final. Enquanto que na primeira os aspetos considerados estão relacionados com tempo de desenvolvimento assim como qualidade e diversidade de documentação, a última está mais focada na experiência de utilizador. Recorrendo a um questionário apresentado a utilizadores que tiveram a oportunidade de experimentar ambas as aplicações, concluiu-se que a experiência do utilizador em algumas funcionalidades, como o registo de utilizador e o processo de login, é afetada pela introdução deste tipo de descentralização, ainda que em muitas outras a diferença seja impercetível. Além disso, também foi considerada a falta de documentação que esta tecnologia possui no momento, embora tenha melhorado ao longo do desenvolvimento desta dissertação

    Oceanus.

    Get PDF
    v. 36, no. 3 (1993

    Implementing LectureBank: a Network to Connect Researchers and Scientific Event Organizers

    Get PDF
    This report discusses the development process, design choices, and certain aspects of the implementation of key components behind LectureBank, a network connecting researchers and academic event organizers. We examine the tools involved and how the purpose and audience of the site as well as best practices and advice influenced the project. Additionally, we survey the latest advances in the field of web design, including new semantic HTML5 elements and microformats, and explore how we integrated and took advantage of them

    A Deep Dive into Technical Encryption Concepts to Better Understand Cybersecurity & Data Privacy Legal & Policy Issues

    Full text link
    Lawyers wishing to exercise a meaningful degree of leadership at the intersection of technology and the law could benefit greatly from a deep understanding of the use and application of encryption, considering it arises in so many legal scenarios. For example, in FTC v. Wyndham1 the defendant failed to implement nearly every conceivable cybersecurity control, including lack of encryption for stored data, resulting in multiple data breaches and a consequent FTC enforcement action for unfair and deceptive practices. Other examples of legal issues requiring use of encryption and other technology concepts include compliance with security requirements of GLBA & HIPAA, encryption safe harbors relative to state data breach notification laws and the CCPA, the NYDFS Cybersecurity Regulation, and PCI standards. Further, some policy discussions have taken place in 2020 regarding encrypted DNS over HTTPS, and lawyers would certainly seem to benefit from a better understanding of relevant encryption concepts to assess the privacy effectiveness of emerging encryption technologies, such as encrypted DNS. Finally, the need for technology education for lawyers is evidenced by North Carolina and Florida requiring one or more hours in technology CLE and New York in 2020 moving toward required CLE in the area of cybersecurity specifically. This article observes that there is a continuing desire for strong encryption mechanisms to advance the privacy interests of civilians’ online activities/communications (e.g., messages or web browsing). Law enforcement advocates for a “front door,” requiring tech platforms to maintain a decryption mechanism for online data, which they must produce upon the government providing a warrant. However, privacy advocates may encourage warrant-proof encryption mechanisms where tech platforms remove their ability to ever decrypt. This extreme pro-privacy position could be supported based on viewing privacy interests under a lens such as Blackstone’s ratio. Just as the Blackstone ratio principle favors constitutional protections that allow ten guilty people to go free rather than allowing one innocent person suffer, individual privacy rights could arguably favor fairly unsurveillable encrypted communications at the risk of not detecting various criminal activity. However, given that the internet can support large-scale good or evil activity, law enforcement continues to express a desire for a front door required by legislation and subject to suitable privacy safeguards, striking a balance between strong privacy versus law enforcement’s need to investigate serious crimes. In the last few decades, law enforcement appears to have lost the debate for various reasons, but the debate will likely continue for years to come. For attorneys to exercise meaningful leadership in evaluating the strength of encryption technologies relative to privacy rights, attorneys must generally understand encryption principles, how these principles are applied to data at rest (e.g., local encryption), and how they operate with respect to data in transit. Therefore, this article first explores encryption concepts primarily with regard to data at rest and then with regard to data in transit, exploring some general networking protocols as context for understanding how encryption can applied to data in transit, protecting the data payload of a packet and/or the routing/header information (i.e., the “from” and “to” field) of the packet. Part 1 of this article briefly explores the need for lawyers to understand encryption. Part 2 provides a mostly technical discussion of encryption concepts, with some legal concepts injected therein. Finally, Part 3 provides some high level legal discussion relevant to encryption (including arguments for and against law enforcement’s desire for a front door). To facilitate understanding for a non-technical legal audience, I include a variety of physical world analogies throughout (e.g., postal analogies and the like)
    • …
    corecore