4,480 research outputs found

    Inheritance patterns in citation networks reveal scientific memes

    Full text link
    Memes are the cultural equivalent of genes that spread across human culture by means of imitation. What makes a meme and what distinguishes it from other forms of information, however, is still poorly understood. Our analysis of memes in the scientific literature reveals that they are governed by a surprisingly simple relationship between frequency of occurrence and the degree to which they propagate along the citation graph. We propose a simple formalization of this pattern and we validate it with data from close to 50 million publication records from the Web of Science, PubMed Central, and the American Physical Society. Evaluations relying on human annotators, citation network randomizations, and comparisons with several alternative approaches confirm that our formula is accurate and effective, without a dependence on linguistic or ontological knowledge and without the application of arbitrary thresholds or filters.Comment: 8 two-column pages, 5 figures; accepted for publication in Physical Review

    A Quantitative Approach to Understanding Online Antisemitism

    Full text link
    A new wave of growing antisemitism, driven by fringe Web communities, is an increasingly worrying presence in the socio-political realm. The ubiquitous and global nature of the Web has provided tools used by these groups to spread their ideology to the rest of the Internet. Although the study of antisemitism and hate is not new, the scale and rate of change of online data has impacted the efficacy of traditional approaches to measure and understand these troubling trends. In this paper, we present a large-scale, quantitative study of online antisemitism. We collect hundreds of million posts and images from alt-right Web communities like 4chan's Politically Incorrect board (/pol/) and Gab. Using scientifically grounded methods, we quantify the escalation and spread of antisemitic memes and rhetoric across the Web. We find the frequency of antisemitic content greatly increases (in some cases more than doubling) after major political events such as the 2016 US Presidential Election and the "Unite the Right" rally in Charlottesville. We extract semantic embeddings from our corpus of posts and demonstrate how automated techniques can discover and categorize the use of antisemitic terminology. We additionally examine the prevalence and spread of the antisemitic "Happy Merchant" meme, and in particular how these fringe communities influence its propagation to more mainstream communities like Twitter and Reddit. Taken together, our results provide a data-driven, quantitative framework for understanding online antisemitism. Our methods serve as a framework to augment current qualitative efforts by anti-hate groups, providing new insights into the growth and spread of hate online.Comment: To appear at the 14th International AAAI Conference on Web and Social Media (ICWSM 2020). Please cite accordingl

    Early Warning Analysis for Social Diffusion Events

    Get PDF
    There is considerable interest in developing predictive capabilities for social diffusion processes, for instance to permit early identification of emerging contentious situations, rapid detection of disease outbreaks, or accurate forecasting of the ultimate reach of potentially viral ideas or behaviors. This paper proposes a new approach to this predictive analytics problem, in which analysis of meso-scale network dynamics is leveraged to generate useful predictions for complex social phenomena. We begin by deriving a stochastic hybrid dynamical systems (S-HDS) model for diffusion processes taking place over social networks with realistic topologies; this modeling approach is inspired by recent work in biology demonstrating that S-HDS offer a useful mathematical formalism with which to represent complex, multi-scale biological network dynamics. We then perform formal stochastic reachability analysis with this S-HDS model and conclude that the outcomes of social diffusion processes may depend crucially upon the way the early dynamics of the process interacts with the underlying network's community structure and core-periphery structure. This theoretical finding provides the foundations for developing a machine learning algorithm that enables accurate early warning analysis for social diffusion events. The utility of the warning algorithm, and the power of network-based predictive metrics, are demonstrated through an empirical investigation of the propagation of political memes over social media networks. Additionally, we illustrate the potential of the approach for security informatics applications through case studies involving early warning analysis of large-scale protests events and politically-motivated cyber attacks

    Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data

    Get PDF
    Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required. To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems. In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries. The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms. In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms. Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies

    Identification of a New Family of Enzymes with Potential \u3cem\u3eO\u3c/em\u3e-acetylpeptidoglycan esterase activity in both Gram-positive and Gram-negative bacteria

    Get PDF
    Background: The metabolism of the rigid bacterial cell wall heteropolymer peptidoglycan is a dynamic process requiring continuous biosynthesis and maintenance involving the coordination of both lytic and synthetic enzymes. The O-acetylation of peptidoglycan has been proposed to provide one level of control on these activities as this modification inhibits the action of the major endogenous lytic enzymes, the lytic transglycosylases. The O-acetylation of peptidoglycan also inhibits the activity of the lysozymes which serve as the first line of defense of host cells against the invasion of bacterial pathogens. Despite this central importance, there is a dearth of information regarding peptidoglycan O-acetylation and nothing has previously been reported on its de-acetylation. Results: Homology searches of the genome databases have permitted this first report on the identification of a potential family of O-Acetylpeptidoglycan esterases (Ape). These proteins encoded in the genomes of a variety of both Gram-negative and Gram-positive bacteria, including a number of important human pathogens such as species of Neisseria, Helicobacter, Campylobacter, and Bacillus anthracis, have been organized into three families based on amino acid sequence similarities with family 1 being further divided into three sub-families. The genes encoding these proteins are shown to be clustered with Peptidoglycan O-acetyltransferases (Pat) and in some cases, together with other genes involved in cell wall metabolism. Representative bacteria that encode the Ape proteins were experimentally shown to produce O-acetylated peptidoglycan. Conclusion: The hypothetical proteins encoded by the pat and ape genes have been organized into families based on sequence similarities. The Pat proteins have sequence similarity to Pseudomonas aeruginosa AlgI, an integral membrane protein known to participate in the O-acetylation of the exopolysaccaride, alginate. As none of the bacteria that harbor the pat genes produce alginate, we propose that the Pat proteins serve to O-acetylate peptidoglycan which is known to be a maturation event occurring in the periplasm. The Ape sequences have amino acid sequence similarity to the CAZy CE 3 carbohydrate esterases, a family previously known to be composed of only O-acetylxylan esterases. They are predicted to contain the α/β hydrolase fold associated with the GDSL and TesA hydrolases and they possess the signature motifs associated with the catalytic residues of the CE3 esterases. Specific signature sequence motifs were identified for the Ape proteins which led to their organization into distinct families. We propose that by expressing both Pat and Ape enzymes, bacteria would be able to obtain a high level of localized control over the degradation of peptidoglycan through the attachment and removal of O-linked acetate. This would facilitate the efficient insertion of pores and flagella, localize spore formation, and control the level of general peptidoglycan turnover
    corecore