4,480 research outputs found
Inheritance patterns in citation networks reveal scientific memes
Memes are the cultural equivalent of genes that spread across human culture
by means of imitation. What makes a meme and what distinguishes it from other
forms of information, however, is still poorly understood. Our analysis of
memes in the scientific literature reveals that they are governed by a
surprisingly simple relationship between frequency of occurrence and the degree
to which they propagate along the citation graph. We propose a simple
formalization of this pattern and we validate it with data from close to 50
million publication records from the Web of Science, PubMed Central, and the
American Physical Society. Evaluations relying on human annotators, citation
network randomizations, and comparisons with several alternative approaches
confirm that our formula is accurate and effective, without a dependence on
linguistic or ontological knowledge and without the application of arbitrary
thresholds or filters.Comment: 8 two-column pages, 5 figures; accepted for publication in Physical
Review
A Quantitative Approach to Understanding Online Antisemitism
A new wave of growing antisemitism, driven by fringe Web communities, is an
increasingly worrying presence in the socio-political realm. The ubiquitous and
global nature of the Web has provided tools used by these groups to spread
their ideology to the rest of the Internet. Although the study of antisemitism
and hate is not new, the scale and rate of change of online data has impacted
the efficacy of traditional approaches to measure and understand these
troubling trends. In this paper, we present a large-scale, quantitative study
of online antisemitism. We collect hundreds of million posts and images from
alt-right Web communities like 4chan's Politically Incorrect board (/pol/) and
Gab. Using scientifically grounded methods, we quantify the escalation and
spread of antisemitic memes and rhetoric across the Web. We find the frequency
of antisemitic content greatly increases (in some cases more than doubling)
after major political events such as the 2016 US Presidential Election and the
"Unite the Right" rally in Charlottesville. We extract semantic embeddings from
our corpus of posts and demonstrate how automated techniques can discover and
categorize the use of antisemitic terminology. We additionally examine the
prevalence and spread of the antisemitic "Happy Merchant" meme, and in
particular how these fringe communities influence its propagation to more
mainstream communities like Twitter and Reddit. Taken together, our results
provide a data-driven, quantitative framework for understanding online
antisemitism. Our methods serve as a framework to augment current qualitative
efforts by anti-hate groups, providing new insights into the growth and spread
of hate online.Comment: To appear at the 14th International AAAI Conference on Web and Social
Media (ICWSM 2020). Please cite accordingl
Early Warning Analysis for Social Diffusion Events
There is considerable interest in developing predictive capabilities for
social diffusion processes, for instance to permit early identification of
emerging contentious situations, rapid detection of disease outbreaks, or
accurate forecasting of the ultimate reach of potentially viral ideas or
behaviors. This paper proposes a new approach to this predictive analytics
problem, in which analysis of meso-scale network dynamics is leveraged to
generate useful predictions for complex social phenomena. We begin by deriving
a stochastic hybrid dynamical systems (S-HDS) model for diffusion processes
taking place over social networks with realistic topologies; this modeling
approach is inspired by recent work in biology demonstrating that S-HDS offer a
useful mathematical formalism with which to represent complex, multi-scale
biological network dynamics. We then perform formal stochastic reachability
analysis with this S-HDS model and conclude that the outcomes of social
diffusion processes may depend crucially upon the way the early dynamics of the
process interacts with the underlying network's community structure and
core-periphery structure. This theoretical finding provides the foundations for
developing a machine learning algorithm that enables accurate early warning
analysis for social diffusion events. The utility of the warning algorithm, and
the power of network-based predictive metrics, are demonstrated through an
empirical investigation of the propagation of political memes over social media
networks. Additionally, we illustrate the potential of the approach for
security informatics applications through case studies involving early warning
analysis of large-scale protests events and politically-motivated cyber
attacks
Scalable Architecture for Integrated Batch and Streaming Analysis of Big Data
Thesis (Ph.D.) - Indiana University, Computer Sciences, 2015As Big Data processing problems evolve, many modern applications demonstrate special characteristics. Data exists in the form of both large historical datasets and high-speed real-time streams, and many analysis pipelines require integrated parallel batch processing and stream processing. Despite the large size of the whole dataset, most analyses focus on specific subsets according to certain criteria. Correspondingly, integrated support for efficient queries and post- query analysis is required.
To address the system-level requirements brought by such characteristics, this dissertation proposes a scalable architecture for integrated queries, batch analysis, and streaming analysis of Big Data in the cloud. We verify its effectiveness using a representative application domain - social media data analysis - and tackle related research challenges emerging from each module of the architecture by integrating and extending multiple state-of-the-art Big Data storage and processing systems.
In the storage layer, we reveal that existing text indexing techniques do not work well for the unique queries of social data, which put constraints on both textual content and social context. To address this issue, we propose a flexible indexing framework over NoSQL databases to support fully customizable index structures, which can embed necessary social context information for efficient queries.
The batch analysis module demonstrates that analysis workflows consist of multiple algorithms with different computation and communication patterns, which are suitable for different processing frameworks. To achieve efficient workflows, we build an integrated analysis stack based on YARN, and make novel use of customized indices in developing sophisticated analysis algorithms.
In the streaming analysis module, the high-dimensional data representation of social media streams poses special challenges to the problem of parallel stream clustering. Due to the sparsity of the high-dimensional data, traditional synchronization method becomes expensive and severely impacts the scalability of the algorithm. Therefore, we design a novel strategy that broadcasts the incremental changes rather than the whole centroids of the clusters to achieve scalable parallel stream clustering algorithms.
Performance tests using real applications show that our solutions for parallel data loading/indexing, queries, analysis tasks, and stream clustering all significantly outperform implementations using current state-of-the-art technologies
Identification of a New Family of Enzymes with Potential \u3cem\u3eO\u3c/em\u3e-acetylpeptidoglycan esterase activity in both Gram-positive and Gram-negative bacteria
Background: The metabolism of the rigid bacterial cell wall heteropolymer peptidoglycan is a dynamic process requiring continuous biosynthesis and maintenance involving the coordination of both lytic and synthetic enzymes. The O-acetylation of peptidoglycan has been proposed to provide one level of control on these activities as this modification inhibits the action of the major endogenous lytic enzymes, the lytic transglycosylases. The O-acetylation of peptidoglycan also inhibits the activity of the lysozymes which serve as the first line of defense of host cells against the invasion of bacterial pathogens. Despite this central importance, there is a dearth of information regarding peptidoglycan O-acetylation and nothing has previously been reported on its de-acetylation.
Results: Homology searches of the genome databases have permitted this first report on the identification of a potential family of O-Acetylpeptidoglycan esterases (Ape). These proteins encoded in the genomes of a variety of both Gram-negative and Gram-positive bacteria, including a number of important human pathogens such as species of Neisseria, Helicobacter, Campylobacter, and Bacillus anthracis, have been organized into three families based on amino acid sequence similarities with family 1 being further divided into three sub-families. The genes encoding these proteins are shown to be clustered with Peptidoglycan O-acetyltransferases (Pat) and in some cases, together with other genes involved in cell wall metabolism. Representative bacteria that encode the Ape proteins were experimentally shown to produce O-acetylated peptidoglycan.
Conclusion: The hypothetical proteins encoded by the pat and ape genes have been organized into families based on sequence similarities. The Pat proteins have sequence similarity to Pseudomonas aeruginosa AlgI, an integral membrane protein known to participate in the O-acetylation of the exopolysaccaride, alginate. As none of the bacteria that harbor the pat genes produce alginate, we propose that the Pat proteins serve to O-acetylate peptidoglycan which is known to be a maturation event occurring in the periplasm. The Ape sequences have amino acid sequence similarity to the CAZy CE 3 carbohydrate esterases, a family previously known to be composed of only O-acetylxylan esterases. They are predicted to contain the α/β hydrolase fold associated with the GDSL and TesA hydrolases and they possess the signature motifs associated with the catalytic residues of the CE3 esterases. Specific signature sequence motifs were identified for the Ape proteins which led to their organization into distinct families. We propose that by expressing both Pat and Ape enzymes, bacteria would be able to obtain a high level of localized control over the degradation of peptidoglycan through the attachment and removal of O-linked acetate. This would facilitate the efficient insertion of pores and flagella, localize spore formation, and control the level of general peptidoglycan turnover
- …