354 research outputs found

    What Scale of Audience a Campaign can Reach in What Price on Twitter?

    Get PDF
    Abstract—Campaigns with commercial and spam purposes have flooded the Twitter community. To understand what scale of audience a campaign could reach, we first perform a measurement study by collecting a dataset of about 10 million tweets via streaming API and one million search tweets for targeting topics, as well as 37,313 user accounts that are suspended by Twitter. From the dataset, we extract a spam campaign and a commercial promotion campaign accompanied by spamming activities. Then, we characterize the way in which a campaign can reach its audience, especially revealing the features that dominate the information diffusion. After identifying the accounts suspended by Twitter, we further inspect to what extent these features can help to weed out spam accounts. Also, the retrospective inspection is useful to uncover the tactics that malicious accounts utilize to avoid being suspended. Using the measurement results, we then develop a theoretical framework based on an epidemic model to investigate the dynamics of spammers and victims whom spammers reach in the spam campaign. With the theoretical framework, we conduct a benefit-cost analysis of the spam campaign, shedding lights on how to restrict the benefit of the spam campaign. I

    Detecting Incentivized Review Groups With Co-Review Graph

    Get PDF
    Online reviews play a crucial role in the ecosystem of nowadays business (especially e-commerce platforms), and have become the primary source of consumer opinions. To manipulate consumers’ opinions, some sellers of e-commerce platforms outsource opinion spamming with incentives (e.g., free products) in exchange for incentivized reviews. As incentives, by nature, are likely to drive more biased reviews or even fake reviews. Despite e-commerce platforms such as Amazon have taken initiatives to squash the incentivized review practice, sellers turn to various social networking platforms (e.g., Facebook) to outsource the incentivized reviews. The aggregation of sellers who request incentivized reviews and reviewers who seek incentives forms incentivized review groups. In this paper, we focus on the incentivized review groups in e-commerce platforms. We perform the data collections from various social networking platforms, including Facebook, WeChat, and Douban. A measurement study of incentivized review groups is conducted with regards to group members, group activities, and products. To identify the incentivized review groups, we propose a new detection approach based on co-review graphs. Specifically, we employ the community detection method to find the suspicious communities from co-review graphs. We also build a “gold standard” dataset from the data we collected, which contains the information of reviewers who belong to incentivized review groups. We utilize the “gold standard” dataset to evaluate the effectiveness of our detection approach

    A lake ice phenology dataset for the Northern Hemisphere based on passive microwave remote sensing

    Get PDF
    Lake ice phenology (LIP) is an essential indicator of climate change and helps with understanding of the regional characteristics of climate change impacts. Ground observation records and remote sensing retrieval products of lake ice phenology are abundant for Europe, North America, and the Tibetan Plateau, but there is a lack of data for inner Eurasia. In this work, enhanced-resolution passive microwave satellite data (PMW) were used to investigate the Northern Hemisphere Lake Ice Phenology (PMW LIP). The Freeze Onset (FO), Complete Ice Cover (CIC), Melt Onset (MO), and Complete Ice Free (CIF) dates were derived for 753 lakes, including 409 lakes for which ice phenology retrievals were available for the period 1978 to 2020 and 344 lakes for which these were available for 2002 to 2020. Verification of the PMW LIP using ground records gave correlation coefficients of 0.93 and 0.84 for CIC and CIF, respectively, and the corresponding values of the RMSE were 11.84 and 10.07 days. The lake ice phenology in this dataset was significantly correlated (P < 0.001) with that obtained from Moderate Resolution Imaging Spectroradiometer (MODIS) data–the average correlation coefficient was 0.90 and the average RMSE was 7.87 days. The minimum RMSE was 4.39 days for CIF. The PMW is not affected by the weather or the amount of sunlight and thus provides more reliable data about the freezing and thawing process information than MODIS observations. The PMW LIP dataset provides the basic freeze–thaw data that is required for research into lake ice and the impact of climate change in the cold regions of the Northern Hemisphere. The dataset is available at http://www.doi.org/10.11922/sciencedb.j00076.00081.Peer reviewe

    Dial N for NXDomain: The Scale, Origin, and Security Implications of DNS Queries to Non-Existent Domains

    Get PDF
    Non-Existent Domain (NXDomain) is one type of the Domain Name System (DNS) error responses, indicating that the queried domain name does not exist and cannot be resolved. Unfortunately, little research has focused on understanding why and how NXDomain responses are generated, utilized, and exploited. In this paper, we conduct the first comprehensive and systematic study on NXDomain by investigating its scale, origin, and security implications. Utilizing a large-scale passive DNS database, we identify 146,363,745,785 NXDomains queried by DNS users between 2014 and 2022. Within these 146 billion NXDomains, 91 million of them hold historic WHOIS records, of which 5.3 million are identified as malicious domains including about 2.4 million blocklisted domains, 2.8 million DGA (Domain Generation Algorithms) based domains, and 90 thousand squatting domains targeting popular domains. To gain more insights into the usage patterns and security risks of NXDomains, we register 19 carefully selected NXDomains in the DNS database, each of which received more than ten thousand DNS queries per month. We then deploy a honeypot for our registered domains and collect 5,925,311 incoming queries for 6 months, from which we discover that 5,186,858 and 505,238 queries are generated from automated processes and web crawlers, respectively. Finally, we perform extensive traffic analysis on our collected data and reveal that NXDomains can be misused for various purposes, including botnet takeover, malicious file injection, and residue trust exploitation

    Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies

    Full text link
    Recently, a new paradigm called Differentiable Search Index (DSI) has been proposed for document retrieval, wherein a sequence-to-sequence model is learned to directly map queries to relevant document identifiers. The key idea behind DSI is to fully parameterize traditional ``index-retrieve'' pipelines within a single neural model, by encoding all documents in the corpus into the model parameters. In essence, DSI needs to resolve two major questions: (1) how to assign an identifier to each document, and (2) how to learn the associations between a document and its identifier. In this work, we propose a Semantic-Enhanced DSI model (SE-DSI) motivated by Learning Strategies in the area of Cognitive Psychology. Our approach advances original DSI in two ways: (1) For the document identifier, we take inspiration from Elaboration Strategies in human learning. Specifically, we assign each document an Elaborative Description based on the query generation technique, which is more meaningful than a string of integers in the original DSI; and (2) For the associations between a document and its identifier, we take inspiration from Rehearsal Strategies in human learning. Specifically, we select fine-grained semantic features from a document as Rehearsal Contents to improve document memorization. Both the offline and online experiments show improved retrieval performance over prevailing baselines.Comment: Accepted by KDD 202

    What drives soil degradation after gravel mulching for 6 years in northwest China?

    Get PDF
    Gravel mulch is an agricultural water conservation practice that has been widely used in the semi-arid region of northwest China, but its effectiveness is now lessening due to soil degradation caused by long-term gravel mulching. In this study, we report on a 6-year-long gravel mulch experiment conducted in the northwestern Loess Plateau to evaluate the impact of gravel mulch on soil physicochemical properties and microbial communities, with the objective of clarifying the causes of long-term gravel mulching-induced land degradation. After 6 years mulching, we found that gravel mulched soil contained significantly higher concentrations of total carbon and total organic carbon than non-mulched soil (control). Long-term gravel mulching significantly changed the soil microbial diversity and abundance distribution of bacterial and fungal communities. Notably, the relative abundance of Acidobacteria was significantly higher under gravel mulching than the control (no mulching), being significantly greater in the AG treatment (small-sized gravel, 2–5 mm) than all other treatments. Conversely, the relative abundance of Actinobacteria was significantly lower under gravel mulching than the control, being the lowest in the BG treatment (large-sized gravel, 40–60 mm). At the same time, the relative abundance of Bacteroidetes was significantly lower in AG yet higher in BG vis-à-vis the other treatments. Of the various factors examined, on a 6-year scale, the capture of dust by gravel mulch and altered carbon and nitrogen components in soil play major contributing roles in the compositional change of soil microorganisms. These results suggest that modified soil material input from gravel mulching may be the key factor leading to soil degradation. More long-term experimental studies at different sites are now needed to elucidate the mechanisms responsible for soil degradation under gravel mulching
    • …
    corecore