119 research outputs found
Nonuniform Thickness and Weighted Distance
Nonuniform tubular neighborhoods of curves in Euclidean n-space are studied
by using weighted distance functions and generalizing the normal exponential
map. Different notions of injectivity radii are introduced to investigate
singular but injective exponential maps. A generalization of the thickness
formula is obtained for nonuniform thickness. All singularities within almost
injectivity radius are classified by the Horizontal Collapsing Property.
Examples are provided to show the distinction between the different types of
injectivity radii, as well as showing that the standard differentiable
injectivity radius fails to be upper semicontinuous on a singular set of weight
functions
HLOC: Hints-Based Geolocation Leveraging Multiple Measurement Frameworks
Geographically locating an IP address is of interest for many purposes. There
are two major ways to obtain the location of an IP address: querying commercial
databases or conducting latency measurements. For structural Internet nodes,
such as routers, commercial databases are limited by low accuracy, while
current measurement-based approaches overwhelm users with setup overhead and
scalability issues. In this work we present our system HLOC, aiming to combine
the ease of database use with the accuracy of latency measurements. We evaluate
HLOC on a comprehensive router data set of 1.4M IPv4 and 183k IPv6 routers.
HLOC first extracts location hints from rDNS names, and then conducts
multi-tier latency measurements. Configuration complexity is minimized by using
publicly available large-scale measurement frameworks such as RIPE Atlas. Using
this measurement, we can confirm or disprove the location hints found in domain
names. We publicly release HLOC's ready-to-use source code, enabling
researchers to easily increase geolocation accuracy with minimum overhead.Comment: As published in TMA'17 conference:
http://tma.ifip.org/main-conference
Partial Mobilization: Tracking Multilingual Information Flows Amongst Russian Media Outlets and Telegram
In response to disinformation and propaganda from Russian online media
following the Russian invasion of Ukraine, Russian outlets including Russia
Today and Sputnik News were banned throughout Europe. Many of these Russian
outlets, in order to reach their audiences, began to heavily promote their
content on messaging services like Telegram. In this work, to understand this
phenomenon, we study how 16 Russian media outlets have interacted with and
utilized 732 Telegram channels throughout 2022. To do this, we utilize a
multilingual version of the foundational model MPNet to embed articles and
Telegram messages in a shared embedding space and semantically compare content.
Leveraging a parallelized version of DP-Means clustering, we perform
paragraph-level topic/narrative extraction and time-series analysis with Hawkes
Processes. With this approach, across our websites, we find between 2.3%
(ura.news) and 26.7% (ukraina.ru) of their content originated/resulted from
activity on Telegram. Finally, tracking the spread of individual narratives, we
measure the rate at which these websites and channels disseminate content
within the Russian media ecosystem
Watch Your Language: Large Language Models and Content Moderation
Large language models (LLMs) have exploded in popularity due to their ability
to perform a wide array of natural language tasks. Text-based content
moderation is one LLM use case that has received recent enthusiasm, however,
there is little research investigating how LLMs perform in content moderation
settings. In this work, we evaluate a suite of modern, commercial LLMs (GPT-3,
GPT-3.5, GPT-4) on two common content moderation tasks: rule-based community
moderation and toxic content detection. For rule-based community moderation, we
construct 95 LLM moderation-engines prompted with rules from 95 Reddit
subcommunities and find that LLMs can be effective at rule-based moderation for
many communities, achieving a median accuracy of 64% and a median precision of
83%. For toxicity detection, we find that LLMs significantly outperform
existing commercially available toxicity classifiers. However, we also find
that recent increases in model size add only marginal benefit to toxicity
detection, suggesting a potential performance plateau for LLMs on toxicity
detection tasks. We conclude by outlining avenues for future work in studying
LLMs and content moderation
Beyond Counting: New Perspectives on the Active IPv4 Address Space
In this study, we report on techniques and analyses that enable us to capture
Internet-wide activity at individual IP address-level granularity by relying on
server logs of a large commercial content delivery network (CDN) that serves
close to 3 trillion HTTP requests on a daily basis. Across the whole of 2015,
these logs recorded client activity involving 1.2 billion unique IPv4
addresses, the highest ever measured, in agreement with recent estimates.
Monthly client IPv4 address counts showed constant growth for years prior, but
since 2014, the IPv4 count has stagnated while IPv6 counts have grown. Thus, it
seems we have entered an era marked by increased complexity, one in which the
sole enumeration of active IPv4 addresses is of little use to characterize
recent growth of the Internet as a whole.
With this observation in mind, we consider new points of view in the study of
global IPv4 address activity. Our analysis shows significant churn in active
IPv4 addresses: the set of active IPv4 addresses varies by as much as 25% over
the course of a year. Second, by looking across the active addresses in a
prefix, we are able to identify and attribute activity patterns to network
restructurings, user behaviors, and, in particular, various address assignment
practices. Third, by combining spatio-temporal measures of address utilization
with measures of traffic volume, and sampling-based estimates of relative host
counts, we present novel perspectives on worldwide IPv4 address activity,
including empirical observation of under-utilization in some areas, and
complete utilization, or exhaustion, in others.Comment: in Proceedings of ACM IMC 201
Twits, Toxic Tweets, and Tribal Tendencies: Trends in Politically Polarized Posts on Twitter
Social media platforms are often blamed for exacerbating political
polarization and worsening public dialogue. Many claim hyperpartisan users post
pernicious content, slanted to their political views, inciting contentious and
toxic conversations. However, what factors, actually contribute to increased
online toxicity and negative interactions? In this work, we explore the role
that political ideology plays in contributing to toxicity both on an individual
user level and a topic level on Twitter. To do this, we train and open-source a
DeBERTa-based toxicity detector with a contrastive objective that outperforms
the Google Jigsaw Persective Toxicity detector on the Civil Comments test
dataset. Then, after collecting 187 million tweets from 55,415 Twitter users,
we determine how several account-level characteristics, including political
ideology and account age, predict how often each user posts toxic content.
Running a linear regression, we find that the diversity of views and the
toxicity of the other accounts with which that user engages has a more marked
effect on their own toxicity. Namely, toxic comments are correlated with users
who engage with a wider array of political views. Performing topic analysis on
the toxic content posted by these accounts using the large language model MPNet
and a version of the DP-Means clustering algorithm, we find similar behavior
across 6,592 individual topics, with conversations on each topic becoming more
toxic as a wider diversity of users become involved
- …