Search CORE

119 research outputs found

Nonuniform Thickness and Weighted Distance

Author: Durumeric Oguz C.
Publication venue
Publication date: 01/01/2008
Field of study

Nonuniform tubular neighborhoods of curves in Euclidean n-space are studied by using weighted distance functions and generalizing the normal exponential map. Different notions of injectivity radii are introduced to investigate singular but injective exponential maps. A generalization of the thickness formula is obtained for nonuniform thickness. All singularities within almost injectivity radius are classified by the Horizontal Collapsing Property. Examples are provided to show the distinction between the different types of injectivity radii, as well as showing that the standard differentiable injectivity radius fails to be upper semicontinuous on a singular set of weight functions

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

HLOC: Hints-Based Geolocation Leveraging Multiple Measurement Frameworks

Author: bottger
durumeric
edmundson
gasser
snyder
wang
wong
zhang
Publication venue
Publication date: 28/06/2017
Field of study

Geographically locating an IP address is of interest for many purposes. There are two major ways to obtain the location of an IP address: querying commercial databases or conducting latency measurements. For structural Internet nodes, such as routers, commercial databases are limited by low accuracy, while current measurement-based approaches overwhelm users with setup overhead and scalability issues. In this work we present our system HLOC, aiming to combine the ease of database use with the accuracy of latency measurements. We evaluate HLOC on a comprehensive router data set of 1.4M IPv4 and 183k IPv6 routers. HLOC first extracts location hints from rDNS names, and then conducts multi-tier latency measurements. Configuration complexity is minimized by using publicly available large-scale measurement frameworks such as RIPE Atlas. Using this measurement, we can confirm or disprove the location hints found in domain names. We publicly release HLOC's ready-to-use source code, enabling researchers to easily increase geolocation accuracy with minimum overhead.Comment: As published in TMA'17 conference: http://tma.ifip.org/main-conference

arXiv.org e-Print Archive

Crossref

Partial Mobilization: Tracking Multilingual Information Flows Amongst Russian Media Outlets and Telegram

Author: Durumeric Zakir
Hanley Hans W. A.
Publication venue
Publication date: 25/01/2023
Field of study

In response to disinformation and propaganda from Russian online media following the Russian invasion of Ukraine, Russian outlets including Russia Today and Sputnik News were banned throughout Europe. Many of these Russian outlets, in order to reach their audiences, began to heavily promote their content on messaging services like Telegram. In this work, to understand this phenomenon, we study how 16 Russian media outlets have interacted with and utilized 732 Telegram channels throughout 2022. To do this, we utilize a multilingual version of the foundational model MPNet to embed articles and Telegram messages in a shared embedding space and semantically compare content. Leveraging a parallelized version of DP-Means clustering, we perform paragraph-level topic/narrative extraction and time-series analysis with Hawkes Processes. With this approach, across our websites, we find between 2.3% (ura.news) and 26.7% (ukraina.ru) of their content originated/resulted from activity on Telegram. Finally, tracking the spread of individual narratives, we measure the rate at which these websites and channels disseminate content within the Russian media ecosystem

arXiv.org e-Print Archive

Watch Your Language: Large Language Models and Content Moderation

Author: AbuHashem Yousef
Durumeric Zakir
Kumar Deepak
Publication venue
Publication date: 25/09/2023
Field of study

Large language models (LLMs) have exploded in popularity due to their ability to perform a wide array of natural language tasks. Text-based content moderation is one LLM use case that has received recent enthusiasm, however, there is little research investigating how LLMs perform in content moderation settings. In this work, we evaluate a suite of modern, commercial LLMs (GPT-3, GPT-3.5, GPT-4) on two common content moderation tasks: rule-based community moderation and toxic content detection. For rule-based community moderation, we construct 95 LLM moderation-engines prompted with rules from 95 Reddit subcommunities and find that LLMs can be effective at rule-based moderation for many communities, achieving a median accuracy of 64% and a median precision of 83%. For toxicity detection, we find that LLMs significantly outperform existing commercially available toxicity classifiers. However, we also find that recent increases in model size add only marginal benefit to toxicity detection, suggesting a potential performance plateau for LLMs on toxicity detection tasks. We conclude by outlining avenues for future work in studying LLMs and content moderation

arXiv.org e-Print Archive

Beyond Counting: New Perspectives on the Active IPv4 Address Space

Author: Adrian D.
Antonakakis M.
Durumeric Z.
Hao S.
Katz-Bassett E.
Moura G. C. M.
Wong B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/09/2016
Field of study

In this study, we report on techniques and analyses that enable us to capture Internet-wide activity at individual IP address-level granularity by relying on server logs of a large commercial content delivery network (CDN) that serves close to 3 trillion HTTP requests on a daily basis. Across the whole of 2015, these logs recorded client activity involving 1.2 billion unique IPv4 addresses, the highest ever measured, in agreement with recent estimates. Monthly client IPv4 address counts showed constant growth for years prior, but since 2014, the IPv4 count has stagnated while IPv6 counts have grown. Thus, it seems we have entered an era marked by increased complexity, one in which the sole enumeration of active IPv4 addresses is of little use to characterize recent growth of the Internet as a whole. With this observation in mind, we consider new points of view in the study of global IPv4 address activity. Our analysis shows significant churn in active IPv4 addresses: the set of active IPv4 addresses varies by as much as 25% over the course of a year. Second, by looking across the active addresses in a prefix, we are able to identify and attribute activity patterns to network restructurings, user behaviors, and, in particular, various address assignment practices. Third, by combining spatio-temporal measures of address utilization with measures of traffic volume, and sampling-based estimates of relative host counts, we present novel perspectives on worldwide IPv4 address activity, including empirical observation of under-utilization in some areas, and complete utilization, or exhaustion, in others.Comment: in Proceedings of ACM IMC 201

arXiv.org e-Print Archive

Crossref

Twits, Toxic Tweets, and Tribal Tendencies: Trends in Politically Polarized Posts on Twitter

Author: Durumeric Zakir
Hanley Hans W. A.
Publication venue
Publication date: 19/07/2023
Field of study

Social media platforms are often blamed for exacerbating political polarization and worsening public dialogue. Many claim hyperpartisan users post pernicious content, slanted to their political views, inciting contentious and toxic conversations. However, what factors, actually contribute to increased online toxicity and negative interactions? In this work, we explore the role that political ideology plays in contributing to toxicity both on an individual user level and a topic level on Twitter. To do this, we train and open-source a DeBERTa-based toxicity detector with a contrastive objective that outperforms the Google Jigsaw Persective Toxicity detector on the Civil Comments test dataset. Then, after collecting 187 million tweets from 55,415 Twitter users, we determine how several account-level characteristics, including political ideology and account age, predict how often each user posts toxic content. Running a linear regression, we find that the diversity of views and the toxicity of the other accounts with which that user engages has a more marked effect on their own toxicity. Namely, toxic comments are correlated with users who engage with a wider array of political views. Performing topic analysis on the toxic content posted by these accounts using the large language model MPNet and a version of the DP-Means clustering algorithm, we find similar behavior across 6,592 individual topics, with conversations on each topic becoming more toxic as a wider diversity of users become involved

arXiv.org e-Print Archive