Search CORE

12,585 research outputs found

Profiling user activities with minimal traffic traces

Author: J Zhang
L Sweeney
T Fawcett
TTT Nguyen
Publication venue
Publication date: 07/04/2015
Field of study

Understanding user behavior is essential to personalize and enrich a user's online experience. While there are significant benefits to be accrued from the pursuit of personalized services based on a fine-grained behavioral analysis, care must be taken to address user privacy concerns. In this paper, we consider the use of web traces with truncated URLs - each URL is trimmed to only contain the web domain - for this purpose. While such truncation removes the fine-grained sensitive information, it also strips the data of many features that are crucial to the profiling of user activity. We show how to overcome the severe handicap of lack of crucial features for the purpose of filtering out the URLs representing a user activity from the noisy network traffic trace (including advertisement, spam, analytics, webscripts) with high accuracy. This activity profiling with truncated URLs enables the network operators to provide personalized services while mitigating privacy concerns by storing and sharing only truncated traffic traces. In order to offset the accuracy loss due to truncation, our statistical methodology leverages specialized features extracted from a group of consecutive URLs that represent a micro user action like web click, chat reply, etc., which we call bursts. These bursts, in turn, are detected by a novel algorithm which is based on our observed characteristics of the inter-arrival time of HTTP records. We present an extensive experimental evaluation on a real dataset of mobile web traces, consisting of more than 130 million records, representing the browsing activities of 10,000 users over a period of 30 days. Our results show that the proposed methodology achieves around 90% accuracy in segregating URLs representing user activities from non-representative URLs

arXiv.org e-Print Archive

Crossref

Cyber Security

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/02/2022
Field of study

This open access book constitutes the refereed proceedings of the 17th International Annual Conference on Cyber Security, CNCERT 2021, held in Beijing, China, in AJuly 2021. The 14 papers presented were carefully reviewed and selected from 51 submissions. The papers are organized according to the following topical sections: data security; privacy protection; anomaly detection; traffic analysis; social network security; vulnerability detection; text classification

Directory of Open Access Books (DOAB)

Recommended from our members

Stacking-based visualization of trajectory attribute data

Author: Andrienko G.
Andrienko N.
Schumann H.
Tominski C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Visualizing trajectory attribute data is challenging because it involves showing the trajectories in their spatio-temporal context as well as the attribute values associated with the individual points of trajectories. Previous work on trajectory visualization addresses selected aspects of this problem, but not all of them. We present a novel approach to visualizing trajectory attribute data. Our solution covers space, time, and attribute values. Based on an analysis of relevant visualization tasks, we designed the visualization solution around the principle of stacking trajectory bands. The core of our approach is a hybrid 2D/3D display. A 2D map serves as a reference for the spatial context, and the trajectories are visualized as stacked 3D trajectory bands along which attribute values are encoded by color. Time is integrated through appropriate ordering of bands and through a dynamic query mechanism that feeds temporally aggregated information to a circular time display. An additional 2D time graph shows temporal information in full detail by stacking 2D trajectory bands. Our solution is equipped with analytical and interactive mechanisms for selecting and ordering of trajectories, and adjusting the color mapping, as well as coordinated highlighting and dedicated 3D navigation. We demonstrate the usefulness of our novel visualization by three examples related to radiation surveillance, traffic analysis, and maritime navigation. User feedback obtained in a small experiment indicates that our hybrid 2D/3D solution can be operated quite well

City Research Online

Crossref

Fraunhofer-ePrints

From Social Data Mining to Forecasting Socio-Economic Crisis

Author: A. Barabasi
A. Barabasi
A. Diekmann
A. Halevy
A. Monreale
A. Szalay
A. Vespignani
B. Kluger
B.-C. Chen
B.B. Mandelbrot
B.C. Chen
C. Cattuto
C. Doctorow
C. Lynch
D. Helbing
D. Helbing
D. Helbing
D. Helbing
D. Helbing
D. Helbing
D. Lazer
E. Ostrom
E. Ravenstein
E.F. Fama
E.J. Candes
G. Sugihara
G. Ziegler
G.K. Zipf
I. Foster
Ioannidis P.A. John
J. Danãelsson
J. Krumm
J.H. Fowler
K.P. Smith
L. Molgedey
L. Odling-Smee
M. Atzori
M. Scheffer
M.J. Salganik
M.M. Gaber
N. Eagle
N.A. Christakis
N.A. Christakis
N.A. Christakis
N.F. Johnson
P. Bajaria
R.K. Merton
S. Balietti
S. Nelson
S.E. Asch
S.H. Muggleton
S.V. Buldyrev
W.S. Bainbridge
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2011
Field of study

Socio-economic data mining has a great potential in terms of gaining a better understanding of problems that our economy and society are facing, such as financial instability, shortages of resources, or conflicts. Without large-scale data mining, progress in these areas seems hard or impossible. Therefore, a suitable, distributed data mining infrastructure and research centers should be built in Europe. It also appears appropriate to build a network of Crisis Observatories. They can be imagined as laboratories devoted to the gathering and processing of enormous volumes of data on both natural systems such as the Earth and its ecosystem, as well as on human techno-socio-economic systems, so as to gain early warnings of impending events. Reality mining provides the chance to adapt more quickly and more accurately to changing situations. Further opportunities arise by individually customized services, which however should be provided in a privacy-respecting way. This requires the development of novel ICT (such as a self- organizing Web), but most likely new legal regulations and suitable institutions as well. As long as such regulations are lacking on a world-wide scale, it is in the public interest that scientists explore what can be done with the huge data available. Big data do have the potential to change or even threaten democratic societies. The same applies to sudden and large-scale failures of ICT systems. Therefore, dealing with data must be done with a large degree of responsibility and care. Self-interests of individuals, companies or institutions have limits, where the public interest is affected, and public interest is not a sufficient justification to violate human rights of individuals. Privacy is a high good, as confidentiality is, and damaging it would have serious side effects for society.Comment: 65 pages, 1 figure, Visioneer White Paper, see http://www.visioneer.ethz.c

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Measuring Infringement of Intellectual Property Rights

Author: Bastian Vanessa
Collopy Dennis
Drye Tim
Jenner Peter
Koempel Florian
Lewis David
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 30/06/2014
Field of study

© Crown Copyright 2014. You may re-use this information (excluding logos) free of charge in any format or medium, under the terms of the Open Government Licence. To view this licence, visit http://www.nationalarchives.gov. uk/doc/open-government-licence/ Where we have identified any third party copyright information you will need to obtain permission from the copyright holders concernedThe review is wide-ranging in scope and overall our findings evidence a lack of appreciation among those producing research for the high-level principles of measurement and assessment of scale. To date, the approaches adopted by industry seem more designed for internal consumption and are usually contingent on particular technologies and/or sector perspectives. Typically, there is a lack of transparency in the methodologies and data used to form the basis of claims, making much of this an unreliable basis for policy formulation. The research approaches we found are characterised by a number of features that can be summarised as a preference for reactive approaches that look to establish snapshots of an important issue at the time of investigation. Most studies are ad hoc in nature and on the whole we found a lack of sustained longitudinal approaches that would develop the appreciation of change. Typically the studies are designed to address specific hypotheses that might serve to support the position of the particular commissioning body. To help bring some structure to this area, we propose a framework for the assessment of the volume of infringement in each different area. The underlying aim is to draw out a common approach wherever possible in each area, rather than being drawn initially to the differences in each field. We advocate on-going survey tracking of the attitudes, perceptions and, where practical, behaviours of both perpetrators and claimants in IP infringement. Clearly, the nature of perpetrators, claimants and enforcement differs within each IPR but in our view the assessment for each IPR should include all of these elements. It is important to clarify that the key element of the survey structure is the adoption of a survey sampling methodology and smaller volumes of representative participation. Once selection is given the appropriate priority, a traditional offline survey will have a part to play, but as the opportunity arises, new technological methodologies, particularly for the voluntary monitoring of online behaviour, can add additional detail to the overall assessment of the scale of activity. This framework can be applied within each of the IP right sectors: copyright, trademarks,patents, and design rights. It may well be that the costs involved with this common approach could be mitigated by a syndicated approach to the survey elements. Indeed, a syndicated approach has a number of advantages in addition to cost. It could be designed to reduce any tendency either to hide inappropriate/illegal activity or alternatively exaggerate its volume to fit with the theme of the survey. It also has the scope to allow for monthly assessments of attitudes rather than being vulnerable to unmeasured seasonal impacts

University of Hertfordshire Research Archive

Spatial and Temporal Sentiment Analysis of Twitter data

Author: Xia Jianhong (Cecilia)
Zhiwen S.
Publication venue: London: Ubiquity Press
Publication date: 01/01/2016
Field of study

The public have used Twitter world wide for expressing opinions. This study focuses on spatio-temporal variation of georeferenced Tweets’ sentiment polarity, with a view to understanding how opinions evolve on Twitter over space and time and across communities of users. More specifically, the question this study tested is whether sentiment polarity on Twitter exhibits specific time-location patterns. The aim of the study is to investigate the spatial and temporal distribution of georeferenced Twitter sentiment polarity within the area of 1 km buffer around the Curtin Bentley campus boundary in Perth, Western Australia. Tweets posted in campus were assigned into six spatial zones and four time zones. A sentiment analysis was then conducted for each zone using the sentiment analyser tool in the Starlight Visual Information System software. The Feature Manipulation Engine was employed to convert non-spatial files into spatial and temporal feature class. The spatial and temporal distribution of Twitter sentiment polarity patterns over space and time was mapped using Geographic Information Systems (GIS). Some interesting results were identified. For example, the highest percentage of positive Tweets occurred in the social science area, while science and engineering and dormitory areas had the highest percentage of negative postings. The number of negative Tweets increases in the library and science and engineering areas as the end of the semester approaches, reaching a peak around an exam period, while the percentage of negative Tweets drops at the end of the semester in the entertainment and sport and dormitory area. This study will provide some insights into understanding students and staff ’s sentiment variation on Twitter, which could be useful for university teaching and learning management

Crossref

espace@Curtin

European Handbook of Crowdsourced Geographic Information

Author
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 25/08/2016
Field of study

This book focuses on the study of the remarkable new source of geographic information that has become available in the form of user-generated content accessible over the Internet through mobile and Web applications. The exploitation, integration and application of these sources, termed volunteered geographic information (VGI) or crowdsourced geographic information (CGI), offer scientists an unprecedented opportunity to conduct research on a variety of topics at multiple scales and for diversified objectives. The Handbook is organized in five parts, addressing the fundamental questions: What motivates citizens to provide such information in the public domain, and what factors govern/predict its validity?What methods might be used to validate such information? Can VGI be framed within the larger domain of sensor networks, in which inert and static sensors are replaced or combined by intelligent and mobile humans equipped with sensing devices? What limitations are imposed on VGI by differential access to broadband Internet, mobile phones, and other communication technologies, and by concerns over privacy? How do VGI and crowdsourcing enable innovation applications to benefit human society? Chapters examine how crowdsourcing techniques and methods, and the VGI phenomenon, have motivated a multidisciplinary research community to identify both fields of applications and quality criteria depending on the use of VGI. Besides harvesting tools and storage of these data, research has paid remarkable attention to these information resources, in an age when information and participation is one of the most important drivers of development. The collection opens questions and points to new research directions in addition to the findings that each of the authors demonstrates. Despite rapid progress in VGI research, this Handbook also shows that there are technical, social, political and methodological challenges that require further studies and research

UCL Discovery

Web User-session Inference by Means of Clustering Techniques

Author: BIANCO A.
MARDENTE G
MELLIA M
MUNAFÒ M
MUSCARIELLO L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

This paper focuses on the definition and identification of “Web user-sessions”, aggregations of several TCP connections generated by the same source host. The identification of a user-session is non trivial. Traditional approaches rely on threshold based mechanisms. However, these techniques are very sensitive to the value chosen for the threshold, which may be difficult to set correctly. By applying clustering techniques, we define a novel methodology to identify Web user-sessions without requiring an a priori definition of threshold values. We define a clustering based approach, we discuss pros and cons of this approach, and we apply it to real traffic traces. The proposed methodology is applied to artificially generated traces to evaluate its benefits against traditional threshold based approaches. We also analyze the characteristics of user-sessions extracted by the clustering methodology from real traces and study their statistical properties. Web user-sessions tend to be Poisson, but correlation may arise during periods of network/hosts anomalous behavior

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino