12,585 research outputs found
Profiling user activities with minimal traffic traces
Understanding user behavior is essential to personalize and enrich a user's
online experience. While there are significant benefits to be accrued from the
pursuit of personalized services based on a fine-grained behavioral analysis,
care must be taken to address user privacy concerns. In this paper, we consider
the use of web traces with truncated URLs - each URL is trimmed to only contain
the web domain - for this purpose. While such truncation removes the
fine-grained sensitive information, it also strips the data of many features
that are crucial to the profiling of user activity. We show how to overcome the
severe handicap of lack of crucial features for the purpose of filtering out
the URLs representing a user activity from the noisy network traffic trace
(including advertisement, spam, analytics, webscripts) with high accuracy. This
activity profiling with truncated URLs enables the network operators to provide
personalized services while mitigating privacy concerns by storing and sharing
only truncated traffic traces.
In order to offset the accuracy loss due to truncation, our statistical
methodology leverages specialized features extracted from a group of
consecutive URLs that represent a micro user action like web click, chat reply,
etc., which we call bursts. These bursts, in turn, are detected by a novel
algorithm which is based on our observed characteristics of the inter-arrival
time of HTTP records. We present an extensive experimental evaluation on a real
dataset of mobile web traces, consisting of more than 130 million records,
representing the browsing activities of 10,000 users over a period of 30 days.
Our results show that the proposed methodology achieves around 90% accuracy in
segregating URLs representing user activities from non-representative URLs
Cyber Security
This open access book constitutes the refereed proceedings of the 17th International Annual Conference on Cyber Security, CNCERT 2021, held in Beijing, China, in AJuly 2021. The 14 papers presented were carefully reviewed and selected from 51 submissions. The papers are organized according to the following topical sections: âdata security; privacy protection; anomaly detection; traffic analysis; social network security; vulnerability detection; text classification
Recommended from our members
Stacking-based visualization of trajectory attribute data
Visualizing trajectory attribute data is challenging because it involves showing the trajectories in their spatio-temporal context as well as the attribute values associated with the individual points of trajectories. Previous work on trajectory visualization addresses selected aspects of this problem, but not all of them. We present a novel approach to visualizing trajectory attribute data. Our solution covers space, time, and attribute values. Based on an analysis of relevant visualization tasks, we designed the visualization solution around the principle of stacking trajectory bands. The core of our approach is a hybrid 2D/3D display. A 2D map serves as a reference for the spatial context, and the trajectories are visualized as stacked 3D trajectory bands along which attribute values are encoded by color. Time is integrated through appropriate ordering of bands and through a dynamic query mechanism that feeds temporally aggregated information to a circular time display. An additional 2D time graph shows temporal information in full detail by stacking 2D trajectory bands. Our solution is equipped with analytical and interactive mechanisms for selecting and ordering of trajectories, and adjusting the color mapping, as well as coordinated highlighting and dedicated 3D navigation. We demonstrate the usefulness of our novel visualization by three examples related to radiation surveillance, traffic analysis, and maritime navigation. User feedback obtained in a small experiment indicates that our hybrid 2D/3D solution can be operated quite well
From Social Data Mining to Forecasting Socio-Economic Crisis
Socio-economic data mining has a great potential in terms of gaining a better
understanding of problems that our economy and society are facing, such as
financial instability, shortages of resources, or conflicts. Without
large-scale data mining, progress in these areas seems hard or impossible.
Therefore, a suitable, distributed data mining infrastructure and research
centers should be built in Europe. It also appears appropriate to build a
network of Crisis Observatories. They can be imagined as laboratories devoted
to the gathering and processing of enormous volumes of data on both natural
systems such as the Earth and its ecosystem, as well as on human
techno-socio-economic systems, so as to gain early warnings of impending
events. Reality mining provides the chance to adapt more quickly and more
accurately to changing situations. Further opportunities arise by individually
customized services, which however should be provided in a privacy-respecting
way. This requires the development of novel ICT (such as a self- organizing
Web), but most likely new legal regulations and suitable institutions as well.
As long as such regulations are lacking on a world-wide scale, it is in the
public interest that scientists explore what can be done with the huge data
available. Big data do have the potential to change or even threaten democratic
societies. The same applies to sudden and large-scale failures of ICT systems.
Therefore, dealing with data must be done with a large degree of responsibility
and care. Self-interests of individuals, companies or institutions have limits,
where the public interest is affected, and public interest is not a sufficient
justification to violate human rights of individuals. Privacy is a high good,
as confidentiality is, and damaging it would have serious side effects for
society.Comment: 65 pages, 1 figure, Visioneer White Paper, see
http://www.visioneer.ethz.c
Measuring Infringement of Intellectual Property Rights
© Crown Copyright 2014. You may re-use this information (excluding logos) free of charge in any format or medium, under the terms of the Open Government Licence. To view this licence, visit http://www.nationalarchives.gov. uk/doc/open-government-licence/ Where we have identified any third party copyright information you will need to obtain permission from the copyright holders concernedThe review is wide-ranging in scope and overall our findings evidence a lack of appreciation among those producing research for the high-level principles of measurement and assessment of scale. To date, the approaches adopted by industry seem more designed for internal consumption and are usually contingent on particular technologies and/or sector perspectives. Typically, there is a lack of transparency in the methodologies and data used to form the basis of claims, making much of this an unreliable basis for policy formulation. The research approaches we found are characterised by a number of features that can be summarised as a preference for reactive approaches that look to establish snapshots of an important issue at the time of investigation. Most studies are ad hoc in nature and on the whole we found a lack of sustained longitudinal approaches that would develop the appreciation of change. Typically the studies are designed to address specific hypotheses that might serve to support the position of the particular commissioning body. To help bring some structure to this area, we propose a framework for the assessment of the volume of infringement in each different area. The underlying aim is to draw out a common approach wherever possible in each area, rather than being drawn initially to the differences in each field. We advocate on-going survey tracking of the attitudes, perceptions and, where practical, behaviours of both perpetrators and claimants in IP infringement. Clearly, the nature of perpetrators, claimants and enforcement differs within each IPR but in our view the assessment for each IPR should include all of these elements. It is important to clarify that the key element of the survey structure is the adoption of a survey sampling methodology and smaller volumes of representative participation. Once selection is given the appropriate priority, a traditional offline survey will have a part to play, but as the opportunity arises, new technological methodologies, particularly for the voluntary monitoring of online behaviour, can add additional detail to the overall assessment of the scale of activity. This framework can be applied within each of the IP right sectors: copyright, trademarks,patents, and design rights. It may well be that the costs involved with this common approach could be mitigated by a syndicated approach to the survey elements. Indeed, a syndicated approach has a number of advantages in addition to cost. It could be designed to reduce any tendency either to hide inappropriate/illegal activity or alternatively exaggerate its volume to fit with the theme of the survey. It also has the scope to allow for monthly assessments of attitudes rather than being vulnerable to unmeasured seasonal impacts
Spatial and Temporal Sentiment Analysis of Twitter data
The public have used Twitter world wide for expressing opinions. This study focuses on spatio-temporal variation of georeferenced Tweetsâ sentiment polarity, with a view to understanding how opinions evolve on Twitter over space and time and across communities of users. More specifically, the question this study tested is whether sentiment polarity on Twitter exhibits specific time-location patterns. The aim of the study is to investigate the spatial and temporal distribution of georeferenced Twitter sentiment polarity within the area of 1 km buffer around the Curtin Bentley campus boundary in Perth, Western Australia. Tweets posted in campus were assigned into six spatial zones and four time zones. A sentiment analysis was then conducted for each zone using the sentiment analyser tool in the Starlight Visual Information System software. The Feature Manipulation Engine was employed to convert non-spatial files into spatial and temporal feature class. The spatial and temporal distribution of Twitter sentiment polarity patterns over space and time was mapped using Geographic Information Systems (GIS). Some interesting results were identified. For example, the highest percentage of positive Tweets occurred in the social science area, while science and engineering and dormitory areas had the highest percentage of negative postings. The number of negative Tweets increases in the library and science and engineering areas as the end of the semester approaches, reaching a peak around an exam period, while the percentage of negative Tweets drops at the end of the semester in the entertainment and sport and dormitory area. This study will provide some insights into understanding students and staff âs sentiment variation on Twitter, which could be useful for university teaching and learning management
European Handbook of Crowdsourced Geographic Information
This book focuses on the study of the remarkable new source of geographic information that has become available in the form of user-generated content accessible over the Internet through mobile and Web applications. The exploitation, integration and application of these sources, termed volunteered geographic information (VGI) or crowdsourced geographic information (CGI), offer scientists an unprecedented opportunity to conduct research on a variety of topics at multiple scales and for diversified objectives.
The Handbook is organized in five parts, addressing the fundamental questions: What motivates citizens to provide such information in the public domain, and what factors govern/predict its validity?What methods might be used to validate such information? Can VGI be framed within the larger domain of sensor networks, in which inert and static sensors are replaced or combined by intelligent and mobile humans equipped with sensing devices? What limitations are imposed on VGI by differential access to broadband Internet, mobile phones, and other communication technologies, and by concerns over privacy? How do VGI and crowdsourcing enable innovation applications to benefit human society?
Chapters examine how crowdsourcing techniques and methods, and the VGI phenomenon, have motivated a multidisciplinary research community to identify both fields of applications and quality criteria depending on the use of VGI. Besides harvesting tools and storage of these data, research has paid remarkable attention to these information resources, in an age when information and participation is one of the most important drivers of development.
The collection opens questions and points to new research directions in addition to the findings that each of the authors demonstrates. Despite rapid progress in VGI research, this Handbook also shows that there are technical, social, political and methodological challenges that require further studies and research
Web User-session Inference by Means of Clustering Techniques
This paper focuses on the definition and identification
of âWeb user-sessionsâ, aggregations of several TCP
connections generated by the same source host. The identification
of a user-session is non trivial. Traditional approaches rely on
threshold based mechanisms. However, these techniques are very
sensitive to the value chosen for the threshold, which may be
difficult to set correctly. By applying clustering techniques, we
define a novel methodology to identify Web user-sessions without
requiring an a priori definition of threshold values. We define
a clustering based approach, we discuss pros and cons of this
approach, and we apply it to real traffic traces. The proposed
methodology is applied to artificially generated traces to evaluate
its benefits against traditional threshold based approaches. We
also analyze the characteristics of user-sessions extracted by the
clustering methodology from real traces and study their statistical
properties. Web user-sessions tend to be Poisson, but correlation
may arise during periods of network/hosts anomalous behavior
- âŠ