4,400 research outputs found
Is Content Publishing in BitTorrent Altruistic or Profit-Driven
BitTorrent is the most popular P2P content delivery application where
individual users share various type of content with tens of thousands of other
users. The growing popularity of BitTorrent is primarily due to the
availability of valuable content without any cost for the consumers. However,
apart from required resources, publishing (sharing) valuable (and often
copyrighted) content has serious legal implications for user who publish the
material (or publishers). This raises a question that whether (at least major)
content publishers behave in an altruistic fashion or have other incentives
such as financial. In this study, we identify the content publishers of more
than 55k torrents in 2 major BitTorrent portals and examine their behavior. We
demonstrate that a small fraction of publishers are responsible for 66% of
published content and 75% of the downloads. Our investigations reveal that
these major publishers respond to two different profiles. On one hand,
antipiracy agencies and malicious publishers publish a large amount of fake
files to protect copyrighted content and spread malware respectively. On the
other hand, content publishing in BitTorrent is largely driven by companies
with financial incentive. Therefore, if these companies lose their interest or
are unable to publish content, BitTorrent traffic/portals may disappear or at
least their associated traffic will significantly reduce
User oriented access to secure biomedical resources through the grid
The life science domain is typified by heterogeneous data sets that are evolving at an exponential rate. Numerous post-genomic databases and areas of post-genomic life science research have been established and are being actively explored. Whilst many of these databases are public and freely accessible, it is often the case that researchers have data that is not so freely available and access to this data needs to be strictly controlled when distributed collaborative research is undertaken. Grid technologies provide one mechanism by which access to and integration of federated data sets is possible. Combining such data access and integration technologies with fine grained security infrastructures facilitates the establishment of virtual organisations (VO). However experience has shown that the general research (non-Grid) community are not comfortable with the Grid and its associated security models based upon public key infrastructures (PKIs). The Internet2 Shibboleth technology helps to overcome this through users only having to log in to their home site to gain access to resources across a VO â or in Shibboleth terminology a federation. In this paper we outline how we have applied the combination of Grid technologies, advanced security infrastructures and the Internet2 Shibboleth technology in several biomedical projects to provide a user-oriented model for secure access to and usage of Grid resources. We believe that this model may well become the de facto mechanism for undertaking e-Research on the Grid across numerous domains including the life sciences
User-oriented security supporting inter-disciplinary life science research across the grid
Understanding potential genetic factors in disease or development of personalised e-Health solutions require scientists to access a multitude of data and compute resources across the Internet from functional genomics resources through to epidemiological studies. The Grid paradigm provides a compelling model whereby seamless access to these resources can be achieved. However, the acceptance of Grid technologies in this domain by researchers and resource owners must satisfy particular constraints from this community - two of the most critical of these constraints being advanced security and usability. In this paper we show how the Internet2 Shibboleth technology combined with advanced authorisation infrastructures can help address these constraints. We demonstrate the viability of this approach through a selection of case studies across the complete life science spectrum
Making Thin Data Thick: User Behavior Analysis with Minimum Information
abstract: With the rise of social media, user-generated content has become available at an unprecedented scale. On Twitter, 1 billion tweets are posted every 5 days and on Facebook, 20 million links are shared every 20 minutes. These massive collections of user-generated content have introduced the human behavior's big-data.
This big data has brought about countless opportunities for analyzing human behavior at scale. However, is this data enough? Unfortunately, the data available at the individual-level is limited for most users. This limited individual-level data is often referred to as thin data. Hence, researchers face a big-data paradox, where this big-data is a large collection of mostly limited individual-level information. Researchers are often constrained to derive meaningful insights regarding online user behavior with this limited information. Simply put, they have to make thin data thick.
In this dissertation, how human behavior's thin data can be made thick is investigated. The chief objective of this dissertation is to demonstrate how traces of human behavior can be efficiently gleaned from the, often limited, individual-level information; hence, introducing an all-inclusive user behavior analysis methodology that considers social media users with different levels of information availability. To that end, the absolute minimum information in terms of both link or content data that is available for any social media user is determined. Utilizing only minimum information in different applications on social media such as prediction or recommendation tasks allows for solutions that are (1) generalizable to all social media users and that are (2) easy to implement. However, are applications that employ only minimum information as effective or comparable to applications that use more information?
In this dissertation, it is shown that common research challenges such as detecting malicious users or friend recommendation (i.e., link prediction) can be effectively performed using only minimum information. More importantly, it is demonstrated that unique user identification can be achieved using minimum information. Theoretical boundaries of unique user identification are obtained by introducing social signatures. Social signatures allow for user identification in any large-scale network on social media. The results on single-site user identification are generalized to multiple sites and it is shown how the same user can be uniquely identified across multiple sites using only minimum link or content information.
The findings in this dissertation allows finding the same user across multiple sites, which in turn has multiple implications. In particular, by identifying the same users across sites, (1) patterns that users exhibit across sites are identified, (2) how user behavior varies across sites is determined, and (3) activities that are observed only across sites are identified and studied.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Security-oriented data grids for microarray expression profiles
Microarray experiments are one of the key ways in which gene activity can be identified and measured thereby shedding light and understanding for example on biological processes. The BBSRC funded Grid enabled Microarray Expression Profile Search (GEMEPS) project has developed an infrastructure which allows post-genomic life science researchers to ask and answer the following questions: who has undertaken microarray experiments that are in some way similar or relevant to mine; and how similar were these relevant experiments? Given that microarray experiments are expensive to undertake and may possess crucial information for future exploitation (both academically and commercially), scientists are wary of allowing unrestricted access to their data by the wider community until fully exploited locally. A key requirement is thus to have fine grained security that is easy to establish and simple (or ideally transparent) to use across inter-institutional virtual organisations. In this paper we present an enhanced security-oriented data Grid infrastructure that supports the definition of these kinds of queries and the analysis and comparison of microarray experiment results
An Army of Me: Sockpuppets in Online Discussion Communities
In online discussion communities, users can interact and share information
and opinions on a wide variety of topics. However, some users may create
multiple identities, or sockpuppets, and engage in undesired behavior by
deceiving others or manipulating discussions. In this work, we study
sockpuppetry across nine discussion communities, and show that sockpuppets
differ from ordinary users in terms of their posting behavior, linguistic
traits, as well as social network structure. Sockpuppets tend to start fewer
discussions, write shorter posts, use more personal pronouns such as "I", and
have more clustered ego-networks. Further, pairs of sockpuppets controlled by
the same individual are more likely to interact on the same discussion at the
same time than pairs of ordinary users. Our analysis suggests a taxonomy of
deceptive behavior in discussion communities. Pairs of sockpuppets can vary in
their deceptiveness, i.e., whether they pretend to be different users, or their
supportiveness, i.e., if they support arguments of other sockpuppets controlled
by the same user. We apply these findings to a series of prediction tasks,
notably, to identify whether a pair of accounts belongs to the same underlying
user or not. Altogether, this work presents a data-driven view of deception in
online discussion communities and paves the way towards the automatic detection
of sockpuppets.Comment: 26th International World Wide Web conference 2017 (WWW 2017
Web Tracking: Mechanisms, Implications, and Defenses
This articles surveys the existing literature on the methods currently used
by web services to track the user online as well as their purposes,
implications, and possible user's defenses. A significant majority of reviewed
articles and web resources are from years 2012-2014. Privacy seems to be the
Achilles' heel of today's web. Web services make continuous efforts to obtain
as much information as they can about the things we search, the sites we visit,
the people with who we contact, and the products we buy. Tracking is usually
performed for commercial purposes. We present 5 main groups of methods used for
user tracking, which are based on sessions, client storage, client cache,
fingerprinting, or yet other approaches. A special focus is placed on
mechanisms that use web caches, operational caches, and fingerprinting, as they
are usually very rich in terms of using various creative methodologies. We also
show how the users can be identified on the web and associated with their real
names, e-mail addresses, phone numbers, or even street addresses. We show why
tracking is being used and its possible implications for the users (price
discrimination, assessing financial credibility, determining insurance
coverage, government surveillance, and identity theft). For each of the
tracking methods, we present possible defenses. Apart from describing the
methods and tools used for keeping the personal data away from being tracked,
we also present several tools that were used for research purposes - their main
goal is to discover how and by which entity the users are being tracked on
their desktop computers or smartphones, provide this information to the users,
and visualize it in an accessible and easy to follow way. Finally, we present
the currently proposed future approaches to track the user and show that they
can potentially pose significant threats to the users' privacy.Comment: 29 pages, 212 reference
Towards a virtual research environment for paediatric endocrinology across Europe
Paediatric endocrinology is a medical specialty dealing with variations of physical growth and sexual development in childhood. Genetic anomalies that can cause disorders of sexual development in children are rare. Given this, sharing and collaboration on the small number of cases that occur is needed by clinical experts in the field. The EU-funded EuroDSD project (www.eurodsd.eu) is one such collaboration involving clinical centres and clinical and genetic experts across Europe. Through the establishment of a virtual research environment (VRE) supporting sharing of data and a variety of clinical and bioinformatics analysis tools, EuroDSD aims to provide a research infrastructure for research into disorders of sex development. Security, ethics and information governance are at the heart of this infrastructure. This paper describes the infrastructure that is being built and the inherent challenges in security, availability and dependability that must be overcome for the enterprise to succeed
- âŠ