48 research outputs found
Shall I post this now? Optimized, delay-based privacy protection in social networks
The final publication is available at Springer via http://dx.doi.org/10.1007/s10115-016-1010-4Despite the several advantages commonly attributed to social networks such as easiness and immediacy to communicate with acquaintances and friends, significant privacy threats provoked by unexperienced or even irresponsible users recklessly publishing sensitive material are also noticeable. Yet, a different, but equally significant privacy risk might arise from social networks profiling the online activity of their users based on the timestamp of the interactions between the former and the latter. In order to thwart this last type of commonly neglected attacks, this paper proposes an optimized deferral mechanism for messages in online social networks. Such solution suggests intelligently delaying certain messages posted by end users in social networks in a way that the observed online activity profile generated by the attacker does not reveal any time-based sensitive information, while preserving the usability of the system. Experimental results as well as a proposed architecture implementing this approach demonstrate the suitability and feasibility of our mechanism.Peer ReviewedPostprint (author's final draft
Identifying Experts in Question \& Answer Portals: A Case Study on Data Science Competencies in Reddit
The irreplaceable key to the triumph of Question & Answer (Q&A) platforms is
their users providing high-quality answers to the challenging questions posted
across various topics of interest. Recently, the expert finding problem
attracted much attention in information retrieval research. In this work, we
inspect the feasibility of supervised learning model to identify data science
experts in Reddit. Our method is based on the manual coding results where two
data science experts labelled expert, non-expert and out-of-scope comments. We
present a semi-supervised approach using the activity behaviour of every user,
including Natural Language Processing (NLP), crowdsourced and user feature
sets. We conclude that the NLP and user feature sets contribute the most to the
better identification of these three classes It means that this method can
generalise well within the domain. Moreover, we present different types of
users, which can be helpful to detect various types of users in the future
A Survey on Data-Driven Evaluation of Competencies and Capabilities Across Multimedia Environments
The rapid evolution of technology directly impacts the skills and jobs needed in the next decade. Users can, intentionally or unintentionally, develop different skills by creating, interacting with, and consuming the content from online environments and portals where informal learning can emerge. These environments generate large amounts of data; therefore, big data can have a significant impact on education. Moreover, the educational landscape has been shifting from a focus on contents to a focus on competencies and capabilities that will prepare our society for an unknown future during the 21st century. Therefore, the main goal of this literature survey is to examine diverse technology-mediated environments that can generate rich data sets through the users’ interaction and where data can be used to explicitly or implicitly perform a data-driven evaluation of different competencies and capabilities. We thoroughly and comprehensively surveyed the state of the art to identify and analyse digital environments, the data they are producing and the capabilities they can measure and/or develop. Our survey revealed four key multimedia environments that include sites for content sharing & consumption, video games, online learning and social networks that fulfilled our goal. Moreover, different methods were used to measure a large array of diverse capabilities such as expertise, language proficiency and soft skills. Our results prove the potential of the data from diverse digital environments to support the development of lifelong and lifewide 21st-century capabilities for the future society
Fundamentos de Programación: Catálogo de ejercicios y soluciones
©2022. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0
Identifying Professional Photographers Through Image Quality and Aesthetics in Flickr
In our generation, there is an undoubted rise in the use of social media and
specifically photo and video sharing platforms. These sites have proved their
ability to yield rich data sets through the users' interaction which can be
used to perform a data-driven evaluation of capabilities. Nevertheless, this
study reveals the lack of suitable data sets in photo and video sharing
platforms and evaluation processes across them. In this way, our first
contribution is the creation of one of the largest labelled data sets in Flickr
with the multimodal data which has been open sourced as part of this
contribution. Predicated on these data, we explored machine learning models and
concluded that it is feasible to properly predict whether a user is a
professional photographer or not based on self-reported occupation labels and
several feature representations out of the user, photo and crowdsourced sets.
We also examined the relationship between the aesthetics and technical quality
of a picture and the social activity of that picture. Finally, we depicted
which characteristics differentiate professional photographers from
non-professionals. As far as we know, the results presented in this work
represent an important novelty for the users' expertise identification which
researchers from various domains can use for different applications
COSMOS: Centinela colaborativa, perfecta y adaptable para la Internet de las cosas
The Internet of Things (IoT) became established during the last decade as an emerging technology with considerable potentialities and applicability. Its paradigm of everything connected together penetrated the real world, with smart devices located in several daily appliances. Such intelligent objects are able to communicate autonomously through already existing network infrastructures, thus generating a more concrete integration between real world and computer-based systems. On the downside, the great benefit carried by the IoT paradigm in our life brings simultaneously severe security issues, since the information exchanged among the objects frequently remains unprotected from malicious attackers. The paper at hand proposes COSMOS (Collaborative, Seamless and Adaptive Sentinel for the Internet of Things), a novel sentinel to protect smart environments from cyber threats. Our sentinel shields the IoT devices using multiple defensive rings, resulting in a more accurate and robust protection. Additionally, we discuss the current deployment of the sentinel on a commodity device (i.e., Raspberry Pi). Exhaustive experiments are conducted on the sentinel, demonstrating that it performs meticulously even in heavily stressing conditions. Each defensive layer is tested, reaching a remarkable performance, thus proving the applicability of COSMOS in a distributed and dynamic scenario such as IoT. With the aim of easing the enjoyment of the proposed sentinel, we further developed a friendly and ease-to-use COSMOS App, so that end-users can manage sentinel(s) directly using their own devices (e.g., smartphone)
A Big Data Architecture for Early Identification and Categorization of Dark Web Sites
The dark web has become notorious for its association with illicit activities
and there is a growing need for systems to automate the monitoring of this
space. This paper proposes an end-to-end scalable architecture for the early
identification of new Tor sites and the daily analysis of their content. The
solution is built using an Open Source Big Data stack for data serving with
Kubernetes, Kafka, Kubeflow, and MinIO, continuously discovering onion
addresses in different sources (threat intelligence, code repositories, web-Tor
gateways, and Tor repositories), downloading the HTML from Tor and
deduplicating the content using MinHash LSH, and categorizing with the BERTopic
modeling (SBERT embedding, UMAP dimensionality reduction, HDBSCAN document
clustering and c-TF-IDF topic keywords). In 93 days, the system identified
80,049 onion services and characterized 90% of them, addressing the challenge
of Tor volatility. A disproportionate amount of repeated content is found, with
only 6.1% unique sites. From the HTML files of the dark sites, 31 different
low-topics are extracted, manually labeled, and grouped into 11 high-level
topics. The five most popular included sexual and violent content,
repositories, search engines, carding, cryptocurrencies, and marketplaces.
During the experiments, we identified 14 sites with 13,946 clones that shared a
suspiciously similar mirroring rate per day, suggesting an extensive common
phishing network. Among the related works, this study is the most
representative characterization of onion services based on topics to date
Mobility in collaborative alert systems building trust through reputation
Part 3: - WCNS 2011 Workshop; International audience; Collaborative Intrusion Detection Networks (CIDN) are usually composed by a set of nodes working together to detect distributed intrusions that cannot be easily recognized with traditional intrusion detection architectures. In this approach every node could potentially collaborate to provide its vision of the system and report the alarms being detected at the network, service and/or application levels. This approach includes considering mobile nodes that will be entering and leaving the network in an ad hoc manner. However, for this alert information to be useful in the context of CIDN networks, certain trust and reputation mechanisms determining the credibility of a particular mobile node, and the alerts it provides, are needed. This is the main objective of this paper, where an inter-domain trust and reputation model, together with an architecture for inter-domain collaboration, are presented with the main aim of improving the detection accuracy in CIDN systems while users move from one security domain to another.
Document type: Part of book or chapter of boo
SCORPION Cyber Range: Fully Customizable Cyberexercises, Gamification and Learning Analytics to Train Cybersecurity Competencies
It is undeniable that we are witnessing an unprecedented digital revolution.
However, recent years have been characterized by the explosion of cyberattacks,
making cybercrime one of the most profitable businesses on the planet. That is
why training in cybersecurity is increasingly essential to protect the assets
of cyberspace. One of the most vital tools to train cybersecurity competencies
is the Cyber Range, a virtualized environment that simulates realistic
networks. The paper at hand introduces SCORPION, a fully functional and
virtualized Cyber Range, which manages the authoring and automated deployment
of scenarios. In addition, SCORPION includes several elements to improve
student motivation, such as a gamification system with medals, points, or
rankings, among other elements. Such a gamification system includes an adaptive
learning module that is able to adapt the cyberexercise based on the users'
performance. Moreover, SCORPION leverages learning analytics that collects and
processes telemetric and biometric user data, including heart rate through a
smartwatch, which is available through a dashboard for instructors. Finally, we
developed a case study where SCORPION obtained 82.10% in usability and 4.57 out
of 5 in usefulness from the viewpoint of a student and an instructor. The
positive evaluation results are promising, indicating that SCORPION can become
an effective, motivating, and advanced cybersecurity training tool to help fill
current gaps in this context.Comment: 31 page