1,782 research outputs found
A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions
In recent decades, social network anonymization has become a crucial research
field due to its pivotal role in preserving users' privacy. However, the high
diversity of approaches introduced in relevant studies poses a challenge to
gaining a profound understanding of the field. In response to this, the current
study presents an exhaustive and well-structured bibliometric analysis of the
social network anonymization field. To begin our research, related studies from
the period of 2007-2022 were collected from the Scopus Database then
pre-processed. Following this, the VOSviewer was used to visualize the network
of authors' keywords. Subsequently, extensive statistical and network analyses
were performed to identify the most prominent keywords and trending topics.
Additionally, the application of co-word analysis through SciMAT and the
Alluvial diagram allowed us to explore the themes of social network
anonymization and scrutinize their evolution over time. These analyses
culminated in an innovative taxonomy of the existing approaches and
anticipation of potential trends in this domain. To the best of our knowledge,
this is the first bibliometric analysis in the social network anonymization
field, which offers a deeper understanding of the current state and an
insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure
Data Mining
The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining
Technical Research Priorities for Big Data
To drive innovation and competitiveness, organisations need to foster the development and broad adoption of data technologies, value-adding use cases and sustainable business models. Enabling an effective data ecosystem requires overcoming several technical challenges associated with the cost and complexity of management, processing, analysis and utilisation of data. This chapter details a community-driven initiative to identify and characterise the key technical research priorities for research and development in data technologies. The chapter examines the systemic and structured methodology used to gather inputs from over 200 stakeholder organisations. The result of the process identified five key technical research priorities in the areas of data management, data processing, data analytics, data visualisation and user interactions, and data protection, together with 28 sub-level challenges. The process also highlighted the important role of data standardisation, data engineering and DevOps for Big Data
Process Mining Workshops
This open access book constitutes revised selected papers from the International Workshops held at the Third International Conference on Process Mining, ICPM 2021, which took place in Eindhoven, The Netherlands, during October 31–November 4, 2021. The conference focuses on the area of process mining research and practice, including theory, algorithmic challenges, and applications. The co-located workshops provided a forum for novel research ideas. The 28 papers included in this volume were carefully reviewed and selected from 65 submissions. They stem from the following workshops: 2nd International Workshop on Event Data and Behavioral Analytics (EDBA) 2nd International Workshop on Leveraging Machine Learning in Process Mining (ML4PM) 2nd International Workshop on Streaming Analytics for Process Mining (SA4PM) 6th International Workshop on Process Querying, Manipulation, and Intelligence (PQMI) 4th International Workshop on Process-Oriented Data Science for Healthcare (PODS4H) 2nd International Workshop on Trust, Privacy, and Security in Process Analytics (TPSA) One survey paper on the results of the XES 2.0 Workshop is included
Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey
Modern language models (LMs) have been successfully employed in source code
generation and understanding, leading to a significant increase in research
focused on learning-based code intelligence, such as automated bug repair, and
test case generation. Despite their great potential, language models for code
intelligence (LM4Code) are susceptible to potential pitfalls, which hinder
realistic performance and further impact their reliability and applicability in
real-world deployment. Such challenges drive the need for a comprehensive
understanding - not just identifying these issues but delving into their
possible implications and existing solutions to build more reliable language
models tailored to code intelligence. Based on a well-defined systematic
research approach, we conducted an extensive literature review to uncover the
pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues
have been identified. After carefully examining these studies, we designed a
taxonomy of pitfalls in LM4Code research and conducted a systematic study to
summarize the issues, implications, current solutions, and challenges of
different pitfalls for LM4Code systems. We developed a comprehensive
classification scheme that dissects pitfalls across four crucial aspects: data
collection and labeling, system design and learning, performance evaluation,
and deployment and maintenance. Through this study, we aim to provide a roadmap
for researchers and practitioners, facilitating their understanding and
utilization of LM4Code in reliable and trustworthy ways
Recommended from our members
Exploring Societal Computing based on the Example of Privacy
Data privacy when using online systems like Facebook and Amazon has become an increasingly popular topic in the last few years. This thesis will consist of the following four projects that aim to address the issues of privacy and software engineering.
First, only a little is known about how users and developers perceive privacy and which concrete measures would mitigate their privacy concerns. To investigate privacy requirements, we conducted an online survey with closed and open questions and collected 408 valid responses. Our results show that users often reduce privacy to security, with data sharing and data breaches being their biggest concerns. Users are more concerned about the content of their documents and their personal data such as location than about their interaction data. Unlike users, developers clearly prefer technical measures like data anonymization and think that privacy laws and policies are less effective. We also observed interesting differences between people from different geographies. For example, people from Europe are more concerned about data breaches than people from North America. People from Asia/Pacific and Europe believe that content and metadata are more critical for privacy than people from North America. Our results contribute to developing a user-driven privacy framework that is based on empirical evidence in addition to the legal, technical, and commercial perspectives.
Second, a related challenge to above, is to make privacy more understandable in complex systems that may have a variety of user interface options, which may change often. As social network platforms have evolved, the ability for users to control how and with whom information is being shared introduces challenges concerning the configuration and comprehension of privacy settings. To address these concerns, our crowd sourced approach simplifies the understanding of privacy settings by using data collected from 512 users over a 17 month period to generate visualizations that allow users to compare their personal settings to an arbitrary subset of individuals of their choosing. To validate our approach we conducted an online survey with closed and open questions and collected 59 valid responses after which we conducted follow-up interviews with 10 respondents. Our results showed that 70% of respondents found visualizations using crowd sourced data useful for understanding privacy settings, and 80% preferred a crowd sourced tool for configuring their privacy settings over current privacy controls.
Third, as software evolves over time, this might introduce bugs that breach users' privacy. Further, there might be system-wide policy changes that could change users' settings to be more or less private than before. We present a novel technique that can be used by end-users for detecting changes in privacy, i.e., regression testing for privacy. Using a social approach for detecting privacy bugs, we present two prototype tools. Our evaluation shows the feasibility and utility of our approach for detecting privacy bugs. We highlight two interesting case studies on the bugs that were discovered using our tools. To the best of our knowledge, this is the first technique that leverages regression testing for detecting privacy bugs from an end-user perspective.
Fourth, approaches to addressing these privacy concerns typically require substantial extra computational resources, which might be beneficial where privacy is concerned, but may have significant negative impact with respect to Green Computing and sustainability, another major societal concern. Spending more computation time results in spending more energy and other resources that make the software system less sustainable. Ideally, what we would like are techniques for designing software systems that address these privacy concerns but which are also sustainable - systems where privacy could be achieved "for free", i.e., without having to spend extra computational effort. We describe how privacy can indeed be achieved for free an accidental and beneficial side effect of doing some existing computation - in web applications and online systems that have access to user data. We show the feasibility, sustainability, and utility of our approach and what types of privacy threats it can mitigate.
Finally, we generalize the problem of privacy and its tradeoffs. As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing, another major area of societal concern. We propose a new crosscutting research area, Societal Computing, that focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We highlight some of the relevant research topics and open problems that we foresee in Societal Computing. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole
Advances and Challenges of Multi-task Learning Method in Recommender System: A Survey
Multi-task learning has been widely applied in computational vision, natural
language processing and other fields, which has achieved well performance. In
recent years, a lot of work about multi-task learning recommender system has
been yielded, but there is no previous literature to summarize these works. To
bridge this gap, we provide a systematic literature survey about multi-task
recommender systems, aiming to help researchers and practitioners quickly
understand the current progress in this direction. In this survey, we first
introduce the background and the motivation of the multi-task learning-based
recommender systems. Then we provide a taxonomy of multi-task learning-based
recommendation methods according to the different stages of multi-task learning
techniques, which including task relationship discovery, model architecture and
optimization strategy. Finally, we raise discussions on the application and
promising future directions in this area
Process Mining Workshops
This open access book constitutes revised selected papers from the International Workshops held at the Third International Conference on Process Mining, ICPM 2021, which took place in Eindhoven, The Netherlands, during October 31–November 4, 2021. The conference focuses on the area of process mining research and practice, including theory, algorithmic challenges, and applications. The co-located workshops provided a forum for novel research ideas. The 28 papers included in this volume were carefully reviewed and selected from 65 submissions. They stem from the following workshops: 2nd International Workshop on Event Data and Behavioral Analytics (EDBA) 2nd International Workshop on Leveraging Machine Learning in Process Mining (ML4PM) 2nd International Workshop on Streaming Analytics for Process Mining (SA4PM) 6th International Workshop on Process Querying, Manipulation, and Intelligence (PQMI) 4th International Workshop on Process-Oriented Data Science for Healthcare (PODS4H) 2nd International Workshop on Trust, Privacy, and Security in Process Analytics (TPSA) One survey paper on the results of the XES 2.0 Workshop is included
Data privacy as a business opportunity : leveraging privacy maximizing features to address client privacy concerns
Data privacy is a critical concern in the era of data-driven businesses. Users are becoming
increasingly sensitive about the collection and processing of their personal data. This Master’s
thesis examines whether a firm’s data privacy policy can provide an edge over competitors.
Primary research was conducted to ascertain user preferences and behavior regarding data
privacy in the context of identified business drivers for prioritizing data privacy as well as for
mitigating associated risks and benefits. This data supplemented secondary material from the
literature review. PESTEL analysis indicated that key drivers for data privacy are legal, ethical,
financial, and technical. Moreover, expert interviews and the survey revealed that businesses
cannot avoid data privacy and proved the above-mentioned key drivers. Furthermore, the
drivers can be structured for transparency, trust, capabilities, and holistic processes. Data
privacy must be approached holistically as data governance to ensure efficient and responsible
data management within an organization. Hence, a concept was developed which proactively
leverages user concerns and minimizes the consequences of data breaches and non-compliance
with the GDPR.
Based on the foregoing, privacy policies can lead to unique positioning and consequently
provide a competitive advantage (CA) with the following measures: (1) explicit opt-in choices
on a consent management platform, (2) efficient Data Lifecycle Management, (3) are in the
context of privacy by design, and (4) represent technical best practices, such as differential
privacy. These criteria, properly executed with consideration to company-specific use cases and
the internal resources and capabilities, leverage privacy maximizing features for CA.A privacidade dos dados é uma preocupação crítica na era das empresas orientadas pelos dados.
Os utilizadores estão a tornar-se cada vez mais sensíveis quanto à recolha dos seus dados
pessoais. Esta tese de mestrado examina se a política de privacidade de dados de uma empresa
pode proporcionar uma vantagem sobre a concorrência.
Foi realizada uma pesquisa primária para determinar as preferências e o comportamento dos
utilizadores relativamente à privacidade dos dados no contexto dos impulsionadores
empresariais identificados para dar prioridade à privacidade dos dados. Estes dados
complementaram o material secundário da revisão bibliográfica. A análise PESTEL indicou
que os principais motores da privacidade de dados são legais, éticos, financeiros, e técnicos,
comprovados por entrevistas e inquéritos. Além disso, os condutores podem ser estruturados
para transparência, confiança, capacidades, e processos holísticos. A privacidade dos dados
deve ser abordada holisticamente como governação dos dados para assegurar uma gestão
eficiente dos dados dentro de uma organização. Foi desenvolvido um conceito que mostra que
as políticas de privacidade podem conduzir a um posicionamento único e, consequentemente,
proporcionar uma vantagem competitiva com as seguintes medidas:(1) escolhas explícitas de
opt-in sobre uma plataforma de gestão de consentimento, (2) gestão eficiente do ciclo de vida
dos dados, (3) estão no contexto da privacidade por conceção, e (4) representam as melhores
práticas técnicas, tais como a privacidade diferencial. Estes critérios, devidamente executados
tendo em consideração os casos de utilização específicos da empresa e os recursos e
capacidades internas, potenciam as características de privacidade para uma vantagem
competitiva
Privaatsuskaitse tehnoloogiaid äriprotsesside kaeveks
Protsessikaeve tehnikad võimaldavad organisatsioonidel analüüsida protsesside täitmise käigus tekkivaid logijälgi eesmärgiga leida parendusvõimalusi. Nende tehnikate eelduseks on, et nimetatud logijälgi koondavad sündmuslogid on andmeanalüütikutele analüüside läbi viimiseks kättesaadavad. Sellised sündmuslogid võivad sisaldada privaatset informatsiooni isikute kohta kelle jaoks protsessi täidetakse. Sellistel juhtudel peavad organisatsioonid rakendama privaatsuskaitse tehnoloogiaid (PET), et võimaldada analüütikul sündmuslogi põhjal järeldusi teha, samas säilitades isikute privaatsust.
Kuigi PET tehnikad säilitavad isikute privaatsust organisatsiooni siseselt, muudavad nad ühtlasi sündmuslogisid sellisel viisil, mis võib viia analüüsi käigus valede järeldusteni. PET tehnikad võivad lisada sündmuslogidesse sellist uut käitumist, mille esinemine ei ole reaalses sündmuslogis võimalik. Näiteks võivad mõned PET tehnikad haigla sündmuslogi anonüümimisel lisada logijälje, mille kohaselt patsient külastas arsti enne haiglasse saabumist.
Käesolev lõputöö esitab privaatsust säilitavate lähenemiste komplekti nimetusega privaatsust säilitav protsessikaeve (PPPM). PPPM põhiline eesmärk on leida tasakaal võimaliku sündmuslogi analüüsist saadava kasu ja analüüsile kohaldatavate privaatsusega seonduvate regulatsioonide (näiteks GDPR) vahel. Lisaks pakub käesolev lõputöö lahenduse, mis võimaldab erinevatel organisatsioonidel protsessikaevet üle ühise andmete terviku rakendada, ilma oma privaatseid andmeid üksteisega jagamata.
Käesolevas lõputöös esitatud tehnikad on avatud lähtekoodiga tööriistadena kättesaadavad. Nendest tööriistadest esimene on Amun, mis võimaldab sündmuslogi omanikul sündmuslogi anonüümida enne selle analüütikule jagamist. Teine tööriist on Libra, mis pakub täiendatud võimalusi kasutatavuse ja privaatsuse tasakaalu leidmiseks. Kolmas tööriist on Shareprom, mis võimaldab organisatsioonidele ühiste protsessikaartide loomist sellisel viisil, et ükski osapool ei näe teiste osapoolte andmeid.Process Mining Techniques enable organizations to analyze process execution traces to identify improvement opportunities. Such techniques need the event logs (which record process execution) to be available for data analysts to perform the analysis. These logs contain private information about the individuals for whom a process is being executed. In such cases, organizations need to deploy Privacy-Enhancing Technologies (PETs) to enable the analyst to drive conclusions from the event logs while preserving the privacy of individuals.
While PETs techniques preserve the privacy of individuals inside the organization, they work by perturbing the event logs in such a way that may lead to misleading conclusions of the analysis. They may inject new behaviors into the event logs that are impossible to exist in real-life event logs. For example, some PETs techniques anonymize a hospital event log by injecting a trace that a patient may visit a doctor before checking in inside the hospital.
In this thesis, we propose a set of privacy-preserving approaches that we call Privacy-Preserving Process Mining (PPPM) approaches to strike a balance between the benefits an analyst can get from analyzing these event logs and the requirements imposed on them by privacy regulations (e.g., GDPR). Also, in this thesis, we propose an approach that enables organizations to jointly perform process mining over their data without sharing their private information.
The techniques proposed in this thesis have been proposed as open-source tools. The first tool is Amun, enabling an event log publisher to anonymize their event log before sharing it with an analyst. The second tool is called Libra, which provides an enhanced utility-privacy tradeoff. The third tool is Shareprom, which enables organizations to construct process maps jointly in such a manner that no party learns the data of the other parties.https://www.ester.ee/record=b552434
- …