Search CORE

1,782 research outputs found

A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions

Author: Gandomi Amir H.
Gharoun Hassan
Khorshidi Mohammad Sadegh
Rakhshaninejad Morteza
Yazdanjouei Hossein
Yazdanjue Navid
Publication venue
Publication date: 24/07/2023
Field of study

In recent decades, social network anonymization has become a crucial research field due to its pivotal role in preserving users' privacy. However, the high diversity of approaches introduced in relevant studies poses a challenge to gaining a profound understanding of the field. In response to this, the current study presents an exhaustive and well-structured bibliometric analysis of the social network anonymization field. To begin our research, related studies from the period of 2007-2022 were collected from the Scopus Database then pre-processed. Following this, the VOSviewer was used to visualize the network of authors' keywords. Subsequently, extensive statistical and network analyses were performed to identify the most prominent keywords and trending topics. Additionally, the application of co-word analysis through SciMAT and the Alluvial diagram allowed us to explore the themes of social network anonymization and scrutinize their evolution over time. These analyses culminated in an innovative taxonomy of the existing approaches and anticipation of potential trends in this domain. To the best of our knowledge, this is the first bibliometric analysis in the social network anonymization field, which offers a deeper understanding of the current state and an insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure

arXiv.org e-Print Archive

Data Mining

Author
Publication venue: 'IntechOpen'
Publication date: 27/07/2022
Field of study

The availability of big data due to computerization and automation has generated an urgent need for new techniques to analyze and convert big data into useful information and knowledge. Data mining is a promising and leading-edge technology for mining large volumes of data, looking for hidden information, and aiding knowledge discovery. It can be used for characterization, classification, discrimination, anomaly detection, association, clustering, trend or evolution prediction, and much more in fields such as science, medicine, economics, engineering, computers, and even business analytics. This book presents basic concepts, ideas, and research in data mining

Directory of Open Access Books (DOAB)

Technical Research Priorities for Big Data

Author: Auer Sören
Berre Arne J.
Curry Edward
Curry Edward
Despenic Marija
García Robles Ana
Hasan Souleiman
Metzger Andreas
Metzger Andreas
Ojo Adegboyega
Pazzaglia Jean-Christophe
Petkovic Milan
Roman Dumitru
Seidl Robert
ul Hassan Umair
Walshe Ray
Waterfeld Walter
Zillner Sonja
Zillner Sonja
Publication venue: Cham : Springer International Publishing
Publication date: 01/01/2021
Field of study

To drive innovation and competitiveness, organisations need to foster the development and broad adoption of data technologies, value-adding use cases and sustainable business models. Enabling an effective data ecosystem requires overcoming several technical challenges associated with the cost and complexity of management, processing, analysis and utilisation of data. This chapter details a community-driven initiative to identify and characterise the key technical research priorities for research and development in data technologies. The chapter examines the systemic and structured methodology used to gather inputs from over 200 stakeholder organisations. The result of the process identified five key technical research priorities in the areas of data management, data processing, data analytics, data visualisation and user interactions, and data protection, together with 28 sub-level challenges. The process also highlighted the important role of data standardisation, data engineering and DevOps for Big Data

Institutionelles Repositorium der Leibniz Universität Hannover

Process Mining Workshops

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2022
Field of study

This open access book constitutes revised selected papers from the International Workshops held at the Third International Conference on Process Mining, ICPM 2021, which took place in Eindhoven, The Netherlands, during October 31–November 4, 2021. The conference focuses on the area of process mining research and practice, including theory, algorithmic challenges, and applications. The co-located workshops provided a forum for novel research ideas. The 28 papers included in this volume were carefully reviewed and selected from 65 submissions. They stem from the following workshops: 2nd International Workshop on Event Data and Behavioral Analytics (EDBA) 2nd International Workshop on Leveraging Machine Learning in Process Mining (ML4PM) 2nd International Workshop on Streaming Analytics for Process Mining (SA4PM) 6th International Workshop on Process Querying, Manipulation, and Intelligence (PQMI) 4th International Workshop on Process-Oriented Data Science for Healthcare (PODS4H) 2nd International Workshop on Trust, Privacy, and Security in Process Analytics (TPSA) One survey paper on the results of the XES 2.0 Workshop is included

Directory of Open Access Books (DOAB)

Pitfalls in Language Models for Code Intelligence: A Taxonomy and Survey

Author: He Yiling
Li Li
Liu Yue
Qin Zhan
She Xinyu
Tantithamthavorn Chakkrit
Wang Haoyu
Zhao Yanjie
Publication venue
Publication date: 27/10/2023
Field of study

Modern language models (LMs) have been successfully employed in source code generation and understanding, leading to a significant increase in research focused on learning-based code intelligence, such as automated bug repair, and test case generation. Despite their great potential, language models for code intelligence (LM4Code) are susceptible to potential pitfalls, which hinder realistic performance and further impact their reliability and applicability in real-world deployment. Such challenges drive the need for a comprehensive understanding - not just identifying these issues but delving into their possible implications and existing solutions to build more reliable language models tailored to code intelligence. Based on a well-defined systematic research approach, we conducted an extensive literature review to uncover the pitfalls inherent in LM4Code. Finally, 67 primary studies from top-tier venues have been identified. After carefully examining these studies, we designed a taxonomy of pitfalls in LM4Code research and conducted a systematic study to summarize the issues, implications, current solutions, and challenges of different pitfalls for LM4Code systems. We developed a comprehensive classification scheme that dissects pitfalls across four crucial aspects: data collection and labeling, system design and learning, performance evaluation, and deployment and maintenance. Through this study, we aim to provide a roadmap for researchers and practitioners, facilitating their understanding and utilization of LM4Code in reliable and trustworthy ways

arXiv.org e-Print Archive

Recommended from our members

Exploring Societal Computing based on the Example of Privacy

Author: Sheth Swapneel
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

Data privacy when using online systems like Facebook and Amazon has become an increasingly popular topic in the last few years. This thesis will consist of the following four projects that aim to address the issues of privacy and software engineering. First, only a little is known about how users and developers perceive privacy and which concrete measures would mitigate their privacy concerns. To investigate privacy requirements, we conducted an online survey with closed and open questions and collected 408 valid responses. Our results show that users often reduce privacy to security, with data sharing and data breaches being their biggest concerns. Users are more concerned about the content of their documents and their personal data such as location than about their interaction data. Unlike users, developers clearly prefer technical measures like data anonymization and think that privacy laws and policies are less effective. We also observed interesting differences between people from different geographies. For example, people from Europe are more concerned about data breaches than people from North America. People from Asia/Pacific and Europe believe that content and metadata are more critical for privacy than people from North America. Our results contribute to developing a user-driven privacy framework that is based on empirical evidence in addition to the legal, technical, and commercial perspectives. Second, a related challenge to above, is to make privacy more understandable in complex systems that may have a variety of user interface options, which may change often. As social network platforms have evolved, the ability for users to control how and with whom information is being shared introduces challenges concerning the configuration and comprehension of privacy settings. To address these concerns, our crowd sourced approach simplifies the understanding of privacy settings by using data collected from 512 users over a 17 month period to generate visualizations that allow users to compare their personal settings to an arbitrary subset of individuals of their choosing. To validate our approach we conducted an online survey with closed and open questions and collected 59 valid responses after which we conducted follow-up interviews with 10 respondents. Our results showed that 70% of respondents found visualizations using crowd sourced data useful for understanding privacy settings, and 80% preferred a crowd sourced tool for configuring their privacy settings over current privacy controls. Third, as software evolves over time, this might introduce bugs that breach users' privacy. Further, there might be system-wide policy changes that could change users' settings to be more or less private than before. We present a novel technique that can be used by end-users for detecting changes in privacy, i.e., regression testing for privacy. Using a social approach for detecting privacy bugs, we present two prototype tools. Our evaluation shows the feasibility and utility of our approach for detecting privacy bugs. We highlight two interesting case studies on the bugs that were discovered using our tools. To the best of our knowledge, this is the first technique that leverages regression testing for detecting privacy bugs from an end-user perspective. Fourth, approaches to addressing these privacy concerns typically require substantial extra computational resources, which might be beneficial where privacy is concerned, but may have significant negative impact with respect to Green Computing and sustainability, another major societal concern. Spending more computation time results in spending more energy and other resources that make the software system less sustainable. Ideally, what we would like are techniques for designing software systems that address these privacy concerns but which are also sustainable - systems where privacy could be achieved "for free", i.e., without having to spend extra computational effort. We describe how privacy can indeed be achieved for free an accidental and beneficial side effect of doing some existing computation - in web applications and online systems that have access to user data. We show the feasibility, sustainability, and utility of our approach and what types of privacy threats it can mitigate. Finally, we generalize the problem of privacy and its tradeoffs. As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing, another major area of societal concern. We propose a new crosscutting research area, Societal Computing, that focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We highlight some of the relevant research topics and open problems that we foresee in Societal Computing. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole

Columbia University Academic Commons

Advances and Challenges of Multi-task Learning Method in Recommender System: A Survey

Author: Li Kan
Wang Yipeng
Yang Zhen
Yin Ruiping
Zhang Mingzhu
Publication venue
Publication date: 23/05/2023
Field of study

Multi-task learning has been widely applied in computational vision, natural language processing and other fields, which has achieved well performance. In recent years, a lot of work about multi-task learning recommender system has been yielded, but there is no previous literature to summarize these works. To bridge this gap, we provide a systematic literature survey about multi-task recommender systems, aiming to help researchers and practitioners quickly understand the current progress in this direction. In this survey, we first introduce the background and the motivation of the multi-task learning-based recommender systems. Then we provide a taxonomy of multi-task learning-based recommendation methods according to the different stages of multi-task learning techniques, which including task relationship discovery, model architecture and optimization strategy. Finally, we raise discussions on the application and promising future directions in this area

arXiv.org e-Print Archive

Process Mining Workshops

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library

Data privacy as a business opportunity : leveraging privacy maximizing features to address client privacy concerns

Author: Fastje Luis
Publication venue
Publication date: 01/01/2023
Field of study

Data privacy is a critical concern in the era of data-driven businesses. Users are becoming increasingly sensitive about the collection and processing of their personal data. This Master’s thesis examines whether a firm’s data privacy policy can provide an edge over competitors. Primary research was conducted to ascertain user preferences and behavior regarding data privacy in the context of identified business drivers for prioritizing data privacy as well as for mitigating associated risks and benefits. This data supplemented secondary material from the literature review. PESTEL analysis indicated that key drivers for data privacy are legal, ethical, financial, and technical. Moreover, expert interviews and the survey revealed that businesses cannot avoid data privacy and proved the above-mentioned key drivers. Furthermore, the drivers can be structured for transparency, trust, capabilities, and holistic processes. Data privacy must be approached holistically as data governance to ensure efficient and responsible data management within an organization. Hence, a concept was developed which proactively leverages user concerns and minimizes the consequences of data breaches and non-compliance with the GDPR. Based on the foregoing, privacy policies can lead to unique positioning and consequently provide a competitive advantage (CA) with the following measures: (1) explicit opt-in choices on a consent management platform, (2) efficient Data Lifecycle Management, (3) are in the context of privacy by design, and (4) represent technical best practices, such as differential privacy. These criteria, properly executed with consideration to company-specific use cases and the internal resources and capabilities, leverage privacy maximizing features for CA.A privacidade dos dados é uma preocupação crítica na era das empresas orientadas pelos dados. Os utilizadores estão a tornar-se cada vez mais sensíveis quanto à recolha dos seus dados pessoais. Esta tese de mestrado examina se a política de privacidade de dados de uma empresa pode proporcionar uma vantagem sobre a concorrência. Foi realizada uma pesquisa primária para determinar as preferências e o comportamento dos utilizadores relativamente à privacidade dos dados no contexto dos impulsionadores empresariais identificados para dar prioridade à privacidade dos dados. Estes dados complementaram o material secundário da revisão bibliográfica. A análise PESTEL indicou que os principais motores da privacidade de dados são legais, éticos, financeiros, e técnicos, comprovados por entrevistas e inquéritos. Além disso, os condutores podem ser estruturados para transparência, confiança, capacidades, e processos holísticos. A privacidade dos dados deve ser abordada holisticamente como governação dos dados para assegurar uma gestão eficiente dos dados dentro de uma organização. Foi desenvolvido um conceito que mostra que as políticas de privacidade podem conduzir a um posicionamento único e, consequentemente, proporcionar uma vantagem competitiva com as seguintes medidas:(1) escolhas explícitas de opt-in sobre uma plataforma de gestão de consentimento, (2) gestão eficiente do ciclo de vida dos dados, (3) estão no contexto da privacidade por conceção, e (4) representam as melhores práticas técnicas, tais como a privacidade diferencial. Estes critérios, devidamente executados tendo em consideração os casos de utilização específicos da empresa e os recursos e capacidades internas, potenciam as características de privacidade para uma vantagem competitiva

Repositório Institucional da Universidade Católica Portuguesa

Privaatsuskaitse tehnoloogiaid äriprotsesside kaeveks

Author: Elkoumy Gamal
Publication venue
Publication date: 08/11/2022
Field of study

Protsessikaeve tehnikad võimaldavad organisatsioonidel analüüsida protsesside täitmise käigus tekkivaid logijälgi eesmärgiga leida parendusvõimalusi. Nende tehnikate eelduseks on, et nimetatud logijälgi koondavad sündmuslogid on andmeanalüütikutele analüüside läbi viimiseks kättesaadavad. Sellised sündmuslogid võivad sisaldada privaatset informatsiooni isikute kohta kelle jaoks protsessi täidetakse. Sellistel juhtudel peavad organisatsioonid rakendama privaatsuskaitse tehnoloogiaid (PET), et võimaldada analüütikul sündmuslogi põhjal järeldusi teha, samas säilitades isikute privaatsust. Kuigi PET tehnikad säilitavad isikute privaatsust organisatsiooni siseselt, muudavad nad ühtlasi sündmuslogisid sellisel viisil, mis võib viia analüüsi käigus valede järeldusteni. PET tehnikad võivad lisada sündmuslogidesse sellist uut käitumist, mille esinemine ei ole reaalses sündmuslogis võimalik. Näiteks võivad mõned PET tehnikad haigla sündmuslogi anonüümimisel lisada logijälje, mille kohaselt patsient külastas arsti enne haiglasse saabumist. Käesolev lõputöö esitab privaatsust säilitavate lähenemiste komplekti nimetusega privaatsust säilitav protsessikaeve (PPPM). PPPM põhiline eesmärk on leida tasakaal võimaliku sündmuslogi analüüsist saadava kasu ja analüüsile kohaldatavate privaatsusega seonduvate regulatsioonide (näiteks GDPR) vahel. Lisaks pakub käesolev lõputöö lahenduse, mis võimaldab erinevatel organisatsioonidel protsessikaevet üle ühise andmete terviku rakendada, ilma oma privaatseid andmeid üksteisega jagamata. Käesolevas lõputöös esitatud tehnikad on avatud lähtekoodiga tööriistadena kättesaadavad. Nendest tööriistadest esimene on Amun, mis võimaldab sündmuslogi omanikul sündmuslogi anonüümida enne selle analüütikule jagamist. Teine tööriist on Libra, mis pakub täiendatud võimalusi kasutatavuse ja privaatsuse tasakaalu leidmiseks. Kolmas tööriist on Shareprom, mis võimaldab organisatsioonidele ühiste protsessikaartide loomist sellisel viisil, et ükski osapool ei näe teiste osapoolte andmeid.Process Mining Techniques enable organizations to analyze process execution traces to identify improvement opportunities. Such techniques need the event logs (which record process execution) to be available for data analysts to perform the analysis. These logs contain private information about the individuals for whom a process is being executed. In such cases, organizations need to deploy Privacy-Enhancing Technologies (PETs) to enable the analyst to drive conclusions from the event logs while preserving the privacy of individuals. While PETs techniques preserve the privacy of individuals inside the organization, they work by perturbing the event logs in such a way that may lead to misleading conclusions of the analysis. They may inject new behaviors into the event logs that are impossible to exist in real-life event logs. For example, some PETs techniques anonymize a hospital event log by injecting a trace that a patient may visit a doctor before checking in inside the hospital. In this thesis, we propose a set of privacy-preserving approaches that we call Privacy-Preserving Process Mining (PPPM) approaches to strike a balance between the benefits an analyst can get from analyzing these event logs and the requirements imposed on them by privacy regulations (e.g., GDPR). Also, in this thesis, we propose an approach that enables organizations to jointly perform process mining over their data without sharing their private information. The techniques proposed in this thesis have been proposed as open-source tools. The first tool is Amun, enabling an event log publisher to anonymize their event log before sharing it with an analyst. The second tool is called Libra, which provides an enhanced utility-privacy tradeoff. The third tool is Shareprom, which enables organizations to construct process maps jointly in such a manner that no party learns the data of the other parties.https://www.ester.ee/record=b552434

DSpace at Tartu University Library