39 research outputs found

    Ticket Automation: an Insight into Current Research with Applications to Multi-level Classification Scenarios

    Get PDF
    odern service providers often have to deal with large amounts of customer requests, which they need to act upon in a swift and effective manner to ensure adequate support is provided. In this context, machine learning algorithms are fundamental in streamlining support ticket processing workflows. However, a large part of current approaches is still based on traditional Natural Language Processing approaches without fully exploiting the latest advancements in this field. In this work, we aim to provide an overview of support Ticket Automation, what recent proposals are being made in this field, and how well some of these methods can generalize to new scenarios and datasets. We list the most recent proposals for these tasks and examine in detail the ones related to Ticket Classification, the most prevalent of them. We analyze commonly utilized datasets and experiment on two of them, both characterized by a two-level hierarchy of labels, which are descriptive of the ticket’s topic at different levels of granularity. The first is a collection of 20,000 customer complaints, and the second comprises 35,000 issues crawled from a bug reporting website. Using this data, we focus on topically classifying tickets using a pre-trained BERT language model. The experimental section of this work has two objectives. First, we demonstrate the impact of different document representation strategies on classification performance. Secondly, we showcase an effective way to boost classification by injecting information from the hierarchical structure of the labels into the classifier. Our findings show that the choice of the embedding strategy for ticket embeddings considerably impacts classification metrics on our datasets: the best method improves by more than 28% in F1- score over the standard strategy. We also showcase the effectiveness of hierarchical information injection, which further improves the results. In the bugs dataset, one of our multi-level models (ML-BERT) outperforms the best baseline by up to 5.7% in F1-score and 5.4% in accuracy

    Privacy by Design in Data Mining

    Get PDF
    Privacy is ever-growing concern in our society: the lack of reliable privacy safeguards in many current services and devices is the basis of a diffusion that is often more limited than expected. Moreover, people feel reluctant to provide true personal data, unless it is absolutely necessary. Thus, privacy is becoming a fundamental aspect to take into account when one wants to use, publish and analyze data involving sensitive information. Many recent research works have focused on the study of privacy protection: some of these studies aim at individual privacy, i.e., the protection of sensitive individual data, while others aim at corporate privacy, i.e., the protection of strategic information at organization level. Unfortunately, it is in- creasingly hard to transform the data in a way that it protects sensitive information: we live in the era of big data characterized by unprecedented opportunities to sense, store and analyze complex data which describes human activities in great detail and resolution. As a result anonymization simply cannot be accomplished by de-identification. In the last few years, several techniques for creating anonymous or obfuscated versions of data sets have been proposed, which essentially aim to find an acceptable trade-off between data privacy on the one hand and data utility on the other. So far, the common result obtained is that no general method exists which is capable of both dealing with “generic personal data” and preserving “generic analytical results”. In this thesis we propose the design of technological frameworks to counter the threats of undesirable, unlawful effects of privacy violation, without obstructing the knowledge discovery opportunities of data mining technologies. Our main idea is to inscribe privacy protection into the knowledge discovery technol- ogy by design, so that the analysis incorporates the relevant privacy requirements from the start. Therefore, we propose the privacy-by-design paradigm that sheds a new light on the study of privacy protection: once specific assumptions are made about the sensitive data and the target mining queries that are to be answered with the data, it is conceivable to design a framework to: a) transform the source data into an anonymous version with a quantifiable privacy guarantee, and b) guarantee that the target mining queries can be answered correctly using the transformed data instead of the original ones. This thesis investigates on two new research issues which arise in modern Data Mining and Data Privacy: individual privacy protection in data publishing while preserving specific data mining analysis, and corporate privacy protection in data mining outsourcing

    Evaluating Methods for Privacy-Preserving Data Sharing in Genomics

    Get PDF
    The availability of genomic data is often essential to progress in biomedical re- search, personalized medicine, drug development, etc. However, its extreme sensitivity makes it problematic, if not outright impossible, to publish or share it. In this dissertation, we study and build systems that are geared towards privacy preserving genomic data sharing. We first look at the Matchmaker Exchange, a platform that connects multiple distributed databases through an API and allows researchers to query for genetic variants in other databases through the network. However, queries are broadcast to all researchers that made a similar query in any of the connected databases, which can lead to a reluctance to use the platform, due to loss of privacy or competitive advantage. In order to overcome this reluctance, we propose a framework to support anonymous querying on the platform. Since genomic data’s sensitivity does not degrade over time, we analyze the real-world guarantees provided by the only tool available for long term genomic data storage. We find that the system offers low security when the adversary has access to side information, and we support our claims by empirical evidence. We also study the viability of synthetic data for privacy preserving data sharing. Since for genomic data research, the utility of the data provided is of the utmost importance, we first perform a utility evaluation on generative models for different types of datasets (i.e., financial data, images, and locations). Then, we propose a privacy evaluation framework for synthetic data. We then perform a measurement study assessing state-of-the-art generative models specifically geared for human genomic data, looking at both utility and privacy perspectives. Overall, we find that there is no single approach for generating synthetic data that performs well across the board from both utility and privacy perspectives

    Decision Support Systems

    Get PDF
    Decision support systems (DSS) have evolved over the past four decades from theoretical concepts into real world computerized applications. DSS architecture contains three key components: knowledge base, computerized model, and user interface. DSS simulate cognitive decision-making functions of humans based on artificial intelligence methodologies (including expert systems, data mining, machine learning, connectionism, logistical reasoning, etc.) in order to perform decision support functions. The applications of DSS cover many domains, ranging from aviation monitoring, transportation safety, clinical diagnosis, weather forecast, business management to internet search strategy. By combining knowledge bases with inference rules, DSS are able to provide suggestions to end users to improve decisions and outcomes. This book is written as a textbook so that it can be used in formal courses examining decision support systems. It may be used by both undergraduate and graduate students from diverse computer-related fields. It will also be of value to established professionals as a text for self-study or for reference

    Machine Learning Algorithms for Privacy-preserving Behavioral Data Analytics

    Get PDF
    PhD thesisBehavioral patterns observed in data generated by mobile and wearable devices are used by many applications, such as wellness monitoring or service personalization. However, sensitive information may be inferred from these data when they are shared with cloud-based services. In this thesis, we propose machine learning algorithms for data transformations to allow the inference of information required for specific tasks while preventing the inference of privacy-sensitive information. Specifically, we focus on protecting the user’s privacy when sharing motion-sensor data and web-browsing histories. Firstly, for human activity recognition using data of wearable sensors, we introduce two algorithms for training deep neural networks to transform motion-sensor data, focusing on two objectives: (i) to prevent the inference of privacy-sensitive activities (e.g. smoking or drinking), and (ii) to protect user’s sensitive attributes (e.g. gender) and prevent the re-identification of user. We show how to combine these two algorithms and propose a compound architecture that protects both sensitive activities and attributes. Alongside the algorithmic contributions, we published a motion-sensor dataset for human activity recognition. Secondly, to prevent the identification of users using their web-browsing behavior, we introduce an algorithm for privacy-preserving collaborative training of contextual bandit algorithms. The proposed method improves the accuracy of personalized recommendation agents that run locally on the user’s devices. We propose an encoding algorithm for the user’s web-browsing data that preserves the required information for the personalization of the future contents while ensuring differential privacy for the participants in collaborative training. In addition, for processing multivariate sensor data, we show how to make neural network architectures adaptive to dynamic sampling rate and sensor selection. This allows handling situations in human activity recognition where the dimensions of input data can be varied at inference time. Specifically, we introduce a customized pooling layer for neural networks and propose a customized training procedure to generalize over a large number of feasible data dimensions. Using the proposed architectural improvement, we show how to convert existing non-adaptive deep neural networks into an adaptive network while keeping the same classification accuracy. We conclude this thesis by discussing open questions and the potential future directions for continuing research in this area

    Cloud Services Brokerage for Mobile Ubiquitous Computing

    Get PDF
    Recently, companies are adopting Mobile Cloud Computing (MCC) to efficiently deliver enterprise services to users (or consumers) on their personalized devices. MCC is the facilitation of mobile devices (e.g., smartphones, tablets, notebooks, and smart watches) to access virtualized services such as software applications, servers, storage, and network services over the Internet. With the advancement and diversity of the mobile landscape, there has been a growing trend in consumer attitude where a single user owns multiple mobile devices. This paradigm of supporting a single user or consumer to access multiple services from n-devices is referred to as the Ubiquitous Cloud Computing (UCC) or the Personal Cloud Computing. In the UCC era, consumers expect to have application and data consistency across their multiple devices and in real time. However, this expectation can be hindered by the intermittent loss of connectivity in wireless networks, user mobility, and peak load demands. Hence, this dissertation presents an architectural framework called, Cloud Services Brokerage for Mobile Ubiquitous Cloud Computing (CSB-UCC), which ensures soft real-time and reliable services consumption on multiple devices of users. The CSB-UCC acts as an application middleware broker that connects the n-devices of users to the multi-cloud services. The designed system determines the multi-cloud services based on the user's subscriptions and the n-devices are determined through device registration on the broker. The preliminary evaluations of the designed system shows that the following are achieved: 1) high scalability through the adoption of a distributed architecture of the brokerage service, 2) providing soft real-time application synchronization for consistent user experience through an enhanced mobile-to-cloud proximity-based access technique, 3) reliable error recovery from system failure through transactional services re-assignment to active nodes, and 4) transparent audit trail through access-level and context-centric provenance

    Aggregating privatized medical data for secure querying applications

    Full text link
     This thesis analyses and examines the challenges of aggregation of sensitive data and data querying on aggregated data at cloud server. This thesis also delineates applications of aggregation of sensitive medical data in several application scenarios, and tests privatization techniques to assist in improving the strength of privacy and utility
    corecore