235 research outputs found

    Improvements on the k-center problem for uncertain data

    Full text link
    In real applications, there are situations where we need to model some problems based on uncertain data. This leads us to define an uncertain model for some classical geometric optimization problems and propose algorithms to solve them. In this paper, we study the kk-center problem, for uncertain input. In our setting, each uncertain point PiP_i is located independently from other points in one of several possible locations {Pi,1,,Pi,zi}\{P_{i,1},\dots, P_{i,z_i}\} in a metric space with metric dd, with specified probabilities and the goal is to compute kk-centers {c1,,ck}\{c_1,\dots, c_k\} that minimize the following expected cost Ecost(c1,,ck)=RΩprob(R)maxi=1,,nminj=1,kd(P^i,cj)Ecost(c_1,\dots, c_k)=\sum_{R\in \Omega} prob(R)\max_{i=1,\dots, n}\min_{j=1,\dots k} d(\hat{P}_i,c_j) here Ω\Omega is the probability space of all realizations R={P^1,,P^n}R=\{\hat{P}_1,\dots, \hat{P}_n\} of given uncertain points and prob(R)=i=1nprob(P^i).prob(R)=\prod_{i=1}^n prob(\hat{P}_i). In restricted assigned version of this problem, an assignment A:{P1,,Pn}{c1,,ck}A:\{P_1,\dots, P_n\}\rightarrow \{c_1,\dots, c_k\} is given for any choice of centers and the goal is to minimize EcostA(c1,,ck)=RΩprob(R)maxi=1,,nd(P^i,A(Pi)).Ecost_A(c_1,\dots, c_k)=\sum_{R\in \Omega} prob(R)\max_{i=1,\dots, n} d(\hat{P}_i,A(P_i)). In unrestricted version, the assignment is not specified and the goal is to compute kk centers {c1,,ck}\{c_1,\dots, c_k\} and an assignment AA that minimize the above expected cost. We give several improved constant approximation factor algorithms for the assigned versions of this problem in a Euclidean space and in a general metric space. Our results significantly improve the results of \cite{guh} and generalize the results of \cite{wang} to any dimension. Our approach is to replace a certain center point for each uncertain point and study the properties of these certain points. The proposed algorithms are efficient and simple to implement

    Verification in Privacy Preserving Data Publishing

    Get PDF
    Privacy preserving data publication is a major concern for both the owners of data and the data publishers. Principles like k-anonymity, l-diversity were proposed to reduce privacy violations. On the other side, no studies were found on verification on the anonymized data in terms of adversarial breach and anonymity levels. However, the anonymized data is still prone to attacks due to the presence of dependencies among quasi-identifiers and sensitive attributes. This paper presents a novel framework to detect the existence of those dependencies and a solution to reduce them. The advantages of our approach are i) privacy violations can be detected, ii) the extent of privacy risk can be measured and iii) re-anonymization can be done on vulnerable blocks of data. The work is further extended to show how the adversarial breach knowledge eventually increased when new tuples are added and an on the fly solution to reduce it is discussed. Experimental results are reported and analyzed

    Enhancing explainability and scrutability of recommender systems

    Get PDF
    Our increasing reliance on complex algorithms for recommendations calls for models and methods for explainable, scrutable, and trustworthy AI. While explainability is required for understanding the relationships between model inputs and outputs, a scrutable system allows us to modify its behavior as desired. These properties help bridge the gap between our expectations and the algorithm’s behavior and accordingly boost our trust in AI. Aiming to cope with information overload, recommender systems play a crucial role in filtering content (such as products, news, songs, and movies) and shaping a personalized experience for their users. Consequently, there has been a growing demand from the information consumers to receive proper explanations for their personalized recommendations. These explanations aim at helping users understand why certain items are recommended to them and how their previous inputs to the system relate to the generation of such recommendations. Besides, in the event of receiving undesirable content, explanations could possibly contain valuable information as to how the system’s behavior can be modified accordingly. In this thesis, we present our contributions towards explainability and scrutability of recommender systems: • We introduce a user-centric framework, FAIRY, for discovering and ranking post-hoc explanations for the social feeds generated by black-box platforms. These explanations reveal relationships between users’ profiles and their feed items and are extracted from the local interaction graphs of users. FAIRY employs a learning-to-rank (LTR) method to score candidate explanations based on their relevance and surprisal. • We propose a method, PRINCE, to facilitate provider-side explainability in graph-based recommender systems that use personalized PageRank at their core. PRINCE explanations are comprehensible for users, because they present subsets of the user’s prior actions responsible for the received recommendations. PRINCE operates in a counterfactual setup and builds on a polynomial-time algorithm for finding the smallest counterfactual explanations. • We propose a human-in-the-loop framework, ELIXIR, for enhancing scrutability and subsequently the recommendation models by leveraging user feedback on explanations. ELIXIR enables recommender systems to collect user feedback on pairs of recommendations and explanations. The feedback is incorporated into the model by imposing a soft constraint for learning user-specific item representations. We evaluate all proposed models and methods with real user studies and demonstrate their benefits at achieving explainability and scrutability in recommender systems.Unsere zunehmende Abhängigkeit von komplexen Algorithmen für maschinelle Empfehlungen erfordert Modelle und Methoden für erklärbare, nachvollziehbare und vertrauenswürdige KI. Zum Verstehen der Beziehungen zwischen Modellein- und ausgaben muss KI erklärbar sein. Möchten wir das Verhalten des Systems hingegen nach unseren Vorstellungen ändern, muss dessen Entscheidungsprozess nachvollziehbar sein. Erklärbarkeit und Nachvollziehbarkeit von KI helfen uns dabei, die Lücke zwischen dem von uns erwarteten und dem tatsächlichen Verhalten der Algorithmen zu schließen und unser Vertrauen in KI-Systeme entsprechend zu stärken. Um ein Übermaß an Informationen zu verhindern, spielen Empfehlungsdienste eine entscheidende Rolle um Inhalte (z.B. Produkten, Nachrichten, Musik und Filmen) zu filtern und deren Benutzern eine personalisierte Erfahrung zu bieten. Infolgedessen erheben immer mehr In- formationskonsumenten Anspruch auf angemessene Erklärungen für deren personalisierte Empfehlungen. Diese Erklärungen sollen den Benutzern helfen zu verstehen, warum ihnen bestimmte Dinge empfohlen wurden und wie sich ihre früheren Eingaben in das System auf die Generierung solcher Empfehlungen auswirken. Außerdem können Erklärungen für den Fall, dass unerwünschte Inhalte empfohlen werden, wertvolle Informationen darüber enthalten, wie das Verhalten des Systems entsprechend geändert werden kann. In dieser Dissertation stellen wir unsere Beiträge zu Erklärbarkeit und Nachvollziehbarkeit von Empfehlungsdiensten vor. • Mit FAIRY stellen wir ein benutzerzentriertes Framework vor, mit dem post-hoc Erklärungen für die von Black-Box-Plattformen generierten sozialen Feeds entdeckt und bewertet werden können. Diese Erklärungen zeigen Beziehungen zwischen Benutzerprofilen und deren Feeds auf und werden aus den lokalen Interaktionsgraphen der Benutzer extrahiert. FAIRY verwendet eine LTR-Methode (Learning-to-Rank), um die Erklärungen anhand ihrer Relevanz und ihres Grads unerwarteter Empfehlungen zu bewerten. • Mit der PRINCE-Methode erleichtern wir das anbieterseitige Generieren von Erklärungen für PageRank-basierte Empfehlungsdienste. PRINCE-Erklärungen sind für Benutzer verständlich, da sie Teilmengen früherer Nutzerinteraktionen darstellen, die für die erhaltenen Empfehlungen verantwortlich sind. PRINCE-Erklärungen sind somit kausaler Natur und werden von einem Algorithmus mit polynomieller Laufzeit erzeugt , um präzise Erklärungen zu finden. • Wir präsentieren ein Human-in-the-Loop-Framework, ELIXIR, um die Nachvollziehbarkeit der Empfehlungsmodelle und die Qualität der Empfehlungen zu verbessern. Mit ELIXIR können Empfehlungsdienste Benutzerfeedback zu Empfehlungen und Erklärungen sammeln. Das Feedback wird in das Modell einbezogen, indem benutzerspezifischer Einbettungen von Objekten gelernt werden. Wir evaluieren alle Modelle und Methoden in Benutzerstudien und demonstrieren ihren Nutzen hinsichtlich Erklärbarkeit und Nachvollziehbarkeit von Empfehlungsdiensten

    Clustering and its Application in Requirements Engineering

    Get PDF
    Large scale software systems challenge almost every activity in the software development life-cycle, including tasks related to eliciting, analyzing, and specifying requirements. Fortunately many of these complexities can be addressed through clustering the requirements in order to create abstractions that are meaningful to human stakeholders. For example, the requirements elicitation process can be supported through dynamically clustering incoming stakeholders’ requests into themes. Cross-cutting concerns, which have a significant impact on the architectural design, can be identified through the use of fuzzy clustering techniques and metrics designed to detect when a theme cross-cuts the dominant decomposition of the system. Finally, traceability techniques, required in critical software projects by many regulatory bodies, can be automated and enhanced by the use of cluster-based information retrieval methods. Unfortunately, despite a significant body of work describing document clustering techniques, there is almost no prior work which directly addresses the challenges, constraints, and nuances of requirements clustering. As a result, the effectiveness of software engineering tools and processes that depend on requirements clustering is severely limited. This report directly addresses the problem of clustering requirements through surveying standard clustering techniques and discussing their application to the requirements clustering process

    The State of Social Computing Research: A Literature Review and Synthesis using the Latent Semantic Analysis Approach

    Get PDF
    Social computing is an emerging research discipline. The number of publications on social computing has increased by 120% annually in the past four years. Despite the proliferation of studies in this area there is a lack of comprehensive, unified, and systematic characterization of this phenomenon. The definition and characterization of this phenomenon in the extant literature is diverse and fragmented. In this paper we attempt to bring some clarity by synthesizing and summarizing the extant literature in this area. We use Latent Semantic Analysis (LSA), a text mining and natural language processing technique, to summarize the state of social computing research. The results show that there are 27 unique dimensions which currently characterize this concept. LSA also reveals that, the 266 articles found in the literature predominantly focus on three major research themes namely, Knowledge Discovery, Knowledge Sharing, and Content Management in the Social Computing context

    Modeling Heterogeneous Statistical Patterns in High-dimensional Data by Adversarial Distributions: An Unsupervised Generative Framework

    Full text link
    Since the label collecting is prohibitive and time-consuming, unsupervised methods are preferred in applications such as fraud detection. Meanwhile, such applications usually require modeling the intrinsic clusters in high-dimensional data, which usually displays heterogeneous statistical patterns as the patterns of different clusters may appear in different dimensions. Existing methods propose to model the data clusters on selected dimensions, yet globally omitting any dimension may damage the pattern of certain clusters. To address the above issues, we propose a novel unsupervised generative framework called FIRD, which utilizes adversarial distributions to fit and disentangle the heterogeneous statistical patterns. When applying to discrete spaces, FIRD effectively distinguishes the synchronized fraudsters from normal users. Besides, FIRD also provides superior performance on anomaly detection datasets compared with SOTA anomaly detection methods (over 5% average AUC improvement). The significant experiment results on various datasets verify that the proposed method can better model the heterogeneous statistical patterns in high-dimensional data and benefit downstream applications

    In-Close, a fast algorithm for computing formal concepts

    Get PDF
    This paper presents an algorithm, called In-Close, that uses incremental closure and matrix searching to quickly compute all formal concepts in a formal context. In-Close is based, conceptually, on a well known algorithm called Close-By-One. The serial version of a recently published algorithm (Krajca, 2008) was shown to be in the order of 100 times faster than several well-known algorithms, and timings of other algorithms in reviews suggest that none of them are faster than Krajca. This paper compares In-Close to Krajca, discussing computational methods, data requirements and memory considerations. From experiments using several public data sets and random data, this paper shows that In-Close is in the order of 20 times faster than Krajca. In-Close is small, straightforward, requires no matrix pre-processing and is simple to implement.</p
    corecore