Search CORE

18 research outputs found

Diverse Data Selection under Fairness Constraints

Author: McGregor Andrew
Meliou Alexandra
Moumoulidou Zafeiria
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 24th International Conference on Database Theory (ICDT 2021)
Publication date: 01/01/2021
Field of study

Diversity is an important principle in data selection and summarization, facility location, and recommendation systems. Our work focuses on maximizing diversity in data selection, while offering fairness guarantees. In particular, we offer the first study that augments the Max-Min diversification objective with fairness constraints. More specifically, given a universe ? of n elements that can be partitioned into m disjoint groups, we aim to retrieve a k-sized subset that maximizes the pairwise minimum distance within the set (diversity) and contains a pre-specified k_i number of elements from each group i (fairness). We show that this problem is NP-complete even in metric spaces, and we propose three novel algorithms, linear in n, that provide strong theoretical approximation guarantees for different values of m and k. Finally, we extend our algorithms and analysis to the case where groups can be overlapping

Dagstuhl Research Online Publication Server

Crossing Generative Adversarial Networks for Cross-View Person Re-identification

Author: Wang Yang
Wu Lin
Zhang Chengyuan
Publication venue
Publication date: 03/01/2018
Field of study

Person re-identification (\textit{re-id}) refers to matching pedestrians across disjoint yet non-overlapping camera views. The most effective way to match these pedestrians undertaking significant visual variations is to seek reliably invariant features that can describe the person of interest faithfully. Most of existing methods are presented in a supervised manner to produce discriminative features by relying on labeled paired images in correspondence. However, annotating pair-wise images is prohibitively expensive in labors, and thus not practical in large-scale networked cameras. Moreover, seeking comparable representations across camera views demands a flexible model to address the complex distributions of images. In this work, we study the co-occurrence statistic patterns between pairs of images, and propose to crossing Generative Adversarial Network (Cross-GAN) for learning a joint distribution for cross-image representations in a unsupervised manner. Given a pair of person images, the proposed model consists of the variational auto-encoder to encode the pair into respective latent variables, a proposed cross-view alignment to reduce the view disparity, and an adversarial layer to seek the joint distribution of latent representations. The learned latent representations are well-aligned to reflect the co-occurrence patterns of paired images. We empirically evaluate the proposed model against challenging datasets, and our results show the importance of joint invariant features in improving matching rates of person re-id with comparison to semi/unsupervised state-of-the-arts.Comment: 12 pages. arXiv admin note: text overlap with arXiv:1702.03431 by other author

arXiv.org e-Print Archive

University of Queensland eSpace

Diversity and Novelty: Measurement, Learning and Optimization

Author: Ahmed Faez
Publication venue
Publication date: 01/01/2019
Field of study

The primary objective of this dissertation is to investigate research methods to answer the question: ``How (and why) does one measure, learn and optimize novelty and diversity of a set of items?" The computational models we develop to answer this question also provide foundational mathematical techniques to throw light on the following three questions: 1. How does one reliably measure the creativity of ideas? 2. How does one form teams to evaluate design ideas? 3. How does one filter good ideas out of hundreds of submissions? Solutions to these questions are key to enable the effective processing of a large collection of design ideas generated in a design contest. In the first part of the dissertation, we discuss key qualities needed in design metrics and propose new diversity and novelty metrics for judging design products. We show that the proposed metrics have higher accuracy and sensitivity compared to existing alternatives in literature. To measure the novelty of a design item, we propose learning from human subjective responses to derive low dimensional triplet embeddings. To measure diversity, we propose an entropy-based diversity metric, which is more accurate and sensitive than benchmarks. In the second part of the dissertation, we introduce the bipartite b-matching problem and argue the need for incorporating diversity in the objective function for matching problems. We propose new submodular and supermodular objective functions to measure diversity and develop multiple matching algorithms for diverse team formation in offline and online cases. Finally, in the third part, we demonstrate filtering and ranking of ideas using diversity metrics based on Determinantal Point Processes as well as submodular functions. In real-world crowd experiments, we demonstrate that such ranking enables increased efficiency in filtering high-quality ideas compared to traditionally used methods

Digital Repository at the University of Maryland

Finding important entities in graphs

Author: Mavroforakis Charalampos
Publication venue
Publication date: 05/02/2019
Field of study

Graphs are established as one of the most prominent means of data representation. They are composed of simple entities -- nodes and edges -- and reflect the relationship between them. Their impact extends to a broad variety of domains, e.g., biology, sociology and the Web. In these settings, much of the data value can be captured by a simple question; how can we evaluate the importance of these entities? The aim of this dissertation is to explore novel importance measures that are meaningful and can be computed efficiently on large datasets. First, we focus on the spanning edge centrality, an edge importance measure recently introduced to evaluate phylogenetic trees. We propose very efficient methods that approximate this measure in near-linear time and apply them to large graphs with millions of nodes. We demonstrate that this centrality measure is a useful tool for the analysis of networks outside its original application domain. Next, we turn to importance measures for nodes and propose the absorbing random walk centrality. This measure evaluates a group of nodes in a graph according to how central they are with respect to a set of query nodes. Specifically, given a query set and a candidate group of nodes, we start random walks from the queries and measure their length until they reach one of the candidates. The most central group of nodes will collectively minimize the expected length of these random walks. We prove several computational properties of this measure and provide an algorithm, whose solutions offer an approximation guarantee. Additionally, we develop efficient heuristics that allow us to use this importance measure in large datasets. Finally, we consider graphs in which each node is assigned a set of attributes. We define an important connected subgraph to be one for which the total weight of its edges is small, while the number of attributes covered by its nodes is large. To select such an important subgraph, we develop an efficient approximation algorithm based on the primal-dual schema

Boston University Institutional Repository (OpenBU)

Rigorous optimization recipes for sparse and low rank inverse problems with applications in data sciences

Author: Kyrillidis Anastasios
Publication venue: Lausanne, EPFL
Publication date: 09/10/2014
Field of study

Many natural and man-made signals can be described as having a few degrees of freedom relative to their size due to natural parameterizations or constraints; examples include bandlimited signals, collections of signals observed from multiple viewpoints in a network-of-sensors, and per-flow traffic measurements of the Internet. Low-dimensional models (LDMs) mathematically capture the inherent structure of such signals via combinatorial and geometric data models, such as sparsity, unions-of-subspaces, low-rankness, manifolds, and mixtures of factor analyzers, and are emerging to revolutionize the way we treat inverse problems (e.g., signal recovery, parameter estimation, or structure learning) from dimensionality-reduced or incomplete data. Assuming our problem resides in a LDM space, in this thesis we investigate how to integrate such models in convex and non-convex optimization algorithms for significant gains in computational complexity. We mostly focus on two LDMs:

(i)

sparsity and

(ii)

low-rankness. We study trade-offs and their implications to develop efficient and provable optimization algorithms, and--more importantly--to exploit convex and combinatorial optimization that can enable cross-pollination of decades of research in both

Infoscience - École polytechnique fédérale de Lausanne

29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

Author: ISAAC <29. 2018, Jiaoxi, Yilan>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/12/2018
Field of study

Digitale Bibliothek Thüringen

Online Advertising Assignment Problems Considering Realistic Constraints

Author: 김광
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 산업공학과, 2020. 8. 문일경.With a drastic increase in online communities, many companies have been paying attention to online advertising. The main advantages of online advertising are traceability, cost-effectiveness, reachability, and interactivity. The benefits facilitate the continuous popularity of online advertising. For Internet-based companies, a well-constructed online advertisement assignment increases their revenue. Hence, the managers need to develop their decision-making processes for assigning online advertisements on their website so that their revenue is maximized. In this dissertation, we consider online advertising assignment problems considering realistic constraints. There are three types of online advertising assignment problems: (i) Display ads problem in adversarial order, (ii) Display ads problem in probabilistic order, and (iii) Online banner advertisement scheduling for advertising effectiveness. Unlike previous assignment problems, the problems are pragmatic approaches that reflect realistic constraints and advertising effectiveness. Moreover, the algorithms the dissertation designs offer important insights into the online advertisement assignment problem. We give a brief explanation of the fundamental methodologies to solve the online advertising assignment problems in Chapter 1. At the end of this chapter, the contributions and outline of the dissertation are also presented. In Chapter 2, we propose the display ads problem in adversarial order. Deterministic algorithms with worst-case guarantees are designed, and the competitive ratios of them are presented. Upper bounds for the problem are also proved. We investigate the display ads problem in probabilistic order in Chapter 3. This chapter presents stochastic online algorithms with scenario-based stochastic programming and Benders decomposition for two probabilistic order models. In Chapter 4, an online banner advertisement scheduling model for advertising effectiveness is designed. We also present the solution methodologies used to obtain valid lower and upper bounds of the model efficiently. Chapter 5 offers conclusions and suggestion for future studies. The approaches to solving the problems are meaningful in both academic and industrial areas. We validate these approaches can solve the problems efficiently and effectively by conducting computational experiments. The models and solution methodologies are expected to be convenient and beneficial when managers at Internet-based companies place online advertisements on their websites.온라인 커뮤니티의 급격한 성장에 따라, 많은 회사들이 온라인 광고에 관심을 기울이고 있다. 온라인 광고의 장점으로는 추적 가능성, 비용 효과성, 도달 가능성, 상호작용성 등이 있다. 온라인에 기반을 두는 회사들은 잘 짜여진 온라인 광고 할당결정에 관심을 두고 있고, 이는 광고 수익과 연관될 수 있다. 따라서 온라인 광고 관리자는 수익을 극대화 할 수 있는 온라인 광고 할당 의사 결정 프로세스를 개발하여야 한다. 본 논문에서는 현실적인 제약을 고려한 온라인 광고 할당 문제들을 제안한다. 본 논문에서 다루는 문제는 (1) adversarial 순서로 진행하는 디스플레이 애드문제, (2) probabilistic 순서로 진행하는 디스플레이 애드문제 그리고 (3) 광고효과를 위한 온라인 배너 광고 일정계획이다. 이전에 제안되었던 광고 할당 문제들과 달리, 본 논문에서 제안한 문제들은 현실적인 제약과 광고효과를 반영하는 실용적인 접근 방식이다. 또한 제안하는 알고리즘은 온라인 광고 할당 문제의 운영관리에 대한 통찰력을 제공한다. 1장에서는 온라인 광고 할당 문제에 대한 문제해결 방법론에 대해 간단히 소개한다. 더불어 연구의 기여와 개요도 제공된다. 2장에서는 adversarial 순서로 진행하는 디스플레이 애드문제를 제안한다. worst-case를 보장하는 결정론적 알고리즘을 설계하고, 이들의 competitive ratio를 증명한다. 더불어 문제의 상한도 입증된다. 3장에서는 probabilistic 순서로 진행하는 디스플레이 애드문제를 제안한다. 시나리오 기반의 확률론적 온라인 알고리즘과 Benders 분해방법을 혼합한 추계 온라인 알고리즘을 제시한다. 4장에서는 광고효과를 위한 온라인 배너 광고 일정계획을 설계한다. 또한, 모델의 유효한 상한과 하한을 효율적으로 얻는 데 사용되는 문제해결 방법론을 제안한다. 5장에서는 본 논문의 결론과 향후 연구를 위한 방향을 제공한다. 본 논문에서 제안하는 문제해결 방법론은 학술 및 산업 분야 모두 의미가 있다. 수치 실험을 통해 문제해결 접근 방식이 문제를 효율적이고 효과적으로 해결할 수 있음을 보인다. 이는 온라인 광고 관리자가 본 논문에서 제안하는 문제와 문제해결 방법론을 통해 온라인 광고 할당관련 의사결정을 진행하는 데 있어 도움이 될 것으로 기대한다.Chapter 1 Introduction 1 1.1 Display Ads Problem 3 1.1.1 Online Algorithm 4 1.2 Online Banner Advertisement Scheduling Problem 5 1.3 Research Motivations and Contributions 6 1.4 Outline of the Dissertation 9 Chapter 2 Online Advertising Assignment Problem in Adversarial Order 12 2.1 Problem Description and Literature Review 12 2.2 Display Ads Problem in Adversarial Order 15 2.3 Deterministic Algorithms for Adversarial Order 17 2.4 Upper Bounds of Deterministic Algorithms for Adversarial Order 22 2.5 Summary 28 Chapter 3 Online Advertising Assignment Problem in Probabilistic Order 30 3.1 Problem Description and Literature Review 30 3.2 Display Ads Problem in Probabilistic Order 33 3.3 Stochastic Online Algorithms for Probabilistic Order 34 3.3.1 Two-Stage Stochastic Programming 35 3.3.2 Known IID model 37 3.3.3 Random permutation model 41 3.3.4 Stochastic approach using primal-dual algorithm 45 3.4 Computational Experiments 48 3.4.1 Results for known IID model 55 3.4.2 Results for random permutation model 57 3.4.3 Managerial insights for Algorithm 3.1 59 3.5 Summary 60 Chapter 4 Online Banner Advertisement Scheduling for Advertising Effectiveness 61 4.1 Problem Description and Literature Review 61 4.2 Mathematical Model 68 4.2.1 Objective function 68 4.2.2 Notations and formulation 72 4.3 Solution Methodologies 74 4.3.1 Heuristic approach to finding valid lower and upper bounds 75 4.3.2 Hybrid tabu search 79 4.4 Computational Experiments 80 4.4.1 Results for problems with small data sets 82 4.4.2 Results for problems with large data sets 84 4.4.3 Results for problems with standard data 86 4.4.4 Managerial insights for the results 90 4.5 Summary 92 Chapter 5 Conclusions and Future Research 93 Appendices 97 A Initial Sequence of the Hybrid Tabu Search 98 B Procedure of the Hybrid Tabu Search 99 C Small Example of the Hybrid Tabu Search 101 D Linearization Technique of Bilinear Form in R2 104 Bibliography 106Docto

SNU Open Repository and Archive