18 research outputs found

    Diverse Data Selection under Fairness Constraints

    Get PDF
    Diversity is an important principle in data selection and summarization, facility location, and recommendation systems. Our work focuses on maximizing diversity in data selection, while offering fairness guarantees. In particular, we offer the first study that augments the Max-Min diversification objective with fairness constraints. More specifically, given a universe ? of n elements that can be partitioned into m disjoint groups, we aim to retrieve a k-sized subset that maximizes the pairwise minimum distance within the set (diversity) and contains a pre-specified k_i number of elements from each group i (fairness). We show that this problem is NP-complete even in metric spaces, and we propose three novel algorithms, linear in n, that provide strong theoretical approximation guarantees for different values of m and k. Finally, we extend our algorithms and analysis to the case where groups can be overlapping

    Crossing Generative Adversarial Networks for Cross-View Person Re-identification

    Full text link
    Person re-identification (\textit{re-id}) refers to matching pedestrians across disjoint yet non-overlapping camera views. The most effective way to match these pedestrians undertaking significant visual variations is to seek reliably invariant features that can describe the person of interest faithfully. Most of existing methods are presented in a supervised manner to produce discriminative features by relying on labeled paired images in correspondence. However, annotating pair-wise images is prohibitively expensive in labors, and thus not practical in large-scale networked cameras. Moreover, seeking comparable representations across camera views demands a flexible model to address the complex distributions of images. In this work, we study the co-occurrence statistic patterns between pairs of images, and propose to crossing Generative Adversarial Network (Cross-GAN) for learning a joint distribution for cross-image representations in a unsupervised manner. Given a pair of person images, the proposed model consists of the variational auto-encoder to encode the pair into respective latent variables, a proposed cross-view alignment to reduce the view disparity, and an adversarial layer to seek the joint distribution of latent representations. The learned latent representations are well-aligned to reflect the co-occurrence patterns of paired images. We empirically evaluate the proposed model against challenging datasets, and our results show the importance of joint invariant features in improving matching rates of person re-id with comparison to semi/unsupervised state-of-the-arts.Comment: 12 pages. arXiv admin note: text overlap with arXiv:1702.03431 by other author

    Diversity and Novelty: Measurement, Learning and Optimization

    Get PDF
    The primary objective of this dissertation is to investigate research methods to answer the question: ``How (and why) does one measure, learn and optimize novelty and diversity of a set of items?" The computational models we develop to answer this question also provide foundational mathematical techniques to throw light on the following three questions: 1. How does one reliably measure the creativity of ideas? 2. How does one form teams to evaluate design ideas? 3. How does one filter good ideas out of hundreds of submissions? Solutions to these questions are key to enable the effective processing of a large collection of design ideas generated in a design contest. In the first part of the dissertation, we discuss key qualities needed in design metrics and propose new diversity and novelty metrics for judging design products. We show that the proposed metrics have higher accuracy and sensitivity compared to existing alternatives in literature. To measure the novelty of a design item, we propose learning from human subjective responses to derive low dimensional triplet embeddings. To measure diversity, we propose an entropy-based diversity metric, which is more accurate and sensitive than benchmarks. In the second part of the dissertation, we introduce the bipartite b-matching problem and argue the need for incorporating diversity in the objective function for matching problems. We propose new submodular and supermodular objective functions to measure diversity and develop multiple matching algorithms for diverse team formation in offline and online cases. Finally, in the third part, we demonstrate filtering and ranking of ideas using diversity metrics based on Determinantal Point Processes as well as submodular functions. In real-world crowd experiments, we demonstrate that such ranking enables increased efficiency in filtering high-quality ideas compared to traditionally used methods

    Finding important entities in graphs

    Get PDF
    Graphs are established as one of the most prominent means of data representation. They are composed of simple entities -- nodes and edges -- and reflect the relationship between them. Their impact extends to a broad variety of domains, e.g., biology, sociology and the Web. In these settings, much of the data value can be captured by a simple question; how can we evaluate the importance of these entities? The aim of this dissertation is to explore novel importance measures that are meaningful and can be computed efficiently on large datasets. First, we focus on the spanning edge centrality, an edge importance measure recently introduced to evaluate phylogenetic trees. We propose very efficient methods that approximate this measure in near-linear time and apply them to large graphs with millions of nodes. We demonstrate that this centrality measure is a useful tool for the analysis of networks outside its original application domain. Next, we turn to importance measures for nodes and propose the absorbing random walk centrality. This measure evaluates a group of nodes in a graph according to how central they are with respect to a set of query nodes. Specifically, given a query set and a candidate group of nodes, we start random walks from the queries and measure their length until they reach one of the candidates. The most central group of nodes will collectively minimize the expected length of these random walks. We prove several computational properties of this measure and provide an algorithm, whose solutions offer an approximation guarantee. Additionally, we develop efficient heuristics that allow us to use this importance measure in large datasets. Finally, we consider graphs in which each node is assigned a set of attributes. We define an important connected subgraph to be one for which the total weight of its edges is small, while the number of attributes covered by its nodes is large. To select such an important subgraph, we develop an efficient approximation algorithm based on the primal-dual schema

    Rigorous optimization recipes for sparse and low rank inverse problems with applications in data sciences

    Get PDF
    Many natural and man-made signals can be described as having a few degrees of freedom relative to their size due to natural parameterizations or constraints; examples include bandlimited signals, collections of signals observed from multiple viewpoints in a network-of-sensors, and per-flow traffic measurements of the Internet. Low-dimensional models (LDMs) mathematically capture the inherent structure of such signals via combinatorial and geometric data models, such as sparsity, unions-of-subspaces, low-rankness, manifolds, and mixtures of factor analyzers, and are emerging to revolutionize the way we treat inverse problems (e.g., signal recovery, parameter estimation, or structure learning) from dimensionality-reduced or incomplete data. Assuming our problem resides in a LDM space, in this thesis we investigate how to integrate such models in convex and non-convex optimization algorithms for significant gains in computational complexity. We mostly focus on two LDMs: (i)(i) sparsity and (ii)(ii) low-rankness. We study trade-offs and their implications to develop efficient and provable optimization algorithms, and--more importantly--to exploit convex and combinatorial optimization that can enable cross-pollination of decades of research in both

    29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

    Get PDF

    Online Advertising Assignment Problems Considering Realistic Constraints

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์‚ฐ์—…๊ณตํ•™๊ณผ, 2020. 8. ๋ฌธ์ผ๊ฒฝ.With a drastic increase in online communities, many companies have been paying attention to online advertising. The main advantages of online advertising are traceability, cost-effectiveness, reachability, and interactivity. The benefits facilitate the continuous popularity of online advertising. For Internet-based companies, a well-constructed online advertisement assignment increases their revenue. Hence, the managers need to develop their decision-making processes for assigning online advertisements on their website so that their revenue is maximized. In this dissertation, we consider online advertising assignment problems considering realistic constraints. There are three types of online advertising assignment problems: (i) Display ads problem in adversarial order, (ii) Display ads problem in probabilistic order, and (iii) Online banner advertisement scheduling for advertising effectiveness. Unlike previous assignment problems, the problems are pragmatic approaches that reflect realistic constraints and advertising effectiveness. Moreover, the algorithms the dissertation designs offer important insights into the online advertisement assignment problem. We give a brief explanation of the fundamental methodologies to solve the online advertising assignment problems in Chapter 1. At the end of this chapter, the contributions and outline of the dissertation are also presented. In Chapter 2, we propose the display ads problem in adversarial order. Deterministic algorithms with worst-case guarantees are designed, and the competitive ratios of them are presented. Upper bounds for the problem are also proved. We investigate the display ads problem in probabilistic order in Chapter 3. This chapter presents stochastic online algorithms with scenario-based stochastic programming and Benders decomposition for two probabilistic order models. In Chapter 4, an online banner advertisement scheduling model for advertising effectiveness is designed. We also present the solution methodologies used to obtain valid lower and upper bounds of the model efficiently. Chapter 5 offers conclusions and suggestion for future studies. The approaches to solving the problems are meaningful in both academic and industrial areas. We validate these approaches can solve the problems efficiently and effectively by conducting computational experiments. The models and solution methodologies are expected to be convenient and beneficial when managers at Internet-based companies place online advertisements on their websites.์˜จ๋ผ์ธ ์ปค๋ฎค๋‹ˆํ‹ฐ์˜ ๊ธ‰๊ฒฉํ•œ ์„ฑ์žฅ์— ๋”ฐ๋ผ, ๋งŽ์€ ํšŒ์‚ฌ๋“ค์ด ์˜จ๋ผ์ธ ๊ด‘๊ณ ์— ๊ด€์‹ฌ์„ ๊ธฐ์šธ์ด๊ณ  ์žˆ๋‹ค. ์˜จ๋ผ์ธ ๊ด‘๊ณ ์˜ ์žฅ์ ์œผ๋กœ๋Š” ์ถ”์  ๊ฐ€๋Šฅ์„ฑ, ๋น„์šฉ ํšจ๊ณผ์„ฑ, ๋„๋‹ฌ ๊ฐ€๋Šฅ์„ฑ, ์ƒํ˜ธ์ž‘์šฉ์„ฑ ๋“ฑ์ด ์žˆ๋‹ค. ์˜จ๋ผ์ธ์— ๊ธฐ๋ฐ˜์„ ๋‘๋Š” ํšŒ์‚ฌ๋“ค์€ ์ž˜ ์งœ์—ฌ์ง„ ์˜จ๋ผ์ธ ๊ด‘๊ณ  ํ• ๋‹น๊ฒฐ์ •์— ๊ด€์‹ฌ์„ ๋‘๊ณ  ์žˆ๊ณ , ์ด๋Š” ๊ด‘๊ณ  ์ˆ˜์ต๊ณผ ์—ฐ๊ด€๋  ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์˜จ๋ผ์ธ ๊ด‘๊ณ  ๊ด€๋ฆฌ์ž๋Š” ์ˆ˜์ต์„ ๊ทน๋Œ€ํ™” ํ•  ์ˆ˜ ์žˆ๋Š” ์˜จ๋ผ์ธ ๊ด‘๊ณ  ํ• ๋‹น ์˜์‚ฌ ๊ฒฐ์ • ํ”„๋กœ์„ธ์Šค๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ์•ผ ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ˜„์‹ค์ ์ธ ์ œ์•ฝ์„ ๊ณ ๋ คํ•œ ์˜จ๋ผ์ธ ๊ด‘๊ณ  ํ• ๋‹น ๋ฌธ์ œ๋“ค์„ ์ œ์•ˆํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ๋‹ค๋ฃจ๋Š” ๋ฌธ์ œ๋Š” (1) adversarial ์ˆœ์„œ๋กœ ์ง„ํ–‰ํ•˜๋Š” ๋””์Šคํ”Œ๋ ˆ์ด ์• ๋“œ๋ฌธ์ œ, (2) probabilistic ์ˆœ์„œ๋กœ ์ง„ํ–‰ํ•˜๋Š” ๋””์Šคํ”Œ๋ ˆ์ด ์• ๋“œ๋ฌธ์ œ ๊ทธ๋ฆฌ๊ณ  (3) ๊ด‘๊ณ ํšจ๊ณผ๋ฅผ ์œ„ํ•œ ์˜จ๋ผ์ธ ๋ฐฐ๋„ˆ ๊ด‘๊ณ  ์ผ์ •๊ณ„ํš์ด๋‹ค. ์ด์ „์— ์ œ์•ˆ๋˜์—ˆ๋˜ ๊ด‘๊ณ  ํ• ๋‹น ๋ฌธ์ œ๋“ค๊ณผ ๋‹ฌ๋ฆฌ, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋ฌธ์ œ๋“ค์€ ํ˜„์‹ค์ ์ธ ์ œ์•ฝ๊ณผ ๊ด‘๊ณ ํšจ๊ณผ๋ฅผ ๋ฐ˜์˜ํ•˜๋Š” ์‹ค์šฉ์ ์ธ ์ ‘๊ทผ ๋ฐฉ์‹์ด๋‹ค. ๋˜ํ•œ ์ œ์•ˆํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์˜จ๋ผ์ธ ๊ด‘๊ณ  ํ• ๋‹น ๋ฌธ์ œ์˜ ์šด์˜๊ด€๋ฆฌ์— ๋Œ€ํ•œ ํ†ต์ฐฐ๋ ฅ์„ ์ œ๊ณตํ•œ๋‹ค. 1์žฅ์—์„œ๋Š” ์˜จ๋ผ์ธ ๊ด‘๊ณ  ํ• ๋‹น ๋ฌธ์ œ์— ๋Œ€ํ•œ ๋ฌธ์ œํ•ด๊ฒฐ ๋ฐฉ๋ฒ•๋ก ์— ๋Œ€ํ•ด ๊ฐ„๋‹จํžˆ ์†Œ๊ฐœํ•œ๋‹ค. ๋”๋ถˆ์–ด ์—ฐ๊ตฌ์˜ ๊ธฐ์—ฌ์™€ ๊ฐœ์š”๋„ ์ œ๊ณต๋œ๋‹ค. 2์žฅ์—์„œ๋Š” adversarial ์ˆœ์„œ๋กœ ์ง„ํ–‰ํ•˜๋Š” ๋””์Šคํ”Œ๋ ˆ์ด ์• ๋“œ๋ฌธ์ œ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. worst-case๋ฅผ ๋ณด์žฅํ•˜๋Š” ๊ฒฐ์ •๋ก ์  ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์„ค๊ณ„ํ•˜๊ณ , ์ด๋“ค์˜ competitive ratio๋ฅผ ์ฆ๋ช…ํ•œ๋‹ค. ๋”๋ถˆ์–ด ๋ฌธ์ œ์˜ ์ƒํ•œ๋„ ์ž…์ฆ๋œ๋‹ค. 3์žฅ์—์„œ๋Š” probabilistic ์ˆœ์„œ๋กœ ์ง„ํ–‰ํ•˜๋Š” ๋””์Šคํ”Œ๋ ˆ์ด ์• ๋“œ๋ฌธ์ œ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์‹œ๋‚˜๋ฆฌ์˜ค ๊ธฐ๋ฐ˜์˜ ํ™•๋ฅ ๋ก ์  ์˜จ๋ผ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ Benders ๋ถ„ํ•ด๋ฐฉ๋ฒ•์„ ํ˜ผํ•ฉํ•œ ์ถ”๊ณ„ ์˜จ๋ผ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์‹œํ•œ๋‹ค. 4์žฅ์—์„œ๋Š” ๊ด‘๊ณ ํšจ๊ณผ๋ฅผ ์œ„ํ•œ ์˜จ๋ผ์ธ ๋ฐฐ๋„ˆ ๊ด‘๊ณ  ์ผ์ •๊ณ„ํš์„ ์„ค๊ณ„ํ•œ๋‹ค. ๋˜ํ•œ, ๋ชจ๋ธ์˜ ์œ ํšจํ•œ ์ƒํ•œ๊ณผ ํ•˜ํ•œ์„ ํšจ์œจ์ ์œผ๋กœ ์–ป๋Š” ๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋ฌธ์ œํ•ด๊ฒฐ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์•ˆํ•œ๋‹ค. 5์žฅ์—์„œ๋Š” ๋ณธ ๋…ผ๋ฌธ์˜ ๊ฒฐ๋ก ๊ณผ ํ–ฅํ›„ ์—ฐ๊ตฌ๋ฅผ ์œ„ํ•œ ๋ฐฉํ–ฅ์„ ์ œ๊ณตํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ๋ฌธ์ œํ•ด๊ฒฐ ๋ฐฉ๋ฒ•๋ก ์€ ํ•™์ˆ  ๋ฐ ์‚ฐ์—… ๋ถ„์•ผ ๋ชจ๋‘ ์˜๋ฏธ๊ฐ€ ์žˆ๋‹ค. ์ˆ˜์น˜ ์‹คํ—˜์„ ํ†ตํ•ด ๋ฌธ์ œํ•ด๊ฒฐ ์ ‘๊ทผ ๋ฐฉ์‹์ด ๋ฌธ์ œ๋ฅผ ํšจ์œจ์ ์ด๊ณ  ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค. ์ด๋Š” ์˜จ๋ผ์ธ ๊ด‘๊ณ  ๊ด€๋ฆฌ์ž๊ฐ€ ๋ณธ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•˜๋Š” ๋ฌธ์ œ์™€ ๋ฌธ์ œํ•ด๊ฒฐ ๋ฐฉ๋ฒ•๋ก ์„ ํ†ตํ•ด ์˜จ๋ผ์ธ ๊ด‘๊ณ  ํ• ๋‹น๊ด€๋ จ ์˜์‚ฌ๊ฒฐ์ •์„ ์ง„ํ–‰ํ•˜๋Š” ๋ฐ ์žˆ์–ด ๋„์›€์ด ๋  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€ํ•œ๋‹ค.Chapter 1 Introduction 1 1.1 Display Ads Problem 3 1.1.1 Online Algorithm 4 1.2 Online Banner Advertisement Scheduling Problem 5 1.3 Research Motivations and Contributions 6 1.4 Outline of the Dissertation 9 Chapter 2 Online Advertising Assignment Problem in Adversarial Order 12 2.1 Problem Description and Literature Review 12 2.2 Display Ads Problem in Adversarial Order 15 2.3 Deterministic Algorithms for Adversarial Order 17 2.4 Upper Bounds of Deterministic Algorithms for Adversarial Order 22 2.5 Summary 28 Chapter 3 Online Advertising Assignment Problem in Probabilistic Order 30 3.1 Problem Description and Literature Review 30 3.2 Display Ads Problem in Probabilistic Order 33 3.3 Stochastic Online Algorithms for Probabilistic Order 34 3.3.1 Two-Stage Stochastic Programming 35 3.3.2 Known IID model 37 3.3.3 Random permutation model 41 3.3.4 Stochastic approach using primal-dual algorithm 45 3.4 Computational Experiments 48 3.4.1 Results for known IID model 55 3.4.2 Results for random permutation model 57 3.4.3 Managerial insights for Algorithm 3.1 59 3.5 Summary 60 Chapter 4 Online Banner Advertisement Scheduling for Advertising Effectiveness 61 4.1 Problem Description and Literature Review 61 4.2 Mathematical Model 68 4.2.1 Objective function 68 4.2.2 Notations and formulation 72 4.3 Solution Methodologies 74 4.3.1 Heuristic approach to finding valid lower and upper bounds 75 4.3.2 Hybrid tabu search 79 4.4 Computational Experiments 80 4.4.1 Results for problems with small data sets 82 4.4.2 Results for problems with large data sets 84 4.4.3 Results for problems with standard data 86 4.4.4 Managerial insights for the results 90 4.5 Summary 92 Chapter 5 Conclusions and Future Research 93 Appendices 97 A Initial Sequence of the Hybrid Tabu Search 98 B Procedure of the Hybrid Tabu Search 99 C Small Example of the Hybrid Tabu Search 101 D Linearization Technique of Bilinear Form in R2 104 Bibliography 106Docto
    corecore