217 research outputs found

    Multi-Source Spatial Entity Extraction and Linkage

    Get PDF

    Learning what matters - Sampling interesting patterns

    Get PDF
    In the field of exploratory data mining, local structure in data can be described by patterns and discovered by mining algorithms. Although many solutions have been proposed to address the redundancy problems in pattern mining, most of them either provide succinct pattern sets or take the interests of the user into account-but not both. Consequently, the analyst has to invest substantial effort in identifying those patterns that are relevant to her specific interests and goals. To address this problem, we propose a novel approach that combines pattern sampling with interactive data mining. In particular, we introduce the LetSIP algorithm, which builds upon recent advances in 1) weighted sampling in SAT and 2) learning to rank in interactive pattern mining. Specifically, it exploits user feedback to directly learn the parameters of the sampling distribution that represents the user's interests. We compare the performance of the proposed algorithm to the state-of-the-art in interactive pattern mining by emulating the interests of a user. The resulting system allows efficient and interleaved learning and sampling, thus user-specific anytime data exploration. Finally, LetSIP demonstrates favourable trade-offs concerning both quality-diversity and exploitation-exploration when compared to existing methods.Comment: PAKDD 2017, extended versio

    Interactive data analysis and its applications on multi-structured datasets

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Label Ranking with Probabilistic Models

    Get PDF
    Diese Arbeit konzentriert sich auf eine spezielle Prognoseform, das sogenannte Label Ranking. Auf den Punkt gebracht, kann Label Ranking als eine Erweiterung des herkรถmmlichen Klassifizierungproblems betrachtet werden. Bei einer Anfrage (z. B. durch einen Kunden) und einem vordefinierten Set von Kandidaten Labels (zB AUDI, BMW, VW), wird ein einzelnes Label (zB BMW) zur Vorhersage in der Klassifizierung benรถtigt, wรคhrend ein komplettes Ranking aller Label (zB BMW> VW> Audi) fรผr das Label Ranking erforderlich ist. Da Vorhersagen dieser Art, bei vielen Problemen der realen Welt nรผtzlich sind, kรถnnen Label Ranking-Methoden in mehreren Anwendungen, darunter Information Retrieval, Kundenwunsch Lernen und E-Commerce eingesetzt werden. Die vorliegende Arbeit stellt eine Auswahl an Methoden fรผr Label-Ranking vor, die Maschinelles Lernen mit statistischen Bewertungsmodellen kombiniert. Wir konzentrieren wir uns auf zwei statistische Ranking-Modelle, das Mallows- und das Plackett-Luce-Modell und zwei Techniken des maschinellen Lernens, das Beispielbasierte Lernen und das Verallgemeinernde Lineare Modell

    High-Precision Localization Using Ground Texture

    Full text link
    Location-aware applications play an increasingly critical role in everyday life. However, satellite-based localization (e.g., GPS) has limited accuracy and can be unusable in dense urban areas and indoors. We introduce an image-based global localization system that is accurate to a few millimeters and performs reliable localization both indoors and outside. The key idea is to capture and index distinctive local keypoints in ground textures. This is based on the observation that ground textures including wood, carpet, tile, concrete, and asphalt may look random and homogeneous, but all contain cracks, scratches, or unique arrangements of fibers. These imperfections are persistent, and can serve as local features. Our system incorporates a downward-facing camera to capture the fine texture of the ground, together with an image processing pipeline that locates the captured texture patch in a compact database constructed offline. We demonstrate the capability of our system to robustly, accurately, and quickly locate test images on various types of outdoor and indoor ground surfaces

    RRR: Rank-Regret Representative

    Full text link
    Selecting the best items in a dataset is a common task in data exploration. However, the concept of "best" lies in the eyes of the beholder: different users may consider different attributes more important, and hence arrive at different rankings. Nevertheless, one can remove "dominated" items and create a "representative" subset of the data set, comprising the "best items" in it. A Pareto-optimal representative is guaranteed to contain the best item of each possible ranking, but it can be almost as big as the full data. Representative can be found if we relax the requirement to include the best item for every possible user, and instead just limit the users' "regret". Existing work defines regret as the loss in score by limiting consideration to the representative instead of the full data set, for any chosen ranking function. However, the score is often not a meaningful number and users may not understand its absolute value. Sometimes small ranges in score can include large fractions of the data set. In contrast, users do understand the notion of rank ordering. Therefore, alternatively, we consider the position of the items in the ranked list for defining the regret and propose the {\em rank-regret representative} as the minimal subset of the data containing at least one of the top-kk of any possible ranking function. This problem is NP-complete. We use the geometric interpretation of items to bound their ranks on ranges of functions and to utilize combinatorial geometry notions for developing effective and efficient approximation algorithms for the problem. Experiments on real datasets demonstrate that we can efficiently find small subsets with small rank-regrets

    ๋ฉธ์ข…์œ„๊ธฐ ๋™๋ฌผ ๋ณดํ˜ธ ์บ ํŽ˜์ธ์„ ์œ„ํ•œ ์‹œ๊ฐ ์ปค๋ฎค๋‹ˆ์ผ€์ด์…˜์˜ ์ฐจ๋ณ„ํ™” ์—ฐ๊ตฌ

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๋ฏธ์ˆ ๋Œ€ํ•™ ๋””์ž์ธํ•™๋ถ€ ๋””์ž์ธ์ „๊ณต,2019. 8. ๊น€๊ฒฝ์„ .Endangered animal-related conservation campaigns have a long history since the later 19th century when natural resources were severely over-exploited. Through ages, endangered animal-related campaigns have gone through the era of legislation, resource management, environmentalism, and modern landscape-scale conservation. Since the situation of endangered species is getting worse, which the extinction rate is 1000 times higher than the background rate, the issue of wildlife conservation is becoming more and more significant. As a result, the popularity of this issue increased in present societies, accompanied by the investment increase of nature documentaries and related campaigns. However, the visual communication strategy of these campaigns remains unchanged for a long time, which is based on violence and conflict. Through this research, it can be known that more than 80% of the endangered animal-related campaigns preferred to display conflicting human-nature relationships, by showing violent, cruel, or bloody scenes to the audience. Even more, scientifically confusing or incorrect information, even thoughts of radical environmentalism were concealed in some of the campaign designs. By reviewing the history of western societies, the philosophy of binary opposition has a significant contribution to the current design preference, profoundly influencing on the public. Recent research of psychology also gives evidence that using horror and terror is a beneficial strategy to arouse the publics attention. To avoid its increasingly apparent disadvantages, and to adapt to the changing situation of conservation education, a differentiated approach must be made. Through the discussion of the features, the motivation, and the influence of the current violence-conflict-based strategy, combining traditional oriental philosophy and art is an ideal option of offering a different design proposal for the endangered animal-related campaign. By utilizing the thoughts of Confucianism, a more harmonious relationship between human and nature will be created in the final project, using symbolic visual elements to build a bridge linking human and endangered species. Also, positive visual elements and nudges will be utilized in the final project to encourage the audience to act more positively and actively.19 ์„ธ๊ธฐ๋ถ€ํ„ฐ ๋™๋ฌผ ์ž์›์„ ๊ณผ๋„ํ•˜๊ฒŒ ์ด์šฉํ•˜๊ฒŒ ๋˜์–ด ์ด๋Ÿฌํ•œ ์ƒํ™ฉ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋ฉธ์ข… ์œ„๊ธฐ ๋™๋ฌผ๊ณผ ๊ด€๋ จ๋œ ์บ ํŽ˜์ธ์ด ์‹œ์ž‘๋˜์—ˆ๋‹ค. ์‹œ๋Œ€ ๋ฐœ์ „์„ ๋”ฐ๋ผ ์บ ํŽ˜์ธ ์ฃผ์ œ๋Š” ์ž…๋ฒ•์—์„œ ์ž์› ๊ด€๋ฆฌ, ํ™˜๊ฒฝ ๋ณดํ˜ธ ์ฃผ์˜๋ฅผ ์ง€๋‚˜ ์ง€๊ธˆ ํ˜„๋Œ€ ๋ณด์กด์ƒํ•™ ์ค‘์˜ ์ž์—ฐ ๊ฒฝ๊ด€ ๊ทœ๋ชจ์˜ ๋ณด์กด ์‚ฌ์ƒ๊นŒ์ง€ ๋ณ€ํ•˜์˜€๋‹ค. ํ•˜์ง€๋งŒ ํ˜„์žฌ ๋ฉธ์ข… ์œ„๊ธฐ ๋™๋ฌผ์˜ ์ƒ์กด ์ƒํ™ฉ์ด ์ ์ฐจ ์•…ํ™”๋˜์–ด ์žˆ๋‹ค. ์—ฐ๊ตฌ์— ์˜ํ•˜๋ฉด ํ˜„์žฌ ๋™๋ฌผ์˜ ๋ฉธ์ข… ์†๋„๊ฐ€ ์ž์—ฐ ๋ฉธ์ข… ์†๋„๋ณด๋‹ค 1000 ๋ฐฐ ๋†’๋‹ค๊ณ  ํ•œ๋‹ค. ์ด์–ด์„œ ๋ฉธ์ข… ์œ„๊ธฐ ๋™๋ฌผ์„ ๋ณดํ˜ธํ•˜๋Š” ๋ฌธ์ œ๋„ ์ ์ฐจ ์ค‘์š”ํ•ด์ง€๊ณ  ๋Œ€์ค‘์—์„œ๋„ ๋” ๋งŽ์€ ์ธ๊ธฐ๋ฅผ ๋Œ๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ถ”์„ธ๋ฅผ ๋ฐ˜์˜ํ•˜์—ฌ ์ž์—ฐ ๋‹คํ๋ฉ˜ํ„ฐ๋ฆฌ์™€ ํ™˜๊ฒฝ ๋ณดํ˜ธ์™€ ๊ด€๋ จ๋œ ์บ ํŽ˜์ธ์— ๋Œ€ํ•œ ํˆฌ์ž๋„ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฉธ์ข… ์œ„๊ธฐ ๋™๋ฌผ์˜ ์ƒ์กด ์ƒํ™ฉ๊ณผ ์‹ค์ œ ๋ณดํ˜ธ ์‚ฌ์—…์ด ์ง€์†์ ์œผ๋กœ ๋ฐœ์ „ํ•˜๊ณ  ์žˆ๋Š” ๋ฐ˜๋ฉด์— ์บ ํŽ˜์ธ์˜ ์‹œ๊ฐ ์ปค๋ฎค๋‹ˆ์ผ€์ด์…˜ ์ „๋žต์ด ์˜ค๋žซ๋™์•ˆ ๋ณ€ํ•˜์ง€ ์•Š์•˜๋‹ค. ์ด ์—ฐ๊ตฌ๋ฅผ ํ†ตํ•˜์—ฌ ์•ฝ 80% ์˜ ์บ ํŽ˜์ธ์ด ํญ๋ ฅ๊ณผ ์ถฉ๋Œ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋Œ€ํ•ญ์ ์ธ ์ธ๊ฐ„ - ์ž์—ฐ ๊ด€๊ณ„๋ฅผ ์ „์‹œํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์ž‘ํ’ˆ๋“ค์€ ์ฃผ์š” ์ž”์ธํ•˜๊ณ  ๊ณ ํ†ต์Šค๋Ÿฌ์šด ์žฅ๋ฉด์„ ๊ด€๊ฐ์—๊ฒŒ ์ œ์‹œํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ผ๋ถ€๋Ÿฌ ์ถฉ๋Œ์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•˜์—ฌ ํ˜ผ๋ž€์Šค๋Ÿฌ์šด ์ •๋ณด์™€ ์ž˜๋ชป๋œ ์ •๋ณด, ์‹ฌ์ง€์–ด ๊ณผ๊ฒฉํ•œ ํ™˜๊ฒฝ ๋ณดํ˜ธ ์ฃผ์˜๊ฐ€ ๊ฐ€๋” ๋””์ž์ธ ์ž‘ํ’ˆ์—๋‹ด๊ฒŒ ๋˜์—ˆ๋‹ค. ์„œ์–‘ ์‚ฌํšŒ์˜ ์—ญ์‚ฌ์™€ ๋ฐœ์ „์„ ํšŒ๊ณ ํ•˜๊ณ  ์ดํ•ญ๋Œ€๋ฆฝ ์‚ฌ์ƒ์ด ๋Œ€์ค‘์˜ ์˜์‹๊ณผ ๊ธด๋ฐ€ํ•˜๊ฒŒ ๊ฒฐํ•ฉ๋˜์—ˆ๊ณ  ํ˜„์žฌ์˜ ์บ ํŽ˜์ธ ๋””์ž์ธ ์ „๋žต์—๋„ ๊ฒฐ์ •์ ์ธ ์˜ํ–ฅ์„ ์ฃผ๋Š” ์‚ฌ์‹ค์„ ์•Œ๊ฒŒ ๋˜์—ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ตœ๊ทผ์˜ ์‹ฌ๋ฆฌํ•™ ์—ฐ๊ตฌ์— ์˜ํ•˜๋ฉด ๊ณตํฌ์˜ ๊ฐ์ •์„ ์ด์šฉํ•˜์—ฌ ์ •๋ณด ์ „ํŒŒ ์†๋„๋„ ๋น ๋ฅด๋ฉฐ ์ „ํŒŒ ํšจ๊ณผ๋„ ํฌ๋‹ค๊ณ  ํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ํญ๋ ฅ๊ณผ ์ถฉ๋Œ์„ ์ค‘์‹ฌ์œผ๋กœ ์„ธ์šด ์บ ํŽ˜์ธ ๋””์ž์ธ ์ „๋žต์ด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์„ ํ˜ธ๋˜์—ˆ๋‹ค. ์ด ์ „๋žต์˜ ๋‹จ์ ๊ณผ ์—ญํšจ๊ณผ๋„ ํ”ผํ•˜๊ณ  ํ˜„์žฌ ๋ณด์กด์ƒ๋ฌผํ•™์˜ ๋ฐœ์ „๋„ ์ž˜ ๋ฐ˜์˜ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ๋˜ ๋‹ค๋ฅธ ์ฐจ๋ณ„ํ™”๋œ ์‹œ๋„๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ์ด ์—ฐ๊ตฌ์—์„œ๋Š” ํญ๋ ฅ๊ณผ ์ถฉ๋Œ์„ ์ค‘์‹ฌ์œผ๋กœ ์„ธ์šด ์ „๋žต์˜ ํŠน์ง•๊ณผ ๋™๊ธฐ, ์˜ํ–ฅ ๋“ฑ์„ ๋ถ„์„ํ•œ ํ›„์— ๋™์–‘์ ์ธ ์ฒ ํ•™ ์‚ฌ์ƒ๊ณผ ์ „ํ†ต ๋ฏธ์ˆ ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ฉธ์ข… ์œ„๊ธฐ ๋™๋ฌผ์„ ํ™๋ณดํ•˜๊ณ  ๋ณดํ˜ธํ•˜๊ธฐ ์œ„ํ•œ ์ฐจ๋ณ„ํ™”๋œ ์‹œ๊ฐ ์ปค๋ฎค๋‹ˆ์ผ€์ด์…˜ ๋ฐฉ์•ˆ์„ ์ œ์•ˆํ•  ์˜ˆ์ •์ด๋‹ค. ์ตœ์ข… ํ”„๋กœ์ ํŠธ์—์„œ ์ „ํ†ต ์œ ๊ฐ€ ์‚ฌ์ƒ์„ ์ฐธ๊ณ ํ•  ์˜ˆ์ •์ธ๋ฐ ์ƒ์ง•์ ์ธ ์‹œ๊ฐ ์š”์†Œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฉธ์ข… ์œ„๊ธฐ ๋™๋ฌผ๊ณผ ์ธ๊ฐ„์˜ ์ด๋ฏธ์ง€๋ฅผ ์—ฐ๊ฒฐํ•  ๊ฒƒ์ด๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ์‹์„ ํ†ตํ•˜์—ฌ ์กฐํ™”๋กœ์šด ์ธ๊ฐ„ - ์ž์—ฐ ๊ด€๊ณ„๋ฅผ ๋งบ์„ ์ˆ˜ ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ตœ์ข… ์ž‘ํ’ˆ์—์„œ ๊ธ์ •์ ์ธ์‹œ๊ฐ ์š”์†Œ์™€ ๋„›์ง€๋„ ์ด์šฉํ•˜๊ณ  ๊ด€๊ฐ๋“ค์ด ๋” ์ ๊ทน์ ์œผ๋กœ ํ–‰๋™ํ•  ์ˆ˜ ์žˆ๊ฒŒ ์œ ๋„ํ•  ์˜ˆ์ •์ด๋‹ค.1. Introduction 1 2. The Necessity of Diversified Endangered Animal-Related Campaigns 3 2.1 The Current Situation of Endangered Animals 3 2.2 A Brief History of Wildlife Conservation 4 3. Analysis of the Current Design Preference of Endangered Animal-Related Campaigns 8 3.1. General Analysis of Visual Element Usage 8 3.2. The Utilization of Violence 14 3.3. The Utilization of Negative Human Images 27 3.4. Questionable Information Due to Rote Scene Generating 30 3.5. Conclusion 36 4. The Motivation and the Influence of the Current Design Preference 37 4.1. The Motivation on Strategic Planning Level 37 4.2. The Motivation on Problem Resolving Level 39 4.3. The Influence of Violence-Conflict-Based Strategy 41 5. Differentiated Approach in Communication of Endangered Animal-Related Campaigns 44 5.1. Using Positive Visual Elements 44 5.2. Applying Oriental Philosophy 48 5.3. The Confluent Design Methodology 50 5.4. Experimental Approaches 52 5.5. The Final Project 56 6. Discussions and Future Works 69 6.1. The Results and Discussions 69 6.2. Future Works 72 6.3. Conclusion 72 Bibliography 73 Appendix I. Samples of Current Design Preference 79 Appendix II. Final Design of the Posters 98 Appendix III. Localization of the Posters 103 Visual Sources Citations 108 Abstract in Korean 115 Acknowledgment 117Maste

    Happiness Maximizing Sets under Group Fairness Constraints (Technical Report)

    Full text link
    Finding a happiness maximizing set (HMS) from a database, i.e., selecting a small subset of tuples that preserves the best score with respect to any nonnegative linear utility function, is an important problem in multi-criteria decision-making. When an HMS is extracted from a set of individuals to assist data-driven algorithmic decisions such as hiring and admission, it is crucial to ensure that the HMS can fairly represent different groups of candidates without bias and discrimination. However, although the HMS problem was extensively studied in the database community, existing algorithms do not take group fairness into account and may provide solutions that under-represent some groups. In this paper, we propose and investigate a fair variant of HMS (FairHMS) that not only maximizes the minimum happiness ratio but also guarantees that the number of tuples chosen from each group falls within predefined lower and upper bounds. Similar to the vanilla HMS problem, we show that FairHMS is NP-hard in three and higher dimensions. Therefore, we first propose an exact interval cover-based algorithm called IntCov for FairHMS on two-dimensional databases. Then, we propose a bicriteria approximation algorithm called BiGreedy for FairHMS on multi-dimensional databases by transforming it into a submodular maximization problem under a matroid constraint. We also design an adaptive sampling strategy to improve the practical efficiency of BiGreedy. Extensive experiments on real-world and synthetic datasets confirm the efficacy and efficiency of our proposal.Comment: Technical report, a shorter version to appear in PVLDB 16(2
    • โ€ฆ
    corecore