Search CORE

438 research outputs found

MALTS: Matching After Learning to Stretch

Author: Parikh Harsh
Rudin Cynthia
Volfovsky Alexander
Publication venue
Publication date: 22/09/2021
Field of study

We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metric stretches the covariate space according to each covariate's contribution to outcome prediction: this stretching means that mismatches on important covariates carry a larger penalty than mismatches on irrelevant covariates. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for the estimation of conditional average treatment effects.Comment: 40 pages, 5 Tables, 12 Figure

arXiv.org e-Print Archive

Improving the Performance of K-Means for Color Quantization

Author: Al Hasan
Al-Daoud
Al-Daoud
Aloise
Arthur
Babu
Babu
Balasubramanian
Balasubramanian
Bezdek
Bing
Bottou
Braudaway
Brun
Brun
Celebi
Celebi
Celebi
Chang
Chen
Cheng
Dekker
Deng
Deng
Elkan
Equitz
Feldman
Fletcher
Forgy
Fränti
Gan
Gentile
Gervautz
Goldberg
Gonzalez
Har-Peled
Heckbert
Hochbaum
Hsieh
Hu
Hu
Hu
Huang
Joy
Kanjanawanishkul
Kanungo
Kasuga
Katsavounidis
Kaufman
Khan
Kolen
Kuo
Lai
Likas
Linde
Lloyd
Lo
M. Emre Celebi
Milligan
Milvang
Mojsilovic
Orchard
Ozdemir
Papamarkos
Pei
Perim
Phillips
Redmond
Schaefer
Scheunders
Selim
Sertel
Sherkat
Sirisathitkul
Smith
Turnbull
Uchiyama
Velho
Verevka
Wan
Wang
Wu
Wu
Xiang
Xiang
Xiang
Yang
Yang
Publication venue: 'Elsevier BV'
Publication date: 02/01/2011
Field of study

Color quantization is an important operation with many applications in graphics and image processing. Most quantization methods are essentially based on data clustering algorithms. However, despite its popularity as a general purpose clustering algorithm, k-means has not received much respect in the color quantization literature because of its high computational requirements and sensitivity to initialization. In this paper, we investigate the performance of k-means as a color quantizer. We implement fast and exact variants of k-means with several initialization schemes and then compare the resulting quantizers to some of the most popular quantizers in the literature. Experiments on a diverse set of images demonstrate that an efficient implementation of k-means with an appropriate initialization strategy can in fact serve as a very effective color quantizer.Comment: 26 pages, 4 figures, 13 table

arXiv.org e-Print Archive

Crossref

A survey on online active learning

Author: Cacciarelli Davide
Kulahci Murat
Publication venue
Publication date: 14/03/2023
Field of study

Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in the context of online active learning. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research. Our review aims to provide a comprehensive and up-to-date overview of the field and to highlight directions for future work

arXiv.org e-Print Archive

Methods for Real-time Visualization and Interaction with Landforms

Author: Schneider Martin
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

This thesis presents methods to enrich data modeling and analysis in the geoscience domain with a particular focus on geomorphological applications. First, a short overview of the relevant characteristics of the used remote sensing data and basics of its processing and visualization are provided. Then, two new methods for the visualization of vector-based maps on digital elevation models (DEMs) are presented. The first method uses a texture-based approach that generates a texture from the input maps at runtime taking into account the current viewpoint. In contrast to that, the second method utilizes the stencil buffer to create a mask in image space that is then used to render the map on top of the DEM. A particular challenge in this context is posed by the view-dependent level-of-detail representation of the terrain geometry. After suitable visualization methods for vector-based maps have been investigated, two landform mapping tools for the interactive generation of such maps are presented. The user can carry out the mapping directly on the textured digital elevation model and thus benefit from the 3D visualization of the relief. Additionally, semi-automatic image segmentation techniques are applied in order to reduce the amount of user interaction required and thus make the mapping process more efficient and convenient. The challenge in the adaption of the methods lies in the transfer of the algorithms to the quadtree representation of the data and in the application of out-of-core and hierarchical methods to ensure interactive performance. Although high-resolution remote sensing data are often available today, their effective resolution at steep slopes is rather low due to the oblique acquisition angle. For this reason, remote sensing data are suitable to only a limited extent for visualization as well as landform mapping purposes. To provide an easy way to supply additional imagery, an algorithm for registering uncalibrated photos to a textured digital elevation model is presented. A particular challenge in registering the images is posed by large variations in the photos concerning resolution, lighting conditions, seasonal changes, etc. The registered photos can be used to increase the visual quality of the textured DEM, in particular at steep slopes. To this end, a method is presented that combines several georegistered photos to textures for the DEM. The difficulty in this compositing process is to create a consistent appearance and avoid visible seams between the photos. In addition to that, the photos also provide valuable means to improve landform mapping. To this end, an extension of the landform mapping methods is presented that allows the utilization of the registered photos during mapping. This way, a detailed and exact mapping becomes feasible even at steep slopes

bonndoc – Der Publikationsserver der Universität Bonn