2,844 research outputs found

    Towards a semantic and statistical selection of association rules

    Full text link
    The increasing growth of databases raises an urgent need for more accurate methods to better understand the stored data. In this scope, association rules were extensively used for the analysis and the comprehension of huge amounts of data. However, the number of generated rules is too large to be efficiently analyzed and explored in any further process. Association rules selection is a classical topic to address this issue, yet, new innovated approaches are required in order to provide help to decision makers. Hence, many interesting- ness measures have been defined to statistically evaluate and filter the association rules. However, these measures present two major problems. On the one hand, they do not allow eliminating irrelevant rules, on the other hand, their abun- dance leads to the heterogeneity of the evaluation results which leads to confusion in decision making. In this paper, we propose a two-winged approach to select statistically in- teresting and semantically incomparable rules. Our statis- tical selection helps discovering interesting association rules without favoring or excluding any measure. The semantic comparability helps to decide if the considered association rules are semantically related i.e comparable. The outcomes of our experiments on real datasets show promising results in terms of reduction in the number of rules

    RRR: Rank-Regret Representative

    Full text link
    Selecting the best items in a dataset is a common task in data exploration. However, the concept of "best" lies in the eyes of the beholder: different users may consider different attributes more important, and hence arrive at different rankings. Nevertheless, one can remove "dominated" items and create a "representative" subset of the data set, comprising the "best items" in it. A Pareto-optimal representative is guaranteed to contain the best item of each possible ranking, but it can be almost as big as the full data. Representative can be found if we relax the requirement to include the best item for every possible user, and instead just limit the users' "regret". Existing work defines regret as the loss in score by limiting consideration to the representative instead of the full data set, for any chosen ranking function. However, the score is often not a meaningful number and users may not understand its absolute value. Sometimes small ranges in score can include large fractions of the data set. In contrast, users do understand the notion of rank ordering. Therefore, alternatively, we consider the position of the items in the ranked list for defining the regret and propose the {\em rank-regret representative} as the minimal subset of the data containing at least one of the top-kk of any possible ranking function. This problem is NP-complete. We use the geometric interpretation of items to bound their ranks on ranges of functions and to utilize combinatorial geometry notions for developing effective and efficient approximation algorithms for the problem. Experiments on real datasets demonstrate that we can efficiently find small subsets with small rank-regrets

    A search for disk-galaxy lenses in the Sloan Digital Sky Survey

    Full text link
    We present the first automated spectroscopic search for disk-galaxy lenses, using the Sloan Digital Sky Survey database. We follow up eight gravitational lens candidates, selected among a sample of ~40000 candidate massive disk galaxies, using a combination of ground-based imaging and long-slit spectroscopy. We confirm two gravitational lens systems: one probable disk galaxy, and one probable S0 galaxy. The remaining systems are four promising disk-galaxy lens candidates, as well as two probable gravitational lenses whose lens galaxy might be an S0 galaxy. The redshifts of the lenses are z_lens ~ 0.1. The redshift range of the background sources is z_source ~ 0.3 - 0.7. The systems presented here are (confirmed or candidate) galaxy-galaxy lensing systems, that is, systems where the multiple images are faint and extended, allowing an accurate determination of the lens galaxy mass and light distributions without contamination from the background galaxy. Moreover, the low redshift of the (confirmed or candidates) lens galaxies is favorable for measuring rotation points to complement the lensing study. We estimate the rest-frame total mass-to-light ratio within the Einstein radius for the two confirmed lenses: we find M_tot/L_I = 5.4 +- 1.5 within 3.9 +- 0.9 kpc for SDSS J081230.30+543650.9, and M_tot/L_I = 1.5 +- 0.9 within 1.4 +- 0.8 kpc for SDSS J145543.55+530441.2 (all in solar units). Hubble Space Telescope or Adaptive Optics imaging is needed to further study the systems.Comment: ApJ, accepte
    corecore