1,746 research outputs found

    Discovering Attractive Products based on Influence Sets

    Full text link
    Skyline queries have been widely used as a practical tool for multi-criteria decision analysis and for applications involving preference queries. For example, in a typical online retail application, skyline queries can help customers select the most interesting, among a pool of available, products. Recently, reverse skyline queries have been proposed, highlighting the manufacturer's perspective, i.e. how to determine the expected buyers of a given product. In this work we develop novel algorithms for two important classes of queries involving customer preferences. We first propose a novel algorithm, termed as RSA, for answering reverse skyline queries. We then introduce a new type of queries, namely the k-Most Attractive Candidates k-MAC query. In this type of queries, given a set of existing product specifications P, a set of customer preferences C and a set of new candidate products Q, the k-MAC query returns the set of k candidate products from Q that jointly maximizes the total number of expected buyers, measured as the cardinality of the union of individual reverse skyline sets (i.e., influence sets). Applying existing approaches to solve this problem would require calculating the reverse skyline set for each candidate, which is prohibitively expensive for large data sets. We, thus, propose a batched algorithm for this problem and compare its performance against a branch-and-bound variant that we devise. Both of these algorithms use in their core variants of our RSA algorithm. Our experimental study using both synthetic and real data sets demonstrates that our proposed algorithms outperform existing, or naive solutions to our studied classes of queries

    Skyline queries computation on crowdsourced- enabled incomplete database

    Get PDF
    Data incompleteness becomes a frequent phenomenon in a large number of contemporary database applications such as web autonomous databases, big data, and crowd-sourced databases. Processing skyline queries over incomplete databases impose a number of challenges that negatively influence processing the skyline queries. Most importantly, the skylines derived from incomplete databases are also incomplete in which some values are missing. Retrieving skylines with missing values is undesirable, particularly, for recommendation and decision-making systems. Furthermore, running skyline queries on a database with incomplete data raises a number of issues influence processing skyline queries such as losing the transitivity property of the skyline technique and cyclic dominance between the tuples. The issue of estimating the missing values of skylines has been discussed and examined in the database literature. Most recently, several studies have suggested exploiting the crowd-sourced databases in order to estimate the missing values by generating plausible values using the crowd. Crowd-sourced databases have proved to be a powerful solution to perform user-given tasks by integrating human intelligence and experience to process the tasks. However, task processing using crowd-sourced incurs additional monetary cost and increases the time latency. Also, it is not always possible to produce a satisfactory result that meets the user's preferences. This paper proposes an approach for estimating the missing values of the skylines by first exploiting the available data and utilizes the implicit relationships between the attributes in order to impute the missing values of the skylines. This process aims at reducing the number of values to be estimated using the crowd when local estimation is inappropriate. Intensive experiments on both synthetic and real datasets have been accomplished. The experimental results have proven that the proposed approach for estimating the missing values of the skylines over crowd-sourced enabled incomplete databases is scalable and outperforms the other existing approaches

    Coping with new Challenges in Clustering and Biomedical Imaging

    Get PDF
    The last years have seen a tremendous increase of data acquisition in different scientific fields such as molecular biology, bioinformatics or biomedicine. Therefore, novel methods are needed for automatic data processing and analysis of this large amount of data. Data mining is the process of applying methods like clustering or classification to large databases in order to uncover hidden patterns. Clustering is the task of partitioning points of a data set into distinct groups in order to minimize the intra cluster similarity and to maximize the inter cluster similarity. In contrast to unsupervised learning like clustering, the classification problem is known as supervised learning that aims at the prediction of group membership of data objects on the basis of rules learned from a training set where the group membership is known. Specialized methods have been proposed for hierarchical and partitioning clustering. However, these methods suffer from several drawbacks. In the first part of this work, new clustering methods are proposed that cope with problems from conventional clustering algorithms. ITCH (Information-Theoretic Cluster Hierarchies) is a hierarchical clustering method that is based on a hierarchical variant of the Minimum Description Length (MDL) principle which finds hierarchies of clusters without requiring input parameters. As ITCH may converge only to a local optimum we propose GACH (Genetic Algorithm for Finding Cluster Hierarchies) that combines the benefits from genetic algorithms with information-theory. In this way the search space is explored more effectively. Furthermore, we propose INTEGRATE a novel clustering method for data with mixed numerical and categorical attributes. Supported by the MDL principle our method integrates the information provided by heterogeneous numerical and categorical attributes and thus naturally balances the influence of both sources of information. A competitive evaluation illustrates that INTEGRATE is more effective than existing clustering methods for mixed type data. Besides clustering methods for single data objects we provide a solution for clustering different data sets that are represented by their skylines. The skyline operator is a well-established database primitive for finding database objects which minimize two or more attributes with an unknown weighting between these attributes. In this thesis, we define a similarity measure, called SkyDist, for comparing skylines of different data sets that can directly be integrated into different data mining tasks such as clustering or classification. The experiments show that SkyDist in combination with different clustering algorithms can give useful insights into many applications. In the second part, we focus on the analysis of high resolution magnetic resonance images (MRI) that are clinically relevant and may allow for an early detection and diagnosis of several diseases. In particular, we propose a framework for the classification of Alzheimer's disease in MR images combining the data mining steps of feature selection, clustering and classification. As a result, a set of highly selective features discriminating patients with Alzheimer and healthy people has been identified. However, the analysis of the high dimensional MR images is extremely time-consuming. Therefore we developed JGrid, a scalable distributed computing solution designed to allow for a large scale analysis of MRI and thus an optimized prediction of diagnosis. In another study we apply efficient algorithms for motif discovery to task-fMRI scans in order to identify patterns in the brain that are characteristic for patients with somatoform pain disorder. We find groups of brain compartments that occur frequently within the brain networks and discriminate well among healthy and diseased people

    Multi-Source Spatial Entity Extraction and Linkage

    Get PDF

    Line Emitting Galaxies Beyond a Redshift of 7: An Improved Method for Estimating the Evolving Neutrality of the Intergalactic Medium

    Get PDF
    The redshift-dependent fraction of color-selected galaxies revealing Lyman alpha emission has become the most valuable constraint on the evolving neutrality of the early intergalactic medium. However, in addition to resonant scattering by neutral gas, the visibility of Lyman alpha is also dependent on the intrinsic properties of the host galaxy, including its stellar population, dust content and the nature of outflowing gas. Taking advantage of significant progress we have made in determining the line emitting properties of z46z \simeq 4-6 galaxies, we propose an improved method, based on using the measured slopes of the rest-frame ultraviolet continua of galaxies, to interpret the growing body of near-infrared spectra of z>7z>7 galaxies in order to take into account these host galaxy dependencies. In a first application of our new method, we demonstrate its potential via a new spectroscopic survey of 7<z<87<z<8 galaxies undertaken with the Keck MOSFIRE spectrograph. Together with earlier published data our data provides improved estimates of the evolving visibility of Lyman alpha, particularly at redshift z8z\simeq 8. As a byproduct, we also present a new line emitting galaxy at a redshift z=7.62z=7.62 which supersedes an earlier redshift record. We discuss the improving constraints on the evolving neutral fraction over 6<z<86<z<8 and the implications for cosmic reionization.Comment: To be submitted to Ap

    Maintaining sliding window skylines on data streams

    Full text link

    ParetoPrep: Fast computation of Path Skylines Queries

    Full text link
    Computing cost optimal paths in network data is a very important task in many application areas like transportation networks, computer networks or social graphs. In many cases, the cost of an edge can be described by various cost criteria. For example, in a road network possible cost criteria are distance, time, ascent, energy consumption or toll fees. In such a multicriteria network, a route or path skyline query computes the set of all paths having pareto optimal costs, i.e. each result path is optimal for different user preferences. In this paper, we propose a new method for computing route skylines which significantly decreases processing time and memory consumption. Furthermore, our method does not rely on any precomputation or indexing method and thus, it is suitable for dynamically changing edge costs. Our experiments demonstrate that our method outperforms state of the art approaches and allows highly efficient path skyline computation without any preprocessing.Comment: 12 pages, 9 figures, technical repor

    The star formation histories of galaxies in the Sloan Digital Sky Survey

    Full text link
    We present the results of a MOPED analysis of ~3 x 10^5 galaxy spectra from the Sloan Digital Sky Survey Data Release Three (SDSS DR3), with a number of improvements in data, modelling and analysis compared with our previous analysis of DR1. The improvements include: modelling the galaxies with theoretical models at a higher spectral resolution of 3\AA; better calibrated data; an extended list of excluded emission lines, and a wider range of dust models. We present new estimates of the cosmic star formation rate, the evolution of stellar mass density and the stellar mass function from the fossil record. In contrast to our earlier work the results show no conclusive peak in the star formation rate out to a redshift around 2 but continue to show conclusive evidence for `downsizing' in the SDSS fossil record. The star formation history is now in good agreement with more traditional instantaneous measures. The galaxy stellar mass function is determined over five decades of mass, and an updated estimate of the current stellar mass density is presented. We also investigate the systematic effects of changes in the stellar population modelling, the spectral resolution, dust modelling, sky lines, spectral resolution and the change of data set. We find that the main changes in the results are due to the improvements in the calibration of the SDSS data, changes in the initial mass function and the theoretical models used.Comment: replaced to match accepted version in MNRA

    How Digital Strategy and Management Games Can Facilitate the Practice of Dynamic Decision-Making

    Get PDF
    This paper examines how digital strategy and management games that have been initially designed for entertainment can facilitate the practice of dynamic decision-making. Based on a comparative qualitative analysis of 17 games—organized into categories derived from a conceptual model of decision-making design—this article illustrates two ways in which these games may be useful in supporting the learning of dynamic decision-making in educational practice: (1) Players must take over the role of a decider and solve situations in which players must pursue different conflicting goals by making a continuous series of decisions on a variety of actions and measures; (2) three of the features of the games are considered to structure players’ practice of decision-making and foster processes of learning through the curation of possible decisions, the offering of lucid feedback and the modification of time. This article also highlights the games’ shortcomings, from an educational perspective, as players’ decisions are restricted by the numbers of choices they can make within the game, and certain choices are rewarded more than others. An educational application of the games must, therefore, entail a critical reflection of players’ limited choices inside a necessarily biased system
    corecore