1,791 research outputs found
Discovering Attractive Products based on Influence Sets
Skyline queries have been widely used as a practical tool for multi-criteria
decision analysis and for applications involving preference queries. For
example, in a typical online retail application, skyline queries can help
customers select the most interesting, among a pool of available, products.
Recently, reverse skyline queries have been proposed, highlighting the
manufacturer's perspective, i.e. how to determine the expected buyers of a
given product. In this work we develop novel algorithms for two important
classes of queries involving customer preferences. We first propose a novel
algorithm, termed as RSA, for answering reverse skyline queries. We then
introduce a new type of queries, namely the k-Most Attractive Candidates k-MAC
query. In this type of queries, given a set of existing product specifications
P, a set of customer preferences C and a set of new candidate products Q, the
k-MAC query returns the set of k candidate products from Q that jointly
maximizes the total number of expected buyers, measured as the cardinality of
the union of individual reverse skyline sets (i.e., influence sets). Applying
existing approaches to solve this problem would require calculating the reverse
skyline set for each candidate, which is prohibitively expensive for large data
sets. We, thus, propose a batched algorithm for this problem and compare its
performance against a branch-and-bound variant that we devise. Both of these
algorithms use in their core variants of our RSA algorithm. Our experimental
study using both synthetic and real data sets demonstrates that our proposed
algorithms outperform existing, or naive solutions to our studied classes of
queries
Coping with new Challenges in Clustering and Biomedical Imaging
The last years have seen a tremendous increase of data acquisition in different scientific fields such as molecular biology, bioinformatics or biomedicine. Therefore, novel methods are needed for automatic data processing and analysis of this large amount of data. Data mining is the process of applying methods like clustering or classification to large databases in order to uncover hidden patterns. Clustering is the task of partitioning points of a data set into distinct groups in order to minimize the intra cluster similarity and to maximize the inter cluster similarity. In contrast to unsupervised learning like clustering, the classification problem is known as supervised learning that aims at the prediction of group membership of data objects on the basis of rules learned from a training set where the group membership is known.
Specialized methods have been proposed for hierarchical and partitioning clustering. However, these methods suffer from several drawbacks. In the first part of this work, new clustering methods are proposed that cope with problems from conventional clustering algorithms. ITCH (Information-Theoretic Cluster Hierarchies) is a hierarchical clustering method that is based on a hierarchical variant of the Minimum Description Length (MDL) principle which finds hierarchies of clusters without requiring input parameters. As ITCH may converge only to a local optimum we propose GACH (Genetic Algorithm for Finding Cluster Hierarchies) that combines the benefits from genetic algorithms with information-theory. In this way the search space is explored more effectively.
Furthermore, we propose INTEGRATE a novel clustering method for data with mixed numerical and categorical attributes. Supported by the MDL principle our method integrates the information provided by heterogeneous numerical and categorical attributes and thus naturally balances the influence of both sources of information. A competitive evaluation illustrates that INTEGRATE is more effective than existing clustering methods for mixed type data. Besides clustering methods for single data objects we provide a solution for clustering different data sets that are represented by their skylines. The skyline operator is a well-established database primitive for finding database objects which minimize two or more attributes with an unknown weighting between these attributes. In this thesis, we define a similarity measure, called SkyDist, for comparing skylines of different data sets that can directly be integrated into different data mining tasks such as clustering or classification. The experiments show that SkyDist in combination with different clustering algorithms can give useful insights into many applications.
In the second part, we focus on the analysis of high resolution magnetic resonance images (MRI) that are clinically relevant and may allow for an early detection and diagnosis of several diseases. In particular, we propose a framework for the classification of Alzheimer's disease in MR images combining the data mining steps of feature selection, clustering and classification. As a result, a set of highly selective features discriminating patients with Alzheimer and healthy people has been identified. However, the analysis of the high dimensional MR images is extremely time-consuming. Therefore we developed JGrid, a scalable distributed computing solution designed to allow for a large scale analysis of MRI and thus an optimized prediction of diagnosis. In another study we apply efficient algorithms for motif discovery to task-fMRI scans in order to identify patterns in the brain that are characteristic for patients with somatoform pain disorder. We find groups of brain compartments that occur frequently within the brain networks and discriminate well among healthy and diseased people
Skyline queries computation on crowdsourced- enabled incomplete database
Data incompleteness becomes a frequent phenomenon in a large number of contemporary database applications such as web autonomous databases, big data, and crowd-sourced databases. Processing skyline queries over incomplete databases impose a number of challenges that negatively influence processing the skyline queries. Most importantly, the skylines derived from incomplete databases are also incomplete in which some values are missing. Retrieving skylines with missing values is undesirable, particularly, for
recommendation and decision-making systems. Furthermore, running skyline queries on a database with incomplete data raises a number of issues influence processing skyline queries such as losing the transitivity property of the skyline technique and cyclic dominance between the tuples. The issue of estimating the missing values of skylines has been discussed and examined in the database literature. Most recently, several studies have suggested exploiting the crowd-sourced databases in order to estimate the missing values by generating plausible values using the crowd. Crowd-sourced databases have proved to be a powerful solution to perform user-given tasks by integrating human intelligence and experience to process the tasks. However,
task processing using crowd-sourced incurs additional monetary cost and increases the time latency. Also,
it is not always possible to produce a satisfactory result that meets the user's preferences. This paper proposes an approach for estimating the missing values of the skylines by first exploiting the available data and utilizes the implicit relationships between the attributes in order to impute the missing values of the skylines. This process aims at reducing the number of values to be estimated using the crowd when local estimation is inappropriate. Intensive experiments on both synthetic and real datasets have been accomplished. The experimental results have proven that the proposed approach for estimating the missing values of the skylines over crowd-sourced enabled incomplete databases is scalable and outperforms the other existing approaches
Line Emitting Galaxies Beyond a Redshift of 7: An Improved Method for Estimating the Evolving Neutrality of the Intergalactic Medium
The redshift-dependent fraction of color-selected galaxies revealing Lyman
alpha emission has become the most valuable constraint on the evolving
neutrality of the early intergalactic medium. However, in addition to resonant
scattering by neutral gas, the visibility of Lyman alpha is also dependent on
the intrinsic properties of the host galaxy, including its stellar population,
dust content and the nature of outflowing gas. Taking advantage of significant
progress we have made in determining the line emitting properties of galaxies, we propose an improved method, based on using the measured
slopes of the rest-frame ultraviolet continua of galaxies, to interpret the
growing body of near-infrared spectra of galaxies in order to take into
account these host galaxy dependencies. In a first application of our new
method, we demonstrate its potential via a new spectroscopic survey of
galaxies undertaken with the Keck MOSFIRE spectrograph. Together with earlier
published data our data provides improved estimates of the evolving visibility
of Lyman alpha, particularly at redshift . As a byproduct, we also
present a new line emitting galaxy at a redshift which supersedes an
earlier redshift record. We discuss the improving constraints on the evolving
neutral fraction over and the implications for cosmic reionization.Comment: To be submitted to Ap
ParetoPrep: Fast computation of Path Skylines Queries
Computing cost optimal paths in network data is a very important task in many
application areas like transportation networks, computer networks or social
graphs. In many cases, the cost of an edge can be described by various cost
criteria. For example, in a road network possible cost criteria are distance,
time, ascent, energy consumption or toll fees. In such a multicriteria network,
a route or path skyline query computes the set of all paths having pareto
optimal costs, i.e. each result path is optimal for different user preferences.
In this paper, we propose a new method for computing route skylines which
significantly decreases processing time and memory consumption. Furthermore,
our method does not rely on any precomputation or indexing method and thus, it
is suitable for dynamically changing edge costs. Our experiments demonstrate
that our method outperforms state of the art approaches and allows highly
efficient path skyline computation without any preprocessing.Comment: 12 pages, 9 figures, technical repor
The star formation histories of galaxies in the Sloan Digital Sky Survey
We present the results of a MOPED analysis of ~3 x 10^5 galaxy spectra from
the Sloan Digital Sky Survey Data Release Three (SDSS DR3), with a number of
improvements in data, modelling and analysis compared with our previous
analysis of DR1. The improvements include: modelling the galaxies with
theoretical models at a higher spectral resolution of 3\AA; better calibrated
data; an extended list of excluded emission lines, and a wider range of dust
models. We present new estimates of the cosmic star formation rate, the
evolution of stellar mass density and the stellar mass function from the fossil
record. In contrast to our earlier work the results show no conclusive peak in
the star formation rate out to a redshift around 2 but continue to show
conclusive evidence for `downsizing' in the SDSS fossil record. The star
formation history is now in good agreement with more traditional instantaneous
measures. The galaxy stellar mass function is determined over five decades of
mass, and an updated estimate of the current stellar mass density is presented.
We also investigate the systematic effects of changes in the stellar population
modelling, the spectral resolution, dust modelling, sky lines, spectral
resolution and the change of data set. We find that the main changes in the
results are due to the improvements in the calibration of the SDSS data,
changes in the initial mass function and the theoretical models used.Comment: replaced to match accepted version in MNRA
How Digital Strategy and Management Games Can Facilitate the Practice of Dynamic Decision-Making
This paper examines how digital strategy and management games that have been initially designed for entertainment can facilitate the practice of dynamic decision-making. Based on a comparative qualitative analysis of 17 games—organized into categories derived from a conceptual model of decision-making design—this article illustrates two ways in which these games may be useful in supporting the learning of dynamic decision-making in educational practice:
(1) Players must take over the role of a decider and solve situations in which players must pursue different conflicting goals by making a continuous series of decisions on a variety of actions and measures; (2) three of the features of the games are considered to structure players’ practice of decision-making and foster processes of learning through the curation of possible decisions, the offering of lucid feedback and the modification of time. This article also highlights the games’ shortcomings, from an educational perspective, as players’ decisions are restricted by the numbers of choices they can make within the game, and certain choices are rewarded more than others. An educational application of the games must, therefore, entail a critical reflection of players’ limited choices inside a necessarily biased system
- …