2,101 research outputs found
On the Feature Discovery for App Usage Prediction in Smartphones
With the increasing number of mobile Apps developed, they are now closely
integrated into daily life. In this paper, we develop a framework to predict
mobile Apps that are most likely to be used regarding the current device status
of a smartphone. Such an Apps usage prediction framework is a crucial
prerequisite for fast App launching, intelligent user experience, and power
management of smartphones. By analyzing real App usage log data, we discover
two kinds of features: The Explicit Feature (EF) from sensing readings of
built-in sensors, and the Implicit Feature (IF) from App usage relations. The
IF feature is derived by constructing the proposed App Usage Graph (abbreviated
as AUG) that models App usage transitions. In light of AUG, we are able to
discover usage relations among Apps. Since users may have different usage
behaviors on their smartphones, we further propose one personalized feature
selection algorithm. We explore minimum description length (MDL) from the
training data and select those features which need less length to describe the
training data. The personalized feature selection can successfully reduce the
log size and the prediction time. Finally, we adopt the kNN classification
model to predict Apps usage. Note that through the features selected by the
proposed personalized feature selection algorithm, we only need to keep these
features, which in turn reduces the prediction time and avoids the curse of
dimensionality when using the kNN classifier. We conduct a comprehensive
experimental study based on a real mobile App usage dataset. The results
demonstrate the effectiveness of the proposed framework and show the predictive
capability for App usage prediction.Comment: 10 pages, 17 figures, ICDM 2013 short pape
An MDL framework for sparse coding and dictionary learning
The power of sparse signal modeling with learned over-complete dictionaries
has been demonstrated in a variety of applications and fields, from signal
processing to statistical inference and machine learning. However, the
statistical properties of these models, such as under-fitting or over-fitting
given sets of data, are still not well characterized in the literature. As a
result, the success of sparse modeling depends on hand-tuning critical
parameters for each data and application. This work aims at addressing this by
providing a practical and objective characterization of sparse models by means
of the Minimum Description Length (MDL) principle -- a well established
information-theoretic approach to model selection in statistical inference. The
resulting framework derives a family of efficient sparse coding and dictionary
learning algorithms which, by virtue of the MDL principle, are completely
parameter free. Furthermore, such framework allows to incorporate additional
prior information to existing models, such as Markovian dependencies, or to
define completely new problem formulations, including in the matrix analysis
area, in a natural way. These virtues will be demonstrated with parameter-free
algorithms for the classic image denoising and classification problems, and for
low-rank matrix recovery in video applications
Estimating Spectroscopic Redshifts by Using k Nearest Neighbors Regression I. Description of Method and Analysis
Context: In astronomy, new approaches to process and analyze the
exponentially increasing amount of data are inevitable. While classical
approaches (e.g. template fitting) are fine for objects of well-known classes,
alternative techniques have to be developed to determine those that do not fit.
Therefore a classification scheme should be based on individual properties
instead of fitting to a global model and therefore loose valuable information.
An important issue when dealing with large data sets is the outlier detection
which at the moment is often treated problem-orientated. Aims: In this paper we
present a method to statistically estimate the redshift z based on a similarity
approach. This allows us to determine redshifts in spectra in emission as well
as in absorption without using any predefined model. Additionally we show how
an estimate of the redshift based on single features is possible. As a
consequence we are e.g. able to filter objects which show multiple redshift
components. We propose to apply this general method to all similar problems in
order to identify objects where traditional approaches fail. Methods: The
redshift estimation is performed by comparing predefined regions in the spectra
and applying a k nearest neighbor regression model for every predefined
emission and absorption region, individually. Results: We estimated a redshift
for more than 50% of the analyzed 16,000 spectra of our reference and test
sample. The redshift estimate yields a precision for every individually tested
feature that is comparable with the overall precision of the redshifts of SDSS.
In 14 spectra we find a significant shift between emission and absorption or
emission and emission lines. The results show already the immense power of this
simple machine learning approach for investigating huge databases such as the
SDSS.Comment: accepted for publication in A&
The Minimum Description Length Principle for Pattern Mining: A Survey
This is about the Minimum Description Length (MDL) principle applied to
pattern mining. The length of this description is kept to the minimum.
Mining patterns is a core task in data analysis and, beyond issues of
efficient enumeration, the selection of patterns constitutes a major challenge.
The MDL principle, a model selection method grounded in information theory, has
been applied to pattern mining with the aim to obtain compact high-quality sets
of patterns. After giving an outline of relevant concepts from information
theory and coding, as well as of work on the theory behind the MDL and similar
principles, we review MDL-based methods for mining various types of data and
patterns. Finally, we open a discussion on some issues regarding these methods,
and highlight currently active related data analysis problems
Relating recharge mechanisms to chemical changes in an updip Appalachian coal mine discharge: A case study from Lambert Run, West Virginia
Impaired drainage from active and abandoned mines degrades the water quality of receiving streams and aquifers. Coal mine drainage (CMD) has been studied for decades in Appalachia, but unknowns and uncertainties are still present, including the influence of mine hydrogeology on the outflow chemistry of above-drainage mines. To evaluate the influence of recharge type on above-drainage mine chemistry, samples were collected every two weeks at a CMD outflow treatment system in Harrison County, West Virginia.
Samples were collected to measure geochemical changes taking place in the mine workings and along the flowpath of the passive treatment system. Samples were divided into two groups based on the dominant type of recharge entering the mine during sample collection. Direct recharge dominated samples had lower concentrations of hydrolysable cations at the mine outflow, causing the discharge to be both net-acidic and net-alkaline during the study period. Total rare earth element concentrations at the outflow were positively correlated to Fe, Al, and Mn, and negatively correlated to pH and discharge. During both recharge regimes, Fe, Al, Mn, and rare earth elements were removed along the treatment system flowpath. Throughout the study period, 89% of dissolved inorganic carbon in the system was degassed to the atmosphere as CO2.
This study demonstrates that varied recharge mechanisms can influence the CMD outflow chemistry, with implications in treatment system design, interpretation of routine chemical data, and extrapolation of CO2 efflux from CMD outflows for large-scale carbon balance studies
- …