2,101 research outputs found

    On the Feature Discovery for App Usage Prediction in Smartphones

    Full text link
    With the increasing number of mobile Apps developed, they are now closely integrated into daily life. In this paper, we develop a framework to predict mobile Apps that are most likely to be used regarding the current device status of a smartphone. Such an Apps usage prediction framework is a crucial prerequisite for fast App launching, intelligent user experience, and power management of smartphones. By analyzing real App usage log data, we discover two kinds of features: The Explicit Feature (EF) from sensing readings of built-in sensors, and the Implicit Feature (IF) from App usage relations. The IF feature is derived by constructing the proposed App Usage Graph (abbreviated as AUG) that models App usage transitions. In light of AUG, we are able to discover usage relations among Apps. Since users may have different usage behaviors on their smartphones, we further propose one personalized feature selection algorithm. We explore minimum description length (MDL) from the training data and select those features which need less length to describe the training data. The personalized feature selection can successfully reduce the log size and the prediction time. Finally, we adopt the kNN classification model to predict Apps usage. Note that through the features selected by the proposed personalized feature selection algorithm, we only need to keep these features, which in turn reduces the prediction time and avoids the curse of dimensionality when using the kNN classifier. We conduct a comprehensive experimental study based on a real mobile App usage dataset. The results demonstrate the effectiveness of the proposed framework and show the predictive capability for App usage prediction.Comment: 10 pages, 17 figures, ICDM 2013 short pape

    An MDL framework for sparse coding and dictionary learning

    Full text link
    The power of sparse signal modeling with learned over-complete dictionaries has been demonstrated in a variety of applications and fields, from signal processing to statistical inference and machine learning. However, the statistical properties of these models, such as under-fitting or over-fitting given sets of data, are still not well characterized in the literature. As a result, the success of sparse modeling depends on hand-tuning critical parameters for each data and application. This work aims at addressing this by providing a practical and objective characterization of sparse models by means of the Minimum Description Length (MDL) principle -- a well established information-theoretic approach to model selection in statistical inference. The resulting framework derives a family of efficient sparse coding and dictionary learning algorithms which, by virtue of the MDL principle, are completely parameter free. Furthermore, such framework allows to incorporate additional prior information to existing models, such as Markovian dependencies, or to define completely new problem formulations, including in the matrix analysis area, in a natural way. These virtues will be demonstrated with parameter-free algorithms for the classic image denoising and classification problems, and for low-rank matrix recovery in video applications

    Estimating Spectroscopic Redshifts by Using k Nearest Neighbors Regression I. Description of Method and Analysis

    Full text link
    Context: In astronomy, new approaches to process and analyze the exponentially increasing amount of data are inevitable. While classical approaches (e.g. template fitting) are fine for objects of well-known classes, alternative techniques have to be developed to determine those that do not fit. Therefore a classification scheme should be based on individual properties instead of fitting to a global model and therefore loose valuable information. An important issue when dealing with large data sets is the outlier detection which at the moment is often treated problem-orientated. Aims: In this paper we present a method to statistically estimate the redshift z based on a similarity approach. This allows us to determine redshifts in spectra in emission as well as in absorption without using any predefined model. Additionally we show how an estimate of the redshift based on single features is possible. As a consequence we are e.g. able to filter objects which show multiple redshift components. We propose to apply this general method to all similar problems in order to identify objects where traditional approaches fail. Methods: The redshift estimation is performed by comparing predefined regions in the spectra and applying a k nearest neighbor regression model for every predefined emission and absorption region, individually. Results: We estimated a redshift for more than 50% of the analyzed 16,000 spectra of our reference and test sample. The redshift estimate yields a precision for every individually tested feature that is comparable with the overall precision of the redshifts of SDSS. In 14 spectra we find a significant shift between emission and absorption or emission and emission lines. The results show already the immense power of this simple machine learning approach for investigating huge databases such as the SDSS.Comment: accepted for publication in A&

    The Minimum Description Length Principle for Pattern Mining: A Survey

    Full text link
    This is about the Minimum Description Length (MDL) principle applied to pattern mining. The length of this description is kept to the minimum. Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The MDL principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, as well as of work on the theory behind the MDL and similar principles, we review MDL-based methods for mining various types of data and patterns. Finally, we open a discussion on some issues regarding these methods, and highlight currently active related data analysis problems

    Relating recharge mechanisms to chemical changes in an updip Appalachian coal mine discharge: A case study from Lambert Run, West Virginia

    Get PDF
    Impaired drainage from active and abandoned mines degrades the water quality of receiving streams and aquifers. Coal mine drainage (CMD) has been studied for decades in Appalachia, but unknowns and uncertainties are still present, including the influence of mine hydrogeology on the outflow chemistry of above-drainage mines. To evaluate the influence of recharge type on above-drainage mine chemistry, samples were collected every two weeks at a CMD outflow treatment system in Harrison County, West Virginia. Samples were collected to measure geochemical changes taking place in the mine workings and along the flowpath of the passive treatment system. Samples were divided into two groups based on the dominant type of recharge entering the mine during sample collection. Direct recharge dominated samples had lower concentrations of hydrolysable cations at the mine outflow, causing the discharge to be both net-acidic and net-alkaline during the study period. Total rare earth element concentrations at the outflow were positively correlated to Fe, Al, and Mn, and negatively correlated to pH and discharge. During both recharge regimes, Fe, Al, Mn, and rare earth elements were removed along the treatment system flowpath. Throughout the study period, 89% of dissolved inorganic carbon in the system was degassed to the atmosphere as CO2. This study demonstrates that varied recharge mechanisms can influence the CMD outflow chemistry, with implications in treatment system design, interpretation of routine chemical data, and extrapolation of CO2 efflux from CMD outflows for large-scale carbon balance studies
    • …
    corecore