10,760 research outputs found
A Recurrent Neural Network Survival Model: Predicting Web User Return Time
The size of a website's active user base directly affects its value. Thus, it
is important to monitor and influence a user's likelihood to return to a site.
Essential to this is predicting when a user will return. Current state of the
art approaches to solve this problem come in two flavors: (1) Recurrent Neural
Network (RNN) based solutions and (2) survival analysis methods. We observe
that both techniques are severely limited when applied to this problem.
Survival models can only incorporate aggregate representations of users instead
of automatically learning a representation directly from a raw time series of
user actions. RNNs can automatically learn features, but can not be directly
trained with examples of non-returning users who have no target value for their
return time. We develop a novel RNN survival model that removes the limitations
of the state of the art methods. We demonstrate that this model can
successfully be applied to return time prediction on a large e-commerce dataset
with a superior ability to discriminate between returning and non-returning
users than either method applied in isolation.Comment: Accepted into ECML PKDD 2018; 8 figures and 1 tabl
Survival ensembles by the sum of pairwise differences with application to lung cancer microarray studies
Lung cancer is among the most common cancers in the United States, in terms
of incidence and mortality. In 2009, it is estimated that more than 150,000
deaths will result from lung cancer alone. Genetic information is an extremely
valuable data source in characterizing the personal nature of cancer. Over the
past several years, investigators have conducted numerous association studies
where intensive genetic data is collected on relatively few patients compared
to the numbers of gene predictors, with one scientific goal being to identify
genetic features associated with cancer recurrence or survival. In this note,
we propose high-dimensional survival analysis through a new application of
boosting, a powerful tool in machine learning. Our approach is based on an
accelerated lifetime model and minimizing the sum of pairwise differences in
residuals. We apply our method to a recent microarray study of lung
adenocarcinoma and find that our ensemble is composed of 19 genes, while a
proportional hazards (PH) ensemble is composed of nine genes, a proper subset
of the 19-gene panel. In one of our simulation scenarios, we demonstrate that
PH boosting in a misspecified model tends to underfit and ignore
moderately-sized covariate effects, on average. Diagnostic analyses suggest
that the PH assumption is not satisfied in the microarray data and may explain,
in part, the discrepancy in the sets of active coefficients. Our simulation
studies and comparative data analyses demonstrate how statistical learning by
PH models alone is insufficient.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS426 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
SAFE: A Neural Survival Analysis Model for Fraud Early Detection
Many online platforms have deployed anti-fraud systems to detect and prevent
fraudulent activities. However, there is usually a gap between the time that a
user commits a fraudulent action and the time that the user is suspended by the
platform. How to detect fraudsters in time is a challenging problem. Most of
the existing approaches adopt classifiers to predict fraudsters given their
activity sequences along time. The main drawback of classification models is
that the prediction results between consecutive timestamps are often
inconsistent. In this paper, we propose a survival analysis based fraud early
detection model, SAFE, which maps dynamic user activities to survival
probabilities that are guaranteed to be monotonically decreasing along time.
SAFE adopts recurrent neural network (RNN) to handle user activity sequences
and directly outputs hazard values at each timestamp, and then, survival
probability derived from hazard values is deployed to achieve consistent
predictions. Because we only observe the user suspended time instead of the
fraudulent activity time in the training data, we revise the loss function of
the regular survival model to achieve fraud early detection. Experimental
results on two real world datasets demonstrate that SAFE outperforms both the
survival analysis model and recurrent neural network model alone as well as
state-of-the-art fraud early detection approaches.Comment: To appear in AAAI-201
Failure mode prediction and energy forecasting of PV plants to assist dynamic maintenance tasks by ANN based models
In the field of renewable energy, reliability analysis techniques combining the operating time of the system with the observation of operational and environmental conditions, are gaining importance over time.
In this paper, reliability models are adapted to incorporate monitoring data on operating assets, as well as information on their environmental conditions, in their calculations. To that end, a logical decision tool based on two artificial neural networks models is presented. This tool allows updating assets reliability analysis according to changes in operational and/or environmental conditions.
The proposed tool could easily be automated within a supervisory control and data acquisition system, where reference values and corresponding warnings and alarms could be now dynamically generated using the tool. Thanks to this capability, on-line diagnosis and/or potential asset degradation prediction can be certainly improved.
Reliability models in the tool presented are developed according to the available amount of failure data and are used for early detection of degradation in energy production due to power inverter and solar trackers functional failures.
Another capability of the tool presented in the paper is to assess the economic risk associated with the system under existing conditions and for a certain period of time. This information can then also be used to trigger preventive maintenance activities
- …