5 research outputs found
Structure Regularization for Structured Prediction: Theories and Experiments
While there are many studies on weight regularization, the study on structure
regularization is rare. Many existing systems on structured prediction focus on
increasing the level of structural dependencies within the model. However, this
trend could have been misdirected, because our study suggests that complex
structures are actually harmful to generalization ability in structured
prediction. To control structure-based overfitting, we propose a structure
regularization framework via \emph{structure decomposition}, which decomposes
training samples into mini-samples with simpler structures, deriving a model
with better generalization power. We show both theoretically and empirically
that structure regularization can effectively control overfitting risk and lead
to better accuracy. As a by-product, the proposed method can also substantially
accelerate the training speed. The method and the theoretical results can apply
to general graphical models with arbitrary structures. Experiments on
well-known tasks demonstrate that our method can easily beat the benchmark
systems on those highly-competitive tasks, achieving state-of-the-art
accuracies yet with substantially faster training speed
Two-stage Sampled Learning Theory on Distributions
We focus on the distribution regression problem: regressing to a real-valued
response from a probability distribution. Although there exist a large number
of similarity measures between distributions, very little is known about their
generalization performance in specific learning tasks. Learning problems
formulated on distributions have an inherent two-stage sampled difficulty: in
practice only samples from sampled distributions are observable, and one has to
build an estimate on similarities computed between sets of points. To the best
of our knowledge, the only existing method with consistency guarantees for
distribution regression requires kernel density estimation as an intermediate
step (which suffers from slow convergence issues in high dimensions), and the
domain of the distributions to be compact Euclidean. In this paper, we provide
theoretical guarantees for a remarkably simple algorithmic alternative to solve
the distribution regression problem: embed the distributions to a reproducing
kernel Hilbert space, and learn a ridge regressor from the embeddings to the
outputs. Our main contribution is to prove the consistency of this technique in
the two-stage sampled setting under mild conditions (on separable, topological
domains endowed with kernels). For a given total number of observations, we
derive convergence rates as an explicit function of the problem difficulty. As
a special case, we answer a 15-year-old open question: we establish the
consistency of the classical set kernel [Haussler, 1999; Gartner et. al, 2002]
in regression, and cover more recent kernels on distributions, including those
due to [Christmann and Steinwart, 2010].Comment: v6: accepted at AISTATS-2015 for oral presentation; final version;
code: https://bitbucket.org/szzoli/ite/; extension to the misspecified and
vector-valued case: http://arxiv.org/abs/1411.206
Learning Theory for Distribution Regression
We focus on the distribution regression problem: regressing to vector-valued
outputs from probability measures. Many important machine learning and
statistical tasks fit into this framework, including multi-instance learning
and point estimation problems without analytical solution (such as
hyperparameter or entropy estimation). Despite the large number of available
heuristics in the literature, the inherent two-stage sampled nature of the
problem makes the theoretical analysis quite challenging, since in practice
only samples from sampled distributions are observable, and the estimates have
to rely on similarities computed between sets of points. To the best of our
knowledge, the only existing technique with consistency guarantees for
distribution regression requires kernel density estimation as an intermediate
step (which often performs poorly in practice), and the domain of the
distributions to be compact Euclidean. In this paper, we study a simple,
analytically computable, ridge regression-based alternative to distribution
regression, where we embed the distributions to a reproducing kernel Hilbert
space, and learn the regressor from the embeddings to the outputs. Our main
contribution is to prove that this scheme is consistent in the two-stage
sampled setup under mild conditions (on separable topological domains enriched
with kernels): we present an exact computational-statistical efficiency
trade-off analysis showing that our estimator is able to match the one-stage
sampled minimax optimal rate [Caponnetto and De Vito, 2007; Steinwart et al.,
2009]. This result answers a 17-year-old open question, establishing the
consistency of the classical set kernel [Haussler, 1999; Gaertner et. al, 2002]
in regression. We also cover consistency for more recent kernels on
distributions, including those due to [Christmann and Steinwart, 2010].Comment: Final version appeared at JMLR, with supplement. Code:
https://bitbucket.org/szzoli/ite/. arXiv admin note: text overlap with
arXiv:1402.175
Learning alternative ways of performing a task
[EN] A common way of learning to perform a task is to observe how it is carried out by experts. However, it is well known that for most tasks there is no unique way to perform them. This is especially noticeable the more complex the task is because factors such as the skill or the know-how of the expert may well affect the way she solves the task. In addition, learning from experts also suffers of having a small set of training examples generally coming from several experts (since experts are usually a limited and ex- pensive resource), being all of them positive examples (i.e. examples that represent successful executions of the task). Traditional machine learning techniques are not useful in such scenarios, as they require extensive training data. Starting from very few executions of the task presented as activity sequences, we introduce a novel inductive approach for learning multiple models, with each one representing an alter- native strategy of performing a task. By an iterative process based on generalisation and specialisation, we learn the underlying patterns that capture the different styles of performing a task exhibited by the examples. We illustrate our approach on two common activity recognition tasks: a surgical skills training task and a cooking domain. We evaluate the inferred models with respect to two metrics that measure how well the models represent the examples and capture the different forms of executing a task showed by the examples. We compare our results with the traditional process mining approach and show that a small set of meaningful examples is enough to obtain patterns that capture the different strategies that are followed to solve the tasks.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN2014-61716-EXP (SUPERVASION) and RTI2018-094403-B-C32, and by Generalitat Valenciana under grant PROMETEO/2019/098. David Nieves is also supported by the Spanish MINECO under FPI grant (BES-2016-078863).Nieves, D.; RamĂrez Quintana, MJ.; Montserrat Aranda, C.; Ferri RamĂrez, C.; Hernández-Orallo, J. (2020). Learning alternative ways of performing a task. Expert Systems with Applications. 148:1-18. https://doi.org/10.1016/j.eswa.2020.113263S118148van der Aalst, W. M., Bolt, A., & van Zelst, S. J. (2017). Rapidprom: Mine your processes and not just your data. arXiv:1703.03740.Van der Aalst, W. M. P., Reijers, H. A., Weijters, A. J. M. M., van Dongen, B. F., Alves de Medeiros, A. K., Song, M., & Verbeek, H. M. W. (2007). Business process mining: An industrial application. Information Systems, 32(5), 713-732. doi:10.1016/j.is.2006.05.003AdĂ©, H., de Raedt, L., & Bruynooghe, M. (1995). Declarative bias for specific-to-general ILP systems. Machine Learning, 20(1-2), 119-154. doi:10.1007/bf00993477Ahmidi, N., Tao, L., Sefati, S., Gao, Y., Lea, C., Haro, B. B., … Hager, G. D. (2017). A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery. IEEE Transactions on Biomedical Engineering, 64(9), 2025-2041. doi:10.1109/tbme.2016.2647680Baker, C. L., Saxe, R., & Tenenbaum, J. B. (2009). Action understanding as inverse planning. Cognition, 113(3), 329-349. doi:10.1016/j.cognition.2009.07.005Blum, T., Padoy, N., FeuĂźner, H., & Navab, N. (2008). Workflow mining for visualization and analysis of surgeries. International Journal of Computer Assisted Radiology and Surgery, 3(5), 379-386. doi:10.1007/s11548-008-0239-0Camacho, R., Carreira, P., Lynce, I., & Resendes, S. (2014). An ontology-based approach to conflict resolution in Home and Building Automation Systems. Expert Systems with Applications, 41(14), 6161-6173. doi:10.1016/j.eswa.2014.04.017Caruana, R. (1997). Machine Learning, 28(1), 41-75. doi:10.1023/a:1007379606734Liming Chen, Hoey, J., Nugent, C. D., Cook, D. J., & Zhiwen Yu. (2012). Sensor-Based Activity Recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 790-808. doi:10.1109/tsmcc.2012.2198883Chen, L., Nugent, C. D., & Wang, H. (2012). A Knowledge-Driven Approach to Activity Recognition in Smart Homes. IEEE Transactions on Knowledge and Data Engineering, 24(6), 961-974. doi:10.1109/tkde.2011.51Chen, Y., Wang, J., Huang, M., & Yu, H. (2019). Cross-position activity recognition with stratified transfer learning. Pervasive and Mobile Computing, 57, 1-13. doi:10.1016/j.pmcj.2019.04.004Cook, D., Feuz, K. D., & Krishnan, N. C. (2013). Transfer learning for activity recognition: a survey. Knowledge and Information Systems, 36(3), 537-556. doi:10.1007/s10115-013-0665-3Dai, P., Di, H., Dong, L., Tao, L., & Xu, G. (2008). Group Interaction Analysis in Dynamic Context. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 38(1), 275-282. doi:10.1109/tsmcb.2007.909939Ding, R., Li, X., Nie, L., Li, J., Si, X., Chu, D., … Zhan, D. (2018). Empirical Study and Improvement on Deep Transfer Learning for Human Activity Recognition. Sensors, 19(1), 57. doi:10.3390/s19010057Duong, T., Phung, D., Bui, H., & Venkatesh, S. (2009). Efficient duration and hierarchical modeling for human activity recognition. Artificial Intelligence, 173(7-8), 830-856. doi:10.1016/j.artint.2008.12.005FĂĽrnkranz, J. (1999). Artificial Intelligence Review, 13(1), 3-54. doi:10.1023/a:1006524209794Geng, L., & Hamilton, H. J. (2006). Interestingness measures for data mining. ACM Computing Surveys, 38(3), 9. doi:10.1145/1132960.1132963Hoey, J., Poupart, P., Bertoldi, A. von, Craig, T., Boutilier, C., & Mihailidis, A. (2010). Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process. Computer Vision and Image Understanding, 114(5), 503-519. doi:10.1016/j.cviu.2009.06.008Hong, J., Suh, E., & Kim, S.-J. (2009). Context-aware systems: A literature review and classification. Expert Systems with Applications, 36(4), 8509-8522. doi:10.1016/j.eswa.2008.10.071Hussein, A., Gaber, M. M., Elyan, E., & Jayne, C. (2017). Imitation Learning. ACM Computing Surveys, 50(2), 1-35. doi:10.1145/3054912Kalra, L., Zhao, X., Soto, A. J., & Milios, E. (2013). Detection of daily living activities using a two-stage Markov model. Journal of Ambient Intelligence and Smart Environments, 5(3), 273-285. doi:10.3233/ais-130208Kardas, K., & Cicekli, N. K. (2017). SVAS: Surveillance Video Analysis System. Expert Systems with Applications, 89, 343-361. doi:10.1016/j.eswa.2017.07.051KrĂĽger, F., Nyolt, M., Yordanova, K., Hein, A., & Kirste, T. (2014). Computational State Space Models for Activity and Intention Recognition. A Feasibility Study. PLoS ONE, 9(11), e109381. doi:10.1371/journal.pone.0109381Neumann, A., Elbrechter, C., Pfeiffer-LeĂźmann, N., Kõiva, R., Carlmeyer, B., RĂĽther, S., … Ritter, H. J. (2017). «KogniChef»: A Cognitive Cooking Assistant. KI - KĂĽnstliche Intelligenz, 31(3), 273-281. doi:10.1007/s13218-017-0488-6Papadimitriou, P., Dasdan, A., & Garcia-Molina, H. (2010). Web graph similarity for anomaly detection. Journal of Internet Services and Applications, 1(1), 19-30. doi:10.1007/s13174-010-0003-xPeng, L., Chen, L., Ye, Z., & Zhang, Y. (2018). AROMA. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(2), 1-16. doi:10.1145/3214277Rosen, J., Solazzo, M., Hannaford, B., & Sinanan, M. (2002). Task Decomposition of Laparoscopic Surgery for Objective Evaluation of Surgical Residents’ Learning Curve Using Hidden Markov Model. Computer Aided Surgery, 7(1), 49-61. doi:10.3109/10929080209146016Sadilek, A., & Kautz, H. (2012). Location-Based Reasoning about Complex Multi-Agent Behavior. Journal of Artificial Intelligence Research, 43, 87-133. doi:10.1613/jair.3421Sanchez, D., Tentori, M., & Favela, J. (2008). Activity Recognition for the Smart Hospital. IEEE Intelligent Systems, 23(2), 50-57. doi:10.1109/mis.2008.18Ĺ krjanc, I., Andonovski, G., Ledezma, A., Sipele, O., Iglesias, J. A., & Sanchis, A. (2018). Evolving cloud-based system for the recognition of drivers’ actions. Expert Systems with Applications, 99, 231-238. doi:10.1016/j.eswa.2017.11.008Sun, X., Kashima, H., & Ueda, N. (2013). Large-Scale Personalized Human Activity Recognition Using Online Multitask Learning. IEEE Transactions on Knowledge and Data Engineering, 25(11), 2551-2563. doi:10.1109/tkde.2012.246Twomey, N., Diethe, T., Kull, M., Song, H., Camplani, M., Hannuna, S., et al. (2016). The sphere challenge. arXiv:1603.00797.Voulodimos, A., Kosmopoulos, D., Vasileiou, G., Sardis, E., Anagnostopoulos, V., Lalos, C., … Varvarigou, T. (2012). A Threefold Dataset for Activity and Workflow Recognition in Complex Industrial Environments. IEEE Multimedia, 19(3), 42-52. doi:10.1109/mmul.2012.31Wallace, C. S. (1999). Minimum Message Length and Kolmogorov Complexity. The Computer Journal, 42(4), 270-283. doi:10.1093/comjnl/42.4.27