5 research outputs found

    Structure Regularization for Structured Prediction: Theories and Experiments

    Full text link
    While there are many studies on weight regularization, the study on structure regularization is rare. Many existing systems on structured prediction focus on increasing the level of structural dependencies within the model. However, this trend could have been misdirected, because our study suggests that complex structures are actually harmful to generalization ability in structured prediction. To control structure-based overfitting, we propose a structure regularization framework via \emph{structure decomposition}, which decomposes training samples into mini-samples with simpler structures, deriving a model with better generalization power. We show both theoretically and empirically that structure regularization can effectively control overfitting risk and lead to better accuracy. As a by-product, the proposed method can also substantially accelerate the training speed. The method and the theoretical results can apply to general graphical models with arbitrary structures. Experiments on well-known tasks demonstrate that our method can easily beat the benchmark systems on those highly-competitive tasks, achieving state-of-the-art accuracies yet with substantially faster training speed

    Two-stage Sampled Learning Theory on Distributions

    Get PDF
    We focus on the distribution regression problem: regressing to a real-valued response from a probability distribution. Although there exist a large number of similarity measures between distributions, very little is known about their generalization performance in specific learning tasks. Learning problems formulated on distributions have an inherent two-stage sampled difficulty: in practice only samples from sampled distributions are observable, and one has to build an estimate on similarities computed between sets of points. To the best of our knowledge, the only existing method with consistency guarantees for distribution regression requires kernel density estimation as an intermediate step (which suffers from slow convergence issues in high dimensions), and the domain of the distributions to be compact Euclidean. In this paper, we provide theoretical guarantees for a remarkably simple algorithmic alternative to solve the distribution regression problem: embed the distributions to a reproducing kernel Hilbert space, and learn a ridge regressor from the embeddings to the outputs. Our main contribution is to prove the consistency of this technique in the two-stage sampled setting under mild conditions (on separable, topological domains endowed with kernels). For a given total number of observations, we derive convergence rates as an explicit function of the problem difficulty. As a special case, we answer a 15-year-old open question: we establish the consistency of the classical set kernel [Haussler, 1999; Gartner et. al, 2002] in regression, and cover more recent kernels on distributions, including those due to [Christmann and Steinwart, 2010].Comment: v6: accepted at AISTATS-2015 for oral presentation; final version; code: https://bitbucket.org/szzoli/ite/; extension to the misspecified and vector-valued case: http://arxiv.org/abs/1411.206

    Learning Theory for Distribution Regression

    Full text link
    We focus on the distribution regression problem: regressing to vector-valued outputs from probability measures. Many important machine learning and statistical tasks fit into this framework, including multi-instance learning and point estimation problems without analytical solution (such as hyperparameter or entropy estimation). Despite the large number of available heuristics in the literature, the inherent two-stage sampled nature of the problem makes the theoretical analysis quite challenging, since in practice only samples from sampled distributions are observable, and the estimates have to rely on similarities computed between sets of points. To the best of our knowledge, the only existing technique with consistency guarantees for distribution regression requires kernel density estimation as an intermediate step (which often performs poorly in practice), and the domain of the distributions to be compact Euclidean. In this paper, we study a simple, analytically computable, ridge regression-based alternative to distribution regression, where we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs. Our main contribution is to prove that this scheme is consistent in the two-stage sampled setup under mild conditions (on separable topological domains enriched with kernels): we present an exact computational-statistical efficiency trade-off analysis showing that our estimator is able to match the one-stage sampled minimax optimal rate [Caponnetto and De Vito, 2007; Steinwart et al., 2009]. This result answers a 17-year-old open question, establishing the consistency of the classical set kernel [Haussler, 1999; Gaertner et. al, 2002] in regression. We also cover consistency for more recent kernels on distributions, including those due to [Christmann and Steinwart, 2010].Comment: Final version appeared at JMLR, with supplement. Code: https://bitbucket.org/szzoli/ite/. arXiv admin note: text overlap with arXiv:1402.175

    Learning alternative ways of performing a task

    Full text link
    [EN] A common way of learning to perform a task is to observe how it is carried out by experts. However, it is well known that for most tasks there is no unique way to perform them. This is especially noticeable the more complex the task is because factors such as the skill or the know-how of the expert may well affect the way she solves the task. In addition, learning from experts also suffers of having a small set of training examples generally coming from several experts (since experts are usually a limited and ex- pensive resource), being all of them positive examples (i.e. examples that represent successful executions of the task). Traditional machine learning techniques are not useful in such scenarios, as they require extensive training data. Starting from very few executions of the task presented as activity sequences, we introduce a novel inductive approach for learning multiple models, with each one representing an alter- native strategy of performing a task. By an iterative process based on generalisation and specialisation, we learn the underlying patterns that capture the different styles of performing a task exhibited by the examples. We illustrate our approach on two common activity recognition tasks: a surgical skills training task and a cooking domain. We evaluate the inferred models with respect to two metrics that measure how well the models represent the examples and capture the different forms of executing a task showed by the examples. We compare our results with the traditional process mining approach and show that a small set of meaningful examples is enough to obtain patterns that capture the different strategies that are followed to solve the tasks.This work has been partially supported by the EU (FEDER) and the Spanish MINECO under grants TIN2014-61716-EXP (SUPERVASION) and RTI2018-094403-B-C32, and by Generalitat Valenciana under grant PROMETEO/2019/098. David Nieves is also supported by the Spanish MINECO under FPI grant (BES-2016-078863).Nieves, D.; Ramírez Quintana, MJ.; Montserrat Aranda, C.; Ferri Ramírez, C.; Hernández-Orallo, J. (2020). Learning alternative ways of performing a task. Expert Systems with Applications. 148:1-18. https://doi.org/10.1016/j.eswa.2020.113263S118148van der Aalst, W. M., Bolt, A., & van Zelst, S. J. (2017). Rapidprom: Mine your processes and not just your data. arXiv:1703.03740.Van der Aalst, W. M. P., Reijers, H. A., Weijters, A. J. M. M., van Dongen, B. F., Alves de Medeiros, A. K., Song, M., & Verbeek, H. M. W. (2007). Business process mining: An industrial application. Information Systems, 32(5), 713-732. doi:10.1016/j.is.2006.05.003Adé, H., de Raedt, L., & Bruynooghe, M. (1995). Declarative bias for specific-to-general ILP systems. Machine Learning, 20(1-2), 119-154. doi:10.1007/bf00993477Ahmidi, N., Tao, L., Sefati, S., Gao, Y., Lea, C., Haro, B. B., … Hager, G. D. (2017). A Dataset and Benchmarks for Segmentation and Recognition of Gestures in Robotic Surgery. IEEE Transactions on Biomedical Engineering, 64(9), 2025-2041. doi:10.1109/tbme.2016.2647680Baker, C. L., Saxe, R., & Tenenbaum, J. B. (2009). Action understanding as inverse planning. Cognition, 113(3), 329-349. doi:10.1016/j.cognition.2009.07.005Blum, T., Padoy, N., Feußner, H., & Navab, N. (2008). Workflow mining for visualization and analysis of surgeries. International Journal of Computer Assisted Radiology and Surgery, 3(5), 379-386. doi:10.1007/s11548-008-0239-0Camacho, R., Carreira, P., Lynce, I., & Resendes, S. (2014). An ontology-based approach to conflict resolution in Home and Building Automation Systems. Expert Systems with Applications, 41(14), 6161-6173. doi:10.1016/j.eswa.2014.04.017Caruana, R. (1997). Machine Learning, 28(1), 41-75. doi:10.1023/a:1007379606734Liming Chen, Hoey, J., Nugent, C. D., Cook, D. J., & Zhiwen Yu. (2012). Sensor-Based Activity Recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 790-808. doi:10.1109/tsmcc.2012.2198883Chen, L., Nugent, C. D., & Wang, H. (2012). A Knowledge-Driven Approach to Activity Recognition in Smart Homes. IEEE Transactions on Knowledge and Data Engineering, 24(6), 961-974. doi:10.1109/tkde.2011.51Chen, Y., Wang, J., Huang, M., & Yu, H. (2019). Cross-position activity recognition with stratified transfer learning. Pervasive and Mobile Computing, 57, 1-13. doi:10.1016/j.pmcj.2019.04.004Cook, D., Feuz, K. D., & Krishnan, N. C. (2013). Transfer learning for activity recognition: a survey. Knowledge and Information Systems, 36(3), 537-556. doi:10.1007/s10115-013-0665-3Dai, P., Di, H., Dong, L., Tao, L., & Xu, G. (2008). Group Interaction Analysis in Dynamic Context. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 38(1), 275-282. doi:10.1109/tsmcb.2007.909939Ding, R., Li, X., Nie, L., Li, J., Si, X., Chu, D., … Zhan, D. (2018). Empirical Study and Improvement on Deep Transfer Learning for Human Activity Recognition. Sensors, 19(1), 57. doi:10.3390/s19010057Duong, T., Phung, D., Bui, H., & Venkatesh, S. (2009). Efficient duration and hierarchical modeling for human activity recognition. Artificial Intelligence, 173(7-8), 830-856. doi:10.1016/j.artint.2008.12.005Fürnkranz, J. (1999). Artificial Intelligence Review, 13(1), 3-54. doi:10.1023/a:1006524209794Geng, L., & Hamilton, H. J. (2006). Interestingness measures for data mining. ACM Computing Surveys, 38(3), 9. doi:10.1145/1132960.1132963Hoey, J., Poupart, P., Bertoldi, A. von, Craig, T., Boutilier, C., & Mihailidis, A. (2010). Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process. Computer Vision and Image Understanding, 114(5), 503-519. doi:10.1016/j.cviu.2009.06.008Hong, J., Suh, E., & Kim, S.-J. (2009). Context-aware systems: A literature review and classification. Expert Systems with Applications, 36(4), 8509-8522. doi:10.1016/j.eswa.2008.10.071Hussein, A., Gaber, M. M., Elyan, E., & Jayne, C. (2017). Imitation Learning. ACM Computing Surveys, 50(2), 1-35. doi:10.1145/3054912Kalra, L., Zhao, X., Soto, A. J., & Milios, E. (2013). Detection of daily living activities using a two-stage Markov model. Journal of Ambient Intelligence and Smart Environments, 5(3), 273-285. doi:10.3233/ais-130208Kardas, K., & Cicekli, N. K. (2017). SVAS: Surveillance Video Analysis System. Expert Systems with Applications, 89, 343-361. doi:10.1016/j.eswa.2017.07.051Krüger, F., Nyolt, M., Yordanova, K., Hein, A., & Kirste, T. (2014). Computational State Space Models for Activity and Intention Recognition. A Feasibility Study. PLoS ONE, 9(11), e109381. doi:10.1371/journal.pone.0109381Neumann, A., Elbrechter, C., Pfeiffer-Leßmann, N., Kõiva, R., Carlmeyer, B., Rüther, S., … Ritter, H. J. (2017). «KogniChef»: A Cognitive Cooking Assistant. KI - Künstliche Intelligenz, 31(3), 273-281. doi:10.1007/s13218-017-0488-6Papadimitriou, P., Dasdan, A., & Garcia-Molina, H. (2010). Web graph similarity for anomaly detection. Journal of Internet Services and Applications, 1(1), 19-30. doi:10.1007/s13174-010-0003-xPeng, L., Chen, L., Ye, Z., & Zhang, Y. (2018). AROMA. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(2), 1-16. doi:10.1145/3214277Rosen, J., Solazzo, M., Hannaford, B., & Sinanan, M. (2002). Task Decomposition of Laparoscopic Surgery for Objective Evaluation of Surgical Residents’ Learning Curve Using Hidden Markov Model. Computer Aided Surgery, 7(1), 49-61. doi:10.3109/10929080209146016Sadilek, A., & Kautz, H. (2012). Location-Based Reasoning about Complex Multi-Agent Behavior. Journal of Artificial Intelligence Research, 43, 87-133. doi:10.1613/jair.3421Sanchez, D., Tentori, M., & Favela, J. (2008). Activity Recognition for the Smart Hospital. IEEE Intelligent Systems, 23(2), 50-57. doi:10.1109/mis.2008.18Škrjanc, I., Andonovski, G., Ledezma, A., Sipele, O., Iglesias, J. A., & Sanchis, A. (2018). Evolving cloud-based system for the recognition of drivers’ actions. Expert Systems with Applications, 99, 231-238. doi:10.1016/j.eswa.2017.11.008Sun, X., Kashima, H., & Ueda, N. (2013). Large-Scale Personalized Human Activity Recognition Using Online Multitask Learning. IEEE Transactions on Knowledge and Data Engineering, 25(11), 2551-2563. doi:10.1109/tkde.2012.246Twomey, N., Diethe, T., Kull, M., Song, H., Camplani, M., Hannuna, S., et al. (2016). The sphere challenge. arXiv:1603.00797.Voulodimos, A., Kosmopoulos, D., Vasileiou, G., Sardis, E., Anagnostopoulos, V., Lalos, C., … Varvarigou, T. (2012). A Threefold Dataset for Activity and Workflow Recognition in Complex Industrial Environments. IEEE Multimedia, 19(3), 42-52. doi:10.1109/mmul.2012.31Wallace, C. S. (1999). Minimum Message Length and Kolmogorov Complexity. The Computer Journal, 42(4), 270-283. doi:10.1093/comjnl/42.4.27
    corecore