2,252 research outputs found

    A simple multi-class boosting framework with theoretical guarantees and empirical proficiency

    Get PDF
    There is a need for simple yet accurate white-box learning systems that train quickly and with lit- tle data. To this end, we showcase REBEL, a multi-class boosting method, and present a novel family of weak learners called localized similar- ities. Our framework provably minimizes the training error of any dataset at an exponential rate. We carry out experiments on a variety of synthetic and real datasets, demonstrating a con- sistent tendency to avoid overfitting. We eval- uate our method on MNIST and standard UCI datasets against other state-of-the-art methods, showing the empirical proficiency of our method

    A simple multi-class boosting framework with theoretical guarantees and empirical proficiency

    Get PDF
    There is a need for simple yet accurate white-box learning systems that train quickly and with lit- tle data. To this end, we showcase REBEL, a multi-class boosting method, and present a novel family of weak learners called localized similar- ities. Our framework provably minimizes the training error of any dataset at an exponential rate. We carry out experiments on a variety of synthetic and real datasets, demonstrating a con- sistent tendency to avoid overfitting. We eval- uate our method on MNIST and standard UCI datasets against other state-of-the-art methods, showing the empirical proficiency of our method

    Boosting Boosting

    Get PDF
    Machine learning is becoming prevalent in all aspects of our lives. For some applications, there is a need for simple but accurate white-box systems that are able to train efficiently and with little data. "Boosting" is an intuitive method, combining many simple (possibly inaccurate) predictors to form a powerful, accurate classifier. Boosted classifiers are intuitive, easy to use, and exhibit the fastest speeds at test-time when implemented as a cascade. However, they have a few drawbacks: training decision trees is a relatively slow procedure, and from a theoretical standpoint, no simple unified framework for cost-sensitive multi-class boosting exists. Furthermore, (axis-aligned) decision trees may be inadequate in some situations, thereby stalling training; and even in cases where they are sufficiently useful, they don't capture the intrinsic nature of the data, as they tend to form boundaries that overfit. My thesis focuses on remedying these three drawbacks of boosting. Ch.III outlines a method (called QuickBoost) that trains identical classifiers at an order of magnitude faster than before, based on a proof of a bound. In Ch.IV, a unified framework for cost-sensitive multi-class boosting (called REBEL) is proposed, both advancing theory and demonstrating empirical gains. Finally, Ch.V describes a novel family of weak learners (called Localized Similarities) that guarantee theoretical bounds and outperform decision trees and Neural Nets (as well as several other commonly used classification methods) on a range of datasets. The culmination of my work is an easy-to-use, fast-training, cost-sensitive multi-class boosting framework whose functionality is interpretable (since each weak learner is a simple comparison of similarity), and whose performance is better than Neural Networks and other competing methods. It is the tool that everyone should have in their toolbox and the first one they try.</p

    Predicting Census Survey Response Rates With Parsimonious Additive Models and Structured Interactions

    Full text link
    In this paper we consider the problem of predicting survey response rates using a family of flexible and interpretable nonparametric models. The study is motivated by the US Census Bureau's well-known ROAM application which uses a linear regression model trained on the US Census Planning Database data to identify hard-to-survey areas. A crowdsourcing competition (Erdman and Bates, 2016) organized around ten years ago revealed that machine learning methods based on ensembles of regression trees led to the best performance in predicting survey response rates; however, the corresponding models could not be adopted for the intended application due to their black-box nature. We consider nonparametric additive models with small number of main and pairwise interaction effects using 0\ell_0-based penalization. From a methodological viewpoint, we study both computational and statistical aspects of our estimator; and discuss variants that incorporate strong hierarchical interactions. Our algorithms (opensourced on github) extend the computational frontiers of existing algorithms for sparse additive models, to be able to handle datasets relevant for the application we consider. We discuss and interpret findings from our model on the US Census Planning Database. In addition to being useful from an interpretability standpoint, our models lead to predictions that appear to be better than popular black-box machine learning methods based on gradient boosting and feedforward neural networks - suggesting that it is possible to have models that have the best of both worlds: good model accuracy and interpretability.Comment: 40 pages, 7 figure

    Feature Selection with Annealing for Forecasting Financial Time Series

    Full text link
    Stock market and cryptocurrency forecasting is very important to investors as they aspire to achieve even the slightest improvement to their buy or hold strategies so that they may increase profitability. However, obtaining accurate and reliable predictions is challenging, noting that accuracy does not equate to reliability, especially when financial time-series forecasting is applied owing to its complex and chaotic tendencies. To mitigate this complexity, this study provides a comprehensive method for forecasting financial time series based on tactical input output feature mapping techniques using machine learning (ML) models. During the prediction process, selecting the relevant indicators is vital to obtaining the desired results. In the financial field, limited attention has been paid to this problem with ML solutions. We investigate the use of feature selection with annealing (FSA) for the first time in this field, and we apply the least absolute shrinkage and selection operator (Lasso) method to select the features from more than 1,000 candidates obtained from 26 technical classifiers with different periods and lags. Boruta (BOR) feature selection, a wrapper method, is used as a baseline for comparison. Logistic regression (LR), extreme gradient boosting (XGBoost), and long short-term memory (LSTM) are then applied to the selected features for forecasting purposes using 10 different financial datasets containing cryptocurrencies and stocks. The dependent variables consisted of daily logarithmic returns and trends. The mean-squared error for regression, area under the receiver operating characteristic curve, and classification accuracy were used to evaluate model performance, and the statistical significance of the forecasting results was tested using paired t-tests. Experiments indicate that the FSA algorithm increased the performance of ML models, regardless of problem type.Comment: 37 pages, 1 figures and 12 table

    From Dakar to Brasilia: Monitoring Unesco´s Education Goals

    Get PDF
    Active participation of Brazilian civil society, coupled with the 2007 education development plan, launched by the Brazilian government provides an interesting example of the influences of the Dakar Goals. The two domestic initiatives share the same name, spirit and direction proposed in Dakar 2000. We analyse here changes in the Brazilian policies and indicators related to the Dakar Education Goals since its creation, we note: (i) an increase in enrolment over the relevant period; (ii) access to primary education was nearly universal by 2000; (iii) over-aged youth and adult students fell considerably during the period, but access did not expand; (iv) illiteracy has been falling at a rate which, if sustained, will enable us to meet the goal; (v) gender discrimination did not take place in Brazil; (vi) most pupil proficiency indicators have progressively deteriorated from what was already a low standard. In summary, quantity indicators did improve over the period while most quality indicators worsened.

    The digital and multilingual competences. A good teaching practice for the EFL classroom of Secondary Education

    Get PDF
    In the context of modern education, the European Commission has recommended 8 key competences for lifelong learning. When considering the enhancement role played by the digital competence in the development of the all other ones, nurturing this competence becomes paramount. From the perspective of EFL education, the usage of ICTs in the classroom is directly linked to an improvement of communication skills. This is due to the fact that technology affords numerous opportunities in the form of digital resources and platforms available for educators to experiment with. On the other side, students seem more engaged and motivated to take part in class lessons. They want to actively interact in group activities and let their creative voices be heard. With that in mind, the paper presents a 10 session proposal, conceived as a good teaching practice directed at 4th of ESO EFL students.En el contexto de la educación moderna, la Comisión Europea ha recomendado 8 competencias clave para el aprendizaje permanente. Al considerar el papel potenciador que desempeña la competencia digital en el desarrollo de todas las demás, cultivar esta competencia se convierte en algo primordial. Con respecto a la enseñanza del inglés como lengua extranjera, el uso de las TIC en el aula está directamente relacionado con la mejora de las competencias comunicativas. La razón es que la tecnología ofrece numerosas oportunidades en forma de recursos y plataformas digitales disponibles para que los educadores experimenten. Por otro lado, los alumnos parecen más comprometidos y motivados para participar en las clases. Quieren interactuar activamente en las actividades de grupo y dejar oír su voz creativa. Así pues, se presenta una propuesta de 10 sesiones, concebidas como una buena práctica docente dirigida a alumnos de inglés de 4º de la ESO.Máster en Profesor de Educación Secundaria Obligatoria y Bachillerato, Formación Profesional y Enseñanzas de Idioma

    The role of government in the development of ethnic entrepreneurs : The qualitative study on Vietnamese ethnic entrepreneur in Finland

    Get PDF
    Entrepreneurship has consequently become the most available means of economic and social survival for foreigners who are facing diverse challenges. However, the recognition of ethnic entrepreneurship is still limited in Finland in comparison with other countries. Besides, while a large number of studies related to the development of ethnic entrepreneurship have been published, only a limited number of articles focus on the institutional environment and its effect on the ethnic entrepreneurship. Therefore, this study aims to investigate the role of Finnish governmental policies and programs in the development of ethnic entrepreneurs during initial stages of the entrepreneurial process. The qualitative study on Vietnamese ethnic entrepreneurs was conducted in Turku region, Finland. In order to answer the research questions, the literatures about the ethnic entrepreneurship, the institutional environment, and initial stages of the entrepreneurial process have been reviewed carefully. All the factors have been combined and illustrated in the synthesis, which is used as the guideline for the empirical study. The research approach chapter outlines the methodology and data collection process. The qualitative method with interviewing is chosen as the main research methodology. Five Vietnamese ethnic entrepreneurs with different backgrounds and business experience were selected for the data collection process. Next, the modified synthesis was presented in order to reflect the empirical studies about how the Finnish governmental policies and programs actually affected the growth of Vietnamese ethnic entrepreneurs during their early stages of business. In summary, the theoretical conclusion, including the managerial implications and limitations as well as the suggestions for further researches, was presented. This dissertation provides a fundamental knowledge about the role of governmental policies and programs in the development of ethnic entrepreneurship and practical knowledge about the experience of Vietnamese immigrants in Turku region. It is suggested that more empirical studies about the role of the government in other regions should be conducted