18 research outputs found

    TakeAPeek Learning: Chaining classifiers for multiple regression

    Get PDF
    Em problemas de regressão multi-alvo o objetivo é prever o valor numérico de múltiplas variáveis dependentes Y a partir dos valores numéricos das variáveis independentes X utilizando o algoritmo de regressão linear. Um dos problemas com este tipo de algoritmo é o fato que não se tem em conta a codependência entre as variáveis alvo Y.Esta tese tem como objetivo adaptar o algoritmo de regressão múltipla existente para usar classificadores em cadeia. Ao utilizar classificadores encadeados a previsão de cada variável Yn não é feita independentemente das restantes variáveis Y. Cada modelo de previsão incluirá os valores das variáveis Y previamente calculadas. Com isto, o modelo irá ter em conta a codependência entre as variáveis alvo levando a resultados mais precisos.O trabalho feito nesta tese e baseado num artigo existente sobre métodos de classificação multi-label para regressão multi-alvo. Irei demonstrar a eficácia do algoritmo base usando 12 datasets e provar a influência da ordem do qual as variáveis alvo que são previstas têm sobre o erro médio final de todas as variáveis alvo.In multiple target regression problems, the objective is to predict the numeric value of multiple dependent Y variables from the numeric values of independent X variables using linear regression. One of the issues with this type of problem is the fact that it does not account for the codependency between the target variables Y themselves.This objective of this thesis is to adapt the existent multiple regression algorithm to use chained classifiers. By using chained classifiers the prediction of each target variable Yn is not done independently of the other Y variables. Each prediction model will include the values of previously predicted Y variables. By doing so the model will attempt to combat the co dependence between the target variables thus leading to more accurate results.This thesis works upon an existing article about multi-label classification methods for multi-target regression. I will demonstrate the effectiveness of the base algorithm using 12 datasets and prove the influence that the order in which the target variables are predicted have upon the final mean error of all target variables

    Predicting rice phenotypes with meta and multi-target learning

    Get PDF
    Abstract: The features in some machine learning datasets can naturally be divided into groups. This is the case with genomic data, where features can be grouped by chromosome. In many applications it is common for these groupings to be ignored, as interactions may exist between features belonging to different groups. However, including a group that does not influence a response introduces noise when fitting a model, leading to suboptimal predictive accuracy. Here we present two general frameworks for the generation and combination of meta-features when feature groupings are present. Furthermore, we make comparisons to multi-target learning, given that one is typically interested in predicting multiple phenotypes. We evaluated the frameworks and multi-target learning approaches on a genomic rice dataset where the regression task is to predict plant phenotype. Our results demonstrate that there are use cases for both the meta and multi-target approaches, given that overall, they significantly outperform the base case

    Predicting rice phenotypes with meta and multi-target learning

    Get PDF
    The features in some machine learning datasets can naturally be divided into groups. This is the case with genomic data, where features can be grouped by chromosome. In many applications it is common for these groupings to be ignored, as interactions may exist between features belonging to different groups. However, including a group that does not influence a response introduces noise when fitting a model, leading to suboptimal predictive accuracy. Here we present two general frameworks for the generation and combination of meta-features when feature groupings are present. Furthermore, we make comparisons to multi-target learning, given that one is typically interested in predicting multiple phenotypes. We evaluated the frameworks and multi-target learning approaches on a genomic rice dataset where the regression task is to predict plant phenotype. Our results demonstrate that there are use cases for both the meta and multi-target approaches, given that overall, they significantly outperform the base case

    A Probabilistic Assessment of Soil Erosion Susceptibility in a Head Catchment of the Jemma Basin, Ethiopian Highlands

    Get PDF
    Soil erosion represents one of the most important global issues with serious effects on agriculture and water quality, especially in developing countries, such as Ethiopia, where rapid population growth and climatic changes affect widely mountainous areas. The Meskay catchment is a head catchment of the Jemma Basin draining into the Blue Nile (Central Ethiopia) and is characterized by high relief energy. Thus, it is exposed to high degradation dynamics, especially in the lower parts of the catchment. In this study, we aim at the geomorphological assessment of soil erosion susceptibilities. First, a geomorphological map was generated based on remote sensing observations. In particular, we mapped three categories of landforms related to (i) sheet erosion, (ii) gully erosion, and (iii) badlands using a high-resolution digital elevation model (DEM). The map was validated by a detailed field survey. Subsequently, we used the three categories as dependent variables in a probabilistic modelling approach to derive the spatial distribution of the specific process susceptibilities. In this study we applied the maximum entropy model (MaxEnt). The independent variables were derived from a set of spatial attributes describing the lithology, terrain, and land cover based on remote sensing data and DEMs. As a result, we produced three separate susceptibility maps for sheet and gully erosion as well as badlands. The resulting susceptibility maps showed good to excellent prediction performance. Moreover, to explore the mutual overlap of the three susceptibility maps, we generated a combined map as a color composite where each color represents one component of water erosion. The latter map yields useful information for land-use managers and planning purposes

    I'm simply the best, better than all the rest: Narcissistic leaders and corporate fundraising success

    Get PDF
    We examine the relationship between leader grandiose narcissism, composed of admiration and rivalry, and corporate fundraising success in a sample of 2377 organizational leaders. To examine a large sample of leaders, we applied a machine-learning algorithm to predict leaders' personality scores based on leaders' Twitter profiles. We found that admiration was positively related to - while rivalry was negatively related to corporate fundraising success (in '000s). Analyses also showed that leader gender does not moderate this relationship, unlike initially expected. We discuss and compare our findings to previous work on narcissism and crowdfunding

    An Efficient Feature Subset Selection Algorithm for Classification of Multidimensional Dataset

    Get PDF
    Multidimensional medical data classification has recently received increased attention by researchers working on machine learning and data mining. In multidimensional dataset (MDD) each instance is associated with multiple class values. Due to its complex nature, feature selection and classifier built from the MDD are typically more expensive or time-consuming. Therefore, we need a robust feature selection technique for selecting the optimum single subset of the features of the MDD for further analysis or to design a classifier. In this paper, an efficient feature selection algorithm is proposed for the classification of MDD. The proposed multidimensional feature subset selection (MFSS) algorithm yields a unique feature subset for further analysis or to build a classifier and there is a computational advantage on MDD compared with the existing feature selection algorithms. The proposed work is applied to benchmark multidimensional datasets. The number of features was reduced to 3% minimum and 30% maximum by using the proposed MFSS. In conclusion, the study results show that MFSS is an efficient feature selection algorithm without affecting the classification accuracy even for the reduced number of features. Also the proposed MFSS algorithm is suitable for both problem transformation and algorithm adaptation and it has great potentials in those applications generating multidimensional datasets
    corecore