18 research outputs found

    Studying the Potential of Multi-Target Classification to Characterize Combinations of Classes with Skewed Distribution

    Get PDF
    The identification of subpopulations with particu-lar characteristics with respect to a disease is important for personalized diagnostics and therapy design. For some diseases, the outcome is described by more than one target variable. An example is tinnitus: the perceived loudness of the phantom signal and the level of distress caused by it are both relevant targets for diagnosis and therapy. In this work, we study the potential of multi-target classification for the identification of those screening variables, which separate best among the different subpopula-tions of patients, paying particular attention to subpopulations with discordant value combinations of loudness and distress. We analyse the screening data of 1344 tinnitus patients from the University Hospital Regensburg, including questions from 7 questionnaires, and report on the performance of our workflow in target separation and in ranking the questionnaires’ variables on their discriminative power

    Ontology of core data mining entities

    Get PDF
    In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

    Production of Secondary Metabolites in Extreme Environments: Food- and Airborne Wallemia spp. Produce Toxic Metabolites at Hypersaline Conditions

    Get PDF
    <div><p>The food- and airborne fungal genus <i>Wallemia</i> comprises seven xerophilic and halophilic species: <i>W</i>. <i>sebi</i>, <i>W</i>. <i>mellicola</i>, <i>W</i>. <i>canadensis</i>, <i>W</i>. <i>tropicalis</i>, <i>W</i>. <i>muriae</i>, <i>W</i>. <i>hederae</i> and <i>W</i>. <i>ichthyophaga</i>. All listed species are adapted to low water activity and can contaminate food preserved with high amounts of salt or sugar. In relation to food safety, the effect of high salt and sugar concentrations on the production of secondary metabolites by this toxigenic fungus was investigated. The secondary metabolite profiles of 30 strains of the listed species were examined using general growth media, known to support the production of secondary metabolites, supplemented with different concentrations of NaCl, glucose and MgCl<sub>2</sub>. In more than two hundred extracts approximately one hundred different compounds were detected using high-performance liquid chromatography-diode array detection (HPLC-DAD). Although the genome data analysis of <i>W</i>. <i>mellicola</i> (previously <i>W</i>. <i>sebi sensu lato</i>) and <i>W</i>. <i>ichthyophaga</i> revealed a low number of secondary metabolites clusters, a substantial number of secondary metabolites were detected at different conditions. Machine learning analysis of the obtained dataset showed that NaCl has higher influence on the production of secondary metabolites than other tested solutes. Mass spectrometric analysis of selected extracts revealed that NaCl in the medium affects the production of some compounds with substantial biological activities (wallimidione, walleminol, walleminone, UCA 1064-A and UCA 1064-B). In particular an increase in NaCl concentration from 5% to 15% in the growth media increased the production of the toxic metabolites wallimidione, walleminol and walleminone.</p></div

    Impact of multi-output and stacking methods on feed efficiency prediction from genotype using machine learning algorithms

    Get PDF
    Feeding represents the largest economic cost in meat production; therefore, selection to improve traits related to feed efficiency is a goal in most livestock breeding programs. Residual feed intake (RFI), that is, the difference between the actual and the expected feed intake based on animal's requirements, has been used as the selection criteria to improve feed efficiency since it was proposed by Kotch in 1963. In growing pigs, it is computed as the residual of the multiple regression model of daily feed intake (DFI), on average daily gain (ADG), backfat thickness (BFT), and metabolic body weight (MW). Recently, prediction using single-output machine learning algorithms and information from SNPs as predictor variables have been proposed for genomic selection in growing pigs, but like in other species, the prediction quality achieved for RFI has been generally poor. However, it has been suggested that it could be improved through multi-output or stacking methods. For this purpose, four strategies were implemented to predict RFI. Two of them correspond to the computation of RFI in an indirect way using the predicted values of its components obtained from (i) individual (multiple single-output strategy) or (ii) simultaneous predictions (multi-output strategy). The other two correspond to the direct prediction of RFI using (iii) the individual predictions of its components as predictor variables jointly with the genotype (stacking strategy), or (iv) using only the genotypes as predictors of RFI (single-output strategy). The single-output strategy was considered the benchmark. This research aimed to test the former three hypotheses using data recorded from 5828 growing pigs and 45,610 SNPs. For all the strategies two different learning methods were fitted: random forest (RF) and support vector regression (SVR). A nested cross-validation (CV) with an outer 10-folds CV and an inner threefold CV for hyperparameter tuning was implemented to test all strategies. This scheme was repeated using as predictor variables different subsets with an increasing number (from 200 to 3000) of the most informative SNPs identified with RF. Results showed that the highest prediction performance was achieved with 1000 SNPs, although the stability of feature selection was poor (0.13 points out of 1). For all SNP subsets, the benchmark showed the best prediction performance. Using the RF as a learner and the 1000 most informative SNPs as predictors, the mean (SD) of the 10 values obtained in the test sets were: 0.23 (0.04) for the Spearman correlation, 0.83 (0.04) for the zero–one loss, and 0.33 (0.03) for the rank distance loss. We conclude that the information on predicted components of RFI (DFI, ADG, MW, and BFT) does not contribute to improve the quality of the prediction of this trait in relation to the one obtained with the single-output strategy.info:eu-repo/semantics/publishedVersio

    Prilagodljivi računalniški sistem za priporočanje učnih objektov v konstruktivističnem učnem okolju – ALECA

    Get PDF
    Today there are increasingly more learning environments which support active learning, taking into account student characteristics, preferences and activities. In this paper, we present a concept of a learning recommender system, which combines knowledge from pedagogy and recommending systems. We analyse the influence of combining different learning styles models on preferred types of multimedia materials. The results reveal that students prefer well-structured learning texts with color discrimination, and that the hemispheric learning style model is the most important criterion in determining student preferences for different multimedia learning materials. In the second part of our research, we describe an approach to alleviating the new user problem in terms of better recommendation accuracy of the system for recommending learning materials in environments where the system has no prior information about learners. Our findings present the concept of an adaptive learning system, with an analysis of its possible effects in learning practice.Dandanes se pojavlja vse več učnih sistemov, ki podpirajo aktivno učenje in upoštevajo učenčeve učne lastnosti, značilnosti in aktivnosti. V prispevku predstavljamo zasnovo učnega priporočilnega sistema, ki združuje znanja pedagogike in računalniških priporočilnih algoritmov. Proučujemo, kako združevanje modelov učnih stilov vpliva na izbiro različnih tipov večpredstavnih učnih gradiv. Rezultati kažejo, da študentje za učenje najpogosteje uporabljajo dobro strukturirana učna gradiva, ki vsebujejo barvno diskriminacijo, in da je hemisferični model učnih stilov najpomembnejši odločitveni kriterij. V nadaljevanju opisujemo postopek za reševanje t. i. problema hladnega zagona, s katerim je mogoče izboljšati točnost sistema za priporočanje učnih gradiv v okoljih, kjer o učencih nimamo predhodnih podatkov. Namen prispevka je predstaviti idejno zasnovo prilagodljivega učnega sistema z analizo njegovih predvidenih učinkov na učno prakso

    Semi-supervised Predictive Clustering Trees for (Hierarchical) Multi-label Classification

    Full text link
    Semi-supervised learning (SSL) is a common approach to learning predictive models using not only labeled examples, but also unlabeled examples. While SSL for the simple tasks of classification and regression has received a lot of attention from the research community, this is not properly investigated for complex prediction tasks with structurally dependent variables. This is the case of multi-label classification and hierarchical multi-label classification tasks, which may require additional information, possibly coming from the underlying distribution in the descriptive space provided by unlabeled examples, to better face the challenging task of predicting simultaneously multiple class labels. In this paper, we investigate this aspect and propose a (hierarchical) multi-label classification method based on semi-supervised learning of predictive clustering trees. We also extend the method towards ensemble learning and propose a method based on the random forest approach. Extensive experimental evaluation conducted on 23 datasets shows significant advantages of the proposed method and its extension with respect to their supervised counterparts. Moreover, the method preserves interpretability and reduces the time complexity of classical tree-based models

    Comparación entre árboles de regresión CART y regresión Lineal

    Get PDF
    Resumen: La Regresión lineal es el método más usado en estadística para predecir valores de variables continuas debido a su fácil interpretación, pero en muchas situaciones los supuestos para aplicar el modelo no se cumplen y algunos usuarios tienden a forzarlos llevando a conclusiones erróneas. Los árboles de regresión CART son una alternativa de regresión que no requiere supuestos sobre los datos a analizar y es un método de fácil interpretación de los resultados. En este trabajo se comparan a nivel predictivo la regresión lineal con CART mediante simulación. En general, se encontró que cuando se ajusta el modelo de regresión lineal correcto a los datos, el error de predicción de regresión lineal siempre es menor que el de CART. También se encontró que cuando se ajusta erróneamente un modelo de regresión lineal a los datos, el error de predicción de CART es menor que el de regresión lineal sólo cuando se tiene una cantidad de datos suficientemente grandeAbstract Linear regression is the statistical method most used to predict values of continuous variables because of its easy interpretation, but in many situations to apply the model assumptions are not met and some users tend to force leading to erroneous conclusions. CART regression trees are an alternative regression requires no assumptions about the data to be analyzed and a method of easy interpretation of the results. In this paper we compare the predictive level from both CART and linear regression through simulation. In general, it was found that when adjusting the correct linear regression model to the data, the linear regression prediction error is always less than the CART prediction error. We also found that when adjusted erroneously linear regression model to the data, CART prediction error is smaller than the linear regression prediction error only when it has a sufficiently large amount of dataMaestrí

    Data-Driven Structuring of the Output Space Improves the Performance of Multi-Target Regressors

    Get PDF
    peer-reviewedThe task of multi-target regression (MTR) is concerned with learning predictive models capable of predicting multiple target variables simultaneously. MTR has attracted an increasing attention within research community in recent years, yielding a variety of methods. The methods can be divided into two main groups: problem transformation and problem adaptation. The former transform a MTR problem into simpler (typically single target) problems and apply known approaches, while the latter adapt the learning methods to directly handle the multiple target variables and learn better models which simultaneously predict all of the targets. Studies have identified the latter group of methods as having competitive advantage over the former, probably due to the fact that it exploits the interrelations of the multiple targets. In the related task of multi-label classification, it has been recently shown that organizing the multiple labels into a hierarchical structure can improve predictive performance. In this paper, we investigate whether organizing the targets into a hierarchical structure can improve the performance for MTR problems. More precisely, we propose to structure the multiple target variables into a hierarchy of variables, thus translating the task of MTR into a task of hierarchical multi-target regression (HMTR). We use four data-driven methods for devising the hierarchical structure that cluster the real values of the targets or the feature importance scores with respect to the targets. The evaluation of the proposed methodology on 16 benchmark MTR datasets reveals that structuring the multiple target variables into a hierarchy improves the predictive performance of the corresponding MTR models. The results also show that data-driven methods produce hierarchies that can improve the predictive performance even more than expert constructed hierarchies. Finally, the improvement in predictive performance is more pronounced for the datasets with very large numbers (more than hundred) of targets.European Commissio

    A model of an adaptive system for recommending learning objects in a constructivist learning environment

    Get PDF
    Computer-based multimedia learning environments support the idea that people learn better and more deeply when appropriate pictures (i.e., animations, video, static graphics) are added to text or narration. There are many adaptive learning systems that adapt learning materials to student properties, preferences, and activities. Adaptive learning environments mostly support only traditional concepts of learning. There is a need to design and develop an e-learning system that embodies principles of constructivist learning approach. The solution is in recommenders systems, which suggest items of interest to users based on their preferences (i.e. previous ratings). If there are no ratings for a certain user or item/object, there is a situation called a cold start problem, which leads to unreliable recommendations. Researchers mostly avoid tackling the absolute cold start in recommender systems. The topic of presented dissertation is designing a recommender system with a novel approach to avoid cold start problem. Approaches for solving the new user cold start problem can be divided into two main groups: the first group performs additional inquiries to gather more information about the users; and the second group uses dedicated algorithms for users in the cold start state. The first group of approaches aims at performing additional inquiries about the user. According to this approach, we relate combinations of different learning styles (taking into account four different learning styles models) to preferred multimedia types. We explore a decision model aimed at proposing learning material of an appropriate multimedia type. The study includes 272 student participants. The resulting decision model shows that students prefer well-structured learning texts with colour discrimination, and that the hemispheric learning style model is the most important criterion in deciding student preferences for different multimedia learning materials. To provide a more accurate and reliable model for recommending different multimedia types more learning style models must be combined. Kolb’s classification and the VAK classification allow us to learn if students prefer an active role in the learning process, and what multimedia type they prefer. The results also shows that there is an obvious need to combine learning styles model in order to get a wider view of the student’s characteristics: an approach to problem solving problems, cognitive modes, way of thinking, and a dominant mode of perceiving information. On another hand, model recommends same multimedia material regardless of the learning topic. In the second part of our research, we have designed and developed a novel approach for alleviating the cold start problem by imputing missing values into the input matrix, thereby improving recommendation performance. Our approach has three steps: 1) finding similar users to given user in cold start state; 2) selecting relevant attributes for the imputation process; 3) aggregate ratings to input matrix for a user in the cold start state. We separate our approach for solving cold start problem into solving absolute cold start problem and solving partial cold start problem. According to the results of our experiments (solving absolute cold start problem), the results indicate that all our proposed methods improve recommending for non-negative matrix factorization with stochastic gradient descent (NG). For semi-non-negative matrix factorization with missing data (SN), combinations FR-ME (imputing attribute's mean value into the attributes that have the highest frequency of the most frequent values) and SD-MF (imputing attribute’s most frequent value into attributes that have the lowest standard deviation) improve recommendations for users in the absolute cold start state. For non-negative matrix factorization with alternating least squares (NS) and matrix factorization by data fusion (DF), none of variations of proposed parameters (methods) improves recommending in absolute cold start state. In the next stage of our research, we evaluated our approach for solving partial cold start problem. Statistical analysis of experimental evaluation of our approach on the artificial domain showed that each parameter significantly improved recommending of matrix factorization methods. The methods that yield improvements in recommendation accuracy compared with the raw matrix factorization are methods that consider 25 % of similar users (2525-*-*-*), select an attribute according to the frequency (*-FR-*-*) or RReliefF (*-RR-*-*), and impute a value aggregated by mean value (ME) or predicted by using regression trees (RT). For further investigation we chose two method combinations (25-FR-ME-* and 25-RR-RT-*), which were expected to work well, and compared them with other strategies on real domains. Among all approaches evaluated on the artificial domain, we chose the best performing method with the highest average rank – a method that considers 50 % of similar users, selects an attribute for imputation according to the RReliefF, and imputes a value predicted by linear regression (50-RR-LR-*). All three combinations of the selected methods were evaluated on two real domains: Jester in PEFbase. An evaluation showed that method 25-FR-ME-* combined with matrix factorization NG performed statistically better than the raw matrix factorization algorithms (DF, NG, NS in SN) on real domains for users in the partial cold-start state. The results demonstrated the advantage of using imputation approaches in terms of better recommendation accuracy. At the same time, the results have shown that imputing of missing values has no negative impact for recommending to the users, which are not in the cold start state
    corecore