1,383,337 research outputs found

    Skill set profile clustering: the empty K-means algorithm with automatic specification of starting cluster centers

    Get PDF
    While students’ skill set profiles can be estimated with formal cognitive diagnosis models [8], their computational complexity makes simpler proxy skill estimates attractive [1, 4, 6]. These estimates can be clustered to generate groups of similar students. Often hierarchical agglomerative clustering or k-means clustering is utilized, requiring, for K skills, the specification of 2^K clusters. The number of skill set profiles/clusters can quickly become computationally intractable. Moreover, not all profiles may be present in the population. We present a flexible version of k-means that allows for empty clusters. We also specify a method to determine efficient starting centers based on the Q-matrix. Combining the two substantially improves the clustering results and allows for analysis of data sets previously thought impossible

    Techniques of linear prediction, with application to oceanic and atmospheric fields in the tropical Pacific

    No full text
    The problem of constructing optimal linear prediction models by multivariance regression methods is reviewed. It is well known that as the number of predictors in a model is increased, the skill of the prediction grows, but the statistical significance generally decreases. For predictions using a large number of candidate predictors, strategies are therefore needed to determine optimal prediction models which properly balance the competing requirements of skill and significance. The popular methods of coefficient screening or stepwise regression represent a posteriori predictor selection methods and therefore cannot be used to recover statistically significant models by truncation if the complete model, including all predictors, is statistically insignificant. Higher significance can be achieved only by a priori reduction of the predictor set. To determine the maximum number of predictors which may be meaningfully incorporated in a model, a model hierarchy can be used in which a series of best fit prediction models is constructed for a (prior defined) nested sequence of predictor sets, the sequence being terminated when the significance level either falls below a prescribed limit or reaches a maximum value. The method requires a reliable assessment of model significance. This is characterized by a quadratic statistic which is defined independently of the model skill or artificial skill. As an example, the method is applied to the prediction of sea surface temperature anomalies at Christmas Island (representative of sea surface temperatures in the central equatorial Pacific) and variations of the central and east Pacific Hadley circulation (characterized by the second empirical orthogonal function (EOF) of the meridional component of the trade wind anomaly field) using a general multiple‐time‐lag prediction matrix. The ordering of the predictors is based on an EOF sequence, defined formally as orthogonal variables in the composite space of all (normalized) predictors, irrespective of their different physical dimensions, time lag, and geographic position. The choice of a large set of 20 predictors at 12 time lags yields significant predictability only for forecast periods of 3 to 5 months. However, a prior reduction of the predictor set to 4 predictors at 10 time lags leads to 95% significant predictions with skill values of the order of 0.4 to 0.7 up to 6 or 8 months. For infinitely long time series the construction of optimal prediction models reduces essentially to the problem of linear system identification. However, the model hierarchies normally considered for the simulation of general linear systems differ in structure from the model hierarchies which appear to be most suitable for constructing pure prediction models. Thus the truncation imposed by statistical significance requirements can result in rather different models for the two cases. The relation between optimal prediction models and linear dynamical models is illustrated by the prediction of east‐west sea level changes in the equatorial Pacific from wind field anomalies. It is shown that the optimal empirical prediction is statistically consistent in this case with both the first‐order relaxation and damped oscillator models recently proposed by McWilliams and Gent (but with somewhat different model parameters than suggested by the authors). Thus the data do not allow a distinction between the two physical models; the simplest acceptable model is the first‐order damped response. Finally, the problem of estimating forecast skill is discussed. It is usually stated that the forecast skill is smaller than the true skill, which in turn is smaller than the hindcast skill, by an amount which in both cases is approximately equal to the artificial skill. However, this result applies to the mean skills averaged over the ensemble of all possible hindcast data sets, given the true model. Under the more appropriate side condition of a given hindcast data set and an unknown true model, the estimation of the forecast skill represents a problem of statistical inference and is dependent on the assumed prior probability distribution of true models. The Bayesian hypothesis of a uniform prior distribution yields an average forecast skill equal to the hindcast skill, but other (equally acceptable) assumptions yield lower forecast skills more compatible with the usual hindcast‐averaged expressio

    Clustering student skill set profiles in a unit hypercube using mixtures of multivariate betas

    Get PDF
    <br>This paper presents a finite mixture of multivariate betas as a new model-based clustering method tailored to applications where the feature space is constrained to the unit hypercube. The mixture component densities are taken to be conditionally independent, univariate unimodal beta densities (from the subclass of reparameterized beta densities given by Bagnato and Punzo 2013). The EM algorithm used to fit this mixture is discussed in detail, and results from both this beta mixture model and the more standard Gaussian model-based clustering are presented for simulated skill mastery data from a common cognitive diagnosis model and for real data from the Assistment System online mathematics tutor (Feng et al 2009). The multivariate beta mixture appears to outperform the standard Gaussian model-based clustering approach, as would be expected on the constrained space. Fewer components are selected (by BIC-ICL) in the beta mixture than in the Gaussian mixture, and the resulting clusters seem more reasonable and interpretable.</br> <br>This article is in technical report form, the final publication is available at http://www.springerlink.com/openurl.asp?genre=article &id=doi:10.1007/s11634-013-0149-z</br&gt

    IT Capital, Job Content and Educational Attainment

    Get PDF
    Based on a large data set containing information on occupations between 1979 and 1999, this study explores the ?black box? surrounding the skill?biased technological change hypothesis by analyzing the mechanisms that induce information technologies to be complementary to employees with higher skill levels. Using direct, multidimensional measures of occupational skill requirements, the analysis shows that IT capital substitutes repetitive manual and repetitive cognitive skills, whereas it complements analytical and interactive skills. These changes in the within occupational task mix result in an increased deployment of employees with high levels of education who have comparative advantages in performing non?repetitive cognitive tasks. --skill-biased technological change,job task content,vocational education

    Optimal Technology and Development

    Get PDF
    Skill intensive technologies seem to be adopted by rich countries rather than poor ones. Related to that observation, the ratio of wages of skilled to unskilled workers - the skill premium - shows two important features over time and across countries. In the US the skill premium decreased during the first half of the 20th century and it increased after 1950, evolving according to a U shaped pattern. On the other hand, the same measure across countries around 1990 is hump shaped when countries are ordered by GDP per worker. By modeling the decisions for factor accumulation and technology adoption, this paper gives a systematic explanation as to why we see ever more skill intensive technologies being adopted both over time in the US and across countries. The model developed here endogenously generates predictions for the skill premium that are consistent with both the US and international observations under the same set of parameter valuesskill biased technological change; skill premium,endogenous technology; inequality

    Skill set profile clustering based on student capability vectors computed from online tutoring data

    Get PDF
    In educational research, a fundamental goal is identifying which skills students have mastered, which skills they have not, and which skills they are in the process of mastering. As the number of examinees, items, and skills increases, the estimation of even simple cognitive diagnosis models becomes difficult. To address this, we introduce a capability matrix showing for each skill the proportion correct on all items tried by each student involving that skill. We apply variations of common clustering methods to this matrix and discuss conditioning on sparse subspaces. We demonstrate the feasibility and scalability of our method on several simulated datasets and illustrate the difficulties inherent in real data using a subset of online mathematics tutor data. We also comment on the interpretability and application of the results for teachers

    The Skill Balancing Act: Determinants of and Returns to Balanced Skills

    Get PDF
    Entrepreneurs are found to have balanced skill sets and most have worked in small firms before starting their own business. In light of this, we compare the skill sets of employees working in businesses of different size to the skill sets of entrepreneurs using a rich data set on the applied skills of individuals. This data set allows us to construct an indicator that measures skill balance in the uantity (skill scope) and quality (skill level) dimension. Our results show that employees working in large businesses tend to have a lower skill balance than those working in small businesses; yet, the skill balance of entrepreneurs remains the largest. The impact of human capital formation on skill balance also varies among employees of different business sizes and entrepreneurs. Finally, the estimated returns to balanced skills are largest for entrepreneurs whereas, for employees, these returns decrease as business size increases. However, we find no relationship between balancing skills at lower skill levels and income, indicating that both dimensions - skill level and skill scope - are relevant. We end by discussing the policy implications that can be drawn from our results in regard to skill balance.entrepreneurship, returns to human capital, balanced skill set, jack-of-all-trades
    corecore