79 research outputs found

    Adaptive item selection under matroid constraints

    Full text link
    The shadow testing approach (STA; van der Linden & Reese, 1998) is considered the state of the art in constrained item selection for computerized adaptive tests. The present paper shows that certain types of constraints (e.g., bounds on categorical item attributes) induce a matroid on the item bank. This observation is used to devise item selection algorithms that are based on matroid optimization and lead to optimal tests, as the STA does. In particular, a single matroid constraint can be treated optimally by an efficient greedy algorithm that selects the most informative item preserving the integrity of the constraints. A simulation study shows that for applicable constraints, the optimal algorithms realize a decrease in standard error (SE) corresponding to a reduction in test length of up to 10% compared to the maximum priority index (Cheng & Chang, 2009) and up to 30% compared to Kingsbury and Zara\u27s (1991) constrained computerized adaptive testing

    Simultaneous constrained adaptive item selection for group-based testing

    Get PDF
    By tailoring test forms to the test-taker\u27s proficiency, Computerized Adaptive Testing (CAT) enables substantial increases in testing efficiency over fixed forms testing. When used for formative assessment, the alignment of task difficulty with proficiency increases the chance that teachers can derive useful feedback from assessment data. The application of CAT to formative assessment in the classroom, however, is hindered by the large number of different items used for the whole class; the required familiarization with a large number of test items puts a significant burden on teachers. An improved CAT procedure for group-based testing is presented, which uses simultaneous automated test assembly to impose a limit on the number of items used per group. The proposed linear model for simultaneous adaptive item selection allows for full adaptivity and the accommodation of constraints on test content. The effectiveness of the group-based CAT is demonstrated with real-world items in a simulated adaptive test of 3,000 groups of test-takers, under different assumptions on group composition. Results show that the group-based CAT maintained the efficiency of CAT, while a reduction in the number of used items by one half to two-thirds was achieved, depending on the within-group variance of proficiencies. (DIPF/Orig.

    Supervised clustering of streaming data for email batch detection

    Get PDF
    We address the problem of detecting batches of emails that have been created according to the same template. This problem is motivated by the desire to filter spam more effectively by exploiting collective information about entire batches of jointly generated messages. The application matches the problem setting of supervised clustering, because examples of correct clusterings can be collected. Known decoding procedures for supervised clustering are cubic in the number of instances. When decisions cannot be reconsidered once they have been made – owing to the streaming nature of the data – then the decoding problem can be solved in linear time. We devise a sequential decoding procedure and derive the corresponding optimization problem of supervised clustering. We study the impact of collective attributes of email batches on the effectiveness of recognizing spam emails. 1

    Joint Optimization of an Autoencoder for Clustering and Embedding

    Get PDF
    Incorporating k-means-like clustering techniques into (deep) autoencoders constitutes an interesting idea as the clustering may exploit the learned similarities in the embedding to compute a non-linear grouping of data at-hand. Unfortunately, the resulting contributions are often limited by ad-hoc choices, decoupled optimization problems and other issues. We present a theoretically-driven deep clustering approach that does not suffer from these limitations and allows for joint optimization of clustering and embedding. The network in its simplest form is derived from a Gaussian mixture model and can be incorporated seamlessly into deep autoencoders for state-of-the-art performance

    Mining Positional Data Streams

    Get PDF
    Abstract. We study frequent pattern mining from positional data streams. Existing approaches require discretised data to identify atomic events and are not applicable in our continuous setting. We propose an efficient trajectory-based preprocessing to identify similar movements and a distributed pattern mining algorithm to identify frequent trajectories. We empirically evaluate all parts of the processing pipeline

    Evaluating one-shot tournament predictions

    Get PDF
    We introduce the Tournament Rank Probability Score (TRPS) as a measure to evaluate and compare pre-tournament predictions, where predictions of the full tournament results are required to be available before the tournament begins. The TRPS handles partial ranking of teams, gives credit to predictions that are only slightly wrong, and can be modified with weights to stress the importance of particular features of the tournament prediction. Thus, the Tournament Rank Prediction Score is more flexible than the commonly preferred log loss score for such tasks. In addition, we show how predictions from historic tournaments can be optimally combined into ensemble predictions in order to maximize the TRPS for a new tournament

    Systematic feature evaluation for gene name recognition

    Get PDF
    In task 1A of the BioCreAtIvE evaluation, systems had to be devised that recognize words and phrases forming gene or protein names in natural language sentences. We approach this problem by building a word classification system based on a sliding window approach with a Support Vector Machine, combined with a pattern-based post-processing for the recognition of phrases. The performance of such a system crucially depends on the type of features chosen for consideration by the classification method, such as pre- or postfixes, character n-grams, patterns of capitalization, or classification of preceding or following words. We present a systematic approach to evaluate the performance of different feature sets based on recursive feature elimination, RFE. Based on a systematic reduction of the number of features used by the system, we can quantify the impact of different feature sets on the results of the word classification problem. This helps us to identify descriptive features, to learn about the structure of the problem, and to design systems that are faster and easier to understand. We observe that the SVM is robust to redundant features. RFE improves the performance by 0.7%, compared to using the complete set of attributes. Moreover, a performance that is only 2.3% below this maximum can be obtained using fewer than 5% of the features

    Insights from Classifying Visual Concepts with Multiple Kernel Learning

    Get PDF
    Combining information from various image features has become a standard technique in concept recognition tasks. However, the optimal way of fusing the resulting kernel functions is usually unknown in practical applications. Multiple kernel learning (MKL) techniques allow to determine an optimal linear combination of such similarity matrices. Classical approaches to MKL promote sparse mixtures. Unfortunately, so-called 1-norm MKL variants are often observed to be outperformed by an unweighted sum kernel. The contribution of this paper is twofold: We apply a recently developed non-sparse MKL variant to state-of-the-art concept recognition tasks within computer vision. We provide insights on benefits and limits of non-sparse MKL and compare it against its direct competitors, the sum kernel SVM and the sparse MKL. We report empirical results for the PASCAL VOC 2009 Classification and ImageCLEF2010 Photo Annotation challenge data sets. About to be submitted to PLoS ONE.Comment: 18 pages, 8 tables, 4 figures, format deviating from plos one submission format requirements for aesthetic reason

    Can we assess mental health through social media and smart devices? addressing bias in methodology and evaluation.

    Get PDF
    Predicting mental health from smartphone and social media data on a longitudinal basis has recently attracted great interest, with very promising results being reported across many studies. Such approaches have the potential to revolutionise mental health assessment, if their development and evaluation follows a real world deployment setting. In this work we take a closer look at state-of-the-art approaches, using different mental health datasets and indicators, different feature sources and multiple simulations, in order to assess their ability to generalise. We demonstrate that under a pragmatic evaluation framework, none of the approaches deliver or even approach the reported performances. In fact, we show that current state-of-the-art approaches can barely outperform the most naive baselines in the real-world setting, posing serious questions not only about their deployment ability, but also about the contribution of the derived features for the mental health assessment task and how to make better use of such data in the future
    • …
    corecore