27,420 research outputs found

    Dimensionality Reduction via Matrix Factorization for Predictive Modeling from Large, Sparse Behavioral Data

    Get PDF
    Matrix factorization is a popular technique for engineering features for use in predictive models; it is viewed as a key part of the predictive analytics process and is used in many different domain areas. The purpose of this paper is to investigate matrix-factorization-based dimensionality reduction as a design artifact in predictive analytics. With the rise in availability of large amounts of sparse behavioral data, this investigation comes at a time when traditional techniques must be reevaluated. Our contribution is based on two lines of inquiry: we survey the literature on dimensionality reduction in predictive analytics, and we undertake an experimental evaluation comparing using dimensionality reduction versus not using dimensionality reduction for predictive modeling from large, sparse behavioral data. Our survey of the dimensionality reduction literature reveals that, despite mixed empirical evidence as to the benefit of computing dimensionality reduction, it is frequently applied in predictive modeling research and application without either comparing to a model built using the full feature set or utilizing state-of-the-art predictive modeling techniques for complexity control. This presents a concern, as the survey reveals complexity control as one of the main reasons for employing dimensionality reduction. This lack of comparison is troubling in light of our empirical results. We experimentally evaluate the e cacy of dimensionality reduction in the context of a collection of predictive modeling problems from a large-scale published study. We find that utilizing dimensionality reduction improves predictive performance only under certain, rather narrow, conditions. Specifically, under default regularization (complexity control)settings dimensionality reduction helps for the more di cult predictive problems (where the predictive performance of a model built using the original feature set is relatively lower), but it actually decreases the performance on the easier problems. More surprisingly, employing state-of-the-art methods for selecting regularization parameters actually eliminates any advantage that dimensionality reduction has! Since the value of building accurate predictive models for business analytics applications has been well-established, the resulting guidelines for the application of dimensionality reduction should lead to better research and managerial decisions.NYU Stern School of Busines

    Design agents and the need for high-dimensional perception

    Get PDF
    Designed artefacts may be quantified by any number of measures. This paper aims to show that in doing so, the particular measures used may matter very little, but as many as possible should be taken. A set of building plans is used to demonstrate that arbitrary measures of their shape serve to classify them into neighbourhood types, and the accuracy of classification increases as more are used, even if the dimensionality of the space in which classification occurs is held constant. It is further shown that two autonomous agents may independently choose sets of attributes by which to represent the buildings, but arrive at similar judgements as more are used. This has several implications for studying or simulating design. It suggests that quantitative studies of collections of artefacts may be made without requiring extensive knowledge of the best possible measures—often impossible in real, ill-defined, design situations. It suggests a means by which the generation of novelty can be explained in a group of agents with different ways of seeing a given event. It also suggests that communication can occur without the need for predetermined codes or protocols, introducing the possibility of alternative human-computer interfaces that may be useful in design

    Bullying and school disruption assessment: studies with Portuguese adolescent students

    Get PDF
    Problem Statement: The question of bullying and school disruptive behavior has emerged as a powerful issue in Portuguese educational context. The lack of evaluation instruments, with studied psychometric characteristics, has constituted a problem. Purpose of Study: School disruption and bullying assessment, in Portuguese adolescents, was the focus of this research. Research Methods: The psychometric qualities — internal consistency and the external validity — were analyzed in different scales. Findings: The analyses carried out confirm the scales as reliable and valid instruments. Conclusions: These instruments may be a useful avenue for teachers, psychologists and other education professionals

    The Intrinsic Dimensionality of Attractiveness: A Study in Face Profiles

    Get PDF
    The study of human attractiveness with pattern analysis techniques is an emerging research field. One still largely unresolved problem is which are the facial features relevant to attractiveness, how they combine together, and the number of independent parameters required for describing and identifying harmonious faces. In this paper, we present a first study about this problem, applied to face profiles. First, according to several empirical results, we hypothesize the existence of two well separated manifolds of attractive and unattractive face profiles. Then, we analyze with manifold learning techniques their intrinsic dimensionality. Finally, we show that the profile data can be reduced, with various techniques, to the intrinsic dimensions, largely without loosing their ability to discriminate between attractive and unattractive face

    An investigation of a deep learning based malware detection system

    Full text link
    We investigate a Deep Learning based system for malware detection. In the investigation, we experiment with different combination of Deep Learning architectures including Auto-Encoders, and Deep Neural Networks with varying layers over Malicia malware dataset on which earlier studies have obtained an accuracy of (98%) with an acceptable False Positive Rates (1.07%). But these results were done using extensive man-made custom domain features and investing corresponding feature engineering and design efforts. In our proposed approach, besides improving the previous best results (99.21% accuracy and a False Positive Rate of 0.19%) indicates that Deep Learning based systems could deliver an effective defense against malware. Since it is good in automatically extracting higher conceptual features from the data, Deep Learning based systems could provide an effective, general and scalable mechanism for detection of existing and unknown malware.Comment: 13 Pages, 4 figure
    • …
    corecore