8,407 research outputs found

    Near-Optimal Algorithms for Differentially-Private Principal Components

    Full text link
    Principal components analysis (PCA) is a standard tool for identifying good low-dimensional approximations to data in high dimension. Many data sets of interest contain private or sensitive information about individuals. Algorithms which operate on such data should be sensitive to the privacy risks in publishing their outputs. Differential privacy is a framework for developing tradeoffs between privacy and the utility of these outputs. In this paper we investigate the theory and empirical performance of differentially private approximations to PCA and propose a new method which explicitly optimizes the utility of the output. We show that the sample complexity of the proposed method differs from the existing procedure in the scaling with the data dimension, and that our method is nearly optimal in terms of this scaling. We furthermore illustrate our results, showing that on real data there is a large performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of Machine Learning Research, preliminary version was at NIPS 201

    Analyzing Learners Behavior in MOOCs: An Examination of Performance and Motivation Using a Data-Driven Approach

    Get PDF
    Massive Open Online Courses (MOOCs) have been experiencing increasing use and popularity in highly ranked universities in recent years. The opportunity of accessing high quality courseware content within such platforms, while eliminating the burden of educational, financial and geographical obstacles has led to a rapid growth in participant numbers. The increasing number and diversity of participating learners has opened up new horizons to the research community for the investigation of effective learning environments. Learning Analytics has been used to investigate the impact of engagement on student performance. However, extensive literature review indicates that there is little research on the impact of MOOCs, particularly in analyzing the link between behavioral engagement and motivation as predictors of learning outcomes. In this study, we consider a dataset, which originates from online courses provided by Harvard University and Massachusetts Institute of Technology, delivered through the edX platform [1]. Two sets of empirical experiments are conducted using both statistical and machine learning techniques. Statistical methods are used to examine the association between engagement level and performance, including the consideration of learner educational backgrounds. The results indicate a significant gap between success and failure outcome learner groups, where successful learners are found to read and watch course material to a higher degree. Machine learning algorithms are used to automatically detect learners who are lacking in motivation at an early time in the course, thus providing instructors with insight in regards to student withdrawal

    Diagnostics of Data-Driven Models: Uncertainty Quantification of PM7 Semi-Empirical Quantum Chemical Method.

    Get PDF
    We report an evaluation of a semi-empirical quantum chemical method PM7 from the perspective of uncertainty quantification. Specifically, we apply Bound-to-Bound Data Collaboration, an uncertainty quantification framework, to characterize (a) variability of PM7 model parameter values consistent with the uncertainty in the training data and (b) uncertainty propagation from the training data to the model predictions. Experimental heats of formation of a homologous series of linear alkanes are used as the property of interest. The training data are chemically accurate, i.e., they have very low uncertainty by the standards of computational chemistry. The analysis does not find evidence of PM7 consistency with the entire data set considered as no single set of parameter values is found that captures the experimental uncertainties of all training data. A set of parameter values for PM7 was able to capture the training data within ±1 kcal/mol, but not to the smaller level of uncertainty in the reported data. Nevertheless, PM7 was found to be consistent for subsets of the training data. In such cases, uncertainty propagation from the chemically accurate training data to the predicted values preserves error within bounds of chemical accuracy if predictions are made for the molecules of comparable size. Otherwise, the error grows linearly with the relative size of the molecules
    • …
    corecore