8,407 research outputs found
Near-Optimal Algorithms for Differentially-Private Principal Components
Principal components analysis (PCA) is a standard tool for identifying good
low-dimensional approximations to data in high dimension. Many data sets of
interest contain private or sensitive information about individuals. Algorithms
which operate on such data should be sensitive to the privacy risks in
publishing their outputs. Differential privacy is a framework for developing
tradeoffs between privacy and the utility of these outputs. In this paper we
investigate the theory and empirical performance of differentially private
approximations to PCA and propose a new method which explicitly optimizes the
utility of the output. We show that the sample complexity of the proposed
method differs from the existing procedure in the scaling with the data
dimension, and that our method is nearly optimal in terms of this scaling. We
furthermore illustrate our results, showing that on real data there is a large
performance gap between the existing method and our method.Comment: 37 pages, 8 figures; final version to appear in the Journal of
Machine Learning Research, preliminary version was at NIPS 201
Analyzing Learners Behavior in MOOCs: An Examination of Performance and Motivation Using a Data-Driven Approach
Massive Open Online Courses (MOOCs) have been experiencing increasing use and popularity in highly ranked universities in recent years. The opportunity of accessing high quality courseware content within such platforms, while eliminating the burden of educational, financial and geographical obstacles has led to a rapid growth in participant numbers. The increasing number and diversity of participating learners has opened up new horizons to the research community for the investigation of effective learning environments. Learning Analytics has been used to investigate the impact of engagement on student performance. However, extensive literature review indicates that there is little research on the impact of MOOCs, particularly in analyzing the link between behavioral engagement and motivation as predictors of learning outcomes. In this study, we consider a dataset, which originates from online courses provided by Harvard University and Massachusetts Institute of Technology, delivered through the edX platform [1]. Two sets of empirical experiments are conducted using both statistical and machine learning techniques. Statistical methods are used to examine the association between engagement level and performance, including the consideration of learner educational backgrounds. The results indicate a significant gap between success and failure outcome learner groups, where successful learners are found to read and watch course material to a higher degree. Machine learning algorithms are used to automatically detect learners who are lacking in motivation at an early time in the course, thus providing instructors with insight in regards to student withdrawal
Diagnostics of Data-Driven Models: Uncertainty Quantification of PM7 Semi-Empirical Quantum Chemical Method.
We report an evaluation of a semi-empirical quantum chemical method PM7 from the perspective of uncertainty quantification. Specifically, we apply Bound-to-Bound Data Collaboration, an uncertainty quantification framework, to characterize (a) variability of PM7 model parameter values consistent with the uncertainty in the training data and (b) uncertainty propagation from the training data to the model predictions. Experimental heats of formation of a homologous series of linear alkanes are used as the property of interest. The training data are chemically accurate, i.e., they have very low uncertainty by the standards of computational chemistry. The analysis does not find evidence of PM7 consistency with the entire data set considered as no single set of parameter values is found that captures the experimental uncertainties of all training data. A set of parameter values for PM7 was able to capture the training data within ±1 kcal/mol, but not to the smaller level of uncertainty in the reported data. Nevertheless, PM7 was found to be consistent for subsets of the training data. In such cases, uncertainty propagation from the chemically accurate training data to the predicted values preserves error within bounds of chemical accuracy if predictions are made for the molecules of comparable size. Otherwise, the error grows linearly with the relative size of the molecules
Recommended from our members
Real-time decoding of question-and-answer speech dialogue using human cortical activity.
Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance's identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate
- …