1,646,582 research outputs found

    Differentially Private ANOVA Testing

    Full text link
    Modern society generates an incredible amount of data about individuals, and releasing summary statistics about this data in a manner that provably protects individual privacy would offer a valuable resource for researchers in many fields. We present the first algorithm for analysis of variance (ANOVA) that preserves differential privacy, allowing this important statistical test to be conducted (and the results released) on databases of sensitive information. In addition to our private algorithm for the F test statistic, we show a rigorous way to compute p-values that accounts for the added noise needed to preserve privacy. Finally, we present experimental results quantifying the statistical power of this differentially private version of the test, finding that a sample of several thousand observations is frequently enough to detect variation between groups. The differentially private ANOVA algorithm is a promising approach for releasing a common test statistic that is valuable in fields in the sciences and social sciences.Comment: Accepted, camera-ready version presented at the 1st International Conference on Data Intelligence and Security (ICDIS) 201

    Pengaruh Kondisi Penyimpanan terhadap Pertumbuhan Jamur pada Gambir

    Get PDF
    Gambir is the extracted form of dried leaves and twigs of Uncaria gambier (Hunter) Roxb plant. Gambir is frequently experiencing quality decline due to unfavourable storage condition that accelerate the mold growth. The mold growth prevention has been done through a study by using teratments of package types (A) consisting of carboard paper (A1), plastic sack (A2), and jute sack (A3) as well storage conditions (B) consisting of open space at 25–290C with humidity of 70% (B1) and closed space/warehouse at 24–260C with humidity of 80% (B1). Observation was done for 12 weeks in term of mold growth, yeast types identification and water content. Results of mold type identification by using microscope showed the Aspergillus sp. genus consisting species of Aspergillus Niger, Aspergillus fumigatus and Penicillium. The storage condition had effect on water content and mold growth. The lowest rate of water content increment during storage was found on A3B1 treatment (0.16%) and the highest one was on A2B1 treatment (0.64%). The lowest rate for weekly mold growth was found on A3B1 treatment (78,330 colony/g) at open spacestorage with temperature of 25–290C and humidity of 70% using yute sack. The yute sack package is better for storage of gambir product in term of water content increment and mold growth

    ANOVA for diffusions and It\^{o} processes

    Full text link
    It\^{o} processes are the most common form of continuous semimartingales, and include diffusion processes. This paper is concerned with the nonparametric regression relationship between two such It\^{o} processes. We are interested in the quadratic variation (integrated volatility) of the residual in this regression, over a unit of time (such as a day). A main conceptual finding is that this quadratic variation can be estimated almost as if the residual process were observed, the difference being that there is also a bias which is of the same asymptotic order as the mixed normal error term. The proposed methodology, ``ANOVA for diffusions and It\^{o} processes,'' can be used to measure the statistical quality of a parametric model and, nonparametrically, the appropriateness of a one-regressor model in general. On the other hand, it also helps quantify and characterize the trading (hedging) error in the case of financial applications.Comment: Published at http://dx.doi.org/10.1214/009053606000000452 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Strongly Hierarchical Factorization Machines and ANOVA Kernel Regression

    Full text link
    High-order parametric models that include terms for feature interactions are applied to various data mining tasks, where ground truth depends on interactions of features. However, with sparse data, the high- dimensional parameters for feature interactions often face three issues: expensive computation, difficulty in parameter estimation and lack of structure. Previous work has proposed approaches which can partially re- solve the three issues. In particular, models with factorized parameters (e.g. Factorization Machines) and sparse learning algorithms (e.g. FTRL-Proximal) can tackle the first two issues but fail to address the third. Regarding to unstructured parameters, constraints or complicated regularization terms are applied such that hierarchical structures can be imposed. However, these methods make the optimization problem more challenging. In this work, we propose Strongly Hierarchical Factorization Machines and ANOVA kernel regression where all the three issues can be addressed without making the optimization problem more difficult. Experimental results show the proposed models significantly outperform the state-of-the-art in two data mining tasks: cold-start user response time prediction and stock volatility prediction.Comment: 9 pages, to appear in SDM'1

    Cuestionario-ANOVA

    Get PDF
    Parcialmente financiado por el PIE13-02

    Upaya PT. Perkebunan Nusantara VIII Dalam Mengembangkan Ekspor Teh Hitam Ke Malaysia

    Full text link
    Indonesian is fifth tea exportir country in the world from volume side after Srilanka, Kenya, Tiongkok and India. Compared than kind of other teas, the black tea is the most produced tea namely about 78%, followed green tea 20% and the rest is oolong tea and white tea namely 2 %. The problem that is faced by PT. Perkebunan Nusantara VIII is many competitors in exporting black tea, both homeland and foreign. In this research the writer uses Independence Liberalism Perspective, it means the interdependence between people and government is influenced by what happens everywhere, by the actor\u27s action of other countries

    Biomarker Detection in Association Studies: Modeling SNPs Simultaneously via Logistic ANOVA

    Get PDF
    In genome-wide association studies, the primary task is to detect biomarkers in the form of Single Nucleotide Polymorphisms (SNPs) that have nontrivial associations with a disease phenotype and some other important clinical/environmental factors. However, the extremely large number of SNPs comparing to the sample size inhibits application of classical methods such as the multiple logistic regression. Currently the most commonly used approach is still to analyze one SNP at a time. In this pa- per, we propose to consider the genotypes of the SNPs simultaneously via a logistic analysis of variance (ANOVA) model, which expresses the logit transformed mean of SNP genotypes as the summation of the SNP effects, effects of the disease phenotype and/or other clinical variables, and the interaction effects. We use a reduced-rank representation of the interaction-effect matrix for dimensionality reduction, and employ the L1-penalty in a penalized likelihood framework to filter out the SNPs that have no associations. We develop a Majorization-Minimization algorithm for computational implementation. In addition, we propose a modified BIC criterion to select the penalty parameters and determine the rank number. The proposed method is applied to a Multiple Sclerosis data set and simulated data sets and shows promise in biomarker detection

    Pseudo Bayesian Estimation of One-way ANOVA Model in Complex Surveys

    Full text link
    We devise survey-weighted pseudo posterior distribution estimators under 2-stage informative sampling of both primary clusters and secondary nested units for a one-way ANOVA population generating model as a simple canonical case where population model random effects are defined to be coincident with the primary clusters. We consider estimation on an observed informative sample under both an augmented pseudo likelihood that co-samples random effects, as well as an integrated likelihood that marginalizes out the random effects from the survey-weighted augmented pseudo likelihood. This paper includes a theoretical exposition that enumerates easily verified conditions for which estimation under the augmented pseudo posterior is guaranteed to be consistent at the true generating parameters. We reveal in simulation that both approaches produce asymptotically unbiased estimation of the generating hyperparameters for the random effects when a key condition on the sum of within cluster weighted residuals is met. We present a comparison with frequentist EM and a methods that requires pairwise sampling weights.Comment: 46 pages, 9 figure
    corecore