39,604 research outputs found

    An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

    Full text link
    As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. We present extensive experimental results to substantiate the effectiveness of our methodology.Comment: A preliminary version of this work was presented in ACM PODS 2009. 20 pages, 0 figure

    Eigenvector localization as a tool to study small communities in online social networks

    Full text link
    We present and discuss a mathematical procedure for identification of small "communities" or segments within large bipartite networks. The procedure is based on spectral analysis of the matrix encoding network structure. The principal tool here is localization of eigenvectors of the matrix, by means of which the relevant network segments become visible. We exemplified our approach by analyzing the data related to product reviewing on Amazon.com. We found several segments, a kind of hybrid communities of densely interlinked reviewers and products, which we were able to meaningfully interpret in terms of the type and thematic categorization of reviewed items. The method provides a complementary approach to other ways of community detection, typically aiming at identification of large network modules

    Randomized Comparison of Two Internet-Supported Natural Family Planning Methods (Preliminary Findings)

    Get PDF
    The aims of this study were to determine and compare efficacy, satisfaction, ease of use, and motivation in using an internet-based method of Natural Family Planning (NFP) that utilizes either electronic hormonal fertility monitoring (EHFM) or cervical-mucus monitoring (CMM). Four hundred fifty women (mean age 30.1) and their male partners (mean age 31.9) who sought to avoid pregnancy were randomized into either an EHFM (N=228) or CMM NFP group (N=222). Both groups utilized a Web site that provided NFP instructions, an electronic charting system, and support from professional nurses. Participants were assessed for satisfaction, ease of use, and motivation in use of their respective NFP method at 1, 3, and 6 months. Unintended pregnancies were validated by pregnancy evaluations and urine tests. Correct and total pregnancy rates were determined by survival analysis. Correct and total 12 month unintended pregnancy rates for the combined participants (N=450) were 1 and 9 per 100 couple users (Std. Error = .01 and .02) respectively. The EHFM participants (N=228), however, had a typical unintended pregnancy rate of 6 (Std. Error = .03) compared to the CMM group (N=222) pregnancy rate of 13 (Std. Error = .04) per 100 users over 12 months of use. The mean satisfaction/ease of use score for the EHFM group at 6 months of use was 46.1 compared to 42.9 for the CMM group (p \u3c .07). Motivation to avoid pregnancy was stronger for the CMM group compared to the EHFM group at 3 and 6 months of use (37.9 and 38.8 versus 33.7 and 33.4, p \u3c .01). Although both NFP methods were highly effective methods of family planning delivered through a nurse supported Web site, at this time, the unintended pregnancy rate was lower for the EHFM group and compared well with hormonal contraception. Although acceptability of the EHFM NFP was high, motivation to avoid pregnancy with that group decreased over time

    Issues in Statistical Inference

    Get PDF
    The APA Task Force’s treatment of research methods is critically examined. The present defense of the experiment rests on showing that (a) the control group cannot be replaced by the contrast group, (b) experimental psychologists have valid reasons to use non-randomly selected subjects, (c) there is no evidential support for the experimenter expectancy effect, (d) the Task Force had misrepresented the role of inductive and deductive logic, and (e) the validity of experimental data does not require appealing to the effect size or statistical power

    Preliminary investigation of flexibility in learning color-reward associations in gibbons (<i>Hylobatidae</i>)

    Get PDF
    Previous studies in learning set formation have shown that most animal species can learn to learn with subsequent novel presentations being solved in fewer presentations than when they first encounter a task. Gibbons (Hylobatidae) have generally struggled with these tasks and do not show the learning to learn pattern found in other species. This is surprising given their phylogenetic position and level of cortical development. However, there have been conflicting results with some studies demonstrating higher level learning abilities in these small apes. This study attempts to clarify whether gibbons can in fact use knowledge gained during one learning task to facilitate performance on a similar, but novel problem that would be a precursor to development of a learning set. We tested 16 captive gibbons' ability to associate color cues with provisioned food items in two experiments where they experienced a period of learning followed by experimental trials during which they could potentially use knowledge gained in their first learning experience to facilitate solution I subsequent novel tasks. Our results are similar to most previous studies in that there was no evidence of gibbons being able to use previously acquired knowledge to solve a novel task. However, once the learning association was made, the gibbons performed well above chance. We found no differences across color associations, indicating learning was not affected by the particular color / reward association. However, there were variations in learning performance with regard to genera. The hoolock (Hoolock leuconedys) and siamang (Symphalangus syndactylus) learned the fastest and the lar group (Hylobates sp.) learned the slowest. We caution these results could be due to the small sample size and because of the captive environment in which these gibbons were raised. However, it is likely that environmental variability in the native habitats of the subjects tested could facilitate the evolution of flexible learning in some genera. Further comparative study is necessary in order to incorporate realistic cognitive variables into foraging models
    • …
    corecore