Search CORE

39,604 research outputs found

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

Author: Kirsch Adam
Mitzenmacher Michael
Pietracaprina Andrea
Pucci Geppino
Upfal Eli
Vandin Fabio
Publication venue
Publication date: 01/01/2009
Field of study

As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. We present extensive experimental results to substantiate the effectiveness of our methodology.Comment: A preliminary version of this work was presented in ACM PODS 2009. 20 pages, 0 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Padova

Eigenvector localization as a tool to study small communities in online social networks

Author: Barabási A.-L.
Barry A.
Benkler Y.
Berger P. L.
Bollobás B.
Bornholdt S.
Bouchaud J.-P.
Cattuto C.
Christakis N. A.
Chung F. R. K.
Danon L.
Degenne A.
Diestel R.
Donetti L.
Erdős P.
Fortunato S.
Hacking I.
Lancichinetti A.
Lash S.
Latour B.
Mehta M. L.
Newman M.
Rheingold H.
Scott J.
Slanina F.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 25/05/2011
Field of study

We present and discuss a mathematical procedure for identification of small "communities" or segments within large bipartite networks. The procedure is based on spectral analysis of the matrix encoding network structure. The principal tool here is localization of eigenvectors of the matrix, by means of which the relevant network segments become visible. We exemplified our approach by analyzing the data related to product reviewing on Amazon.com. We found several segments, a kind of hybrid communities of densely interlinked reviewers and products, which we were able to meaningfully interpret in terms of the type and thematic categorization of reviewed items. The method provides a complementary approach to other ways of community detection, typically aiming at identification of large network modules

arXiv.org e-Print Archive

Crossref

Randomized Comparison of Two Internet-Supported Natural Family Planning Methods (Preliminary Findings)

Author: Fehring Richard J.
Schneider Mary
Publication venue: e-Publications@Marquette
Publication date: 01/01/2012
Field of study

The aims of this study were to determine and compare efficacy, satisfaction, ease of use, and motivation in using an internet-based method of Natural Family Planning (NFP) that utilizes either electronic hormonal fertility monitoring (EHFM) or cervical-mucus monitoring (CMM). Four hundred fifty women (mean age 30.1) and their male partners (mean age 31.9) who sought to avoid pregnancy were randomized into either an EHFM (N=228) or CMM NFP group (N=222). Both groups utilized a Web site that provided NFP instructions, an electronic charting system, and support from professional nurses. Participants were assessed for satisfaction, ease of use, and motivation in use of their respective NFP method at 1, 3, and 6 months. Unintended pregnancies were validated by pregnancy evaluations and urine tests. Correct and total pregnancy rates were determined by survival analysis. Correct and total 12 month unintended pregnancy rates for the combined participants (N=450) were 1 and 9 per 100 couple users (Std. Error = .01 and .02) respectively. The EHFM participants (N=228), however, had a typical unintended pregnancy rate of 6 (Std. Error = .03) compared to the CMM group (N=222) pregnancy rate of 13 (Std. Error = .04) per 100 users over 12 months of use. The mean satisfaction/ease of use score for the EHFM group at 6 months of use was 46.1 compared to 42.9 for the CMM group (p \u3c .07). Motivation to avoid pregnancy was stronger for the CMM group compared to the EHFM group at 3 and 6 months of use (37.9 and 38.8 versus 33.7 and 33.4, p \u3c .01). Although both NFP methods were highly effective methods of family planning delivered through a nurse supported Web site, at this time, the unintended pregnancy rate was lower for the EHFM group and compared well with hormonal contraception. Although acceptability of the EHFM NFP was high, motivation to avoid pregnancy with that group decreased over time

epublications@Marquette

Recommended from our members

Measuring category intuitiveness in unconstrained categorization tasks

Author: Akaike
Amotz Perlman
Anderson
Ashby
Ashby
Ashby
Barrett
Billman
Brown
Chapman
Chater
Colreavy
Compton
Compton
Corter
Darren J. Edwards
Demetras
Elman
Emmanuel M. Pothos
Estes
Feldman
Feldman
Fiser
Gopnik
Gosselin
Gureckis
Hahn
Hampton
Handel
Handel
Handel
Heller
Hines
John V. McDonnell
Johnson
Jones
Ken Kurtz
Kurtz
Love
Malt
Malt
Mareschal
Medin
Medin
Medin
Medin
Mervis
Milton
Milton
Minda
Morgan
Murphy
Murphy
Murphy
Nelson
Nelson
Nosofsky
Nosofsky
Peter Hines
Pitt
Pothos
Pothos
Pothos
Quinn
Rand
Reber
Regehr
Rips
Rosch
Sanborn
Schyns
Smith
Stewart
Todd M. Bailey
Vanpaemel
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

What makes a category seem natural or intuitive? In this paper, an unsupervised categorization task was employed to examine observer agreement concerning the categorization of nine different stimulus sets. The stimulus sets were designed to capture different intuitions about classification structure. The main empirical index of category intuitiveness was the frequency of the preferred classification, for different stimulus sets. With 169 participants, and a within participants design, with some stimulus sets the most frequent classification was produced over 50 times and with others not more than two or three times. The main empirical finding was that cluster tightness was more important in determining category intuitiveness, than cluster separation. The results were considered in relation to the following models of unsupervised categorization: DIVA, the rational model, the simplicity model, SUSTAIN, an Unsupervised version of the Generalized Context Model (UGCM), and a simple geometric model based on similarity. DIVA, the geometric approach, SUSTAIN, and the UGCM provided good, though not perfect, fits. Overall, the present work highlights several theoretical and practical issues regarding unsupervised categorization and reveals weaknesses in some of the corresponding formal models

City Research Online

Crossref

Online Research @ Cardiff

Cronfa at Swansea University

Issues in Statistical Inference

Author: Chow Dr. Siu L.
Publication venue
Publication date: 01/01/2002
Field of study

The APA Task Force’s treatment of research methods is critically examined. The present defense of the experiment rests on showing that (a) the control group cannot be replaced by the contrast group, (b) experimental psychologists have valid reasons to use non-randomly selected subjects, (c) there is no evidential support for the experimenter expectancy effect, (d) the Task Force had misrepresented the role of inductive and deductive logic, and (e) the validity of experimental data does not require appealing to the effect size or statistical power

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Preliminary investigation of flexibility in learning color-reward associations in gibbons (<i>Hylobatidae</i>)

Author: Abordo
Barton
Barton
Barton
Boinski
Brockman
Caine
Clutton-Brock
Cunningham
Cunningham
Cunningham
Curran
Deegan
Fedor
Forbes
Garber
Garson
Gossette
Groves
Harlow
Hemingway
Horton
Jacobs
Jacobs
Lucas
MacArthur
Mootnick
Nagle
Osorio
Overdorff
Passingham
Regan
Regan
Riley
Rumbaugh
Stephens
Tomasello
Van Schaik
Publication venue
Publication date: 02/04/2015
Field of study

Previous studies in learning set formation have shown that most animal species can learn to learn with subsequent novel presentations being solved in fewer presentations than when they first encounter a task. Gibbons (Hylobatidae) have generally struggled with these tasks and do not show the learning to learn pattern found in other species. This is surprising given their phylogenetic position and level of cortical development. However, there have been conflicting results with some studies demonstrating higher level learning abilities in these small apes. This study attempts to clarify whether gibbons can in fact use knowledge gained during one learning task to facilitate performance on a similar, but novel problem that would be a precursor to development of a learning set. We tested 16 captive gibbons' ability to associate color cues with provisioned food items in two experiments where they experienced a period of learning followed by experimental trials during which they could potentially use knowledge gained in their first learning experience to facilitate solution I subsequent novel tasks. Our results are similar to most previous studies in that there was no evidence of gibbons being able to use previously acquired knowledge to solve a novel task. However, once the learning association was made, the gibbons performed well above chance. We found no differences across color associations, indicating learning was not affected by the particular color / reward association. However, there were variations in learning performance with regard to genera. The hoolock (Hoolock leuconedys) and siamang (Symphalangus syndactylus) learned the fastest and the lar group (Hylobates sp.) learned the slowest. We caution these results could be due to the small sample size and because of the captive environment in which these gibbons were raised. However, it is likely that environmental variability in the native habitats of the subjects tested could facilitate the evolution of flexible learning in some genera. Further comparative study is necessary in order to incorporate realistic cognitive variables into foraging models

Abertay Research Portal

Crossref