Search CORE

314,860 research outputs found

An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets

Author: Kirsch Adam
Mitzenmacher Michael
Pietracaprina Andrea
Pucci Geppino
Upfal Eli
Vandin Fabio
Publication venue
Publication date: 01/01/2009
Field of study

As advances in technology allow for the collection, storage, and analysis of vast amounts of data, the task of screening and assessing the significance of discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s* for a dataset, such that the number of itemsets with support at least s* represents a substantial deviation from what would be expected in a random dataset with the same number of transactions and the same individual item frequencies. These itemsets can then be flagged as statistically significant with a small false discovery rate. We present extensive experimental results to substantiate the effectiveness of our methodology.Comment: A preliminary version of this work was presented in ACM PODS 2009. 20 pages, 0 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Archivio istituzionale della ricerca - Università di Padova

In search of grammaticalization in synchronic dialect data: General extenders in north-east England

Author: Levey S
Pichler H
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/11/2011
Field of study

In this paper, we draw on a socially stratified corpus of dialect data collected in north-east England to test recent proposals that grammaticalization processes are implicated in the synchronic variability of general extenders (GEs), i.e., phrase- or clause-final constructions such as and that and or something. Combining theoretical insights from the framework of grammaticalization with the empirical methods of variationist sociolinguistics, we operationalize key diagnostics of grammaticalization (syntagmatic length, decategorialization, semantic-pragmatic change) as independent factor groups in the quantitative analysis of GE variability. While multivariate analyses reveal rapid changes in apparent time to the social conditioning of some GE variants in our data, they do not reveal any evidence of systematic changes in the linguistic conditioning of variants in apparent time that would confirm an interpretation of ongoing grammaticalization. These results lead us to questio

University of Salford Institutional Repository

Methodology in mission statement research: where are we, and were should we go? An analysis of 20 years of empirical research

Author: Desmidt Sebastian
Heene Aimé
Publication venue: Universiteit Gent
Publication date: 01/01/2006
Field of study

Ghent University Academic Bibliography

Beyond the mask of deference: Exploring the relationship between ruptures and transference in a single-case study

Author: Carli De
L
Lang E.
Locati F.
Parolin &
Tarasconi P.
Publication venue: 'PAGEPress Publications'
Publication date: 01/01/2016
Field of study

Archivio istituzionale della ricerca - Università di Padova

Deconstructing comprehensibility: identifying the linguistic influences on listeners' L2 comprehensibility ratings

Author: Isaacs T
Trofimovich P
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 15/08/2012
Field of study

Comprehensibility, a major concept in second language (L2) pronunciation research that denotes listeners’ perceptions of how easily they understand L2 speech, is central to interlocutors’ communicative success in real-world contexts. Although comprehensibility has been modeled in several L2 oral proficiency scales—for example, the Test of English as a Foreign Language (TOEFL) or the International English Language Testing System (IELTS)—shortcomings of existing scales (e.g., vague descriptors) reflect limited empirical evidence as to which linguistic aspects influence listeners’ judgments of L2 comprehensibility at different ability levels. To address this gap, a mixed-methods approach was used in the present study to gain a deeper understanding of the linguistic aspects underlying listeners’ L2 comprehensibility ratings. First, speech samples of 40 native French learners of English were analyzed using 19 quantitative speech measures, including segmental, suprasegmental, fluency, lexical, grammatical, and discourse-level variables. These measures were then correlated with 60 native English listeners’ scalar judgments of the speakers’ comprehensibility. Next, three English as a second language (ESL) teachers provided introspective reports on the linguistic aspects of speech that they attended to when judging L2 comprehensibility. Following data triangulation, five speech measures were identified that clearly distinguished between L2 learners at different comprehensibility levels. Lexical richness and fluency measures differentiated between low-level learners; grammatical and discourse-level measures differentiated between high-level learners; and word stress errors discriminated between learners of all levels

UCL Discovery

Explore Bristol Research

From buildings to cities: techniques for the multi-scale analysis of urban form and function

Author: Crooks A.
Smith D.
Publication venue: Centre for Advanced Spatial Analysis (UCL)
Publication date: 01/07/2010
Field of study

The built environment is a significant factor in many urban processes, yet direct measures of built form are seldom used in geographical studies. Representation and analysis of urban form and function could provide new insights and improve the evidence base for research. So far progress has been slow due to limited data availability, computational demands, and a lack of methods to integrate built environment data with aggregate geographical analysis. Spatial data and computational improvements are overcoming some of these problems, but there remains a need for techniques to process and aggregate urban form data. Here we develop a Built Environment Model of urban function and dwelling type classifications for Greater London, based on detailed topographic and address-based data (sourced from Ordnance Survey MasterMap). The multi-scale approach allows the Built Environment Model to be viewed at fine-scales for local planning contexts, and at city-wide scales for aggregate geographical analysis, allowing an improved understanding of urban processes. This flexibility is illustrated in the two examples, that of urban function and residential type analysis, where both local-scale urban clustering and city-wide trends in density and agglomeration are shown. While we demonstrate the multi-scale Built Environment Model to be a viable approach, a number of accuracy issues are identified, including the limitations of 2D data, inaccuracies in commercial function data and problems with temporal attribution. These limitations currently restrict the more advanced applications of the Built Environment Model

UCL Discovery

Semantic Stability in Social Tagging Streams

Author: Huberman Bernardo A.
Singer Philipp
Strohmaier Markus
Wagner Claudia
Publication venue
Publication date: 05/11/2013
Field of study

One potential disadvantage of social tagging systems is that due to the lack of a centralized vocabulary, a crowd of users may never manage to reach a consensus on the description of resources (e.g., books, users or songs) on the Web. Yet, previous research has provided interesting evidence that the tag distributions of resources may become semantically stable over time as more and more users tag them. At the same time, previous work has raised an array of new questions such as: (i) How can we assess the semantic stability of social tagging systems in a robust and methodical way? (ii) Does semantic stabilization of tags vary across different social tagging systems and ultimately, (iii) what are the factors that can explain semantic stabilization in such systems? In this work we tackle these questions by (i) presenting a novel and robust method which overcomes a number of limitations in existing methods, (ii) empirically investigating semantic stabilization processes in a wide range of social tagging systems with distinct domains and properties and (iii) detecting potential causes for semantic stabilization, specifically imitation behavior, shared background knowledge and intrinsic properties of natural language. Our results show that tagging streams which are generated by a combination of imitation dynamics and shared background knowledge exhibit faster and higher semantic stability than tagging streams which are generated via imitation dynamics or natural language streams alone

arXiv.org e-Print Archive

Crossref

SSOAR - Social Science Open Access Repository

MAnnheim DOCument Server

Applied Research Automatic Self-Talk Questionnaire for Sports (ASTQS): Development and Preliminary Validation of a Measure Identifying the Structure of Athletes’ Self-Talk

Author: Chroni Stiliani
Hatzigeorgiadis Antonis
Papaioannou Athanasios
Theodorakis Yannis
Zourbanos Nikos
Publication venue: 'Human Kinetics'
Publication date: 01/01/2009
Field of study

The aim of the present investigation was to develop an instrument assessing the content and the structure of athletes’ self-talk. The study was conducted in three stages. In the first stage, a large pool of items was generated and content analysis was used to organize the items into categories. Furthermore, item-content relevance analysis was conducted to help identifying the most appropriate items. In Stage 2, the factor structure of the instrument was examined by a series of exploratory factor analyses (Sample A: N = 507), whereas in Stage 3 the results of the exploratory factor analysis were retested through confirmatory factor analyses (Sample B: N = 766) and at the same time concurrent validity were assessed. The analyses revealed eight factors, four positive (psych up, confidence, anxiety control and instruction), three negative (worry, disengagement and somatic fatigue) and one neutral (irrelevant thoughts). The findings of the study provide evidence regarding the multidimensionality of self-talk, suggesting that ASTQS seems a psychometrically sound instrument that could help us developing cognitive-behavioral theories and interventions to examine and modify athletes’ self-talk

Northumbria Research Link

University of Thessaly Institutional Repository

Extreme Weather Events in Europe: preparing for climate change adaptation

Author: Benestad Rasmjus
Cubasch Ulrich
Donat M.
Duarte Santos Filipe
Fischer Erich
Gunnar Kvamstø Nils
Hov Øystein
Höppe Peter
Iversen Trond
Kundzewicz W. Zbigniew
Leckebusch C. Gregor
Murlis John
Rezacova Daniela
Rios David
Schädler Bruno
Ulbrich Uwe
Veisz Ottó Bálint
Zerefos Christos
Publication venue: Norwegian Meteorological Institute
Publication date: 01/01/2013
Field of study

University of Birmingham Research Portal

Repository of the Academy's Library

Bern Open Repository and Information System (BORIS)