Search CORE

135 research outputs found

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

Author: Halko N.
Martinsson P. G.
Tropp J. A.
Publication venue: 'The Japan Society for Industrial and Applied Mathematics'
Publication date: 01/01/2011
Field of study

Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data

CiteSeerX

Caltech Authors

Dimension-adaptive bounds on compressive FLD Classification

Author: G. Biau
Geoffrey J. McLachlan
N. Halko
R. Vershynin
R.J. Durrant
S. Dasgupta
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Efficient dimensionality reduction by random projections (RP) gains popularity, hence the learning guarantees achievable in RP spaces are of great interest. In finite dimensional setting, it has been shown for the compressive Fisher Linear Discriminant (FLD) classifier that forgood generalisation the required target dimension grows only as the log of the number of classes and is not adversely affected by the number of projected data points. However these bounds depend on the dimensionality d of the original data space. In this paper we give further guarantees that remove d from the bounds under certain conditions of regularity on the data density structure. In particular, if the data density does not fill the ambient space then the error of compressive FLD is independent of the ambient dimension and depends only on a notion of ‘intrinsic dimension'

CiteSeerX

Crossref

Research Commons@Waikato

Solving $k$ -means on High-dimensional Big Data

Author: AK Jain
H Steinhaus
J Stallmann
JL Bentley
K Jain
MR Ackermann
MW Mahoney
N Halko
P Drineas
PK Agarwal
T Kanungo
T Zhang
X Wu
Publication venue
Publication date: 01/01/2015
Field of study

In recent years, there have been major efforts to develop data stream algorithms that process inputs in one pass over the data with little memory requirement. For the

k

-means problem, this has led to the development of several

(1+\varepsilon)

-approximations (under the assumption that

k

is a constant), but also to the design of algorithms that are extremely fast in practice and compute solutions of high accuracy. However, when not only the length of the stream is high but also the dimensionality of the input points, then current methods reach their limits. We propose two algorithms, piecy and piecy-mr that are based on the recently developed data stream algorithm BICO that can process high dimensional data in one pass and output a solution of high quality. While piecy is suited for high dimensional data with a medium number of points, piecy-mr is meant for high dimensional data that comes in a very long stream. We provide an extensive experimental study to evaluate piecy and piecy-mr that shows the strength of the new algorithms.Comment: 23 pages, 9 figures, published at the 14th International Symposium on Experimental Algorithms - SEA 201

arXiv.org e-Print Archive

computer science publication server

Crossref

Kölner UniversitätsPublikationsServer

Toward criteria for pragmatic measurement in implementation research and practice: a stakeholder-driven approach using concept mapping

Author: Barwick Melanie A.
Damschroder Laura J.
Dorsey Caitlin N.
Halko Heather M.
Lewis Cara C.
Powell Byron J.
Stanick Cameo F.
Weiner Bryan J.
Wensing Michel
Wolfenden Luke
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Background: Advancing implementation research and practice requires valid and reliable measures of implementation determinants, mechanisms, processes, strategies, and outcomes. However, researchers and implementation stakeholders are unlikely to use measures if they are not also pragmatic. The purpose of this study was to establish a stakeholder-driven conceptualization of the domains that comprise the pragmatic measure construct. It built upon a systematic review of the literature and semi-structured stakeholder interviews that generated 47 criteria for pragmatic measures, and aimed to further refine that set of criteria by identifying conceptually distinct categories of the pragmatic measure construct and providing quantitative ratings of the criteria’s clarity and importance. Methods: Twenty-four stakeholders with expertise in implementation practice completed a concept mapping activity wherein they organized the initial list of 47 criteria into conceptually distinct categories and rated their clarity and importance. Multidimensional scaling, hierarchical cluster analysis, and descriptive statistics were used to analyze the data. Findings: The 47 criteria were meaningfully grouped into four distinct categories: (1) acceptable, (2) compatible, (3) easy, and (4) useful. Average ratings of clarity and importance at the category and individual criteria level will be presented. Conclusions: This study advances the field of implementation science and practice by providing clear and conceptually distinct domains of the pragmatic measure construct. Next steps will include a Delphi process to develop consensus on the most important criteria and the development of quantifiable pragmatic rating criteria that can be used to assess measures

University of Toronto Research Repository

University of Newcastle's Digital Repository

Crossref

Heidelberger Dokumentenserver

Directory of Open Access Journals

Carolina Digital Repository

Incremental dimension reduction of tensors with random index

Author: B Emruli
Blerim Emruli
D Achlioptas
DM Kane
E Velldal
Fredrik Sandin
I Fronza
J Karlgren
J Matoušek
K Lund
M Baroni
M Berry
M Sahlgren
M Wan
Magnus Sahlgren
MWM Boyd
N Goel
N Halko
P Frankl
P Kanerva
P Kanerva
PD Turney
RG Baraniuk
S Dasgupta
S Deerwester
Science Staff
SS Vempala
T Cohen
T Cohen
TG Kolda
TK Landauer
V Vasuki
W Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/03/2011
Field of study

We present an incremental, scalable and efficient dimension reduction technique for tensors that is based on sparse random linear coding. Data is stored in a compactified representation with fixed size, which makes memory requirements low and predictable. Component encoding and decoding are performed on-line without computationally expensive re-analysis of the data set. The range of tensor indices can be extended dynamically without modifying the component representation. This idea originates from a mathematical model of semantic memory and a method known as random indexing in natural language processing. We generalize the random-indexing algorithm to tensors and present signal-to-noise-ratio simulations for representations of vectors and matrices. We present also a mathematical analysis of the approximate orthogonality of high-dimensional ternary vectors, which is a property that underpins this and other similar random-coding approaches to dimension reduction. To further demonstrate the properties of random indexing we present results of a synonym identification task. The method presented here has some similarities with random projection and Tucker decomposition, but it performs well at high dimensionality only (n>10^3). Random indexing is useful for a range of complex practical problems, e.g., in natural language processing, data mining, pattern recognition, event detection, graph searching and search engines. Prototype software is provided. It supports encoding and decoding of tensors of order >= 1 in a unified framework, i.e., vectors, matrices and higher order tensors.Comment: 36 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Luleå University of Technology Publications

Regularized Linear Inversion with Randomized Singular Value Decomposition

Author: A Frieze
A Neubauer
A Szlam
B Jin
E Somersalo
GW Stewart
J Wang
L Eldén
L Zhang
M Griebel
M Gu
N Halko
R Witten
U Tautenhahn
Y Wei
Publication venue
Publication date: 04/09/2019
Field of study

In this work, we develop efficient solvers for linear inverse problems based on randomized singular value decomposition (RSVD). This is achieved by combining RSVD with classical regularization methods, e.g., truncated singular value decomposition, Tikhonov regularization, and general Tikhonov regularization with a smoothness penalty. One distinct feature of the proposed approach is that it explicitly preserves the structure of the regularized solution in the sense that it always lies in the range of a certain adjoint operator. We provide error estimates between the approximation and the exact solution under canonical source condition, and interpret the approach in the lens of convex duality. Extensive numerical experiments are provided to illustrate the efficiency and accuracy of the approach.Comment: 20 pages, 4 figure

arXiv.org e-Print Archive

Crossref

UCL Discovery

Operationalizing the ‘pragmatic’ measures construct using a stakeholder feedback and a multi-method approach

Author: Dorsey Caitlin N
Halko Heather M
Lewis Cara C
Palinkas Lawrence A
Powell Byron J
Stanick Cameo F
Weiner Bryan J
Publication venue: BioMed Central
Publication date: 22/11/2018
Field of study

Abstract Context Implementation science measures are rarely used by stakeholders to inform and enhance clinical program change. Little is known about what makes implementation measures pragmatic (i.e., practical) for use in community settings; thus, the present study’s objective was to generate a clinical stakeholder-driven operationalization of a pragmatic measures construct. Evidence acquisition The pragmatic measures construct was defined using: 1) a systematic literature review to identify dimensions of the construct using PsycINFO and PubMed databases, and 2) interviews with an international stakeholder panel (N = 7) who were asked about their perspectives of pragmatic measures. Evidence synthesis Combined results from the systematic literature review and stakeholder interviews revealed a final list of 47 short statements (e.g., feasible, low cost, brief) describing pragmatic measures, which will allow for the development of a rigorous, stakeholder-driven conceptualization of the pragmatic measures construct. Conclusions Results revealed significant overlap between terms related to the pragmatic construct in the existing literature and stakeholder interviews. However, a number of terms were unique to each methodology. This underscores the importance of understanding stakeholder perspectives of criteria measuring the pragmatic construct. These results will be used to inform future phases of the project where stakeholders will determine the relative importance and clarity of each dimension of the pragmatic construct, as well as their priorities for the pragmatic dimensions. Taken together, these results will be incorporated into a pragmatic rating system for existing implementation science measures to support implementation science and practice

Carolina Digital Repository

LASSI-L detects early cognitive changes in pre-motor manifest Huntington’s disease: a replication and validation study

Author: Andrew Hall
Clementina J. Ullman
Jody Corey-Bloom
Luis A. Sierra
Mark A. Halko
Mark A. Halko
Robin Schubert
Samuel A. Frank
Samuel A. Frank
Sarbesh R. Pandeya
Shelby B. Hughes
Shelby B. Hughes
Simon Laganiere
Simon Laganiere
Publication venue: 'Frontiers Media SA'
Publication date: 01/07/2023
Field of study

Background and objectivesCognitive decline is an important early sign in pre-motor manifest Huntington’s disease (preHD) and is characterized by deficits across multiple domains including executive function, psychomotor processing speed, and memory retrieval. Prior work suggested that the Loewenstein-Acevedo Scale for Semantic Interference and Learning (LASSI-L)–a verbal learning task that simultaneously targets these domains - could capture early cognitive changes in preHD. The current study aimed to replicate, validate and further analyze the LASSI-L in preHD using larger datasets.MethodsLASSI-L was administered to 50 participants (25 preHD and 25 Healthy Controls) matched for age, education, and sex in a longitudinal study of disease progression and compared to performance on MMSE, Trail A & B, SCWT, SDMT, Semantic Fluency (Animals), and CVLT-II. Performance was then compared to a separate age-education matched-cohort of 25 preHD participants. Receiver operating curve (ROC) and practice effects (12 month interval) were investigated. Group comparisons were repeated using a preHD subgroup restricted to participants predicted to be far from diagnosis (Far subgroup), based on CAG-Age-Product scaled (CAPs) score. Construct validity was assessed through correlations with previously established measures of subcortical atrophy.ResultsPreHD performance on all sections of the LASSI-L was significantly different from controls. The proactive semantic interference section (PSI) was sensitive (p = 0.0001, d = 1.548), similar across preHD datasets (p = 1.0), reliable on test–retest over 12 months (spearman rho = 0.88; p = <0.00001) and associated with an excellent area under ROC (AUROC) of 0.855. In the preHD Far subgroup comparison, PSI was the only cognitive assessment to survive FDR < 0.05 (p = 0.03). The number of intrusions on PSI was negatively correlated with caudate volume.DiscussionThe LASSI-L is a sensitive, reliable, efficient tool for detecting cognitive decline in preHD. By using a unique verbal learning test paradigm that simultaneously targets executive function, processing speed and memory retrieval, the LASSI-L outperforms many other established tests and captures early signs of cognitive impairment. With further longitudinal validation, the LASSI-L could prove to be a useful biomarker for clinical research in preHD

Directory of Open Access Journals

An updated protocol for a systematic review of implementation-related measures

Author: Dorsey Caitlin N
Halko Heather
Lewis Cara C
Martinez Ruben G
Mettert Kayne D
Nolen Elspeth
Powell Byron
Stanick Cameo
Weiner Bryan J
Publication venue: BioMed Central
Publication date: 25/04/2018
Field of study

Abstract Background Implementation science is the study of strategies used to integrate evidence-based practices into real-world settings (Eccles and Mittman, Implement Sci. 1(1):1, 2006). Central to the identification of replicable, feasible, and effective implementation strategies is the ability to assess the impact of contextual constructs and intervention characteristics that may influence implementation, but several measurement issues make this work quite difficult. For instance, it is unclear which constructs have no measures and which measures have any evidence of psychometric properties like reliability and validity. As part of a larger set of studies to advance implementation science measurement (Lewis et al., Implement Sci. 10:102, 2015), we will complete systematic reviews of measures that map onto the Consolidated Framework for Implementation Research (Damschroder et al., Implement Sci. 4:50, 2009) and the Implementation Outcomes Framework (Proctor et al., Adm Policy Ment Health. 38(2):65-76, 2011), the protocol for which is described in this manuscript. Methods Our primary databases will be PubMed and Embase. Our search strings will be comprised of five levels: (1) the outcome or construct term; (2) terms for measure; (3) terms for evidence-based practice; (4) terms for implementation; and (5) terms for mental health. Two trained research specialists will independently review all titles and abstracts followed by full-text review for inclusion. The research specialists will then conduct measure-forward searches using the “cited by” function to identify all published empirical studies using each measure. The measure and associated publications will be compiled in a packet for data extraction. Data relevant to our Psychometric and Pragmatic Evidence Rating Scale (PAPERS) will be independently extracted and then rated using a worst score counts methodology reflecting “poor” to “excellent” evidence. Discussion We will build a centralized, accessible, searchable repository through which researchers, practitioners, and other stakeholders can identify psychometrically and pragmatically strong measures of implementation contexts, processes, and outcomes. By facilitating the employment of psychometrically and pragmatically strong measures identified through this systematic review, the repository would enhance the cumulativeness, reproducibility, and applicability of research findings in the rapidly growing field of implementation science

Carolina Digital Repository