Search CORE

157 research outputs found

Housing and tax-deferred retirement accounts

Author: Ho Anson T. Y.
Zhou Jie
Publication venue: Ottawa: Bank of Canada
Publication date: 01/01/2016
Field of study

Assets in tax-deferred retirement accounts (TDA) and housing are two major components of household portfolios. In this paper, we develop a life-cycle model to examine the interaction between households' use of TDA and their housing decisions. The model generates life-cycle patterns of home ownership and the composition of net worth that are broadly consistent with the data from the Survey of Consumer Finances. We find that TDA promotes home ownership, as households take advantage of the preferential tax treatments for both TDA and home ownership. They substitute TDA assets for home equity by accumulating wealth in TDA and making smaller down payments (taking out bigger mortgages); consequently, they become homeowners earlier in their lives. On the other hand, housing-related policies, such as a minimum down payment requirement and mortgage interest deductibility, affect households' housing decisions more than their use of TDA

EconStor (ZBW Kiel)

Beyond the Dataset: Understanding Sociotechnical Aspects of the Knowledge Discovery Process Among Modern Data Professionals

Author: Ho Anson
Publication venue: 'University of Waterloo'
Publication date: 28/05/2017
Field of study

Data professionals are among the most sought-out professionals in today’s industry. Although the skillsets and training can vary among these professionals, there is some consensus that a combination of technical and analytical skills is necessary. In fact, a growing number of dedicated undergraduate, graduate, and certificate programs are now offering such core skills to train modern data professionals. Despite the rapid growth of the data profession, we have few insights into what it is like to be a data professional on-the-job beyond having specific technical and analytical skills. We used the Knowledge Discovery Process (KDP) as a framework to understand the sociotechnical and collaborative challenges that data professionals face. We carried out 20 semi-structured interviews with data professionals across seven different domains. Our results indicate that KDP in practice is highly social, collaborative, and dependent on domain knowledge. To address the sociotechnical gap, the need for a translator within the KDP has emerged. The main contribution of this thesis is in providing empirical insights into the work of data professionals, highlighting the sociotechnical challenges that they face on the job. Also, we propose a new analytic approach to combine thematic analysis and cognitive work analysis (CWA) on the same dataset. Implications of this research will improve the productivity of data professionals and will have implications for designing future tools and training materials for the next generation of data professionals

University of Waterloo's Institutional Repository

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Author: Casper Stephen
Hadfield-Menell Dylan
Ho Anson
Räuker Tilman
Publication venue
Publication date: 27/12/2022
Field of study

The last decade of machine learning has seen drastic increases in scale and capabilities. Deep neural networks (DNNs) are increasingly being deployed in the real world. However, they are difficult to analyze, raising concerns about using them without a rigorous understanding of how they function. Effective tools for interpreting them will be important for building more trustworthy AI by helping to identify problems, fix bugs, and improve basic understanding. In particular, "inner" interpretability techniques, which focus on explaining the internal components of DNNs, are well-suited for developing a mechanistic understanding, guiding manual modifications, and reverse engineering solutions. Much recent work has focused on DNN interpretability, and rapid progress has thus far made a thorough systematization of methods difficult. In this survey, we review over 300 works with a focus on inner interpretability tools. We introduce a taxonomy that classifies methods by what part of the network they help to explain (weights, neurons, subnetworks, or latent representations) and whether they are implemented during (intrinsic) or after (post hoc) training. To our knowledge, we are also the first to survey a number of connections between interpretability research and work in adversarial robustness, continual learning, modularity, network compression, and studying the human visual system. We discuss key challenges and argue that the status quo in interpretability research is largely unproductive. Finally, we highlight the importance of future work that emphasizes diagnostics, debugging, adversaries, and benchmarking in order to make interpretability tools more useful to engineers in practical applications

arXiv.org e-Print Archive

crs: A package for nonparametric spline estimation in R

Author: Ho Anson T.Y.
Huynh Kim P.
Jacho-Chávez David T.
Publication venue: 'Wiley'
Publication date: 01/01/2014
Field of study

crs is a library for R written by Jeffrey S. Racine (Maintainer) and Zhenghua Nie. This add-on package provides a collection of functions for spline-based nonparametric estimation of regression functions with both continuous and categorical regressors. Currently, the crs package integrates data-driven methods for selecting the spline degree, the number of knots and the necessary bandwidths for nonparametric conditional mean, IV and quantile regression. A function for multivariate density spline estimation with mixed data is also currently in the works. As a bonus, the authors have also provided the first simple R interface to the NOMAD (‘nonsmooth mesh adaptive direct search’) optimization solver which can be applied to solve other mixed integer optimization problems that future users might find useful in other settings. Although the crs package shares some of the same functionalities as its kernel-based counterpart—the np package by the same author—it currently lacks some of the features the np package provides, such as hypothesis testing and semiparametric estimation. However, what it lacks in breadth, crs makes up in speed. A Monte Carlo experiment in this review uncovers sizable speed gains compared to its np counterpart, with a marginal loss in terms of goodness of fit. Therefore, the package will be extremely useful for applied econometricians interested in employing nonparametric techniques using large amounts of data with a small number of discrete covariates

K-State Research Exchange

Productivity and reallocation: evidence from ecuadorian firm-level data

Author: Ho Anson T. Y.
Huynh Kim P.
Jacho-Chávez David T.
Publication venue
Publication date: 01/10/2019
Field of study

Ecuador, a developing small open economy, serves as an important case study for aggregate productivity growth and input reallocation. Since little is known about the economic performance of Ecuador with its crisis and reforms between 1998 and 2007, this paper uses a comprehensive microdata set from Ecuador’s National Statistics and Census Institute to study Ecuadorian firm dynamics in that period. We find that the reallocation of factor inputs (2.6 percent) and technical efficiency growth (3.2 percent) on the intensive margin are the dominant sources of aggregate productivity growth. Net entry, as a channel of reallocation on the extensive margin, generally has minor effects (–0.1 percent) and contributes to productivity growth only in the later recovery period (2002–04)

LSE Research Online

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Author: Besiroglu Tamay
Heim Lennart
Ho Anson
Hobbhahn Marius
Sevilla Jaime
Villalobos Pablo
Publication venue
Publication date: 25/10/2022
Field of study

We analyze the growth of dataset sizes used in machine learning for natural language processing and computer vision, and extrapolate these using two methods; using the historical growth rate and estimating the compute-optimal dataset size for future predicted compute budgets. We investigate the growth in data usage by estimating the total stock of unlabeled data available on the internet over the coming decades. Our analysis indicates that the stock of high-quality language data will be exhausted soon; likely before 2026. By contrast, the stock of low-quality language data and image data will be exhausted only much later; between 2030 and 2050 (for low-quality language) and between 2030 and 2060 (for images). Our work suggests that the current trend of ever-growing ML models that rely on enormous datasets might slow down if data efficiency is not drastically improved or new sources of data become available

arXiv.org e-Print Archive

Modeling and predicting mobile phone touchscreen transcription typing using an integrated cognitive architecture

Author: Cao Shi
He Jibo
Ho Anson
Publication venue: 'Informa UK Limited'
Publication date: 07/09/2017
Field of study

This is an Accepted Manuscript of an article published by Taylor & Francis in International Journal of Human–Computer Interaction on 2017-09-07, available online: http://dx.doi.org/10.1080/10447318.2017.1373463Modeling typing performance has values in both the theory and design practice of human-computer interaction. Previous models have simulated desktop keyboard transcription typing performance; however, as the increasing prevalence of smartphones, new models are needed to account for mobile phone touchscreen typing. In the current study, we built a model for mobile phone touchscreen typing in an integrated cognitive architecture and tested the model by comparing simulation results with human results. The results showed that the model could simulate and predict interkey time performance in both number typing (Experiment 1) and sentence typing (Experiment 2) tasks. The model produced results similar to the human data and captured the effects of digit/letter position and interkey distance on interkey time. The current work demonstrated the predictive power of the model without adjusting any parameters to fit human data. The results from this study provide new insights into the mechanism of mobile typing performance and support future work simulating and predicting detailed human performance in more complex mobile interaction tasks

University of Waterloo's Institutional Repository

Crossref

Data Science in Stata 16: Frames, Lasso, and Python Integration

Author: Ho Anson T. Y.
Huynh Kim P.
Jacho-Chávez David T.
Rojas-Baez Diego
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 31/05/2021
Field of study

Stata is one of the most widely used software for data analysis, statistics, and model fitting by economists, public policy researchers, epidemiologists, among others. Stata's recent release of version 16 in June 2019 includes an up-to-date methodological library and a user-friendly version of various cutting edge techniques. In the newest release, Stata has implemented several changes and additions that include:• Lasso• Multiple data sets in memory• Meta-analysis• Choice models• Python integration• Bayes-multiple chains• Panel-data ERMs• Sample-size analysis for CIs• Panel-data mixed logit• Nonlinear DSGE models• Numerical integrationThis review covers the most salient innovations in Stata 16. It is the first release that brings along an implementation of machine-learning tools. The three innovations we considered are: (1) Multiple data sets in Memory, (2) Lasso for causal inference, and (3) Python integration

Journal of Statistical Software

Algorithmic progress in language models

Author: Atkinson David
Besiroglu Tamay
Erdil Ege
Guo Zifan Carl
Ho Anson
Owen David
Rahman Robi
Sevilla Jaime
Thompson Neil
Publication venue
Publication date: 09/03/2024
Field of study

We investigate the rate at which algorithms for pre-training language models have improved since the advent of deep learning. Using a dataset of over 200 language model evaluations on Wikitext and Penn Treebank spanning 2012-2023, we find that the compute required to reach a set performance threshold has halved approximately every 8 months, with a 95% confidence interval of around 5 to 14 months, substantially faster than hardware gains per Moore's Law. We estimate augmented scaling laws, which enable us to quantify algorithmic progress and determine the relative contributions of scaling models versus innovations in training algorithms. Despite the rapid pace of algorithmic progress and the development of new architectures such as the transformer, our analysis reveals that the increase in compute made an even larger contribution to overall performance improvements over this time period. Though limited by noisy benchmark data, our analysis quantifies the rapid progress in language modeling, shedding light on the relative contributions from compute and algorithms

arXiv.org e-Print Archive