68 research outputs found

    Scalable learning for geostatistics and speaker recognition

    Get PDF
    With improved data acquisition methods, the amount of data that is being collected has increased severalfold. One of the objectives in data collection is to learn useful underlying patterns. In order to work with data at this scale, the methods not only need to be effective with the underlying data, but also have to be scalable to handle larger data collections. This thesis focuses on developing scalable and effective methods targeted towards different domains, geostatistics and speaker recognition in particular. Initially we focus on kernel based learning methods and develop a GPU based parallel framework for this class of problems. An improved numerical algorithm that utilizes the GPU parallelization to further enhance the computational performance of kernel regression is proposed. These methods are then demonstrated on problems arising in geostatistics and speaker recognition. In geostatistics, data is often collected at scattered locations and factors like instrument malfunctioning lead to missing observations. Applications often require the ability interpolate this scattered spatiotemporal data on to a regular grid continuously over time. This problem can be formulated as a regression problem, and one of the most popular geostatistical interpolation techniques, kriging is analogous to a standard kernel method: Gaussian process regression. Kriging is computationally expensive and needs major modifications and accelerations in order to be used practically. The GPU framework developed for kernel methods is extended to kriging and further the GPU's texture memory is better utilized for enhanced computational performance. Speaker recognition deals with the task of verifying a person's identity based on samples of his/her speech - "utterances". This thesis focuses on text-independent framework and three new recognition frameworks were developed for this problem. We proposed a kernelized Renyi distance based similarity scoring for speaker recognition. While its performance is promising, it does not generalize well for limited training data and therefore does not compare well to state-of-the-art recognition systems. These systems compensate for the variability in the speech data due to the message, channel variability, noise and reverberation. State-of-the-art systems model each speaker as a mixture of Gaussians (GMM) and compensate for the variability (termed "nuisance"). We propose a novel discriminative framework using a latent variable technique, partial least squares (PLS), for improved recognition. The kernelized version of this algorithm is used to achieve a state of the art speaker ID system, that shows results competitive with the best systems reported on in NIST's 2010 Speaker Recognition Evaluation

    Machine Learning in Image Analysis and Pattern Recognition

    Get PDF
    This book is to chart the progress in applying machine learning, including deep learning, to a broad range of image analysis and pattern recognition problems and applications. In this book, we have assembled original research articles making unique contributions to the theory, methodology and applications of machine learning in image analysis and pattern recognition

    STK /WST 795 Research Reports

    Get PDF
    These documents contain the honours research reports for each year for the Department of Statistics.Honours Research Reports - University of Pretoria 20XXStatisticsBSs (Hons) Mathematical Statistics, BCom (Hons) Statistics, BCom (Hons) Mathematical StatisticsUnrestricte

    A Novel Nonparametric Test for Heterogeneity Detection and Assessment of Fluid Removal Among CRRT Patients in ICU

    Get PDF
    Over the past decade acute kidney injury (AKI) has been occurring among 20%-50% of patients admitted to the intensive care unit (ICU) in United States. Continuous renal replacement therapy (CRRT) has become a popular treatment method among these critically ill patients. But there are multiple complications in implementing this treatment, including discrepancies in practiced and prescribed fluid removal, possibly related to the heterogeneity among these patients. With mixture modeling there have been several techniques in detecting heterogeneity with their specific limitations. In this dissertation a novel nonparametric ‘d test’ will be used to detect heterogeneity among CRRT patients in ICU. Along with heterogeneity detection, this dissertation will also seek to understand ongoing issues with fluid removal and discrepancy in treatment implementations

    Machine learning and Monte Carlo based data analysis methods in cosmic dust research

    Get PDF
    This work applies miscellaneous algorithms from the fields Machine Learning and Computational Numerics on the research field Cosmic Dust. The task is to determine the scientific and technical potential of using different methods. Here, the methods are applied on two different projects: the meteor camera system Canary Island Long-Baseline Observatory (CILBO) and the Cassini in-situ dust telescope Cosmic-Dust-Analyzer (CDA)

    Neural density estimation and likelihood-free inference

    Get PDF
    I consider two problems in machine learning and statistics: the problem of estimating the joint probability density of a collection of random variables, known as density estimation, and the problem of inferring model parameters when their likelihood is intractable, known as likelihood-free inference. The contribution of the thesis is a set of new methods for addressing these problems that are based on recent advances in neural networks and deep learning. The first part of the thesis is about density estimation. The joint probability density of a collection of random variables is a useful mathematical description of their statistical properties, but can be hard to estimate from data, especially when the number of random variables is large. Traditional density-estimation methods such as histograms or kernel density estimators are effective for a small number of random variables, but scale badly as the number increases. In contrast, models for density estimation based on neural networks scale better with the number of random variables, and can incorporate domain knowledge in their design. My main contribution is Masked Autoregressive Flow, a new model for density estimation based on a bijective neural network that transforms random noise to data. At the time of its introduction, Masked Autoregressive Flow achieved state-of-the-art results in general-purpose density estimation. Since its publication, Masked Autoregressive Flow has contributed to the broader understanding of neural density estimation, and has influenced subsequent developments in the field. The second part of the thesis is about likelihood-free inference. Typically, a statistical model can be specified either as a likelihood function that describes the statistical relationship between model parameters and data, or as a simulator that can be run forward to generate data. Specifying a statistical model as a simulator can offer greater modelling flexibility and can produce more interpretable models, but can also make inference of model parameters harder, as the likelihood of the parameters may no longer be tractable. Traditional techniques for likelihood-free inference such as approximate Bayesian computation rely on simulating data from the model, but often require a large number of simulations to produce accurate results. In this thesis, I cast the problem of likelihood-free inference as a density-estimation problem, and address it with neural density models. My main contribution is the introduction of two new methods for likelihood-free inference: Sequential Neural Posterior Estimation (Type A), which estimates the posterior, and Sequential Neural Likelihood, which estimates the likelihood. Both methods use a neural density model to estimate the posterior/likelihood, and a sequential training procedure to guide simulations. My experiments show that the proposed methods produce accurate results, and are often orders of magnitude faster than alternative methods based on approximate Bayesian computation

    On the Formation and Economic Implications of Subjective Beliefs and Individual Preferences

    Get PDF
    The conceptual framework of neoclassical economics posits that individual decision-making processes can be represented as maximization of some objective function. In this framework, people's goals and desires are expressed through the means of preferences over outcomes; in addition, in choosing according to these objectives, people employ subjective beliefs about the likelihood of unknown states of the world. For instance, in the subjective expected utility paradigm, people linearly combine their probabilistic beliefs and preferences over outcomes to form an expected utility function. Much of the parsimony and power of theoretical economic analysis stems from the striking generality and simplicity of this framework. At the same time, the crucial importance of preferences and beliefs in our conceptual apparatus in combination with the heterogeneity in choice behavior that is observed across many economic contexts raises a number of empirical questions. For example, how much heterogeneity do we observe in core preference or belief dimensions that are relevant for a broad range of economic behaviors? If such preferences and beliefs exhibit heterogeneity, then what are the origins of this heterogeneity? How do beliefs and preferences form to begin with? And how does variation in beliefs and preferences translate into economically important heterogeneity in choice behavior? This thesis is organized around these broad questions and hence seeks to contribute to the goal of providing an improved empirical understanding of the foundations and economic implications of individual decision-making processes. The content of this work reflects the deep belief that understanding and conceptualizing decision-making requires economists to embrace ideas from a broad range of fields. Accordingly, this thesis draws insights and techniques from the literatures on behavioral and experimental economics, cultural economics, household finance, comparative development, cognitive psychology, and anthropology. Chapters 1 through 3 combine methods from experimental economics, household finance, and cognitive psychology to investigate the effects of bounded rationality on the formation and explanatory power of subjective beliefs. Chapters 4 through 6 use tools from cultural economics, anthropology, and comparative development to study the cross-country variation in economic preferences as well as its origins and implications. The formation of beliefs about payoff-relevant states of the world crucially hinges on an adequate processing of incoming information. However, oftentimes, the information people receive is rather complex in nature. Chapters 1 and 2 investigate how boundedly rational people form beliefs when their information is subject to sampling biases, i.e., when the information pieces people receive are either not mutually independent or systematically selected. Chapter 1 is motivated by Akerlof and Shiller's popular narrative that from time to time some individuals or even entire markets undergo excessive belief swings, which refers to the idea that sometimes people are overly optimistic and sometimes overly pessimistic over, say, the future development of the stock market. In particular, Akerlof and Shiller argue that such "exuberance" or excessive pessimism might be driven by the pervasive "telling and re-telling of stories". In fact, many real information structures such as the news media generate correlated rather than mutually independent signals, and hence give rise to severe double-counting problems. However, clean evidence on how people form beliefs in correlated information environments is missing. Chapter 1, which is joint work with Florian Zimmermann, provides clean experimental evidence that many people neglect such double-counting problems in the updating process, so that beliefs are excessively sensitive to well-connected information sources and follow an overshooting pattern. In addition, in an experimental asset market, correlation neglect not only drives overoptimism and overpessimism at the individual level, but also gives rise to a predictable pattern of over- and underpricing. Finally, investigating the mechanisms underlying the strong heterogeneity in the presence of the bias, a series of treatment manipulations reveals that many people struggle with identifying double-counting problems in the first place, so that exogenous shifts in subjects' focus have large effects on beliefs. Chapter 2 takes as starting point the big public debate about increased political polarization in the United States, which refers to the fact that political beliefs tend to drift apart over time across social and political groups. Popular narratives by, e.g., Sunstein, Bishop, and Pariser posit that such polarization is driven by people selecting into environments in which they are predominantly exposed to information that confirms their prior beliefs. This pattern introduces a selection problem into the belief formation process, which may result in polarization if people failed to take the non-representativeness among their signals into account. However, again, we do not have meaningful evidence on how people actually form beliefs in such "homophilous" environments. Thus, Chapter 2 shows experimentally that many people do not take into account how their own prior decisions shape their informational environment, but rather largely base their views on their local information sample. In consequence, beliefs excessively depend on people's priors and tend to be too extreme, akin to the concerns about "echo chambers" driving irrational belief polarization across social groups. Strikingly, the distribution of individuals' naivete follows a pronounced bimodal structure - people either fully account for the selection problem or do not adjust for it at all. Allowing for interaction between these heterogeneous updating types induces little learning: neither the endogenous acquisition of advice nor exogenously induced dissent lead to a convergence of beliefs across types, suggesting that the belief heterogeneity induced by selected information may persist over time. Finally, the paper provides evidence that selection neglect is conceptually closely related to correlation neglect in that both cognitive biases appear to be driven by selective attentional patterns. Taken together, chapters 1 and 2 show that many people struggle with processing information that is subject to sampling issues. What is more, the chapters also show that these biases might share common cognitive foundations, hence providing hope for a unified attention-based theory of boundedly rational belief formation. While laboratory experimental techniques are a great tool to study the formation of beliefs, they cannot shed light on the relationship between beliefs and economically important choices. In essentially all economic models, beliefs mechanically map into choice behavior. However, it is not evident that people's beliefs play the same role in generating observed behavior across heterogeneous individuals: while some people's decision process might be well-approximated by the belief and preference-driven choice rules envisioned by economic models, other people might use, e.g., simple rules of thumb instead, implying that their beliefs should be largely irrelevant for their choices. That is, bounded rationality might not only affect the formation of beliefs, but also the mapping from beliefs to choices. In Chapter 3, Tilman Drerup, Hans-Martin von Gaudecker, and I take up this conjecture in the context of measurement error problems in household finance: while subjective expectations are important primitives in models of portfolio choice, their direct measurement often yields imprecise and inconsistent measures, which is typically treated as a pure measurement error problem. In contrast to this perspective, we argue that individual-level variation in the precision of subjective expectations measures can actually be productively exploited to gain insights into whether economic models of portfolio choice provide an adequate representation of individual decision processes. Using a novel dataset on experimentally measured subjective stock market expectations and real stock market decisions collected from a large probability sample of the Dutch population, we estimate a semiparametric double index model to explore this conjecture. Our results show that investment decisions exhibit little variation in economic model primitives when individuals provide error-ridden belief statements. In contrast, they predict strong variation in investment decisions for individuals who report precise expectation measures. These findings indicate that the degree of precision in expectations data provides useful information to uncover heterogeneity in choice behavior, and that boundedly rational beliefs need not necessarily map into irrational choices. In the standard neoclassical framework, people's beliefs only serve the purpose of achieving a given set of goals. In many applications of economic interest, these goals are well-characterized by a small set of preferences, i.e., risk aversion, patience, and social preferences. Prior research has shown that these preferences vary systematically in the population, and that they are broadly predictive of those behaviors economic theory supposes them to. At the same time, this empirical evidence stems from often fairly special samples in a given country, hence precluding an analysis of how general the variation and predictive power in preferences is across cultural, economic, and institutional backgrounds. In addition, it is conceivable that preferences vary not just at an individual level, but also across entire populations - if so, what are the deep historical or cultural origins of this variation, and what are its (aggregate) economic implications? Chapters 4 through 6 take up these questions by presenting and analyzing the Global Preference Survey (GPS), a novel globally representative dataset on risk and time preferences, positive and negative reciprocity, altruism, and trust for 80,000 individuals, drawn as representative samples from 76 countries around the world, representing 90 percent of both the world's population and global income. In joint work with Armin Falk, Anke Becker, Thomas Dohmen, David Huffman, and Uwe Sunde, Chapter 4 presents the GPS data and shows that the global distribution of preferences exhibits substantial variation across countries, which is partly systematic: certain preferences appear in combination, and follow distinct economic, institutional, and geographic patterns. The heterogeneity in preferences across individuals is even more pronounced and varies systematically with age, gender, and cognitive ability. Around the world, the preference measures are predictive of a wide range of individual-level behaviors including savings and schooling decisions, labor market and health choices, prosocial behaviors, and family structure. We also shed light on the cultural origins of preference variation around the globe using data on language structure. The magnitude of the cross-country variation in preferences is striking and raises the immediate question of what brought it about. Chapter 5 presents joint work with Anke Becker and Armin Falk in which we use the GPS to show that the migratory movements of our early ancestors thousands of years ago have left a footprint in the contemporary cross-country distributions of preferences over risk and social interactions. Across a wide range of regression specifications, differences in preferences between populations are significantly increasing in the length of time elapsed since the respective groups shared common ancestors. This result obtains for risk aversion, altruism, positive reciprocity, and trust, and holds for various proxies for the structure and timing of historical population breakups, including genetic and linguistic data or predicted measures of migratory distance. In addition, country-level preference endowments are non-linearly associated with migratory distance from East Africa, i.e., genetic diversity. In combination with the relationships between language structure and preferences established in Chapter 4, these results point to the importance of very long-run events for understanding the global distribution of some of the key economic traits. Given these findings on the very deep roots of the cross-country variation in preferences, an interesting - and conceptually different - question is whether such country-level preference profiles might have systematic aggregate economic implications. Indeed, according to standard dynamic choice theories, patience is a key driving factor behind the accumulation of productive resources and hence ultimately of income not just at an individual, but also at a macroeconomic level. Using the GPS data on patience, Chapter 6 (joint work with Thomas Dohmen, Armin Falk, David Huffman, and Uwe Sunde) investigates the empirical relevance of this hypothesis in the context of a micro-founded development framework. Around the world, patient people invest more into human and physical capital and have higher incomes. At the macroeconomic level, we establish a significant reduced-form relationship between patience and contemporary income as well as medium- and long-run growth rates, with patience explaining a substantial fraction of development differences across countries and subnational regions. In line with a conceptual framework in which patience drives income through the accumulation of productive resources, average patience also strongly correlates with aggregate human and physical capital accumulation as well as investments into productivity. Taken together, this thesis has a number of unifying themes and insights. First, consistent with the vast heterogeneity in observed choices, people exhibit a large amount of variation in beliefs and preferences, and in how they combine these into choice rules. Second, at least part of this heterogeneity is systematic and has identifyable sources: preferences over risk, time, and social interactions appear to have very deep historical or cultural origins, but also systematically vary with individual characteristics; belief heterogeneity, on the other hand, is partly driven by bounded rationality and its systematic, predictable effects on information-processing. Third, and finally, this heterogeneity in beliefs and preferences is likely to have real economic implications: across cultural and institutional backgrounds, preferences correlate with the types of behaviors that economic models envision them to, not just across individuals, but also at the macroeconomic level; subjective beliefs are predictive of behavior, too, albeit with the twist that certain subgroups of the population do not appear to entertain stable belief distributions to begin with. In sum, (I believe that) much insight is to be gained from further exploring these fascinating topics

    When is it Our Time?: An Event History Model of Lesbian, Gay, and Bisexual Rights Policy Adoption

    Get PDF
    Gays and lesbians have long struggled for their rights as citizens, yet only recently has their struggle been truly politicized in a way that fosters mobilization. When and why social movements coalesce despite the many obstacles to collective action are fundamental questions in comparative politics. While examining social movements is worthwhile, it is important to examine not only when and why a social movement forms, but also when and why a social movement is successful. This dissertation tackles the latter of these objectives, focusing on when and why social movements have success in terms of their duration from the time of their formation until their desired policy output is produced
    • …
    corecore