789 research outputs found

    Scaling Nonparametric Bayesian Inference via Subsample-Annealing

    Full text link
    We describe an adaptation of the simulated annealing algorithm to nonparametric clustering and related probabilistic models. This new algorithm learns nonparametric latent structure over a growing and constantly churning subsample of training data, where the portion of data subsampled can be interpreted as the inverse temperature beta(t) in an annealing schedule. Gibbs sampling at high temperature (i.e., with a very small subsample) can more quickly explore sketches of the final latent state by (a) making longer jumps around latent space (as in block Gibbs) and (b) lowering energy barriers (as in simulated annealing). We prove subsample annealing speeds up mixing time N^2 -> N in a simple clustering model and exp(N) -> N in another class of models, where N is data size. Empirically subsample-annealing outperforms naive Gibbs sampling in accuracy-per-wallclock time, and can scale to larger datasets and deeper hierarchical models. We demonstrate improved inference on million-row subsamples of US Census data and network log data and a 307-row hospital rating dataset, using a Pitman-Yor generalization of the Cross Categorization model.Comment: To appear in AISTATS 201

    Survey- and fishery-derived estimates of Pacific cod (Gadus macrocephalus) biomass: implications for strategies to reduce interactions between groundfish fisheries and Steller sea lions (Eumetopias jubatus)

    Get PDF
    Survey- and fishery-derived biomass estimates have indicated that the harvest indices for Pacific cod (Gadus macrocephalus) within a portion of Steller sea lion (Eumetopias jubatus) critical habitat in February and March 2001 were five to 16 times greater than the annual rate for the entire Bering Sea-Aleutian Islands stock. A bottom trawl survey yielded a cod biomass estimate of 49,032 metric tons (t) for the entire area surveyed, of which less than half (23,329 t) was located within the area used primarily by the commercial fishery, which caught 11,631 t of Pacific cod. Leslie depletion analyses of fishery data yielded biomass estimates of approximately 14,500 t (95% confidence intervals of approximately 9,000–25,000 t), which are within the 95% confidence interval on the fished area survey estimate (12,846–33,812 t). These data indicate that Leslie analyses may be useful in estimating local fish biomass and harvest indices for certain marine fisheries that are well constrained spatially and relatively short in duration (weeks). In addition, fishery effects on prey availability within the time and space scales relevant to foraging sea lions may be much greater than the effects indicated by annual harvest rates estimated from stock assessments averaged across the range of the target spe

    Using Data Visualization to Inform Machine Learning Approaches

    Get PDF
    Machine learning with big data is a complicated task to tackle. Using data visualizations, one can find trends, anomalies, and patterns to help select the appropriate approach to the problem in machine learning. Using 2D visualizations, we’ve displayed flight data on interactive maps, visualizing density and property changes in an area. We’ve also used frequency histograms to view the quantitative properties of each point to look for trends. Using scatterplots, anomalies in data collection were found. Other plots confirmed previously found trends and initial thoughts about the data. These visualizations helped inform a machine learning approach to our problem and avoided major pitfalls further down the road

    The Internship: Bridge Between Marketplace and Liberal Arts Education in the Catholic Tradition

    Get PDF
    Internships can be distinctive pedagogical opportunities within a Catholic liberal arts education. The applied marketplace experience provided by an internship, properly understood, is consistent with the Catholic understanding of education. The value of internships for Catholic higher education can be illustrated by focusing on communication and rhetorical studies. This essay consists of a selected review of literature situating internships within liberal arts education, followed by the articulation of a Thomistic framework for rhetorical education

    AnimalQTLdb: a livestock QTL database tool set for positional QTL information mining and beyond

    Get PDF
    The Animal Quantitative Trait Loci (QTL) database (AnimalQTLdb) is designed to house all publicly available QTL data on livestock animal species from which researchers can easily locate and compare QTL within species. The database tools are also added to link the QTL data to other types of genomic information, such as radiation hybrid (RH) maps, finger printed contig (FPC) physical maps, linkage maps and comparative maps to the human genome, etc. Currently, this database contains data on 1287 pig, 630 cattle and 657 chicken QTL, which are dynamically linked to respective RH, FPC and human comparative maps. We plan to apply the tool to other animal species, and add more structural genome information for alignment, in an attempt to aid comparative structural genome studies ()

    Pareto superior dimension of rotating savings and credit associations (ROSCAs) in Ghana: Evidence from Asunafo North Municipality of Ghana

    Get PDF
    Abstract. This study investigates characteristics of Rotating savings and credit associations (ROSCAs) participants who join the association due to its Pareto superior allocation in Ghana. Some scholars like Dejene and Van den Brink have hypothesized that people join ROSCAs because of its Pareto superior allocation. The study employed primary data analysis in achieving its main objective.  Out of the 400 ROSCA participants sampled for the study from Asunafo North Municipality of Ghana, 71.75% joined the association because of its Pareto superior allocation. A Probit model was used to predict the probability of joining the association due to its superior allocation. The dependent variable took the value of one when respondents join the association due to its superior allocation and zero if otherwise. Married participants, participants with no or low level of education, participants who are unemployed and participants who save more of their income are more likely to join ROSCA due to its Pareto superior allocation. It was recommended that that ROSCA participants who have access (participants living in urban sectors) to formal financial institutions, participants who have accounts at formal financial institutions, participants who are rich and aged participants should be educated on how ROSCA constitutes its Pareto superior allocation.Keywords. Pareto Superiority, Rotating Savings and Credit Association.JEL. B26, D53, E44, G10, G34
    • …
    corecore