911 research outputs found

    Scaling Nonparametric Bayesian Inference via Subsample-Annealing

    Full text link
    We describe an adaptation of the simulated annealing algorithm to nonparametric clustering and related probabilistic models. This new algorithm learns nonparametric latent structure over a growing and constantly churning subsample of training data, where the portion of data subsampled can be interpreted as the inverse temperature beta(t) in an annealing schedule. Gibbs sampling at high temperature (i.e., with a very small subsample) can more quickly explore sketches of the final latent state by (a) making longer jumps around latent space (as in block Gibbs) and (b) lowering energy barriers (as in simulated annealing). We prove subsample annealing speeds up mixing time N^2 -> N in a simple clustering model and exp(N) -> N in another class of models, where N is data size. Empirically subsample-annealing outperforms naive Gibbs sampling in accuracy-per-wallclock time, and can scale to larger datasets and deeper hierarchical models. We demonstrate improved inference on million-row subsamples of US Census data and network log data and a 307-row hospital rating dataset, using a Pitman-Yor generalization of the Cross Categorization model.Comment: To appear in AISTATS 201

    Survey- and fishery-derived estimates of Pacific cod (Gadus macrocephalus) biomass: implications for strategies to reduce interactions between groundfish fisheries and Steller sea lions (Eumetopias jubatus)

    Get PDF
    Survey- and fishery-derived biomass estimates have indicated that the harvest indices for Pacific cod (Gadus macrocephalus) within a portion of Steller sea lion (Eumetopias jubatus) critical habitat in February and March 2001 were five to 16 times greater than the annual rate for the entire Bering Sea-Aleutian Islands stock. A bottom trawl survey yielded a cod biomass estimate of 49,032 metric tons (t) for the entire area surveyed, of which less than half (23,329 t) was located within the area used primarily by the commercial fishery, which caught 11,631 t of Pacific cod. Leslie depletion analyses of fishery data yielded biomass estimates of approximately 14,500 t (95% confidence intervals of approximately 9,000–25,000 t), which are within the 95% confidence interval on the fished area survey estimate (12,846–33,812 t). These data indicate that Leslie analyses may be useful in estimating local fish biomass and harvest indices for certain marine fisheries that are well constrained spatially and relatively short in duration (weeks). In addition, fishery effects on prey availability within the time and space scales relevant to foraging sea lions may be much greater than the effects indicated by annual harvest rates estimated from stock assessments averaged across the range of the target spe

    Relational database models and other software and their importance in data analysis, storage, and communication

    Get PDF
    The integration of computer technology into research is a continually evolving process. There are many different areas of computer technology. The two main areas that will be discussed here are computer software and databases. Both computer software and databases have multiple languages from which to choose when implementing these technologies. In the current project, the languages used for computer software were the programming language Java and the scripting language PHP. The software package used for the database was MySQL. The program written in Java was a Graphical User Interface (G.U.I.) used to visualize files formatted in the CAP3 format. The PHP script was used to create a website. The website was created to be the interface for connecting to and working with the database. MySQL was used to create the database for the Porcine Reproductive and Respiratory Syndrome (PRRS) Host Genomic Consortium. This database was designed to hold data generated by the Big Pig Project as well as data that will be generated by a new project. Because the data from the Big Pig Project are housed in one database, it is relatively easier to create the file used for statistical analysis. Analysis was done on viral and cytokine, interleukin (IL) 8, IL1b, and interferon gamma (IFNγ) levels and persistence of the Porcine Reproductive and Respiratory Syndrome virus (PRRSv). Results from this analysis indicate that the interaction of cytokines together have an effect on persistency of viral infection or vice versa. Further analysis of the data indicated that swine leukocyte antigen (SLA) genes were associated with cytokine (IL8, IL1b, and IFNγ) response in swine to infection with the PRRS virus. These analyses and results represent an example of using databases and computer software. Computer technology and research will continue to evolve and the integration of both will continue to grow and become a major component of research and allow for new, inventive ways to study and analyze data

    Genome-Wide Characterization of the Effects of Nucleic Acid Modifying Enzymes: Cytidine Deaminases and DNA Methylation

    Get PDF
    Activation-induced cytidine deaminase (AID) is essential for two processes of immunoglobulin diversification in germinal center B cells: somatic hypermutation (SHM), in which mutations are introduced into immunoglobulin (Ig) genes, and class-switch recombination (CSR), in which genomic constant regions are recombined to encode antibodies of different isotypes. Both of these processes require AID-catalyzed C-to-U lesions at the Ig loci, which are resolved to generate point mutations or double-stranded DNA breaks in the cases of SHM and CSR, respectively. Despite over a decade of intense study, a number of open issues remain surrounding AID. The diversity of findings regarding AID’s role in DNA demethylation raises the question of the scope of its involvement in this process. Additionally, while it is clear that AID-mediated damage occurs, the effects of this damage on the average B cell have not been characterized. Finally, the issue of whether AID is able to edit RNA in vivo has never been rigorously addressed in the literature. In each of these cases, the advent of high-throughput sequencing provides methods for genome-wide characterization of AID’s effects. This thesis presents the application of a number of genome-scale, sequencing-based methods to characterize the effects of AID deficiency and overexpression on the activated B cell: mRNA-Seq and miRNA-Seq allow for measurements of RNA expression and editing, while reduced-representation bisulfite sequencing (RRBS) assays DNA methylation. These analyses confirmed AID’s known role in immunoglobulin isotype switching, while also demonstrating that it has little other effect on gene expression. Additionally, no evidence of AID-dependent mRNA or miRNA editing could be detected. Finally, RRBS data failed to support a role for AID in the regulation of DNA methylation. Thus, despite evidence of its additional activities in other systems, antibody diversification appears to be AID’s sole physiological function in activated B cells. Following the conclusion of my studies of AID’s effects in B cells, I applied similar genomics tools to two amenable topics in nucleic acid modifications. First, I used mRNA-Seq to attempt to determine the substrate of the orphan cytidine deaminase Apolipoprotein B mRNA-editing enzyme, catalytic polypeptide 2 (APOBEC2). Next, I used whole-genome bisulfite sequencing to explore the distribution of 5-methylcytosine in Trypanosoma brucei. In both of these cases, results were inconclusive but suggest future directions for investigation

    Waddle - Always-canonical Intermediate Representation

    Get PDF
    Program transformations that are able to rely on the presence of canonical properties of the program undergoing optimization can be written to be more robust and efficient than an equivalent but generalized transformation that also handles non-canonical programs. If a canonical property is required but broken earlier in an earlier transformation, it must be rebuilt (often from scratch). This additional work can be a dominating factor in compilation time when many transformations are applied over large programs. This dissertation introduces a methodology for constructing program transformations so that the program remains in an always-canonical form as the program is mutated, making only local changes to restore broken properties

    Using Data Visualization to Inform Machine Learning Approaches

    Get PDF
    Machine learning with big data is a complicated task to tackle. Using data visualizations, one can find trends, anomalies, and patterns to help select the appropriate approach to the problem in machine learning. Using 2D visualizations, we’ve displayed flight data on interactive maps, visualizing density and property changes in an area. We’ve also used frequency histograms to view the quantitative properties of each point to look for trends. Using scatterplots, anomalies in data collection were found. Other plots confirmed previously found trends and initial thoughts about the data. These visualizations helped inform a machine learning approach to our problem and avoided major pitfalls further down the road

    The Internship: Bridge Between Marketplace and Liberal Arts Education in the Catholic Tradition

    Get PDF
    Internships can be distinctive pedagogical opportunities within a Catholic liberal arts education. The applied marketplace experience provided by an internship, properly understood, is consistent with the Catholic understanding of education. The value of internships for Catholic higher education can be illustrated by focusing on communication and rhetorical studies. This essay consists of a selected review of literature situating internships within liberal arts education, followed by the articulation of a Thomistic framework for rhetorical education

    Crystalline optical cavity at 4 K with thermal noise limited instability and ultralow drift

    Get PDF
    Crystalline optical cavities are the foundation of today's state-of-the-art ultrastable lasers. Building on our previous silicon cavity effort, we now achieve the fundamental thermal noise-limited stability for a 6 cm long silicon cavity cooled to 4 Kelvin, reaching 6.5×10−176.5\times10^{-17} from 0.8 to 80 seconds. We also report for the first time a clear linear dependence of the cavity frequency drift on the incident optical power. The lowest fractional frequency drift of −3×10−19-3\times10^{-19}/s is attained at a transmitted power of 40 nW, with an extrapolated drift approaching zero in the absence of optical power. These demonstrations provide a promising direction to reach a new performance domain for stable lasers, with stability better than 1×10−171\times10^{-17} and fractional linear drift below 1×10−191\times10^{-19}/s

    AnimalQTLdb: a livestock QTL database tool set for positional QTL information mining and beyond

    Get PDF
    The Animal Quantitative Trait Loci (QTL) database (AnimalQTLdb) is designed to house all publicly available QTL data on livestock animal species from which researchers can easily locate and compare QTL within species. The database tools are also added to link the QTL data to other types of genomic information, such as radiation hybrid (RH) maps, finger printed contig (FPC) physical maps, linkage maps and comparative maps to the human genome, etc. Currently, this database contains data on 1287 pig, 630 cattle and 657 chicken QTL, which are dynamically linked to respective RH, FPC and human comparative maps. We plan to apply the tool to other animal species, and add more structural genome information for alignment, in an attempt to aid comparative structural genome studies ()
    • …
    corecore