911 research outputs found
Scaling Nonparametric Bayesian Inference via Subsample-Annealing
We describe an adaptation of the simulated annealing algorithm to
nonparametric clustering and related probabilistic models. This new algorithm
learns nonparametric latent structure over a growing and constantly churning
subsample of training data, where the portion of data subsampled can be
interpreted as the inverse temperature beta(t) in an annealing schedule. Gibbs
sampling at high temperature (i.e., with a very small subsample) can more
quickly explore sketches of the final latent state by (a) making longer jumps
around latent space (as in block Gibbs) and (b) lowering energy barriers (as in
simulated annealing). We prove subsample annealing speeds up mixing time N^2 ->
N in a simple clustering model and exp(N) -> N in another class of models,
where N is data size. Empirically subsample-annealing outperforms naive Gibbs
sampling in accuracy-per-wallclock time, and can scale to larger datasets and
deeper hierarchical models. We demonstrate improved inference on million-row
subsamples of US Census data and network log data and a 307-row hospital rating
dataset, using a Pitman-Yor generalization of the Cross Categorization model.Comment: To appear in AISTATS 201
Survey- and fishery-derived estimates of Pacific cod (Gadus macrocephalus) biomass: implications for strategies to reduce interactions between groundfish fisheries and Steller sea lions (Eumetopias jubatus)
Survey- and fishery-derived biomass estimates have
indicated that the harvest indices for Pacific cod (Gadus macrocephalus) within a portion of Steller sea lion (Eumetopias jubatus) critical habitat in February and March 2001 were five to 16 times greater than the annual rate for the entire Bering Sea-Aleutian Islands stock. A bottom
trawl survey yielded a cod biomass estimate of 49,032 metric tons (t) for the entire area surveyed, of which
less than half (23,329 t) was located within the area used primarily by the commercial fishery, which caught 11,631 t of Pacific cod. Leslie depletion analyses of fishery data yielded biomass estimates of approximately 14,500 t (95% confidence intervals of approximately 9,000–25,000 t), which
are within the 95% confidence interval on the fished area survey estimate (12,846–33,812 t). These data indicate
that Leslie analyses may be useful in estimating local fish biomass and harvest indices for certain marine fisheries that are well constrained spatially and relatively short in duration (weeks). In addition, fishery effects on prey availability within the time and space scales relevant
to foraging sea lions may be much greater than the effects indicated by annual harvest rates estimated from stock assessments averaged across the range of the target spe
Relational database models and other software and their importance in data analysis, storage, and communication
The integration of computer technology into research is a continually evolving process. There are many different areas of computer technology. The two main areas that will be discussed here are computer software and databases. Both computer software and databases have multiple languages from which to choose when implementing these technologies. In the current project, the languages used for computer software were the programming language Java and the scripting language PHP. The software package used for the database was MySQL. The program written in Java was a Graphical User Interface (G.U.I.) used to visualize files formatted in the CAP3 format. The PHP script was used to create a website. The website was created to be the interface for connecting to and working with the database. MySQL was used to create the database for the Porcine Reproductive and Respiratory Syndrome (PRRS) Host Genomic Consortium. This database was designed to hold data generated by the Big Pig Project as well as data that will be generated by a new project. Because the data from the Big Pig Project are housed in one database, it is relatively easier to create the file used for statistical analysis. Analysis was done on viral and cytokine, interleukin (IL) 8, IL1b, and interferon gamma (IFNγ) levels and persistence of the Porcine Reproductive and Respiratory Syndrome virus (PRRSv). Results from this analysis indicate that the interaction of cytokines together have an effect on persistency of viral infection or vice versa. Further analysis of the data indicated that swine leukocyte antigen (SLA) genes were associated with cytokine (IL8, IL1b, and IFNγ) response in swine to infection with the PRRS virus. These analyses and results represent an example of using databases and computer software. Computer technology and research will continue to evolve and the integration of both will continue to grow and become a major component of research and allow for new, inventive ways to study and analyze data
Genome-Wide Characterization of the Effects of Nucleic Acid Modifying Enzymes: Cytidine Deaminases and DNA Methylation
Activation-induced cytidine deaminase (AID) is essential for two processes of immunoglobulin diversification in germinal center B cells: somatic hypermutation (SHM), in which mutations are introduced into immunoglobulin (Ig) genes, and class-switch recombination (CSR), in which genomic constant regions are recombined to encode antibodies of different isotypes. Both of these processes require AID-catalyzed C-to-U lesions at the Ig loci, which are resolved to generate point mutations or double-stranded DNA breaks in the cases of SHM and CSR, respectively. Despite over a decade of intense study, a number of open issues remain surrounding AID. The diversity of findings regarding AID’s role in DNA demethylation raises the question of the scope of its involvement in this process. Additionally, while it is clear that AID-mediated damage occurs, the effects of this damage on the average B cell have not been characterized. Finally, the issue of whether AID is able to edit RNA in vivo has never been rigorously addressed in the literature. In each of these cases, the advent of high-throughput sequencing provides methods for genome-wide characterization of AID’s effects. This thesis presents the application of a number of genome-scale, sequencing-based methods to characterize the effects of AID deficiency and overexpression on the activated B cell: mRNA-Seq and miRNA-Seq allow for measurements of RNA expression and editing, while reduced-representation bisulfite sequencing (RRBS) assays DNA methylation. These analyses confirmed AID’s known role in immunoglobulin isotype switching, while also demonstrating that it has little other effect on gene expression. Additionally, no evidence of AID-dependent mRNA or miRNA editing could be detected. Finally, RRBS data failed to support a role for AID in the regulation of DNA methylation. Thus, despite evidence of its additional activities in other systems, antibody diversification appears to be AID’s sole physiological function in activated B cells. Following the conclusion of my studies of AID’s effects in B cells, I applied similar genomics tools to two amenable topics in nucleic acid modifications. First, I used mRNA-Seq to attempt to determine the substrate of the orphan cytidine deaminase Apolipoprotein B mRNA-editing enzyme, catalytic polypeptide 2 (APOBEC2). Next, I used whole-genome bisulfite sequencing to explore the distribution of 5-methylcytosine in Trypanosoma brucei. In both of these cases, results were inconclusive but suggest future directions for investigation
Waddle - Always-canonical Intermediate Representation
Program transformations that are able to rely on the presence of canonical properties of the program undergoing optimization can be written to be more robust and efficient than an equivalent but generalized transformation that also handles non-canonical programs. If a canonical property is required but broken earlier in an earlier transformation, it must be rebuilt (often from scratch). This additional work can be a dominating factor in compilation time when many transformations are applied over large programs. This dissertation introduces a methodology for constructing program transformations so that the program remains in an always-canonical form as the program is mutated, making only local changes to restore broken properties
Using Data Visualization to Inform Machine Learning Approaches
Machine learning with big data is a complicated task to tackle. Using data visualizations, one can find trends, anomalies, and patterns to help select the appropriate approach to the problem in machine learning. Using 2D visualizations, we’ve displayed flight data on interactive maps, visualizing density and property changes in an area. We’ve also used frequency histograms to view the quantitative properties of each point to look for trends. Using scatterplots, anomalies in data collection were found. Other plots confirmed previously found trends and initial thoughts about the data. These visualizations helped inform a machine learning approach to our problem and avoided major pitfalls further down the road
The Internship: Bridge Between Marketplace and Liberal Arts Education in the Catholic Tradition
Internships can be distinctive pedagogical opportunities within a Catholic liberal arts education. The applied marketplace experience provided by an internship, properly understood, is consistent with the Catholic understanding of education. The value of internships for Catholic higher education can be illustrated by focusing on communication and rhetorical studies. This essay consists of a selected review of literature situating internships within liberal arts education, followed by the articulation of a Thomistic framework for rhetorical education
Crystalline optical cavity at 4 K with thermal noise limited instability and ultralow drift
Crystalline optical cavities are the foundation of today's state-of-the-art
ultrastable lasers. Building on our previous silicon cavity effort, we now
achieve the fundamental thermal noise-limited stability for a 6 cm long silicon
cavity cooled to 4 Kelvin, reaching from 0.8 to 80 seconds.
We also report for the first time a clear linear dependence of the cavity
frequency drift on the incident optical power. The lowest fractional frequency
drift of /s is attained at a transmitted power of 40 nW, with
an extrapolated drift approaching zero in the absence of optical power. These
demonstrations provide a promising direction to reach a new performance domain
for stable lasers, with stability better than and fractional
linear drift below /s
AnimalQTLdb: a livestock QTL database tool set for positional QTL information mining and beyond
The Animal Quantitative Trait Loci (QTL) database (AnimalQTLdb) is designed to house all publicly available QTL data on livestock animal species from which researchers can easily locate and compare QTL within species. The database tools are also added to link the QTL data to other types of genomic information, such as radiation hybrid (RH) maps, finger printed contig (FPC) physical maps, linkage maps and comparative maps to the human genome, etc. Currently, this database contains data on 1287 pig, 630 cattle and 657 chicken QTL, which are dynamically linked to respective RH, FPC and human comparative maps. We plan to apply the tool to other animal species, and add more structural genome information for alignment, in an attempt to aid comparative structural genome studies ()
- …