289 research outputs found

    The Tessera D&R computational environment: Designed experiments for R-Hadoop performance and Bitcoin analysis

    Get PDF
    D&R is a statistical framework for the analysis of large complex data that enables feasible and practical analysis of large complex data. The analyst selects a division method to divide the data into subsets, applies an analytic method of the analysis to each subset independently with no communication among subsets, selects a recombination method that is applied to the outputs across subsets to form a result of the analytic method for the entire data. The computational tasking of D&R is nearly embarrassingly parallel, so D&R can readily exploit distributed, parallel computational environments, such as our D&R computational environment, Tessera.^ In the first part of this dissertation, I present a study of the performance of the Tessera D&R computational environment through designed experiments. ^ The base of the D&R computational environment is RHIPE, the R and Hadoop Integrated Programming Environment. R is a widely used interactive language for data analysis. Hadoop is a distributed, parallel computational environment consisting of a distributed file system (HDFS) and distributed compute engine (MapReduce). RHIPE is a merger of R and Hadoop.^ The D&R framework enables a fast embarrassingly parallel computation on a cluster for large complex data that can lead to a small computational elapsed times for the applications analytic methods to all of the data. However, the time depends on many factors. The system we study is very complex and the effects of factors are complex. There are interactions, but not well understood. So we run a full factorial experiment with replicates to enable an understanding. ^ In the second part of this dissertation, I present an analysis of the Bitcoin transaction data utilizing the Tessera D&R computational environment. ^ Bitcoin is a de-centralized digital currency system. There is no central authority in the Bitcoin system to issue new money, or validate the transfer of money; both of these tasks are accomplished through the joint work of participants in the Bitcoin network. In the past two years, the Bitcoin system has become very popular, mostly due to its ease of use and embedded anonymity in the system.^ The ease of use of Bitcoin is straightforward. The anonymity of the Bitcoin system, on the other hand, is rather debatable and has drawn much attention in its user community as well as the research community. We admit that a certain level of anonymity exists in the Bitcoin system, but it might not be as invulnerable as one would hope. For one thing, the entire history of Bitcoin transactions is publicly available, which provides an opportunity for passive analysis of Bitcoin usage such as ours.^ I present here a study of the general statistical properties of the usage of Bitcoin transactions and the usage of Bitcoin addresses. We have also built profiles for a few groups of popular addresses among which the addresses share similar behavior. Furthermore, we provide a passive analysis of the anonymity of Bitcoin system by proposing a classification model to identify payment and change in majority of the Bitcoin transactions

    Infinitely many new solutions for singularly perturbed Schr\"{o}dinger equations

    Full text link
    This paper deals with the existence of solutions for the following perturbed Schr\"{o}dinger equation \begin{equation*} -\varepsilon^{2} \Delta u + V(x)u= |u|^{p-2}u, \, \, \text{ in } \, \, \r^{N}, \end{equation*} where ε\varepsilon is a parameter, N3N \geq 3, p(2,2NN2)p \in (2, \frac{2N}{N-2}), and V(x)V(x) is a potential function in \r^{N}. We demonstrate an interesting ``dichotomy'' phenomenon for concentrating solutions of the above Schr\"{o}dinger equation. More specifically, we construct infinitely many new solutions with peaks locating both in the bounded domain and near infinity, which fulfills the profile of the concentration compactness. Moreover, this approach can be extended to solve other related problems

    AE-GPT: Using Large Language Models to Extract Adverse Events from Surveillance Reports-A Use Case with Influenza Vaccine Adverse Events

    Full text link
    Though Vaccines are instrumental in global health, mitigating infectious diseases and pandemic outbreaks, they can occasionally lead to adverse events (AEs). Recently, Large Language Models (LLMs) have shown promise in effectively identifying and cataloging AEs within clinical reports. Utilizing data from the Vaccine Adverse Event Reporting System (VAERS) from 1990 to 2016, this study particularly focuses on AEs to evaluate LLMs' capability for AE extraction. A variety of prevalent LLMs, including GPT-2, GPT-3 variants, GPT-4, and Llama 2, were evaluated using Influenza vaccine as a use case. The fine-tuned GPT 3.5 model (AE-GPT) stood out with a 0.704 averaged micro F1 score for strict match and 0.816 for relaxed match. The encouraging performance of the AE-GPT underscores LLMs' potential in processing medical data, indicating a significant stride towards advanced AE detection, thus presumably generalizable to other AE extraction tasks

    Few-shot Image Generation Using Discrete Content Representation

    Full text link
    Few-shot image generation and few-shot image translation are two related tasks, both of which aim to generate new images for an unseen category with only a few images. In this work, we make the first attempt to adapt few-shot image translation method to few-shot image generation task. Few-shot image translation disentangles an image into style vector and content map. An unseen style vector can be combined with different seen content maps to produce different images. However, it needs to store seen images to provide content maps and the unseen style vector may be incompatible with seen content maps. To adapt it to few-shot image generation task, we learn a compact dictionary of local content vectors via quantizing continuous content maps into discrete content maps instead of storing seen images. Furthermore, we model the autoregressive distribution of discrete content map conditioned on style vector, which can alleviate the incompatibility between content map and style vector. Qualitative and quantitative results on three real datasets demonstrate that our model can produce images of higher diversity and fidelity for unseen categories than previous methods.Comment: This paper is accepted by ACM MM 202
    corecore