304 research outputs found
The Tessera D&R computational environment: Designed experiments for R-Hadoop performance and Bitcoin analysis
D&R is a statistical framework for the analysis of large complex data that enables feasible and practical analysis of large complex data. The analyst selects a division method to divide the data into subsets, applies an analytic method of the analysis to each subset independently with no communication among subsets, selects a recombination method that is applied to the outputs across subsets to form a result of the analytic method for the entire data. The computational tasking of D&R is nearly embarrassingly parallel, so D&R can readily exploit distributed, parallel computational environments, such as our D&R computational environment, Tessera.^ In the first part of this dissertation, I present a study of the performance of the Tessera D&R computational environment through designed experiments. ^ The base of the D&R computational environment is RHIPE, the R and Hadoop Integrated Programming Environment. R is a widely used interactive language for data analysis. Hadoop is a distributed, parallel computational environment consisting of a distributed file system (HDFS) and distributed compute engine (MapReduce). RHIPE is a merger of R and Hadoop.^ The D&R framework enables a fast embarrassingly parallel computation on a cluster for large complex data that can lead to a small computational elapsed times for the applications analytic methods to all of the data. However, the time depends on many factors. The system we study is very complex and the effects of factors are complex. There are interactions, but not well understood. So we run a full factorial experiment with replicates to enable an understanding. ^ In the second part of this dissertation, I present an analysis of the Bitcoin transaction data utilizing the Tessera D&R computational environment. ^ Bitcoin is a de-centralized digital currency system. There is no central authority in the Bitcoin system to issue new money, or validate the transfer of money; both of these tasks are accomplished through the joint work of participants in the Bitcoin network. In the past two years, the Bitcoin system has become very popular, mostly due to its ease of use and embedded anonymity in the system.^ The ease of use of Bitcoin is straightforward. The anonymity of the Bitcoin system, on the other hand, is rather debatable and has drawn much attention in its user community as well as the research community. We admit that a certain level of anonymity exists in the Bitcoin system, but it might not be as invulnerable as one would hope. For one thing, the entire history of Bitcoin transactions is publicly available, which provides an opportunity for passive analysis of Bitcoin usage such as ours.^ I present here a study of the general statistical properties of the usage of Bitcoin transactions and the usage of Bitcoin addresses. We have also built profiles for a few groups of popular addresses among which the addresses share similar behavior. Furthermore, we provide a passive analysis of the anonymity of Bitcoin system by proposing a classification model to identify payment and change in majority of the Bitcoin transactions
Infinitely many new solutions for singularly perturbed Schr\"{o}dinger equations
This paper deals with the existence of solutions for the following perturbed
Schr\"{o}dinger equation \begin{equation*} -\varepsilon^{2} \Delta u + V(x)u=
|u|^{p-2}u, \, \, \text{ in } \, \, \r^{N}, \end{equation*} where
is a parameter, , , and is a
potential function in \r^{N}. We demonstrate an interesting ``dichotomy''
phenomenon for concentrating solutions of the above Schr\"{o}dinger equation.
More specifically, we construct infinitely many new solutions with peaks
locating both in the bounded domain and near infinity, which fulfills the
profile of the concentration compactness. Moreover, this approach can be
extended to solve other related problems
AE-GPT: Using Large Language Models to Extract Adverse Events from Surveillance Reports-A Use Case with Influenza Vaccine Adverse Events
Though Vaccines are instrumental in global health, mitigating infectious
diseases and pandemic outbreaks, they can occasionally lead to adverse events
(AEs). Recently, Large Language Models (LLMs) have shown promise in effectively
identifying and cataloging AEs within clinical reports. Utilizing data from the
Vaccine Adverse Event Reporting System (VAERS) from 1990 to 2016, this study
particularly focuses on AEs to evaluate LLMs' capability for AE extraction. A
variety of prevalent LLMs, including GPT-2, GPT-3 variants, GPT-4, and Llama 2,
were evaluated using Influenza vaccine as a use case. The fine-tuned GPT 3.5
model (AE-GPT) stood out with a 0.704 averaged micro F1 score for strict match
and 0.816 for relaxed match. The encouraging performance of the AE-GPT
underscores LLMs' potential in processing medical data, indicating a
significant stride towards advanced AE detection, thus presumably generalizable
to other AE extraction tasks
Few-shot Image Generation Using Discrete Content Representation
Few-shot image generation and few-shot image translation are two related
tasks, both of which aim to generate new images for an unseen category with
only a few images. In this work, we make the first attempt to adapt few-shot
image translation method to few-shot image generation task. Few-shot image
translation disentangles an image into style vector and content map. An unseen
style vector can be combined with different seen content maps to produce
different images. However, it needs to store seen images to provide content
maps and the unseen style vector may be incompatible with seen content maps. To
adapt it to few-shot image generation task, we learn a compact dictionary of
local content vectors via quantizing continuous content maps into discrete
content maps instead of storing seen images. Furthermore, we model the
autoregressive distribution of discrete content map conditioned on style
vector, which can alleviate the incompatibility between content map and style
vector. Qualitative and quantitative results on three real datasets demonstrate
that our model can produce images of higher diversity and fidelity for unseen
categories than previous methods.Comment: This paper is accepted by ACM MM 202
- …