53 research outputs found
Exploring the synergistic potential of quantum annealing and gate model computing for portfolio optimization
Portfolio optimization is one of the most studied problems for demonstrating
the near-term applications of quantum computing. However, large-scale problems
cannot be solved on today's quantum hardware. In this work, we extend upon a
study to use the best of both quantum annealing and gate-based quantum
computing systems to enable solving large-scale optimization problems
efficiently on the available hardware. The existing work uses a method called
Large System Sampling Approximation (LSSA) that involves dividing the large
problem into several smaller problems and then combining the multiple solutions
to approximate the solution to the original problem. This paper introduces a
novel technique to modify the sampling step of LSSA. We divide the portfolio
optimization problem into sub-systems of smaller sizes by selecting a diverse
set of assets that act as representatives of the entire market and capture the
highest correlations among assets. We conduct tests on real-world stock data
from the Indian stock market on up to 64 assets. Our experimentation shows that
the hybrid approach performs at par with the traditional classical optimization
methods with a good approximation ratio. We also demonstrate the effectiveness
of our approach on a range of portfolio optimization problems of different
sizes. We present the effects of different parameters on the proposed method
and compare its performance with the earlier work. Our findings suggest that
hybrid annealer-gate quantum computing can be a valuable tool for portfolio
managers seeking to optimize their investment portfolios in the near future.Comment: 12 pages, 4 figures, 1 tabl
Campaign Ad - Betty Sutton
Campaign ad for Political Science class - PSCI 217 - Media and Politics
PlantDoc: A Dataset for Visual Plant Disease Detection
India loses 35% of the annual crop yield due to plant diseases. Early
detection of plant diseases remains difficult due to the lack of lab
infrastructure and expertise. In this paper, we explore the possibility of
computer vision approaches for scalable and early plant disease detection. The
lack of availability of sufficiently large-scale non-lab data set remains a
major challenge for enabling vision based plant disease detection. Against this
background, we present PlantDoc: a dataset for visual plant disease detection.
Our dataset contains 2,598 data points in total across 13 plant species and up
to 17 classes of diseases, involving approximately 300 human hours of effort in
annotating internet scraped images. To show the efficacy of our dataset, we
learn 3 models for the task of plant disease classification. Our results show
that modelling using our dataset can increase the classification accuracy by up
to 31%. We believe that our dataset can help reduce the entry barrier of
computer vision techniques in plant disease detection.Comment: 5 Pages, 6 figures, 3 table
Sampling Semantic Data Stream: Resolving Overload and Limited Storage Issues
International audienceThe Semantic Web technologies are being increasingly used for exploiting relations between data. In addition, new tendencies of real-time systems, such as social networks, sensors, cameras or weather information , are continuously generating data. This implies that the data and the links between them are becoming extremely vast. Such huge quantity of data needs to be analyzed, processed, as well as stored, if necessary. In this paper, we propose sampling operators that allow us to drop RDF Triples from the incoming data. Thereby, helping us to reduce the load on existing engines like CQELS, C-SPARQL, which are able to deal with big and linked data. Hence, the processing efforts, time as well as required storage space will reduce remarkably. We have proposed Uniform Random Sampling, Reservoir Sampling and Chain Sampling operators which may be implemented depending on the application
Revisiting Prompt Engineering via Declarative Crowdsourcing
Large language models (LLMs) are incredibly powerful at comprehending and
generating data in the form of text, but are brittle and error-prone. There has
been an advent of toolkits and recipes centered around so-called prompt
engineering-the process of asking an LLM to do something via a series of
prompts. However, for LLM-powered data processing workflows, in particular,
optimizing for quality, while keeping cost bounded, is a tedious, manual
process. We put forth a vision for declarative prompt engineering. We view LLMs
like crowd workers and leverage ideas from the declarative crowdsourcing
literature-including leveraging multiple prompting strategies, ensuring
internal consistency, and exploring hybrid-LLM-non-LLM approaches-to make
prompt engineering a more principled process. Preliminary case studies on
sorting, entity resolution, and imputation demonstrate the promise of our
approac
LLM-Assisted Code Cleaning For Training Accurate Code Generators
Natural language to code generation is an important application area of LLMs
and has received wide attention from the community. The majority of relevant
studies have exclusively concentrated on increasing the quantity and functional
correctness of training sets while disregarding other stylistic elements of
programs. More recently, data quality has garnered a lot of interest and
multiple works have showcased its importance for improving performance. In this
work, we investigate data quality for code and find that making the code more
structured and readable leads to improved code generation performance of the
system. We build a novel data-cleaning pipeline that uses these principles to
transform existing programs by 1.) renaming variables, 2.) modularizing and
decomposing complex code into smaller helper sub-functions, and 3.) inserting
natural-language based plans via LLM based transformations. We evaluate our
approach on two challenging algorithmic code generation benchmarks and find
that fine-tuning CodeLLaMa-7B on our transformed modularized programs improves
the performance by up to 30% compared to fine-tuning on the original dataset.
Additionally, we demonstrate improved performance from using a smaller amount
of higher-quality data, finding that a model fine-tuned on the entire original
dataset is outperformed by a model trained on 15% of our cleaned dataset. Even
in comparison to closed-source models, our models outperform the much larger
AlphaCoder models
- …