902 research outputs found
LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization
This work analyzes and parallelizes LearnedSort, the novel algorithm that
sorts using machine learning models based on the cumulative distribution
function. LearnedSort is analyzed under the lens of algorithms with
predictions, and it is argued that LearnedSort is a learning-augmented
SampleSort. A parallel LearnedSort algorithm is developed combining LearnedSort
with the state-of-the-art SampleSort implementation, IPS4o. Benchmarks on
synthetic and real-world datasets demonstrate improved parallel performance for
parallel LearnedSort compared to IPS4o and other sorting algorithms.Comment: Published in SSDBM 202
A Case Study on Record Matching of Individuals in Historical Archives of Indigenous Databases
Digitization of historical records has produced a significant amount of data
for analysis and interpretation. A critical challenge is the ability to relate
historical information across different archives to allow for the data to be
framed in the appropriate historical context. This paper presents a real-world
case study on historical information integration and record matching with the
goal to improve the historical value of archives containing data in the period
1800 to 1920. The archives contain unique information about M\'etis and
Indigenous people in Canada and interactions with European settlers. The
archives contain thousands of records that have increased relevance when
relationships and interconnections are discovered. The contribution is a record
linking approach suitable for historical archives and an evaluation of its
effectiveness. Experimental results demonstrate potential for discovering
historical linkage with high precision enabling new historical discoveries.Comment: Published in 20th International Conference on Information & Knowledge
Engineering (IKE'21
Using Assignment Incentives to Reduce Student Procrastination and Encourage Code Review Interactions
Procrastination causes student stress, reduced learning and performance, and
results in very busy help sessions immediately before deadlines. A key
challenge is encouraging students to complete assignments earlier rather than
waiting until right before the deadline, so the focus becomes on the learning
objectives rather than just meeting deadlines. This work presents an incentive
system encouraging students to complete assignments many days before deadlines.
Completed assignments are code reviewed by staff for correctness and providing
feedback, which results in more student-instructor interactions and may help
reduce student use of generative AI. The incentives result in a change in
student behavior with 45% of assignments completed early and 30% up to 4 days
before the deadline. Students receive real-time feedback with no increase in
marking time.Comment: 6 pages, To be published in 2023 International Conference on
Computational Science and Computational Intelligence Research Track on
Education (CSCI-RTED) IEEE CP
Detecting Argumentative Fallacies in the Wild:Problems and Limitations of Large Language Models
Previous work on the automatic identification of fallacies in natural language text has typically approached the problem in constrained experimental setups that make it difficult to understand the applicability and usefulness of the proposals in the real world. In this paper, we present the first analysis of the limitations that these data-driven approaches could show in real situations. For that purpose, we first create a validation corpus consisting of natural language argumentation schemes. Second, we provide new empirical results to the emerging task of identifying fallacies in natural language text. Third, we analyse the errors observed outside of the testing data domains considering the new validation corpus. Finally, we point out some important limitations observed in our analysis that should be taken into account in future research in this topic. Specifically, if we want to deploy these systems in the Wild
Detecting Argumentative Fallacies in the Wild:Problems and Limitations of Large Language Models
Previous work on the automatic identification of fallacies in natural language text has typically approached the problem in constrained experimental setups that make it difficult to understand the applicability and usefulness of the proposals in the real world. In this paper, we present the first analysis of the limitations that these data-driven approaches could show in real situations. For that purpose, we first create a validation corpus consisting of natural language argumentation schemes. Second, we provide new empirical results to the emerging task of identifying fallacies in natural language text. Third, we analyse the errors observed outside of the testing data domains considering the new validation corpus. Finally, we point out some important limitations observed in our analysis that should be taken into account in future research in this topic. Specifically, if we want to deploy these systems in the Wild
ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education
With the rapid evolution of Natural Language Processing (NLP), Large Language
Models (LLMs) like ChatGPT have emerged as powerful tools capable of
transforming various sectors. Their vast knowledge base and dynamic interaction
capabilities represent significant potential in improving education by
operating as a personalized assistant. However, the possibility of generating
incorrect, biased, or unhelpful answers are a key challenge to resolve when
deploying LLMs in an education context. This work introduces an innovative
architecture that combines the strengths of ChatGPT with a traditional
information retrieval based chatbot framework to offer enhanced student support
in higher education. Our empirical evaluations underscore the high promise of
this approach.Comment: To appear at INTED2024 - 18th annual International Technology,
Education and Development Conferenc
An Efficient B-tree Implementation for Memory-Constrained Embedded Systems
Embedded devices collect and process significant amounts of data in a variety
of applications including environmental monitoring, industrial automation and
control, and other Internet of Things (IoT) applications. Storing data
efficiently is critically important, especially when the device must perform
local processing on the data. The most widely used data structure for high
performance query and insert is the B-tree. However, existing implementations
consume too much memory for small embedded devices and often rely on operating
system support. This work presents an extremely memory efficient implementation
of B-trees for embedded devices that functions on the smallest devices and does
not require an operating system. Experimental results demonstrate that the
B-tree implementation can run on devices with as little as 4 KB of RAM while
efficiently processing thousands of records.Comment: Published in the 19th International Conference on Embedded Systems,
Cyber-physical Systems, and Applications (ESCS'21). Code is available at
https://github.com/ubco-d
Student Mastery or AI Deception? Analyzing ChatGPT's Assessment Proficiency and Evaluating Detection Strategies
Generative AI systems such as ChatGPT have a disruptive effect on learning
and assessment. Computer science requires practice to develop skills in problem
solving and programming that are traditionally developed using assignments.
Generative AI has the capability of completing these assignments for students
with high accuracy, which dramatically increases the potential for academic
integrity issues and students not achieving desired learning outcomes. This
work investigates the performance of ChatGPT by evaluating it across three
courses (CS1,CS2,databases). ChatGPT completes almost all introductory
assessments perfectly. Existing detection methods, such as MOSS and JPlag
(based on similarity metrics) and GPTzero (AI detection), have mixed success in
identifying AI solutions. Evaluating instructors and teaching assistants using
heuristics to distinguish between student and AI code shows that their
detection is not sufficiently accurate. These observations emphasize the need
for adapting assessments and improved detection methods.Comment: 7 pages, Published in 2023 International Conference on Computational
Science and Computational Intelligence Research Track on Education, IEEE CP
- …