865 research outputs found
LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization
This work analyzes and parallelizes LearnedSort, the novel algorithm that
sorts using machine learning models based on the cumulative distribution
function. LearnedSort is analyzed under the lens of algorithms with
predictions, and it is argued that LearnedSort is a learning-augmented
SampleSort. A parallel LearnedSort algorithm is developed combining LearnedSort
with the state-of-the-art SampleSort implementation, IPS4o. Benchmarks on
synthetic and real-world datasets demonstrate improved parallel performance for
parallel LearnedSort compared to IPS4o and other sorting algorithms.Comment: Published in SSDBM 202
A Case Study on Record Matching of Individuals in Historical Archives of Indigenous Databases
Digitization of historical records has produced a significant amount of data
for analysis and interpretation. A critical challenge is the ability to relate
historical information across different archives to allow for the data to be
framed in the appropriate historical context. This paper presents a real-world
case study on historical information integration and record matching with the
goal to improve the historical value of archives containing data in the period
1800 to 1920. The archives contain unique information about M\'etis and
Indigenous people in Canada and interactions with European settlers. The
archives contain thousands of records that have increased relevance when
relationships and interconnections are discovered. The contribution is a record
linking approach suitable for historical archives and an evaluation of its
effectiveness. Experimental results demonstrate potential for discovering
historical linkage with high precision enabling new historical discoveries.Comment: Published in 20th International Conference on Information & Knowledge
Engineering (IKE'21
Using Assignment Incentives to Reduce Student Procrastination and Encourage Code Review Interactions
Procrastination causes student stress, reduced learning and performance, and
results in very busy help sessions immediately before deadlines. A key
challenge is encouraging students to complete assignments earlier rather than
waiting until right before the deadline, so the focus becomes on the learning
objectives rather than just meeting deadlines. This work presents an incentive
system encouraging students to complete assignments many days before deadlines.
Completed assignments are code reviewed by staff for correctness and providing
feedback, which results in more student-instructor interactions and may help
reduce student use of generative AI. The incentives result in a change in
student behavior with 45% of assignments completed early and 30% up to 4 days
before the deadline. Students receive real-time feedback with no increase in
marking time.Comment: 6 pages, To be published in 2023 International Conference on
Computational Science and Computational Intelligence Research Track on
Education (CSCI-RTED) IEEE CP
Detecting Argumentative Fallacies in the Wild:Problems and Limitations of Large Language Models
Previous work on the automatic identification of fallacies in natural language text has typically approached the problem in constrained experimental setups that make it difficult to understand the applicability and usefulness of the proposals in the real world. In this paper, we present the first analysis of the limitations that these data-driven approaches could show in real situations. For that purpose, we first create a validation corpus consisting of natural language argumentation schemes. Second, we provide new empirical results to the emerging task of identifying fallacies in natural language text. Third, we analyse the errors observed outside of the testing data domains considering the new validation corpus. Finally, we point out some important limitations observed in our analysis that should be taken into account in future research in this topic. Specifically, if we want to deploy these systems in the Wild
Detecting Argumentative Fallacies in the Wild:Problems and Limitations of Large Language Models
Previous work on the automatic identification of fallacies in natural language text has typically approached the problem in constrained experimental setups that make it difficult to understand the applicability and usefulness of the proposals in the real world. In this paper, we present the first analysis of the limitations that these data-driven approaches could show in real situations. For that purpose, we first create a validation corpus consisting of natural language argumentation schemes. Second, we provide new empirical results to the emerging task of identifying fallacies in natural language text. Third, we analyse the errors observed outside of the testing data domains considering the new validation corpus. Finally, we point out some important limitations observed in our analysis that should be taken into account in future research in this topic. Specifically, if we want to deploy these systems in the Wild
An Efficient B-tree Implementation for Memory-Constrained Embedded Systems
Embedded devices collect and process significant amounts of data in a variety
of applications including environmental monitoring, industrial automation and
control, and other Internet of Things (IoT) applications. Storing data
efficiently is critically important, especially when the device must perform
local processing on the data. The most widely used data structure for high
performance query and insert is the B-tree. However, existing implementations
consume too much memory for small embedded devices and often rely on operating
system support. This work presents an extremely memory efficient implementation
of B-trees for embedded devices that functions on the smallest devices and does
not require an operating system. Experimental results demonstrate that the
B-tree implementation can run on devices with as little as 4 KB of RAM while
efficiently processing thousands of records.Comment: Published in the 19th International Conference on Embedded Systems,
Cyber-physical Systems, and Applications (ESCS'21). Code is available at
https://github.com/ubco-d
Student Mastery or AI Deception? Analyzing ChatGPT's Assessment Proficiency and Evaluating Detection Strategies
Generative AI systems such as ChatGPT have a disruptive effect on learning
and assessment. Computer science requires practice to develop skills in problem
solving and programming that are traditionally developed using assignments.
Generative AI has the capability of completing these assignments for students
with high accuracy, which dramatically increases the potential for academic
integrity issues and students not achieving desired learning outcomes. This
work investigates the performance of ChatGPT by evaluating it across three
courses (CS1,CS2,databases). ChatGPT completes almost all introductory
assessments perfectly. Existing detection methods, such as MOSS and JPlag
(based on similarity metrics) and GPTzero (AI detection), have mixed success in
identifying AI solutions. Evaluating instructors and teaching assistants using
heuristics to distinguish between student and AI code shows that their
detection is not sufficiently accurate. These observations emphasize the need
for adapting assessments and improved detection methods.Comment: 7 pages, Published in 2023 International Conference on Computational
Science and Computational Intelligence Research Track on Education, IEEE CP
Estimation of Average Annual Daily Bicycle Count Using Bike-Share GPS Data and Bike Counter Data for an Urban Active Transportation Network
In 2018, the City of Kelowna entered into a license agreement with Dropbike
to operate a dockless bike-share pilot in and around the downtown core. The
bikes were tracked by the user's cell phone GPS through the Dropbike app. The
City's Active Transportation team recognized that this GPS data could help
understand the routes used by cyclists which would then inform decision-making
for infrastructure improvements. Using OSMnx and NetworkX, the map of Kelowna
was converted into a graph network to map inaccurate, infrequent GPS points to
the nearest street intersection, calculate the potential paths taken by
cyclists and count the number of trips by street segment though the comparison
of different path-finding models. Combined with the data from four counters
around downtown, a mixed effects statistical model and a least squares
optimization were used to estimate a relationship between the different traffic
patterns of the bike-share and counter data. Using this relationship based on
sparse data input from physical counting stations and bike share data,
estimations and visualizations of the annual daily bicycle volume in downtown
Kelowna were produced. The analysis, modelling and visualization helped to
better understand how the bike network was being used in the urban center,
including non-traditional routes such as laneways and highway crossings.Comment: Published in 17th International Conference on Data Science
(ICDATA'21
- …