865 research outputs found

    LearnedSort as a learning-augmented SampleSort: Analysis and Parallelization

    Full text link
    This work analyzes and parallelizes LearnedSort, the novel algorithm that sorts using machine learning models based on the cumulative distribution function. LearnedSort is analyzed under the lens of algorithms with predictions, and it is argued that LearnedSort is a learning-augmented SampleSort. A parallel LearnedSort algorithm is developed combining LearnedSort with the state-of-the-art SampleSort implementation, IPS4o. Benchmarks on synthetic and real-world datasets demonstrate improved parallel performance for parallel LearnedSort compared to IPS4o and other sorting algorithms.Comment: Published in SSDBM 202

    A Case Study on Record Matching of Individuals in Historical Archives of Indigenous Databases

    Full text link
    Digitization of historical records has produced a significant amount of data for analysis and interpretation. A critical challenge is the ability to relate historical information across different archives to allow for the data to be framed in the appropriate historical context. This paper presents a real-world case study on historical information integration and record matching with the goal to improve the historical value of archives containing data in the period 1800 to 1920. The archives contain unique information about M\'etis and Indigenous people in Canada and interactions with European settlers. The archives contain thousands of records that have increased relevance when relationships and interconnections are discovered. The contribution is a record linking approach suitable for historical archives and an evaluation of its effectiveness. Experimental results demonstrate potential for discovering historical linkage with high precision enabling new historical discoveries.Comment: Published in 20th International Conference on Information & Knowledge Engineering (IKE'21

    Using Assignment Incentives to Reduce Student Procrastination and Encourage Code Review Interactions

    Full text link
    Procrastination causes student stress, reduced learning and performance, and results in very busy help sessions immediately before deadlines. A key challenge is encouraging students to complete assignments earlier rather than waiting until right before the deadline, so the focus becomes on the learning objectives rather than just meeting deadlines. This work presents an incentive system encouraging students to complete assignments many days before deadlines. Completed assignments are code reviewed by staff for correctness and providing feedback, which results in more student-instructor interactions and may help reduce student use of generative AI. The incentives result in a change in student behavior with 45% of assignments completed early and 30% up to 4 days before the deadline. Students receive real-time feedback with no increase in marking time.Comment: 6 pages, To be published in 2023 International Conference on Computational Science and Computational Intelligence Research Track on Education (CSCI-RTED) IEEE CP

    Detecting Argumentative Fallacies in the Wild:Problems and Limitations of Large Language Models

    Get PDF
    Previous work on the automatic identification of fallacies in natural language text has typically approached the problem in constrained experimental setups that make it difficult to understand the applicability and usefulness of the proposals in the real world. In this paper, we present the first analysis of the limitations that these data-driven approaches could show in real situations. For that purpose, we first create a validation corpus consisting of natural language argumentation schemes. Second, we provide new empirical results to the emerging task of identifying fallacies in natural language text. Third, we analyse the errors observed outside of the testing data domains considering the new validation corpus. Finally, we point out some important limitations observed in our analysis that should be taken into account in future research in this topic. Specifically, if we want to deploy these systems in the Wild

    Detecting Argumentative Fallacies in the Wild:Problems and Limitations of Large Language Models

    Get PDF
    Previous work on the automatic identification of fallacies in natural language text has typically approached the problem in constrained experimental setups that make it difficult to understand the applicability and usefulness of the proposals in the real world. In this paper, we present the first analysis of the limitations that these data-driven approaches could show in real situations. For that purpose, we first create a validation corpus consisting of natural language argumentation schemes. Second, we provide new empirical results to the emerging task of identifying fallacies in natural language text. Third, we analyse the errors observed outside of the testing data domains considering the new validation corpus. Finally, we point out some important limitations observed in our analysis that should be taken into account in future research in this topic. Specifically, if we want to deploy these systems in the Wild

    An Efficient B-tree Implementation for Memory-Constrained Embedded Systems

    Full text link
    Embedded devices collect and process significant amounts of data in a variety of applications including environmental monitoring, industrial automation and control, and other Internet of Things (IoT) applications. Storing data efficiently is critically important, especially when the device must perform local processing on the data. The most widely used data structure for high performance query and insert is the B-tree. However, existing implementations consume too much memory for small embedded devices and often rely on operating system support. This work presents an extremely memory efficient implementation of B-trees for embedded devices that functions on the smallest devices and does not require an operating system. Experimental results demonstrate that the B-tree implementation can run on devices with as little as 4 KB of RAM while efficiently processing thousands of records.Comment: Published in the 19th International Conference on Embedded Systems, Cyber-physical Systems, and Applications (ESCS'21). Code is available at https://github.com/ubco-d

    Student Mastery or AI Deception? Analyzing ChatGPT's Assessment Proficiency and Evaluating Detection Strategies

    Full text link
    Generative AI systems such as ChatGPT have a disruptive effect on learning and assessment. Computer science requires practice to develop skills in problem solving and programming that are traditionally developed using assignments. Generative AI has the capability of completing these assignments for students with high accuracy, which dramatically increases the potential for academic integrity issues and students not achieving desired learning outcomes. This work investigates the performance of ChatGPT by evaluating it across three courses (CS1,CS2,databases). ChatGPT completes almost all introductory assessments perfectly. Existing detection methods, such as MOSS and JPlag (based on similarity metrics) and GPTzero (AI detection), have mixed success in identifying AI solutions. Evaluating instructors and teaching assistants using heuristics to distinguish between student and AI code shows that their detection is not sufficiently accurate. These observations emphasize the need for adapting assessments and improved detection methods.Comment: 7 pages, Published in 2023 International Conference on Computational Science and Computational Intelligence Research Track on Education, IEEE CP

    Estimation of Average Annual Daily Bicycle Count Using Bike-Share GPS Data and Bike Counter Data for an Urban Active Transportation Network

    Full text link
    In 2018, the City of Kelowna entered into a license agreement with Dropbike to operate a dockless bike-share pilot in and around the downtown core. The bikes were tracked by the user's cell phone GPS through the Dropbike app. The City's Active Transportation team recognized that this GPS data could help understand the routes used by cyclists which would then inform decision-making for infrastructure improvements. Using OSMnx and NetworkX, the map of Kelowna was converted into a graph network to map inaccurate, infrequent GPS points to the nearest street intersection, calculate the potential paths taken by cyclists and count the number of trips by street segment though the comparison of different path-finding models. Combined with the data from four counters around downtown, a mixed effects statistical model and a least squares optimization were used to estimate a relationship between the different traffic patterns of the bike-share and counter data. Using this relationship based on sparse data input from physical counting stations and bike share data, estimations and visualizations of the annual daily bicycle volume in downtown Kelowna were produced. The analysis, modelling and visualization helped to better understand how the bike network was being used in the urban center, including non-traditional routes such as laneways and highway crossings.Comment: Published in 17th International Conference on Data Science (ICDATA'21
    • …
    corecore