13 research outputs found
Advancing Dynamic-Time Warp Techniques for Correcting Eye Tracking Data in Reading Source Code
Background: Automated eye tracking data correction algorithms such as Dynamic-Time Warp always made a trade-off between the ability to handle regressions (jumps back) and distortions (fixation drift). At the same time, eye movement in code reading is characterized by non-linearity and regressions.
Objective: In this paper, we present a family of hybrid algorithms that aim to handles both regressions and distortions with high accuracy.
Method: Through simulations with synthetic data we replicate known eye movement phenomena to assess our algorithms against Warp algorithm as a baseline. Furthermore, we utilize three real datasets to evaluate the algorithms in correcting data from reading source code and see if the proposed algorithms generalize to correcting data from reading natural language text.
Results: Our results demonstrate that most proposed algorithms match or outperform baseline warp in correcting both synthetic and real data. Also, we show the prevalence of regressions in reading source code.
Conclusion: Our results highlight our hybrid algorithms as an improvement to Dynamic-Time Warp in handling regressions with higher accuracy and better runtime
Subnational mapping of HIV incidence and mortality among individuals aged 15–49 years in sub-Saharan Africa, 2000–18 : a modelling study
Background: High-resolution estimates of HIV burden across space and time provide an important tool for tracking and monitoring the progress of prevention and control efforts and assist with improving the precision and efficiency of targeting efforts. We aimed to assess HIV incidence and HIV mortality for all second-level administrative units across sub-Saharan Africa. Methods: In this modelling study, we developed a framework that used the geographically specific HIV prevalence data collected in seroprevalence surveys and antenatal care clinics to train a model that estimates HIV incidence and mortality among individuals aged 15–49 years. We used a model-based geostatistical framework to estimate HIV prevalence at the second administrative level in 44 countries in sub-Saharan Africa for 2000–18 and sought data on the number of individuals on antiretroviral therapy (ART) by second-level administrative unit. We then modified the Estimation and Projection Package (EPP) to use these HIV prevalence and treatment estimates to estimate HIV incidence and mortality by second-level administrative unit. Findings: The estimates suggest substantial variation in HIV incidence and mortality rates both between and within countries in sub-Saharan Africa, with 15 countries having a ten-times or greater difference in estimated HIV incidence between the second-level administrative units with the lowest and highest estimated incidence levels. Across all 44 countries in 2018, HIV incidence ranged from 2 ·8 (95% uncertainty interval 2·1–3·8) in Mauritania to 1585·9 (1369·4–1824·8) cases per 100 000 people in Lesotho and HIV mortality ranged from 0·8 (0·7–0·9) in Mauritania to 676· 5 (513· 6–888·0) deaths per 100 000 people in Lesotho. Variation in both incidence and mortality was substantially greater at the subnational level than at the national level and the highest estimated rates were accordingly higher. Among second-level administrative units, Guijá District, Gaza Province, Mozambique, had the highest estimated HIV incidence (4661·7 [2544·8–8120·3]) cases per 100000 people in 2018 and Inhassunge District, Zambezia Province, Mozambique, had the highest estimated HIV mortality rate (1163·0 [679·0–1866·8]) deaths per 100 000 people. Further, the rate of reduction in HIV incidence and mortality from 2000 to 2018, as well as the ratio of new infections to the number of people living with HIV was highly variable. Although most second-level administrative units had declines in the number of new cases (3316 [81· 1%] of 4087 units) and number of deaths (3325 [81·4%]), nearly all appeared well short of the targeted 75% reduction in new cases and deaths between 2010 and 2020. Interpretation: Our estimates suggest that most second-level administrative units in sub-Saharan Africa are falling short of the targeted 75% reduction in new cases and deaths by 2020, which is further compounded by substantial within-country variability. These estimates will help decision makers and programme implementers expand access to ART and better target health resources to higher burden subnational areas
From Novice to Expert: Analysis of Token Level Effects in a Longitudinal Eye Tracking Study - Replication Package
Replication Package for ICPC21 "From Novice to Expert: Analysis of Token Level Effects in a Longitudinal Eye Tracking Study
EMIP Toolkit: A Python Library for Customized Post-processing of the Eye Movements in Programming Dataset
This is the source code and filtered dataset associated with the EMIP WS submission titled:
EMIP Toolkit: A Python Library for Customized Post-processing of the Eye Movements in Programming Dataset
Paper: https://www.researchgate.net/publication/350485560_EMIP_Toolkit_A_Python_Library_for_Customized_Post-processing_of_the_Eye_Movements_in_Programming_Dataset
Video presentation: https://youtu.be/wFdGyM6qUlE
GitHub repository (with the latest EMIP Toolkit code): https://github.com/nalmadi/EMIP-Toolki
How Readable is Model-generated Code? Examining Readability and Visual Inspection of GitHub Copilot
Background: Recent advancements in large language models have motivated the practical use of such models in code generation and program synthesis. However, little is known about the effects of such tools on code readability and visual attention in practice.
Objective: In this paper, we focus on GitHub Copilot to address the issues of readability and visual inspection of model generated code. Readability and low complexity are vital aspects of good source code, and visual inspection of generated code is important in light of automation bias.
Method: Through a human experiment (n=21) we compare model generated code to code written completely by human programmers. We use a combination of static code analysis and human annotators to assess code readability, and we use eye tracking to assess the visual inspection of code.
Results: Our results suggest that model generated code is comparable in complexity and readability to code written by human pair
programmers. At the same time, eye tracking data suggests, to a statistically significant level, that programmers direct less visual attention to model generated code.
Conclusion: Our findings highlight that reading code is more important than ever, and programmers should beware of complacency and automation bias with model generated code.
Please cite:
Al Madi, Naser. "How readable is model-generated code? examining readability and visual inspection of github copilot." Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering. 2022
Dataset of Underrepresented Languages in Eye Tracking Research
A number of factors come together to limit the diversity of eye-tracking research, where the majority of papers are conducted with stimuli in the English language. Studying eye movement over other languages is important considering that each language provides unique insights into human cognition. Recently, there have been valued efforts to present datasets from other languages, yet these efforts focused mostly on European languages. In this paper we highlight issues that limit diversity in eye tracking research on reading, and we present our work in collecting an open-access multilingual reading dataset of underrepresented languages. Utilizing a high-frequency research eye tracker (EyeLink 1000 Plus), we record eye tracking data of native and second language readers of English, Spanish, Chinese, Hindi, Russian, Arabic, Japanese, Kazakh, Urdu, and Vietnamese. The dataset includes demographics, language proficiency self-reporting, and answers to comprehension questions. The current version of the dataset, which we make publicly available, consists of 97 trials by 40 participants. With the goal of increasing the number of participants and included languages, we aim to make studying underrepresented languages more accessible to researchers and tool makers.
Read full paper here: https://www.researchgate.net/profile/Naser-Al-Madi/publication/370553492_A_Dataset_of_Underrepresented_Languages_in_Eye_Tracking_Research/links/645583685762c95ac3766dfe/A-Dataset-of-Underrepresented-Languages-in-Eye-Tracking-Research.pd
Replication - Combining Automation and Expertise: A Semi-automated Approach to Correcting Eye Tracking Data in Reading Tasks
A replication package for the experiment portion of the paper, please find the tool at the following repository:
https://github.com/nalmadi/fix