Grad students in English Department publish article on AI and student writing
English graduate students Davide Pafumi, Frank Onuh, Iftekhar Khalid, and Morgan Pearce have combined with Barbara Bordalejo and Daniel O'Donnell to publish an article on the impact of Generative Artificial Intelligence on commonly-used writing tools.
The article, “'Scarlet Cloak and the Forest Adventure': a preliminary study of the impact of AI on commonly used writing tools" was published February 7, 2025 in the International Journal of Educational Technology in Higher Education.
The paper, which was developed from a project under the supervision of adjunct department member Dr. Bordalejo, compares tools and assesses their impact by using them on texts written prior to the advent of popular chatbots such as ChatGPT. The paper is part of a larger series of projects by students in the Department's Humanities Innovation Lab.
You can read the paper (Open Access), here.
Abstract
This paper explores the growing complexity of detecting and differentiating generative AI from other AI interventions. Initially prompted by noticing how tools like Grammarly were being flagged by AI detection software, it examines how these popular tools such as Grammarly, EditPad, Writefull, and AI models such as ChatGPT and Microsoft Bing Copilot affect human-generated texts and how accurately current AI-detection systems, including Turnitin and GPTZero, can assess texts for use of these tools. The results highlight that widely used writing aids, even those not primarily generative, can trigger false positives in AI detection tools. In order to provide a dataset, the authors applied different AI-enhanced tools to a number of texts of different styles that were written prior to the development of consumer AI tools, and evaluated their impact through key metrics such as readability, perplexity, and burstiness. The findings reveal that tools like Grammarly that subtly enhance readability also trigger detection and increase false positives, especially for non-native speakers. In general, paraphrasing tools score low values in AI detection software, allowing the changes to go mostly unnoticed by the software. However, the use of Microsoft Bing Copilot and Writefull on our selected texts were able to eschew AI detection fairly consistently. To exacerbate this problem, traditional AI detectors like Turnitin and GPTZero struggle to reliably differentiate between legitimate paraphrasing and AI generation, undermining their utility for enforcing academic integrity. The study concludes by urging educators to focus on managing interactions with AI in academic settings rather than outright banning its use. It calls for the creation of policies and guidelines that acknowledge the evolving role of AI in writing, emphasizing the need to interpret detection scores cautiously to avoid penalizing students unfairly. In addition, encouraging openness on how AI is used in writing could alleviate concerns in the research and writing process for both students and academics. The paper recommends a shift toward teaching responsible AI usage rather than pursuing rigid bans or relying on detection metrics that may not accurately capture misconduct.