Many schools now have an AI policy. Many have color-coded systems telling students what they can and cannot do with generative AI across different assessment contexts. Some have gone further: scaffolded workflows, prompt-logging requirements, declaration forms, tiered frameworks distinguishing AI use for brainstorming from AI use for drafting. The effort has been genuine. The intention is serious. And yet, for the most part, none of it is working in the way we hoped for and imagined.
That is the uncomfortable argument at the center of a body of scholarship that I think deserves far wider attention than it has received up to this point.
In a 2025 paper published in Assessment & Evaluation in Higher Education, Thomas Corbin, Phillip Dawson, and Danny Liu introduce a powerful distinction that can be used, if we are brave enough, to reframe how school leaders and teachers think about AI governance. They draw a line between two fundamentally different kinds of responses to generative AI in assessment: discursive changes and structural changes.
Discursive changes are modifications that rely solely on communicating instructions, rules, or guidelines to students, leaving the underlying mechanics of the assessment task unchanged. As the authors put it, discursive changes “operate through what might be called linguistic commands. ” They “attempt to elicit compliance through language alone, without corresponding mechanisms to enforce those boundaries.” Structural changes alter the nature, format, or mechanics of the task itself, and their power, as Corbin et al. argue, “lies in their independence from voluntary student compliance”: rather than asking students to follow communicated rules, structural changes “create assessment environments where the desired behaviour emerges naturally from the assessment design.”
The distinction sounds very straightforward. Its meaning and implications are rather complicated.
Consider the traffic light system, now common across schools and universities internationally. Red means no AI use permitted. Yellow means limited, assistive use. Green means full integration encouraged. NYC Public Schools deployed this framework in March 2026 to organize AI risks into prohibited, cautionary, and approved categories. The College of Staten Island applied the same three-tier structure at the university level that same month. These systems are carefully designed and, in Corbin et al.'s terms, entirely discursive: they communicate rules without enforcing them. Researchers writing in Assessment & Evaluation in Higher Education liken them to traffic lights without cameras or highway patrols. A red-coded assessment is only red if the student decides to treat it that way.
Scaffolded AI workflows are the more sophisticated cousin of the traffic light. Frameworks like the AI Assessment Scale, developed by Perkins and colleagues, offer educators a progression of permission levels with specific language calibrated to each. The architecture is more nuanced than a stoplight, and the fundamental problem is identical. The authors of the scale’s second version acknowledged as much, noting that “permitting any use of AI effectively permits all use of AI,” because the distinctions between levels cannot be enforced in practice.
Policy and institutional guidance sit at the most authoritative end of this spectrum: declaration forms, academic integrity warnings, cover sheet disclosures. One study cited by Corbin et al. found that at a major institution, up to 74% of students did not complete AI declaration requirements appropriately. The policy existed. The compliance did not.
What stoplights, scaffolded frameworks, and institutional policy share is not a failure of design or intention but a failure of category. All three are operating at the level of communication, and communication, however clear, however detailed, however earnest, cannot do what only structure can do.
Corbin and others have named this the enforcement illusion, and the traffic light metaphor is their sharpest illustration of it. Real traffic lights work not because they are visible but because they are e mbedded in systems of physical and institutional enforcement: cameras that detect violations, penalties with genuine consequences, infrastructure that makes stopping the default behavior at a dangerous intersection. When authorities identify that intersection, they do not post clearer guidelines about when drivers should stop.
Institutional AI traffic lights borrow the visual logic of that infrastructure while possessing none of its enforcement capacity. We are using the language of structural change to describe what are purely discursive interventions, and the metaphor is not just inapt but actively misleading, because it generates the appearance of security where little exists. When assessment validity rests on student compliance with unenforceable rules, we are not protecting the integrity of our credentials so much as assuming it, quietly, at scale.
A companion paper by Corbin, Bearman, Boud, and Dawson, also in Assessment & Evaluation in Higher Education, frames the broader challenge through Rittel and Webber’s concept of the wicked problem: one with no correct solution, no stopping rule, and no way to test approaches without real consequences for real students. Their interviews with twenty university teachers responsible for assessment design reveal educators who are not failing to find the right answer so much as confronting a problem that resists the category of right answer altogether. That framing matters here because it explains why discursive responses feel sufficient. Wicked problems invite governance responses, policies, frameworks, guidelines, because those are the instruments institutions reach for when they need to act and be seen acting. The enforcement illusion is not cynical. It is a predictable institutional response to a genuinely unfamiliar problem. Predictable, however, is not the same as adequate.
The discursive/structural distinction clarifies what category our current efforts belong to and what category they cannot reach. Most of what schools have built in response to generative AI is discursive. That work has value, but it does not protect assessment validity.
Two currents in the field are running closer to structural ground. Teacher-researchers are developing richer understandings of how students learn with and against AI as a cognitive scaffold, work that feeds directly into process-oriented instructional and assessment design. Winstone, Gravett, and Elkington’s Black Box Assessment framework sits at this edge, building assessment structures that make the learning process itself visible and therefore harder to outsource. Practitioners and researchers focused on assessment security are working on oral examinations, authenticated checkpoints, evidentiary chains built across a unit rather than staked on a single submission. Both currents are aware of each other, though whether they are in genuine conversation is less clear. Corbin et al.’s vocabulary gives both communities something to build from.
Structural assessment reform also assumes an instructional foundation, some account of what students are actually doing when they engage with AI as a cognitive tool and what genuine competence in that engagement looks like over time. My own work on discipline-specific AI literacy develops five orientations students can take toward AI output: Critic, Verifier, Interlocutor, Editor, Architect. The framework is transdisciplinary in structure but always instantiated in the specific epistemic demands of a discipline. What the Critic role requires of a history student is sourcing. What it requires of a literature student is a prior interpretive encounter of her own.
Whether such frameworks function discursively or structurally is itself an open question. Named as orientations and communicated to students, they look discursive. Embedded in how tasks are sequenced across a year, built into the conditions under which independence becomes possible rather than merely expected, they begin to function structurally. Corbin et al.’s distinction may be just as generative applied to instructional design as it is to assessment. That is where I am looking next.
Nick Potkalitsky, Ph.D.
Check out some of our favorite Substacks:
Mike Kentz’s AI EduPathways: Insights from one of our most insightful, creative, and eloquent AI educators in the business!!!
Terry Underwood’s Learning to Read, Reading to Learn: The most penetrating investigation of the intersections between compositional theory, literacy studies, and AI on the internet!!!
Suzi’s When Life Gives You AI: A cutting-edge exploration of the intersection among computer science, neuroscience, and philosophy
Alejandro Piad Morffis’s The Computerist Journal: Unmatched investigations into coding, machine learning, computational theory, and practical AI applications
Michael Woudenberg’s Polymathic Being: Polymathic wisdom brought to you every Sunday morning with your first cup of coffee
Rob Nelson’s AI Log: Incredibly deep and insightful essay about AI’s impact on higher ed, society, and culture.
Michael Spencer’s AI Supremacy: The most comprehensive and current analysis of AI news and trends, featuring numerous intriguing guest posts
Daniel Bashir’s The Gradient Podcast: The top interviews with leading AI experts, researchers, developers, and linguists.
Daniel Nest’s Why Try AI?: The most amazing updates on AI tools and techniques
Jason Gulya’s The AI Edventure: An important exploration of cutting-edge innovations in AI-responsive curriculum and pedagogy



I really like this framing! It’s useful not only for education, but really any industry seeking to grapple with how to establish guidelines around AI use.
Cross posting my comment from LinkedIn 😊
Hey Nick, regarding the point “A red-coded assessment is only red if the student decides to treat it that way.” indeed this is an issue if a pure labelling exercise is taking place but our interpretation of “red” assessments at Queen Mary University of London is structurally secure assessments eg. invigilated exams or vivas.
Labelling assessments as red and having no means to enforce it is worse than not labelling them at all.
Here’s our approach / interpretation:
Red - structurally secure assessments
Amber - “open” assessments, AI use optional, may require structural redesign of assessments to be authentic, challenging, meaningful
Green - AI required, as part of embedded AI literacy
While the balance will vary between programmes, I feel most of the hard work is going to have to happen within “Amber”.
In short, we’re using the discursive labelling approach to highlight where all the structural redesign needs to happen.