AlgoGPT — Learning with AI for Data Structures & Algorithms

01 Learn with AI

SC1007 doesn't teach around AI — it teaches students to learn with it.

NTU's College of Computing & Data Science is embedding AI into how computing is taught, practised and assessed through its Learn with AI framework. In SC1007, students explore core data structures and algorithms in an AI-supported environment that offers guided hints and feedback — never finished solutions — as they solve problems. Practice is dynamically generated to each student's skill level, so a cohort of over a thousand learns at its own pace while building the judgement to evaluate AI-generated code.

AI systems can generate code remarkably quickly, but understanding why that code works remains fundamental. In our courses, students learn to use these tools responsibly while still making sound design decisions, reasoning through problems, and verifying the correctness of their solutions.

Dr Newton Fernando — Faculty lead, SC1007 Data Structures & Algorithms · NTU CCDS

The learning cycle

Every problem moves a student through three modes of thinking — the structured approach at the heart of SC1007's redesign. AlgoGPT supports each stage with Socratic questions, never finished code.

Explain

Articulate the reasoning

Students put their algorithmic logic into words before any code is written, externalising the mental model so gaps surface early.

Implement

Build it independently

Students write and refine the solution themselves, receiving tiered Socratic nudges only on request — direction, never the answer.

Evaluate

Judge the AI's code

Students review AI-assisted output critically, verifying it against the fundamentals — the skill that matters most in an AI-shaped world.

One lab, two halves

Each two-hour lab is built around the Zone of Proximal Development — students first build reasoning unaided, then extend it with adaptive AI support. The same AI-ON / AI-OFF principle carries into assessment: independent mastery is always verified separately from effective tool use.

Hour 01 · AI-OFF

Tutor-led conceptual work

Students design and analyse linked lists, stacks, queues, trees and graphs by hand — developing problem decomposition and debugging with no GenAI in the room.

Hour 02 · AI-ON

AI-supported pair programming

Students pair with AlgoGPT in Driver and Navigator modes while Teaching Assistants act as meta-coaches — guiding responsible, reflective use of every piece of AI feedback.

"Artificial intelligence is a multiplier. But if the multiplicand is zero, the outcome is zero."

— Professor Luke Ong, Dean, NTU College of Computing & Data Science

Learn with AI pilot

SC1007 · Data Structures & Algorithms SC2000 · Probability & Statistics SC2006 · Software Engineering SC4052 · Cloud Computing

02 The Challenge

In the age of AI, the villain isn't a hard concept — it's frictionless learning.

Standard large language models act as a black box: they hand over working solutions instantly, letting students bypass the very struggle that builds deep understanding. Learning theory is clear that durable knowledge requires effortful, active construction — and frictionless assistance short-circuits it.

“Being denied the answer forced me to actually think. For the first time, I felt I owned the code, not the AI.”

— SC1007 STUDENT

Standard AI

AI as Answer Key

Immediate, complete solutions. Speed without comprehension — and a learner who never owns the logic.

AlgoGPT

AI as Socratic Mentor

Structured friction by design. Hints, questions and deliberately flawed code that force reasoning — turning answers into earned insight.

03 The Platform

More than a chatbot — a classroom operating system for DSA courses.

AlgoGPT closes the loop between teaching and practice. Professors generate syllabus-aligned questions; students solve them in the same environment where labs are scheduled, submissions graded, and progress profiled — all in one place.

◧

MODULE 01

Diagnostic Profiling

An adaptive Week-2 quiz establishes each learner's baseline mastery across core topics, then generates a personalised profile with targeted recommendations.

◍

MODULE 02

Conceptual MCQs

Theory-based multiple-choice components reinforce understanding alongside coding practice, ensuring conceptual clarity supports implementation.

⟳

MODULE 03

Personalised Practice

Targeted sets are generated from each student's struggle patterns, drawn from a repository of several hundred problems — including 150 widely-used DSA interview questions.

⇄

MODULE 04

Dual-Mode Labs

The Driver–Navigator pair-programming environment turns every lab session into structured, reflective, friction-first practice.

▤

MODULE 05

Progress Dashboards

Submission logs give a real-time pulse of the cohort, letting instructors detect unproductive frustration and intervene early.

⚿

MODULE 06

Institution-Ready

A closed lab system with access codes, enrolment, and session tracking — built around the professor → TA → student hierarchy.

04 The Pedagogical Engine

Two role-reversing agents that operationalise pedagogy.

Instead of a single chatbot, AlgoGPT runs two specialised agents. The Driver withholds solutions and coaches through tiered Socratic feedback; the Navigator reverses the roles, writing deliberately flawed code for the student to review.

The core interaction loop within AlgoGPT — Driver Mode and Navigator Mode — **Figure 1** The core interaction loop. Driver Mode (top): the student writes code and receives non-directive, multi-tiered Socratic feedback. Navigator Mode (bottom): the student instructs in natural language and the agent intentionally generates subtly flawed code to provoke critical review.

The Driver

Socratic Coach · Synthesis Mode

Students write code; the AI intervenes only on a student-initiated request. Rather than offering fixes, the Driver decomposes feedback into four escalating tiers — functioning like a senior instructor who scaffolds through questions, never supplying runnable code.

1High-level analysisFrame the problem without revealing the path.

2Conceptual hintsSurface the relevant idea — a tree, a hash, an invariant.

3Guiding questionsPrompt the student to locate their own gap.

4Implementation suggestionsDirection, never the code itself.

AlgoGPT Driver Mode interface — **Figure 2** Driver interface — the active code editor, the student-initiated "Get Nudge" trigger, and the multi-tiered feedback panel that decomposes support into analysis, hint, question, and suggestion.

The Navigator

Critical Evaluator · Review Mode

The roles reverse: students articulate logic in natural language while the AI writes code in a locked editor. The Navigator is prompted to introduce plausibly wrong rather than randomly wrong outputs — errors a student might make themselves — which they must Accept or Reject.

↳Calibrated flaw injectionSubtle bugs targeted at the learner's misconception.

↳Externalised reasoningNatural-language instruction forces a clear mental model.

↳Evaluate before acceptingAn "Accept / Reject" gate operationalises critical verification.

AlgoGPT Navigator Mode interface — **Figure 3** Navigator interface — the agent establishes its role, the student instructs in natural language, code is injected into the locked editor with intentional flaws, and the student must validate it via Accept or Reject.

Normal Mode

Conversational Baseline

A conversational AI interface is available across every lab. Students may ask about concepts or requirements, but the AI still withholds direct code solutions. It serves as a within-system baseline — controlling for AI availability while isolating the incremental effect of friction.

○Always reachableA floor of support beneath every learner.

○Comparative baselineLets researchers measure the friction effect directly.

○Graceful fallbackCognitive walls become detours, not dead ends.

05 System Architecture

A theory-driven, multi-agent system — engineered to scale.

AlgoGPT is built on a stateless backend with LangGraph agent orchestration and Azure-hosted GPT-4o models providing pedagogically constrained reasoning. Each agent has one job; together they manage cognitive load end to end.

Overall architecture of AlgoGPT — **Figure 4** The full system — a Next.js / React frontend with a Monaco editor, a FastAPI backend, a LangGraph-orchestrated agent layer, GPT-4o reasoning with problem-context RAG, and PostgreSQL storage for submissions and chat logs.

Learner Profiling Agent

Runs the diagnostic quiz, classifies mastery, and generates each student's personalised learner profile.

Code Analysis Agent

Inspects submissions to detect misconceptions and the specific reasoning gap behind a failed attempt.

Response Validation Agent

Guards the Socratic protocol — ensuring feedback never leaks runnable solutions to the student.

Instructional Chatbot

Delivers real-time, non-directive feedback through reflective prompts and scaffolding strategies.

Adaptive Task Scheduler

Sequences problems and difficulty, moving learners along a personalised topic progression.

Problem-Context RAG

Grounds every agent response in the exact problem statement and course knowledge base.

Frontend Next.js / React Monaco Editor Backend FastAPI LangGraph orchestration ATLAS NALA integration Reasoning Azure GPT-4o Stateless API Code Runner Service LSP Backend Service Storage PostgreSQL Azure Container Registry

06 The Evidence

35,000+ data points prove that desirable difficulty works.

Across a 14-week semester, AlgoGPT's friction modes produced significantly higher-quality behavioural patterns than standard AI assistance.

Eventual success rate by mode

Lower "frictionless" passing in friction modes reflects deeper cognitive effort — not failure.

Normalbaseline

88.0%

Driversocratic

58.6%

Navigatorevaluator

55.0%

Session outcomes

26% ran to time expiry — an "immersion effect", not abandonment.

60.2% · Completed

26.4% · Cognitive immersion

8.1% · Skipped

Performance & persistence by interaction mode

Driver mode raises iterative engagement; Navigator mode shows the "efficiency of insight" — fewer attempts, deeper pre-submission analysis.

Metric	Normal · Baseline	Driver · Socratic	Navigator · Evaluator
Eventual success rate	88.0%	58.6%	55.0%
Mean attempts per session	3.07	3.24	2.36
Attempts to first accept	2.46	2.66	1.80
Mean tests passed / submission	3.51 / 10	5.13 / 10	3.57 / 10

87.0%

of 5,474 Driver sessions completed without a single AI nudge — proof against AI-dependency.

|r_b| = 0.28

effect size, Normal vs Driver (Mann–Whitney U, p < 1×10⁻¹⁹⁰).

t = 6.97

highest significance in the study — students checking their own understanding.

Engagement deepens over the term

Across the 14-week semester, 35,433 submissions were processed without meaningful attrition. Lab-level success climbed steadily as students adapted to friction-first practice — engagement increased rather than eroded.

Lab 1 — Linked ListsLab 2 — Stacks & QueuesLab 3 — Binary Trees

07 Pedagogical Foundations

Friction, embedded structurally.

Cognitive Load Theory

Minimise extraneous copying

By withholding runnable code, AlgoGPT removes the shortcut that otherwise consumes working memory without building skill — redirecting effort toward germane load.

Self-Regulated Learning

Promote metacognitive monitoring

Students plan, monitor, reflect, and check their own understanding — the survey's two strongest measured dimensions.

Cognitive Forcing Functions

Evaluate before accepting

Navigator Mode's Accept / Reject gate makes students commit to a judgement before trusting AI output — reducing uncritical over-reliance.

Self-Regulated Learning survey — N = 155

Reflective self-evaluationM 3.46

Metacognitive monitoringM 3.41

Metacognitive planningM 3.40

Cognitive scaffoldingM 3.24

Affective outcomesM 3.23

Overall mean 3.39 / 5.0; all 12 items significantly above the neutral midpoint. "Checking understanding" gave the study's largest effect (t = 6.97).

08 Adaptive & Equitable

A personal mentor — calibrated to where each learner actually struggles.

AlgoGPT profiles every learner and detects struggle at a fine grain, so 24/7 Socratic coaching reaches a diverse cohort once shaped by access to private tutoring.

Diagnostic skill distribution

Week-2 profiling quiz — N = 760 students classified by the Learner Profiling Agent.

Beginner53.4% · 405

Unclassified32.0% · 243

Intermediate14.5% · 110

Advanced0.3% · 2

A mostly-beginner cohort — the profile that benefits most from progressive, confidence-building scaffolding.

Algorithmic struggle detection

A 9-level ordinal scale moves beyond code correctness to capture cognitive difficulty.

36.0%

33.7%

22.9%

7.4%

Moderate struggles · Levels 3–6 — 263 students with recurring reasoning challenges.

No explicit struggle · 246 students with stable conceptual fluency.

Severe struggles · Levels 7–9 — 167 students needing structured intervention.

Minor confusion · Levels 1–2 — brief, self-resolved uncertainty.

⊕

Infrastructure equity

A lightweight, web-based platform — no high-end hardware, no installs. World-class scaffolding on any device.

◷

The 24/7 tutor

Immediate, personalised feedback for learners who cannot afford private academic support — available at every hour.

⇄

Naturally differentiated

Scaffolded support for struggling learners; critical evaluation for advanced ones — diverse entry points, one system.

09 Scale, Sustainability & Roadmap

Proven at scale — and built to evolve with every cohort.

A stateless architecture scales from ten students to ten thousand on standard infrastructure. The pedagogy is domain-agnostic, and the roadmap turns today's uniform friction into tomorrow's adaptive scaffolding.

⇗

Horizontal scaling

Stateless sessions with no server-side memory overhead scale effortlessly on Azure container infrastructure and a React frontend.

⌥

Domain-agnostic blueprint

The tiered-nudge protocol is tied to structured reasoning, not a language — ready for Systems Programming, Maths, Physics and beyond.

◎

Quality at scale

Automated, AI-blind test cases keep grading objective; submission logs let instructors detect unproductive frustration early.

On the roadmap

Performance-triggered adaptive scaffolding

Calibrating the intensity of "friction" to a student's real-time struggle, so productive struggle never tips into unproductive confusion.

Diagram-based explanations

Mermaid.js integration to render data-structure and algorithm visuals inside the feedback loop, anchoring abstract concepts.

Audio-based feedback

Whisper-based models to let students give and receive feedback by voice — humanising the pair-programming experience.

Controlled learning-gains study

Pre/post assessments and difficulty-controlled comparisons to move from evidence of engagement to measured learning gains.

11 Research & Recognition

Peer-reviewed and validated by the global academic community.

The peer-reviewed deployment analysed 907 students and 14,958 submissions across 27 DSA problems; the figures throughout this site reflect the expanded post-publication dataset of 35,433 submissions.

ACM DIS

Beyond the Perfect Assistant: Provoking Learning with Flawed AI Partners

V. Balakrishnan, L. K. Kway, A. H. Sandeep, E. F. J. T. Wong, C. A. Ong, O. N. N. Fernando. “Beyond the Perfect Assistant: Provoking Learning with Flawed AI Partners,” DIS Companion ’26: Companion Publication of the 2026 ACM Designing Interactive Systems Conference, 13–17 June 2026, Singapore, pp. 1–5.

ACM Digital Library → PDF →

2026

NIE RPIC

On the Evolution of Pair Programming in AI-Scaffolded Learning

Accepted — NIE Redesigning Pedagogy International Conference 2026