Why legacy code is hard - and why AI helps
Most organisations run on yesterday’s software: sturdy, revenue-critical, stubborn. Over years, sometimes decades, teams ship hotfixes, swap tooling, merge products, separate again, and hurry on. Knowledge evaporates. Documentation drifts. Context takes up residence in people’s heads and nowhere else. One day, a “simple” change lands on the backlog, and the room goes quiet. Even routine edits feel like defusing a bomb.
AI changes the starting position. Instead of months of slow reverse-engineering, you can assemble a navigable picture of how the system behaves today-what calls what, who writes where, why that edge case exists-sooner. Not magic. Not autopilot. A faster, clearer understanding that enables humans to change things safely.
What AI actually does here (and what it doesn’t)
Useful: it spots patterns in vast codebases, turns structure into maps, writes plain-language explanations, fetches the right snippets on demand, suggests tests from observed logic, and keeps docs aligned with the code.
Not useful as a replacement for engineers: outputs still need human verification, design judgment, and domain context. Think power tools for understanding, not “auto-modernise my mainframe.”
Core capabilities, in detail
- Automated mapping & discovery
Goal: transform sprawl into structure.
What happens: parsers and analysers produce graphs -modules, packages, callers↔callees, data flows, external integrations, scheduled jobs. Click from a table to its writers; from an API endpoint to all the work it triggers downstream.
Why it matters: a whole-system view early shrinks guesswork and accelerates planning. - Plain-language summarisation
Goal: explain unfamiliar code in clean, accurate prose.
What happens: AI reads functions, classes, configuration, and optionally commit history to produce crisp descriptions, purpose, inputs/outputs, side effects, and caveats.
Why it matters: new engineers land faster; experienced ones recover intent; conversations shift from “what is this?” to “how should we change it?” - Business rule extraction
Goal: surface the rules the system actually enforces.
What happens: logic that looks like policy-eligibility thresholds, proration, fees, and exception handling gets pulled into human-readable lists tied to specific files and lines.
Why it matters: stakeholders see policy as implemented, enabling safer changes and better audits. - Impact analysis
Goal: understand ripple effects before you move a single line.
What happens: given a function, module, schema, or message contract, AI enumerates probable callers/callees, affected tables, jobs, and integration points, complete with source links.
Why it matters: fewer surprises during refactors and migrations; clearer, smaller, safer change plans. - Test case generation
Goal: build a safety net first.
What happens: suggested unit/integration tests based on branches, boundaries, and typical data paths. Engineers curate, complete, and add domain-specific scenarios.
Why it matters: you lock in current behaviour so regressions shout immediately when you start extracting or rewriting. - Documentation refresh
Goal: make docs trustworthy again.
What happens: current structure and summaries become living documentation -module overviews, sequence descriptions, API notes, tables of readers/writers, “how to change X” guides -each pointing back to exact source lines.
Why it matters: Documents become reviewable, versioned, and less likely to drift from reality.
Where this shines
• Monolith → services, where seams must be cut without breaking behaviour.
• Batch/mainframe estates with intricate jobs, schedules, and data flows.
• High-risk migrations that demand a verified understanding before any lift-and-shift.
• High turnover or looming retirements, when institutional knowledge is thinning.
Known limits and guardrails
Accuracy: AI can be confidently wrong; engineers and SMEs must review.
Scale: giant repositories need indexing, and copying and pasting into a chatbot won’t scale.
Security: analyse proprietary code in trusted environments; follow IP/data policies.
Traceability: every summary and rule should link to files/lines, commits, schemas.
Change control: treat AI outputs (docs/tests) like code-reviewed, versioned, auditable.
A practical, repeatable implementation plan
Phase 0 - Preparation (1–2 weeks)
• Pick a bounded domain (module, service boundary, batch flow).
• Define success signals: answer key questions faster; baseline tests for top flows; an extraction plan for one seam.
• Confirm access: source control, builds, schemas, job schedules.
Phase 1 - Index & map (1–3 weeks)
• Parse and index source; build dependency/call/data-flow graphs.
• Pull in schemas, ETL specs, job schedules, config.
• Produce a high-level system map and a top-10 flows list by business criticality.
Phase 2 - Explain & verify (2–4 weeks)
• Generate summaries for target modules/functions and high-risk areas.
• Extract candidate business rules with code citations.
• SMEs review and correct; maintain a “known-wrong” set to improve future runs.
Phase 3 - Tests & docs (2–4 weeks)
• Convert verified behaviour into unit/integration tests (boundaries and negatives included).
• Create living docs with links to source; wire docs/tests into CI so drift is visible.
Phase 4 - Modernise safely (ongoing)
• Use maps + tests to refactor, extract seams, or migrate components.
• Track regressions, lead time, and mean time to diagnose.
• Iterate -each slice feeds new knowledge back into the index and docs.
Roles and responsibilities
Tech lead/architect: selects domains, defines seams, sets guardrails.
Senior engineers: verify summaries, curate tests, implement changes.
Domain SMEs/product owners: confirm rules and intent.
Platform/DevEx: maintain indexing pipelines, CI hooks, secure compute.
Compliance/security: oversee data handling, access, and audit trails.
What “good” looks like (signals, not vanity metrics)
Mean time to understanding drops -“what breaks if we change X?” is answered quickly.
Critical flows gain executable tests and up-to-date docs.
Defects are contained earlier because tests reflect reality.
Confidence increases: take smaller steps, avoid freezes.
Docs are trusted because they link to the source and get PR-reviewed.
Architecture pattern (tool-agnostic)
Ingestion & parsing → Structural index (ASTs, calls, deps, lineage, endpoints) →
Semantic index (embeddings for code/comments/diagrams) →
Query & reasoning (combine structural + semantic retrieval) →
Generation (summaries, rules, impacts, tests, diagrams) →
Human review (approve/correct/reject; feed corrections back) →
Publication & CI hooks (docs in repo; tests in CI; drift signals raised).
Example prompts and checks
Mapping: “List all functions that write to Orders.TotalAmount; include file paths and line ranges.”
Rules: “Summarise discount rules in DiscountService; cite exact conditionals.”
Impact: “If Customer.Status becomes AccountState, what code paths and queries require review?”
Tests: “Propose unit tests for CalculatePremium with edge cases and expected assertions.”
Docs: “Describe PaymentCapture from API call to DB commit; link to source.”
Verification checklist: Does the answer cite files/lines? Do tests pass now? Did an SME confirm domain intent? What edge cases are missing?
FAQs
Can AI rewrite our legacy code automatically?
It can suggest refactors and generate adapters or boilerplate, but safe modernisation still needs human design, review, and testing.
Do we need tests first?
No. Creating tests is an early and valuable output; humans confirm which tests represent the intended behaviour.
Is this safe for confidential code?
Yes -when analysis runs in secure environments under your policies. Avoid sending proprietary source to ungoverned services.
Will this replace architects or senior engineers?
No. It compresses discovery time, allowing experts to focus on decisions and design.
Summary
AI won’t modernise your system for you. It will, reliably, help you see the system you actually have, explain how it behaves, and change it with a safety net. That’s the difference between risky guesses and deliberate improvement.