What We've Learned

Practical lessons from training enterprise engineering teams to work with AI.

NEW

Why Tool Proficiency Is Not AI Transformation

The enterprise AI training market is selling stage one of a four-stage journey. Here's what the other three stages look like — and why they're the only ones that matter.

READ →

NEW

The Economics of AI Transformation: Build vs Buy vs Train

Every enterprise faces the same decision: hire an AI team, outsource to consultants, or transform the engineers you already have. The math overwhelmingly favors one option.

READ →

01

From Zero to Power User: A Training-Led Approach to AI Adoption

The four-stage methodology we use to take engineering teams from first AI interaction to full autonomy. How it works, what each stage covers, and how we measure it.

18 MIN READ →

02

Power Users vs Casual Users: What Actually Makes the Difference

The specific practices that separate developers who get 2-3x productivity gains from those who barely notice AI is there. It is a workflow change, not a tool change.

12 MIN READ →

03

Measuring AI Transformation: Why DORA Metrics Matter

Why "are we using AI?" is the wrong question. How to measure whether your engineering team is actually getting better, using the same metrics elite teams track.

11 MIN READ →

04

Case Study: Training 150 Engineers at a Middle East Food Franchise Giant

How a 50,000-employee restaurant franchise trained their 200-member engineering team to move from zero AI adoption to hands-free coding with Claude Code in 12 weeks.

15 MIN READ →

05

Case Study: Modernizing 44 Legacy Codebases at an Indian Enterprise Software Company

How a 25-year-old DMS/SFA provider with 1 million lines of Java trained their team to use Claude Code for legacy modernization, cutting incidents by 80%.

16 MIN READ →

From Zero to Power User: A Training-Led Approach to AI Adoption

March 2026 · Timo Team · 18 min read

The shift nobody talks about

AI coding tools started as autocomplete. Suggest a few lines. Developer accepts or rejects. That is not how productive teams use them anymore.

Today, experienced developers treat AI as a development partner. They describe what they want in natural language. The AI generates the implementation. The developer reviews, adjusts, and ships. The developer's primary job shifts from writing code to directing and validating code.

Three capabilities make this possible:

Capability What It Does How Power Users Use It
Inline Completions Suggests code as you type, line by line or block by block Boilerplate, utility functions, API handlers. 40-50% less typing on routine code.
AI Chat Conversational AI inside the IDE — explain, refactor, debug, generate Code comprehension, test generation, debugging. Used 20-30 times per day by power users.
Agent Mode Autonomous multi-step execution — AI plans, edits multiple files, runs terminal commands, iterates on errors Full feature implementation from a single prompt. Creates files, writes code, runs tests, fixes failures.

This is a fundamental change in what it means to be a productive engineer. The question is not "can you write code?" It is "can you direct an AI agent to produce correct, tested, production-ready code?"

What a power user's day actually looks like

This is the workflow that productive teams have converged on. It is not theoretical. We see this across every enterprise engagement we run.

Morning: Pick up a ticket

Developer reads a ticket or feature request. Opens AI chat: "Here's the requirement. Suggest an implementation approach." The AI outlines the approach — files to create, functions to write, tests needed. Developer reviews, adjusts, then says: "Proceed."

Building: TODO-driven development

Developer writes TODO comments describing what each piece should do. The AI generates the implementation for each TODO. Developer reviews each block, accepts or modifies. For larger features, agent mode handles the entire flow — creating files, editing imports, updating configs.

Testing: AI-generated test suites

"Generate unit tests for this module covering edge cases." The AI produces test files, developer reviews coverage. "This test is failing. Here's the error. Fix it." The AI debugs its own output. Testing and building happen in the same flow instead of as separate manual steps.

Review and ship

AI generates the PR description from the diff. Reviewer uses AI to review the PR: "Summarize changes. Flag any security concerns." Merge triggers the CI/CD pipeline automatically. The entire cycle from ticket to production is AI-assisted at every step.

The four-stage developer journey

Every developer we train progresses through four stages. Each stage is assessed before moving to the next. Nobody stays at the basics. This is the methodology we use across all enterprise engagements.

Stage 1: Foundations — "I can use AI tools"

What developers learn:

Install and configure AI tools in their IDE. Master inline completions — the Tab-accept workflow. Use AI chat to explain unfamiliar code in their own repositories. Refactor functions using chat suggestions. Submit their first pull requests with AI assistance.

What developers do (not watch):

Write functions using inline completions on their actual codebase. Explain unfamiliar code blocks using AI chat. Refactor a complex function with AI guidance. Submit pull requests. Everything happens on their real code, not sandbox exercises.

Assessment: Practical evaluation. Can they use the tools independently? If yes, they move on.

Stage 2: Productivity — "I am faster with AI"

What developers learn:

Agent mode — give the AI a task, it plans and executes across multiple files. TODO-driven development — write TODO scaffolding, AI generates implementation. Test generation — produce test suites for untested modules. Code review with AI — review real PRs, identify issues, suggest fixes. CI/CD fundamentals with AI assistance.

What developers do:

Convert a real ticket into TODO comments and have the AI generate working code. Generate a test suite for an untested module. Review a colleague's PR using AI and provide structured feedback. Build their first CI/CD workflow with AI assistance.

Assessment: Are they measurably faster? Are they using agent mode and chat, not just autocomplete? Measured, not assumed.

Stage 3: Mastery — "I build with AI"

What developers learn:

Build complete features end-to-end using agent mode. Configure quality gates, security scanning, and branch protection with AI. Use AI for debugging production issues — log analysis, root cause identification, fix generation. Write custom instructions for their team's coding standards.

What developers do:

Build a complete feature from ticket to production using AI at every step. Configure quality gates on their repository. Debug a real failure using AI chat. Create a custom instructions file for their team's stack and conventions.

Assessment: Can they build a feature end-to-end with AI? Can they configure project-level AI settings? Higher bar than Stage 2.

Stage 4: Power User — "I teach others"

What developers learn:

Train-the-trainer — teach AI workflows to new team members. Advanced prompt engineering for complex code generation. Custom instructions at organization and repository level. Integration with project management tools and team workflows.

What developers do:

Deliver a demo to their team on a workflow they have mastered. Write the custom instructions file their team will use going forward. Complete a full feature build (ticket to deploy) entirely with AI assistance. Become the person others go to when they get stuck.

Assessment: Can they train someone else? This is the test that matters. If a developer can teach the workflow, they own it.

How we measure success

We do not measure success by training hours completed or satisfaction surveys. We measure it with DORA metrics, tracked from day one of every engagement.

Metric What It Measures Why It Matters
Deployment Frequency How often code reaches production Higher frequency = smaller changes = lower risk
Lead Time for Changes Time from first commit to production Shorter lead time = faster delivery
Mean Time to Recovery Time to restore service after a failure Faster recovery = more resilient team
Change Failure Rate % of deployments causing failures Lower failure rate = higher quality

We capture baselines before training begins. This is the "before photograph." Without it, there is no way to prove improvement. Post-engagement, all four metrics are tracked continuously.

Across our engagements, we target and track: 25% productivity improvement, 15% efficiency gains, and 80% incident reduction. These are measured against baselines, not training hours.

Why training-led beats tool-led

Most organizations approach AI adoption by buying licenses and hoping developers figure it out. Some run a workshop. Maybe a lunch-and-learn. Then they wonder why adoption stalls at 20%.

The problem is not the tool. The problem is the workflow. AI tools are powerful, but they require a fundamentally different way of working. Writing TODO comments instead of code. Reviewing AI output instead of writing from scratch. Using agent mode for multi-file changes instead of editing one file at a time.

Nobody learns this from a one-hour workshop. They learn it by doing it on their own codebase, with someone who has done it before standing next to them.

That is what Timo does. We embed into your engineering team for 8-16 weeks. We take every developer through all four stages. We set up the measurement infrastructure so you can see the improvement. And when we leave, your team keeps doing it without us.

The methodology works because it is built from real engagements, not theory. Every stage, every assessment, every metric comes from watching hundreds of developers make this transition. We know where they get stuck. We know what accelerates them. We know what makes the change permanent.

By the end of an engagement, 80%+ of developers are actively using AI daily. Not because they were told to. Because the workflow is better.

Power Users vs Casual Users: What Actually Makes the Difference

March 2026 · Timo Team · 12 min read

Every engineering team we work with has the same distribution. A few developers take to AI tools immediately and become dramatically more productive. Most use the tools occasionally, get some value, but never change how they work. The gap between these two groups is enormous — and it has nothing to do with intelligence, seniority, or technical skill.

It is a workflow difference. Here is what separates them.

What power users actually do

We have trained hundreds of engineers across enterprise environments. The developers who get the most value share six specific practices.

1. Custom instructions per project

Power users write project-specific instructions that tell the AI how their team writes code. Coding standards, framework preferences, naming conventions, architectural patterns. These instruction files live in the repository, so every developer on the team gets consistent output. New team members automatically get project-specific suggestions from day one.

Casual users skip this entirely. They get generic output instead of project-specific code. Then they complain the AI "doesn't understand our codebase."

2. Agent mode for multi-file changes

Power users give the AI a task and let it plan and execute across multiple files. "Build a user authentication flow with JWT tokens and tests." The AI creates files, writes code, runs tests, fixes failures. One prompt, multiple files changed, tests passing.

Casual users edit one file at a time. They use AI like a faster keyboard. They miss the biggest productivity gain these tools offer.

3. Critical review of every suggestion

This one is counterintuitive. Power users are more skeptical of AI output, not less. They accept suggestions line by line, not whole blocks. They read what the AI wrote. They catch errors before they become bugs.

Casual users accept inline suggestions without reading them. They hit Tab on everything. They end up with code they do not understand and bugs they cannot diagnose.

4. Context priming

Before asking for code, power users describe the codebase structure. They explain what exists, what the constraints are, what the conventions are. They give the AI the same context a new team member would need.

Casual users skip context entirely. They ask for code in a vacuum and get generic results. Then they conclude AI tools are not useful for "real" codebases.

5. Chat for debugging before manual inspection

When something breaks, power users open AI chat first. "Why does this return null when input is empty?" "This test is failing with this error. Explain why and suggest a fix." They use AI to narrow down the problem before they start adding log statements.

Casual users never use AI for debugging. They stick to manual inspection — the same process they have used for years. They leave the most time-saving capability on the table.

6. Task chaining

Power users chain tasks in the same session, building on context. "Refactor this function for performance." Then: "Now add tests for the refactored version." Then: "Generate documentation." Each task builds on the previous one. The AI retains context across the chain.

Casual users start fresh every time. They lose context between interactions. Every question is isolated. They never build momentum.

The output difference

Teams that adopt the power user workflow report completing features 2-3x faster while maintaining or improving code quality. That is not a typo. Two to three times faster.

How? Because testing and review happen in the same AI-assisted flow instead of as separate manual steps. A developer who uses AI for the full cycle — understand ticket, plan implementation, write code, generate tests, review changes, create PR description — compresses hours of work into a continuous flow.

The quality improvement comes from consistency. AI-generated tests cover edge cases developers skip when they are tired or rushed. AI-assisted review catches patterns that human review misses. Custom instructions enforce standards that code review alone cannot.

It is a workflow change, not a tool change

This is the most important insight from our work: the difference between power users and casual users is not about the tool. Both groups have the same licenses. The same IDE. The same AI capabilities.

The difference is workflow. Power users changed how they work. They write TODO comments instead of code. They review AI output instead of writing from scratch. They chain tasks instead of starting fresh. They prime context instead of asking in a vacuum.

Casual users bolted AI onto their existing workflow. They use it where it fits without changing anything. That gives them maybe 10-15% improvement. Not bad, but nowhere near what is possible.

The role of structured ticketing

One thing we have noticed across engagements: teams with well-structured tickets adopt AI workflows faster.

When a ticket clearly describes the requirement, acceptance criteria, and constraints, a developer can hand that directly to an AI agent. "Here's the ticket. Suggest an implementation approach." The AI has everything it needs.

When a ticket says "fix the login bug" with no context, the developer has to do all the context-gathering manually before AI can help. The bottleneck is not the AI. It is the input.

This is why we work on ticketing practices as part of our engagements. Better tickets lead to better AI-assisted development, which leads to faster delivery. The upstream investment pays off at every step downstream.

The role of persistent context

The other accelerator is persistent context. Custom instructions, project documentation, memory files that carry information across sessions. When AI retains context about your codebase, architecture decisions, and team conventions, every interaction starts from a higher baseline.

Without persistent context, every AI interaction starts from zero. The developer explains the same things repeatedly. The AI makes the same mistakes repeatedly. It is like working with a new junior developer every day who has no memory of yesterday.

With persistent context, the AI remembers. It knows your conventions. It knows your architecture. It knows what was tried before and why it did not work. That compounding knowledge is what turns a useful tool into a reliable development partner.

How to make the transition

If your team is mostly casual users, here is what actually moves the needle:

Start with custom instructions. Write one file that describes your team's coding standards. Put it in the repository. This single change improves every AI suggestion for every developer on the team.

Require agent mode for any change touching 3+ files. Force the workflow change. Developers who try agent mode once usually do not go back.

Make AI-assisted review the default. Before submitting a PR, run the diff through AI: "Review this for bugs, security issues, and style violations." Build it into the process, not as an optional step.

Track who is using what. AI adoption dashboards show daily active users, suggestion acceptance rates, agent mode sessions. If someone is only using autocomplete, they need coaching on the full workflow, not more license seats.

The transition from casual user to power user takes about 4-6 weeks with structured guidance. Without it, most developers plateau at casual usage and stay there indefinitely.

Measuring AI Transformation: Why DORA Metrics Matter

March 2026 · Timo Team · 11 min read

The wrong question

"Are our developers using AI?" Every engineering leader asks this first. It is the wrong question.

Usage alone tells you nothing. A developer can use AI every day and produce the same output at the same quality at the same speed. A team can have 100% adoption and zero improvement in delivery metrics.

The right question is: "Are we getting better?"

Better means shipping faster. Recovering from failures quicker. Breaking things less often. Delivering more frequently. Those are measurable outcomes, and there is a well-established framework for tracking them.

The four DORA metrics, explained plainly

DORA (DevOps Research and Assessment) identified four metrics that predict software delivery performance. Elite teams score high on all four. Struggling teams score low on all four. There is strong correlation between these metrics and business outcomes — revenue, customer satisfaction, employee retention.

Here is what they are and why each one matters for AI transformation specifically.

Metric What It Measures Why It Matters for AI Adoption
Deployment Frequency How often code reaches production AI-assisted development should produce smaller, more frequent deployments. If frequency does not increase, the workflow change is not happening.
Lead Time for Changes Time from first commit to production deployment AI should compress development time. If lead time stays flat, developers are not using AI for the full cycle — just parts of it.
Mean Time to Recovery (MTTR) Time to restore service after a failure AI-assisted debugging should cut recovery time. Developers who use AI for log analysis and root cause identification recover faster.
Change Failure Rate % of deployments that cause production failures AI-generated tests and AI-assisted review should catch issues before production. If failure rate does not drop, the quality gates are not working.

These four metrics work as a system. You cannot game them individually. If you deploy more frequently but your failure rate increases, that shows up. If your lead time drops but recovery time increases, that shows up. The four together give an honest picture of engineering health.

Establish baselines before you start

This is non-negotiable. Before any training, any tool rollout, any workflow change — capture your current state.

We call this the "before photograph." Without it, you have no way to prove improvement. You will be stuck saying "it feels faster" in leadership reviews instead of showing a chart that says "deployment frequency increased 4x."

How to capture baselines:

Pull deployment history from your existing CI/CD system. Jenkins, GitLab CI, Azure DevOps — whatever you use. Extract deployment frequency, build duration, pass/fail rates. Capture PR merge times from your version control system. Record incident response times from your monitoring tools.

This does not need to be perfect. It needs to exist. A rough baseline is infinitely better than no baseline.

We capture baselines in the first two weeks of every engagement. It takes effort. Teams resist it because they want to start training immediately. But we do not skip it. Ever. The baseline is what makes everything after it credible.

What a dashboard should track

Three categories of data give you full visibility across the development lifecycle.

Operational: "Where is my code right now?"

Pipeline Status Board — Every repository, every environment. Green (passed), red (failed), yellow (in progress). This is the daily standup view.

Pull Request Flow — Open PRs, average time-to-review, time-to-merge, blocked PRs. Shows bottlenecks before they become problems.

Environment Status — Which version is deployed in dev, staging, production for each repository. Release managers and QA leads check this constantly.

Quality Gate Status — Code analysis scores, security findings, test coverage percentage per repository. Shows whether quality is improving or degrading.

Build Health — Build success rate over time, average build duration, flaky test tracking. Infrastructure reliability at a glance.

Engineering Health: "Are we getting better?"

This is the DORA dashboard. All four metrics, tracked continuously, compared against baselines.

Views that matter:

Executive summary — updated weekly, one page, four numbers with trend arrows. Per-repository detail — updated daily, shows which repositories are improving and which are not. Trend analysis — 4-week rolling window, shows the trajectory. Team comparison — shows how different teams are progressing relative to each other.

The audience for this dashboard is engineering leadership. It answers the question they actually care about: is the investment working?

AI Adoption: "Is the tool being used effectively?"

This is where usage metrics belong — as a supporting indicator, not the primary measure.

Metric What It Tells You Alert Threshold
Daily Active Users How many developers use AI tools each day Flag if below 70% for 3 consecutive days
Suggestion Acceptance Rate % of AI suggestions accepted by developers Flag if team average below 25%
Agent Mode Sessions Number of agent mode sessions per week Power user indicator — low count means casual usage
Chat Conversations per Day How often developers use AI chat Engagement indicator
Usage by Technology AI activity broken down by stack Identifies adoption gaps by team or technology

Target: 80%+ daily active usage by end of engagement. We hit this consistently when the four-stage training methodology is followed.

Real numbers from Timo engagements

We are not speculative about these targets. They come from active engagements where we track all of the above.

Metric Baseline Week 6 Week 12
AI-assisted commits 0% 34% 68%
Agent review compliance 91% 100%
Deployment frequency 1x/week 2x/week 4x/week
Incident volume baseline -40% -80%

The key targets we track across engagements:

25% productivity improvement — measured by deployment frequency and lead time reduction.

15% efficiency gains — measured by time savings in development cycle.

80% incident reduction — measured by change failure rate and incident volume against baseline.

These are board-level numbers. They are what leadership wants to see, and they are what we commit to tracking from day one.

Why most organizations measure the wrong things

The most common AI adoption metrics we see organizations track:

— Number of AI licenses purchased
— Training hours completed
— Survey responses on satisfaction
— Number of AI-generated lines of code

None of these tell you whether your engineering team got better. You can buy 500 licenses, train everyone, get great survey scores, and have zero improvement in delivery metrics. We have seen it happen.

DORA metrics are the antidote. They are not AI-specific. They measure engineering performance. If AI adoption is working, DORA metrics improve. If they do not improve, the adoption is not working — regardless of what the usage dashboards say.

How to get started

If your organization is rolling out AI tools or considering it, do these things before you buy a single license:

1. Capture DORA baselines now. Pull your deployment frequency, lead time, MTTR, and change failure rate from your existing systems. This takes a week. Do it.

2. Set up a dashboard before training starts. Even a basic one. Four numbers, updated weekly. If the numbers are not visible, they will not be tracked.

3. Define success criteria in terms of outcomes, not usage. "80% of developers using AI daily" is an input metric. "4x deployment frequency" is an outcome metric. Both matter, but the outcome is what justifies the investment.

4. Review weekly. DORA metrics reviewed weekly create accountability. Monthly reviews let problems fester for too long. Weekly cadence catches issues while they are still small.

The organizations that measure well, transform well. The ones that measure usage and call it success usually end up wondering where the ROI went.

Training 150 Engineers at a Middle East Food Franchise Giant

March 2026 · Timo Team · 15 min read

The client

One of the largest food franchise operators in the Middle East. 50,000+ employees across multiple countries. Over 2,000 restaurants. A 200-member engineering team building and maintaining the digital infrastructure that runs ordering, delivery, loyalty, payments, and kitchen operations across every brand in the portfolio.

They had a problem that most large enterprises share: their engineering team was productive, but not productive enough for what the business needed. Feature delivery took too long. Legacy mobile apps were expensive to maintain. Their iOS codebases were written in patterns from 2016, and every change required navigating thousands of lines of undocumented code. They were migrating from Bitbucket and Jenkins to GitHub Enterprise and needed to upskill their entire team simultaneously.

The problem

The engineering leadership had already procured GitHub Copilot Enterprise licenses for the full team. But license procurement is not adoption. Three months after rollout, usage data told a familiar story:

  • Most developers used Copilot as autocomplete — accepting inline suggestions without reading them
  • Almost nobody used Chat or Agent Mode
  • No custom instructions existed for any repository
  • No structured workflow for AI-assisted development
  • Zero change in DORA metrics since license activation

The tools were there. The methodology was not. This is the gap that separates organizations that buy AI tools from organizations that transform with them.

What we did

We embedded into their engineering org for 12 weeks. Not as consultants who build things and leave. As trainers who teach their team to build things with AI and then get out of the way.

The engagement covered 150 engineers across 3 batches. We ran two parallel tracks:

Track 1: Migration as learning. The team was migrating 8 repositories from Bitbucket to GitHub Enterprise. Instead of having a migration team do it while developers watched, we made the migration itself the training exercise. Every developer migrated their own repositories using Claude Code. Converting Jenkins pipelines to GitHub Actions became an Agent Mode exercise. Writing post-migration validation tests became a test generation exercise. By the time all 8 repositories were migrated, every developer had hands-on experience with Claude Code's core capabilities — not from a classroom, but from real work on their own codebase.

Track 2: Daily development workflow. Once a repository was on GitHub, the developer's daily workflow changed. We introduced a structured loop:

# The daily development loop after TIMO training $ claude --task "Implement loyalty points expiry for KSA region" Loading CLAUDE.md (project rules + coding standards) Loading memory/ (context from 28 prior sessions) Scope: src/loyalty/expiry/ + 4 related files Generating implementation... ✓ Implementation complete. 3 files changed. $ claude --review task-1247 Tests: 34/34 passing Rule violations: 0 Security scan: clear ✓ Approved. Merged to main. Context persisted.

Engineers stopped writing boilerplate. They started describing intent in natural language, reviewing AI output through structured quality gates, and maintaining persistent context across sessions. Claude Code remembered their codebase patterns, their team's coding standards, and the decisions made in previous sessions. Every new task started with context, not from scratch.

The four-stage progression

We ran every engineer through four stages, assessed at each gate:

Stage What They Could Do Assessment
FoundationsUse Claude Code for code comprehension, inline completions, basic refactoring. Submit PRs on GitHub.5-question quiz, ≥3/5
ProductivityAgent mode for multi-file changes. TODO-driven development. Test generation. AI-assisted code review.5-question quiz, ≥3/5
MasteryEnd-to-end feature builds. CI/CD pipeline authoring with AI. Custom instructions per repository. Production debugging.5-question quiz, ≥4/5
Power UserTrain-the-Trainer capability. Advanced prompt engineering. Multi-agent orchestration. Full autonomy.5-question quiz, ≥4/5

We identified 11 Train-the-Trainer champions across all 3 batches. These are the developers who sustain Claude Code adoption after the engagement ends. They run internal training, write custom instructions, and coach developers who fall behind. The capability stays because the people who maintain it are already on the team.

Custom instructions: the force multiplier

One of the highest-impact changes was setting up project-specific custom instructions for Claude Code. These are rules stored in the repository that tell Claude how this team writes code.

For the iOS team: "This is a Swift project using CocoaPods. Follow Swift API Design Guidelines. Use async/await, not completion handlers. All UI code uses UIKit, not SwiftUI."

For the backend team: "TypeScript strict mode. All functions must have JSDoc comments. No console.log in production code. Use the existing repository patterns for error handling."

Every developer on the team gets consistent Claude output. New team members get project-specific suggestions from day one. The AI behaves like a team member who has read the style guide, not a generic code generator.

The results

We tracked DORA metrics from day one. Baselines were captured from Jenkins build history and Bitbucket deployment records before migration. Post-training metrics were tracked continuously from GitHub Actions data.

Metric Before Training Week 6 Week 12
AI-assisted commits0%38%72%
Deployment frequency1x/week2.5x/week4x/week
Lead time for changes4.2 days2.1 days1.1 days
Agent review compliance91%100%
Incident volumebaseline-42%-78%
Daily active Claude users0/200112/200168/200 (84%)

Deployment frequency quadrupled. Lead time dropped from 4.2 days to 1.1 days. Incidents fell by 78%. And 84% of the 200-member engineering team was using Claude Code daily by week 12 — not because they were told to, but because the structured workflow was faster than their old one.

What actually changed

The numbers are the outcome. The actual change was behavioral.

Before: developers wrote every line manually, tested sporadically, and shipped when QA said they could. Code review was a bottleneck. Documentation did not exist. Every new developer spent weeks reading undocumented code to understand the system.

After: developers describe intent to Claude Code. Claude generates implementation following project-specific rules. Developers review through quality gates. Tests are generated alongside code, not after. Documentation is generated alongside code, not never. PRs include AI-generated descriptions. Reviewers use Claude to analyze diffs. The entire loop — ticket to production — runs through structured AI workflows.

The team did not just adopt a tool. They adopted a discipline. That discipline is what makes the change permanent. Tools can be abandoned. Disciplines — once embedded in how a team works — tend to stick.

The engagement in numbers

Metric Value
Engineers trained150 across 3 batches
Duration12 weeks
Repositories migrated8 (with full commit history)
CI/CD pipelines created32 (GitHub Actions)
Train-the-Trainer champions11
Training hours delivered252
Ongoing dependency on TimoNone

The team operates independently on GitHub Enterprise with Claude Code. No ongoing consulting contract. No vendor dependency. The 11 TTT champions train new hires. The custom instructions evolve with the codebase. The methodology is theirs.

Modernizing 44 Legacy Codebases at an Indian Enterprise Software Company

March 2026 · Timo Team · 16 min read

The client

A 25-year-old enterprise software company headquartered in Chennai, India. They build Distribution Management Systems (DMS) and Salesforce Automation (SFA) platforms for FMCG companies. Their software runs at scale: 100,000+ DMS users, 40,000+ field sales reps, 93,000 distributors, and over 5 million retail stores touched across India, the Middle East, Africa, and Southeast Asia.

Their client list reads like a who's-who of consumer goods: 60+ blue-chip brands spanning FMCG, manufacturing, banking, telecom, and hospitality. They are a market leader with deep domain expertise built over two decades.

They also have a problem that comes with 25 years of success: technical debt at scale.

The problem

Here is what 25 years of enterprise software looks like under the hood:

Dimension Reality
Enterprise codebases44 separate Java deployments — one per major customer
Total code volume~1 million lines in the main monolith
Actual business logic200,000-300,000 lines (the rest is ceremony)
Dead code30-40% identified as unused
SaaS tenants30 on a single codebase
Support volume30,000 incidents + 20,000 requests per quarter
DocumentationPoor to non-existent

The 44 enterprise codebases are the core challenge. Each major customer gets a bespoke deployment. Over the years, these deployments diverged. Some customers are running versions from 2010 that have never been upgraded. The codebases share 80%+ logic but are maintained independently. Knowledge lives in the heads of senior engineers, not in documentation. When someone leaves, context leaves with them.

The stack is 95% Java, with React and Angular frontends, iOS and Android mobile apps (some migrating to Flutter), MySQL and PostgreSQL databases, and a new TiDB data warehouse initiative. Most of it runs on AWS, with a few enterprise customers on Azure or on-premises infrastructure.

Support is entirely reactive. There is no proactive monitoring. When something breaks, a ticket appears in FreshDesk, and an engineer investigates from scratch. With 30,000 incidents per quarter, that is a lot of investigating from scratch.

What the board wanted

The CEO and board had committed to specific targets by September 2026:

  • 25% improvement in employee productivity
  • 15% improvement in engineering team efficiency
  • Incident volume reduced to 20% of current levels (30,000 → 6,000 per quarter)
  • SaaS platform fully transformed

These are not aspirational targets. They are board-committed KPIs with a 6-month deadline. The question was not whether to transform, but how to transform 44 codebases and an engineering team simultaneously without stopping delivery.

Our approach: three tracks, one methodology

We structured the engagement around three parallel tracks, all using the TIMO methodology — training the team to use Claude Code on their own codebases, not building things for them.

Track 1: Support and Operations. The highest-impact, fastest-payback track. We trained the support engineering team to use Claude Code for incident triage and resolution. The workflow:

# AI-assisted incident triage trained on 44 codebases $ claude --triage FRESH-28471 Classification: DMS distributor sync failure Codebase: enterprise-dabur-v3 Root cause: Stale cache in DistributorSyncService Similar incidents: 12 in last 30 days across 3 customers Validated fix available: yes Review gate: passed ✓ Fix deployed to staging. Regression tests passing.

The key insight: Claude Code's persistent memory means the AI accumulates knowledge about incident patterns across all 44 codebases. After the first month, the agent can identify root causes that would take a human engineer hours to trace through undocumented code. It cross-references incidents across customer deployments and identifies systemic issues, not just individual tickets.

Track 2: Enterprise Platform. The legacy modernization track. 44 codebases, 1 million lines. The approach:

  1. Diagnose: Catalogue all 44 codebases. Run static analysis. Cluster by similarity. We found that 44 "separate" products were actually 5 canonical products with customer-specific overrides. The divergence was a maintenance problem, not a product problem.
  2. Clean: AI-led dead code removal. Claude Code analyzed each codebase, identified unused classes, unreachable methods, and redundant abstractions. 30-40% of the code was removed without changing any behavior. For the first time, engineers could see the actual system underneath the scaffolding.
  3. Document: Claude Code generated documentation for what remained. This was not generic Javadoc. Claude had persistent context about the business domain — DMS workflows, distributor hierarchies, FMCG-specific logic. The generated documentation explained business intent, not just code structure.
  4. Extract and rewrite: Separate the 200K lines of business logic from 800K lines of ceremony. Build a modernization blueprint. POC: one codebase rewritten in Python/FastAPI with Claude Code doing 70-80% of the implementation. The rewritten version: 7-10x smaller, functionally equivalent, fully tested.

Track 3: SaaS Business. The 30-tenant platform needed a different approach. Here, the challenge was not legacy code but velocity — shipping features for 30 customers without breaking anything. We trained the SaaS team on Claude Code's structured workflow: ticket-driven AI development, auto-generated test suites covering all 30 tenant configurations, and AI-managed deployment pipelines with quality gates at every environment stage.

Training the team on Claude Code

Across all three tracks, the training followed the same four-stage progression. But the content was different for each track because the codebases were different.

For the support team, Foundations meant using Claude Code to comprehend unfamiliar code in customer-specific deployments. For the enterprise team, it meant navigating a 1M-line Java monolith. For the SaaS team, it meant understanding multi-tenant architecture and the blast radius of changes.

The progression was the same everywhere:

Stage Support Team Enterprise Team SaaS Team
FoundationsCode comprehension across customer codebasesNavigate 1M-line monolith with ClaudeUnderstand multi-tenant impact analysis
ProductivityAI-assisted triage and fix generationDead code removal, doc generationFeature development with AI agents
MasteryCross-codebase pattern recognitionBusiness logic extraction and rewriteCI/CD pipeline authoring with AI
Power UserPredictive incident preventionFull modernization workflowMulti-agent orchestration for releases

Custom instructions were critical. Each of the 44 codebases got its own instruction file telling Claude the specific patterns, naming conventions, and business logic for that customer's deployment. The SaaS platform got instructions covering all 30 tenant configurations. When Claude generates code or fixes for a specific customer, it follows that customer's rules — not generic Java best practices.

The metrics

We established DORA baselines in the first month by analyzing existing deployment records, incident logs, and commit history. Here is where things stood at the 3-month mark, with 6-month targets:

Metric Baseline Month 3 Month 6 Target
Employee productivitybaseline+12%+25%
Deployment frequency1x/week2.4x/week4x/week
Incident volume (quarterly)30,00016,200≤6,000
Dead code removed0%34%40%+
Codebases documented0/4418/4444/44
Daily active Claude users061% of team80%+

The incident reduction is the headline number, but the documentation metric matters more long-term. 18 of 44 codebases now have AI-generated documentation that explains business intent, not just code structure. For a company where knowledge lived exclusively in people's heads for 25 years, this is transformational. When an engineer leaves, the knowledge does not leave with them anymore.

The compression effect

The POC rewrite of one enterprise codebase validated the compression thesis:

Metric Legacy (Java) Rewrite (Python/FastAPI)
Total lines of code~180,000~22,000
Business logic lines~45,000~18,000
Ceremony/boilerplate~135,000 (75%)~4,000 (18%)
Test coverage12%89%
Compression ratio8.2x

8.2x code compression. The rewritten version has 89% test coverage versus 12% in the legacy version. Claude Code wrote approximately 75% of the new implementation, with human engineers handling business logic decisions, edge cases, and domain-specific validation rules.

The speed multiplier is even more dramatic: the traditional estimate for rewriting this codebase was 18-24 months with a 20-person team. The AI-assisted approach completed the POC in 6 weeks with 3 engineers and Claude Code. That is roughly a 25x speedup, driven by the combination of AI productivity and the compression effect of moving from verbose Java to modern Python.

What made this engagement different

Three things set this apart from a typical consulting engagement:

1. The AI learns the domain. Claude Code's persistent memory accumulated knowledge about DMS workflows, FMCG distribution patterns, and customer-specific business rules over the course of the engagement. By month 3, the AI could generate code that correctly handled distributor hierarchies, pricing tiers, and regional tax configurations — not because it was pre-trained on this domain, but because the persistent context architecture retained every decision and pattern from every session.

2. The team trains on their own code. Not on tutorials. Not on toy projects. Every exercise, every assessment, every stage progression happened on the actual 44 codebases the team maintains. The Foundations stage for one engineer might involve comprehending a 2010-era customer deployment they have never touched. The Mastery stage for another might involve rewriting the CI/CD pipeline for the SaaS platform. The methodology is universal; the application is specific to each engineer's responsibility.

3. The capability is permanent. The engagement ends. The consulting team leaves. What remains: custom instructions in every repository, persistent memory across all codebases, documented workflows, Train-the-Trainer champions who run internal training, and DORA dashboards that track ongoing improvement. The methodology is embedded in how the team works, not in a consulting contract.

Current status

The engagement is in its fourth month. All three tracks are executing. The board reviews DORA metrics weekly. The 6-month targets — 25% productivity, 15% efficiency, 80% incident reduction — are on track based on Month 3 actuals. Spoors Technologies, a recently acquired subsidiary, is scheduled for a separate assessment in Phase 2.

The transformation is not complete. But the trajectory is clear. And the team doing the transforming is not an external consulting firm. It is their own engineers, trained to manage AI agents on their own codebases.

Let's Talk About Your Team.

Schedule a meeting. We'll figure out if there's a fit and what an engagement would look like.

Schedule a Meeting

poorna@timolabs.dev