Why Healthcare.gov Tracked 200+ Risks But Missed the 3 That Mattered (Prioritizing Risks)
It’s October 1, 2013. 12:01 AM Eastern Time.
Healthcare.gov went live. The most important government website launch in decades. Millions of Americans are waiting to sign up for health insurance. The President’s signature achievement hangs in the balance.
The Centers for Medicare and Medicaid Services had spent months preparing. They had contractors. They had consultants. They had a risk management process without prioritizing risks properly.
McKinsey delivered a comprehensive risk assessment back in March. A 14-slide presentation. A top-10 list of potential problems. Over 200 project documents reviewed. Forty people were interviewed across federal agencies.
The team tracked everything. Budget variance. Schedule slips. Contractor deliverables. Security audits. Testing timelines.
They felt prepared.
At 12:03 AM, the first users hit the site.
By 2:00 AM, Healthcare.gov crashed.
Expected traffic: 50,000 users. Actual traffic: 250,000 users.
On day one, exactly six people successfully enrolled in health insurance. Six.
By the weekend, the site was taken down. Completely unusable.
The final cost? Over 2 billion dollars. Started at 93 million.
Here’s what haunts me about this story. They didn’t fail because they didn’t identify risks. They had identified hundreds of them. They didn’t fail because they didn’t track risks. They tracked them religiously.
They failed because they spent months managing risks that didn’t matter while the three risks that would destroy them sat in their documents, acknowledged but not prioritized.
The Illusion of Control
Your risk register has 47 entries. Color coded. Prioritized. Owners assigned. You review it weekly.
You feel like you’re managing risk.
But here’s the question nobody asks: Are you tracking the RIGHT risks?
Healthcare.gov had identified risks around budget overruns. Contractor performance. Schedule delays. Security compliance. All the usual suspects.
In September 2013, one month before launch, they had documented 45 critical defects and 324 serious defects.
And they were still adding new requirements.
Let me say that again. With 45 critical problems identified, they weren’t fixing them. They were adding more features.
Why? Because they were tracking the wrong risks. They were managing what was easy to measure instead of what would actually kill them.
I did the exact same thing on a banking system migration three years ago.
We had a beautiful risk register. Fifty-three risks. All properly categorized. High, medium, low priority based on standard formulas. Impact times likelihood. The textbook approach.
We spent hours every week reviewing that register. Updating statuses. Checking mitigation plans.
We had risks for things like “vendor might deliver late” and “testing environment might have downtime” and “team member might leave during critical phase.”
The project failed anyway.
Not because of any risk in our register. Because of a risk we had identified but ranked as medium priority: “Legacy system documentation might be incomplete.”
Might be incomplete? It didn’t exist. The guy who built the system 15 years ago had retired. Nobody knew how half of it worked.
That single risk destroyed us. Six months of rework. Budget blown. Timeline obliterated.
But it was only ranked medium priority in our register because we used the standard formula. Likelihood times impact. It seemed unlikely that the lack of documentation would be a showstopper.
Until it was.
The Three Risks That Actually Matter
After Healthcare.gov crashed, the White House brought in a crisis team. Jeff Zients, former OMB director. Todd Park, White House CTO. Mickey Dickerson from Google.
They didn’t review the 200 page risk register. They identified three problems that were actually killing the project:
Problem 1: No single person in charge.
Fifty-five different contractors. Multiple federal agencies. Nobody with authority to make decisions. McKinsey had flagged this in March: “No single empowered decision-making authority.”
It was in the risk assessment. But it wasn’t treated as an existential threat.
Problem 2: No end-to-end integration testing at scale.
They had tested individual components. But they never tested the whole system with 250,000 concurrent users. They never asked “what breaks when everyone shows up at once?”
Problem 3: No system integrator.
Fifty-five contractors building different pieces. Nobody responsible for making sure the pieces worked together. The main contractor, CGI, thought they were just another vendor. CMS thought CGI was the integrator.
That miscommunication cost 2 billion dollars.
Three risks. That’s what killed Healthcare.gov. Not the 47 other things in their tracking system.
Why We Track the Wrong Risks
Here’s why this happens to everyone. Including you. Including me.
We track risks that are easy to measure.
Budget variance? Easy. Compare actual spend to planned spend. Update the spreadsheet.
Schedule delays? Easy. Compare milestone dates to baseline. Update the Gantt chart.
Vendor performance? Easy. Count deliverables. Track response times. Update the dashboard.
“No single empowered decision maker”? How do you measure that? How do you track that weekly? How do you put that in a dashboard?
You can’t. So it stays in position 23 of your risk register while you spend your time tracking the measurable stuff.
We also prioritize risks using formulas that lie to us.
Impact times likelihood. It’s the standard approach. Taught in every PM certification course.
But it misses the risks that actually kill projects.
Healthcare.gov’s lack of integration testing had huge impact. But what was the likelihood? Hard to say. It’s not like they flipped a coin. They either tested at scale or they didn’t.
The formula breaks down when the risk isn’t probabilistic. When it’s binary.
And the formula completely ignores two factors that determine which risks destroy you:
Speed of escalation: How fast does this risk go from “problem” to “disaster”?
Detection difficulty: How hard is it to know this risk is about to hit you?
Healthcare.gov’s scalability problem had fast escalation (minutes, not days) and difficult detection (you only find out when the system crashes under real load).
That’s the combination that kills projects. But the standard formula misses it completely.
The Real Risk Matrix
After my banking disaster, I rebuilt how I evaluate risks.
I still use impact and likelihood. But I add two more dimensions:
Escalation speed: If this risk hits, how fast does it become a crisis?
- Slow: Days or weeks to become critical
- Medium: Hours to become critical
- Fast: Minutes to become critical
Detection difficulty: How hard is it to know this risk is materializing?
- Easy: Clear warning signs, measurable indicators
- Medium: Requires active monitoring to detect
- Hard: Only discover when it’s already hit you
The risks that kill projects score high on impact, high on escalation speed, and high on detection difficulty.
Those are your tier one risks. The ones that deserve 80 percent of your attention.
Everything else? Track them. But don’t let them distract you from the risks that will actually destroy you.
Let me show you what this looks like in practice.
Finding Your Real Risks
I ran this framework on a healthcare app project last year.
We had the usual risk register. Thirty-eight entries. Budget risks. Resource risks. Technology risks.
Then I added the two new dimensions. Escalation speed and detection difficulty.
Most of our risks were slow escalation and easy detection. “Developer might leave” gives you weeks of warning. You can see resignation coming. You have time to react.
But three risks scored differently:
Risk 1: “HIPAA compliance might be incomplete”
- Impact: Catastrophic (can’t launch without compliance)
- Likelihood: Medium (first time doing healthcare)
- Escalation speed: Fast (audit fails, launch stops immediately)
- Detection difficulty: Hard (only know during official audit)
Risk 2: “API rate limits might be lower than assumed”
- Impact: High (app becomes unusable)
- Likelihood: Medium (third party API, unclear documentation)
- Escalation speed: Fast (hits limit, app stops working immediately)
- Detection difficulty: Medium (need production scale testing to find out)
Risk 3: “Client and team define ‘patient portal’ differently”
- Impact: High (six months of rework)
- Likelihood: High (classic assumption risk)
- Escalation speed: Medium (discovery during final demo)
- Detection difficulty: Hard (only surfaces when showing actual system)
Those became our tier one risks. We spent 80 percent of our risk management time on those three.
For HIPAA compliance, we brought in an external auditor at month two. Not month ten. Month two. Cost us 15,000 dollars. Caught twelve compliance gaps before we’d written the code.
For API rate limits, we tested at scale in week four. Not week forty. Week four. Discovered the limits were half what we thought. Redesigned our architecture before we’d built it wrong.
For the definition gap, we built a clickable prototype in week one. Had the client walk through every screen. Found fifteen major disconnects between what we were building and what they expected.
Three risks. We caught all three early. The project succeeded.
Compare that to Healthcare.gov. They identified the risks. They just didn’t prioritize them.
Why Scrum Makes This Easier
Traditional project management makes you set risk priorities once at the beginning. Then you track them for months.
Scrum forces you to re-evaluate priorities every single sprint.
Every sprint review is a detection mechanism. Is the integration working? Are the pieces talking to each other? Are we building what stakeholders actually want?
You find out in two weeks. Not six months.
Every sprint retrospective is a risk audit. What almost went wrong? What did go wrong? What are we learning about our real risks?
Every sprint planning session is a prioritization check. Given what we learned last sprint, what risks need attention this sprint?
The feedback loops catch the fast escalating, hard to detect risks before they kill you.
Healthcare.gov used waterfall. They set priorities in 2012. They tracked them until October 2013. The risks that mattered didn’t get reassessed until after the crash.
In Scrum, those risks would have surfaced by sprint three. Not month fifteen.
Your Next Move
Stop reading. Open your risk register right now.
Look at your top five risks. The ones getting the most attention.
Now ask yourself these four questions:
Question 1: If this risk hits, how fast does it become a crisis?
If the answer is “we’d have weeks to respond,” it’s probably not a tier one risk.
Question 2: How hard is it to detect this risk approaching?
If the answer is “we’d see it coming from a mile away,” it’s probably not a tier one risk.
Question 3: Are we spending risk management time on things that are easy to measure instead of things that could kill us?
Be honest. Are you tracking budget variance because it’s important or because it’s easy?
Question 4: Which risks are we assuming won’t happen just because they’re hard to quantify?
Healthcare.gov assumed “someone will handle the integration” because assigning that risk to someone felt complicated. That assumption cost 2 billion dollars.
Find your three tier one risks. The ones that are high impact, fast escalation, and hard to detect.
Then spend 80 percent of your risk management time on those three.
Not on the forty-seven other risks in your register. Not on updating spreadsheets. Not on reviewing dashboards.
On the three risks that will actually kill your project.
Healthcare.gov had consultants. They had risk assessments. They had tracking systems.
They still lost 2 billion dollars and crashed on day one.
Because they tracked 200 risks but didn’t prioritize the 3 that mattered.
Don’t make the same mistake.
The System That Finds Your Three Risks Automatically
Here’s what I learned after Healthcare.gov, after my banking disaster, after watching dozens of projects fail from wrong priorities.
You don’t need better risk identification. You already identify plenty of risks.
You don’t need better tracking tools. Your spreadsheets work fine.
What you need is a system that forces you to re-evaluate priorities constantly. A system that makes it impossible to spend months tracking the wrong risks.
That’s exactly what Scrum does.
Every two weeks, sprint reviews surface integration problems immediately. You’re not waiting until month fifteen to discover fifty-five contractors aren’t coordinating. You find out in sprint three.
Every retrospective asks “what almost killed us this sprint?” Those fast-escalating, hard-to-detect risks? They show up in retros before they show up as disasters.
Every sprint planning forces the question: “Given what we learned, what are our real risks right now?” Your priorities get challenged every two weeks, not every six months.
The feedback loops aren’t just about building better software. They’re about catching the tier one risks before they catch you.
I’ve taught hundreds of teams how to use Scrum’s ceremonies as risk detection mechanisms. The teams that get it right don’t just deliver better products. They avoid the disasters that kill everyone else’s projects.
Because they know which three risks actually matter. And they know it early enough to do something about it.
Want to master the art of finding your tier one risks before they destroy your project? In my Risk Management in Scrum course, you’ll learn exactly how to identify high-impact, fast-escalating, hard-to-detect risks using Scrum’s built-in feedback loops. You’ll discover why sprint ceremonies are the most powerful risk detection tools you have, and how to use them to re-prioritize risks every two weeks instead of tracking the wrong ones for months. Healthcare.gov spent six months ignoring McKinsey’s warnings. Scrum teams find their killer risks in six days. Learn the system that makes wrong priorities impossible.