Lessons from 200 Million Lines of Code
What happens when you try to map the technical knowledge of 2,500 people across 150 products. The tools were the easy part.
You bump into a colleague in the hallway. It’s a digital services company, 2,500 people worldwide. The conversation follows the same pattern every time: “What client are you on? What stack? What’s been hard lately?”
Three questions. That’s it. And in those three questions lives the entire knowledge graph of a company.
I spent two years trying to turn that hallway conversation into a system. Not to control knowledge. To help people grow.
Everyone has experience. Everyone can grow their experience.
We had 150+ products built for clients across the digital product industry, mostly web technologies. Thousands of developers, each carrying lessons from the codebases they’d touched. Experience is the sum of lessons learned. The more clients you work with, the more products you ship, the more contexts you get exposed to, the better you become at identifying patterns and crystallizing what good looks like.
But that experience was scattered. Trapped in people’s heads. Locked in repos nobody cross-referenced. A developer who’d solved a gnarly caching problem on one project had no way of knowing that three teams over, someone was struggling with the exact same thing.
My mission was simple: help identify what good looks like by creating environments of enhanced quality and collaboration. The open source mindset. Meritocracy over politics.
The question was: could we map what 2,500 people collectively knew?
Three dimensions
It turns out you can index a codebase along three axes that tell you almost everything about a product’s technical DNA:
Programming Languages. GitHub Linguist is open source and powers the language stats on every repo. We used it to extract what languages each of our 150+ products used. The problem was it only gave percentages, not actual numbers. Knowing a repo is “83% TypeScript” doesn’t tell you if that’s 800 lines or 800,000.
Lines of Code. Not blank lines, not comments. Actual code. We used SCC, the Succinct Code Counter by Boyter, an open source tool written in Go. It’s fast. Absurdly fast. It scans files directly instead of parsing them, which meant we could crunch repos that CLOC would choke on. It also gave us complexity scores, which turned out to be more useful than raw line counts.
Open Source Dependencies. We built a custom dependency parser for JavaScript/TypeScript, Java, PHP, Go, Python, and Ruby. This was the dimension nobody expected to care about, and the one that produced the most interesting conversations.
When you combine all three, you get a map. Not just of code, but of collective craftsmanship.
What 200 million lines actually looks like
The number is disorienting at first. 200 million lines of code across 150 products sounds like an abstraction. It’s not.
It’s a Docker container that crashes because you just fed it a 10+ GB folder. It’s discovering that one repo has a million lines of C++ that nobody knew about, hidden behind a build step that nobody questioned. It’s realizing that inquirer.js is better than commander.js when you need humans to interact with your indexing scripts, because the people running them aren’t developers.
Writing the code to count code was half the work. The other half was making the results useful for the people doing the actual building.
The real lessons
Professional similarities exceed differences. This surprised me the most. Across 150 products, built by hundreds of different developers for dozens of different clients, the patterns converged. The same architectural decisions. The same dependency choices. The same mistakes. We’re more alike than our framework wars suggest. That’s good news. It means the lessons transfer.
Codebases are finite. This sounds obvious, but it changed how I think about onboarding. Every codebase, no matter how intimidating, is a finite thing you can understand. When you know the languages, the line count, and the dependencies, the mountain becomes a map. Maps are navigable. Mountains are not.
Complexity comparisons only work within the same language. A complexity score of 47 in TypeScript and 47 in Go mean completely different things. We learned this the hard way by generating reports that made Go services look simpler than they were.
Opening up is worth it. Exposing your codebase for analysis means admitting what’s in there. The messy parts, the technical debt, the million lines of C++ nobody mentioned. Some teams hesitated. The ones who opened up learned faster, improved faster, delivered better products.
A personal note
I gave this talk in July 2023. I’d recently been laid off from a company that handled it brutally. Some of those people were in the audience.
One of the slides mentioned “The Brilliant Jerk.” The person who hoards knowledge as leverage, who makes themselves indispensable by being the only one who understands the system. I’d seen that pattern up close. I’d felt its consequences personally.
But the talk wasn’t about that. The talk was about the opposite. About building systems where knowledge flows freely, where people grow through shared experience, where no single person’s departure can cripple a team. Not because you strip their power. Because you make the whole organization stronger.
Standing on that stage and talking about craftsmanship, about helping people grow, about meritocracy over politics. That was the answer I chose.
The privacy part (this matters)
One thing I want to be clear about: we never read the code. We never hoarded IP. No source code ever left the secure systems it was hosted on.
Everything we indexed was metadata. “150,000 lines of JavaScript.” “230 Python files.” “Depends on React 17, Express 4, PostgreSQL.” Numbers, not contents. Shapes, not secrets.
This was part of the secret sauce. You can map an entire organization’s technical capability without ever seeing a single line of proprietary code. Language counts, dependency graphs, complexity scores. That’s enough to understand what a team knows, what they’ve built, and where they can grow. You don’t need to read the code to understand the codebase.
That constraint wasn’t a limitation. It was a feature. It meant every team could participate without worrying about IP exposure. It meant clients were never at risk. And it meant the system scaled to 150 products without a single security conversation becoming a blocker.
The hallway test
Here’s what I keep coming back to. We built tooling. We wrote parsers. We indexed 200 million lines of code and generated reports and dashboards and dependency graphs.
But the hallway conversation still works better.
“What client are you on? What stack? What’s been hard lately?”
The tools tell you what people typed. The conversations tell you what they learned. 30 craftspeople with a combined 240 years of experience, each carrying 100,000 lines of lessons in their heads. The knowledge was already there.
The hard part was never counting. It was connecting.