The incident didn’t start with malware or a zero-day. It started with a spreadsheet. What began as a routine firmware vulnerability review quickly revealed a deeper problem—one that would later shape how we think about the firmware genome across large-scale industrial device fleets.

We were on a late-evening call with the operations team of a renewable energy company. Their fleet was a few thousand industrial gateways scattered across wind farms and substations.

The Moment a Spreadsheet Stopped Being Enough

The kind of kit that just sits there for years, quietly doing its job, until suddenly everyone remembers it exists. They followed the standard playbook. Pull a representative set of firmware images. Send them through an SBOM and vulnerability scanner. Import everything into a dashboard and export it as Excel for “management visibility.”

Also Read: Part 3: How Firmware Vulnerability Noise Masks Real Security Risks in Embedded Systems

The security manager shared his screen and scrolled. Six thousand eight hundred and something vulnerabilities. Rows of CVE IDs, severity scores, package names, and versions. It looked serious. It looked like progress. Then the operations lead, the person who owned uptime and maintenance windows, unmuted his mic.

“Okay,” he said, “I have three crews, two maintenance windows this month, and about four thousand gateways in the field. Which devices am I touching first?” He let that hang for a moment, then added: “And how many of these vulnerabilities are even alive in the code that’s running, versus old copies buried somewhere nobody calls?”

We looked at the spreadsheet. The spreadsheet had nothing to say. That was the first crack. We built a list. We did not have a ground truth.

From Spreadsheet to Reality: Lab Investigation

A couple of weeks later, I was in a lab with a stack of the same gateways on a bench. Same hardware, same firmware, same vendor. No dashboards, no slides, just the devices and a rough plan: take one of those images apart and see how much of the spreadsheet survived contact with reality.

We picked one firmware from the “critical” column of the report and started unpacking. Tear open the update package, carve out the filesystems and partitions, mount what would mount, dump what would not. The usual forensics dance.

For a while, it looked familiar. Root filesystem. BusyBox. OpenSSL. Some vendor daemons, a web interface, and log files.

Exactly the sort of thing a scanner is good at naming.

Then we hit the bits the scanner had barely noticed. Alongside the Linux world, there was a real-time OS image for controlling tasks. No package manager. No friendly metadata. Just proprietary binaries and a thin scattering of strings.

Tucked elsewhere was a bare-metal blob talking to a microcontroller, seemingly in charge of safety interlocks. And inside several different executables we started seeing the same strange patterns: identical functions, similar constants, same habits. A vendor SDK library that had been compiled straight into multiple binaries.

On the whiteboard we’d written “Firmware = file.” At some point that stopped feeling right. By the time we were done, we crossed it out and wrote:

Firmware = ecosystem

What we were really uncovering was a shared Firmware Genome—Linux, RTOS, bare metal, and reused code evolving together across products and device generations.

Once you see firmware that way, you can’t really go back.

Also Read: Part 2: Who Really Owns Firmware Security? Inside the Industry’s Blind Spot

The Hidden Firmware Genome: Beyond Declared Components

Around this time, a new advisory landed: a serious bug in a widely used embedded TCP/IP stack. Let’s call it NetStack, to keep things generic.

The original SBOM-driven scan said NetStack was present in two firmware images. Both Linux-based, both tagged as “high priority.”

The vendor’s plan practically wrote itself: patch those two images, push an update, tick the box. The lab was the perfect place to test whether that picture was accurate.

We took all the firmware images we had from that customer and went looking for NetStack again, but this time we ignored package names and version strings. Instead, we looked for the code itself.

Patterns in functions, constants, and layout. The kind of signature that survives a change of filename or a missing version banner.

Every time we found it, we wrote down where: which file, which architecture, which firmware build, which device model.

It did not show up in two images. It showed up in nine.

Seven of those never appeared in the original spreadsheet. In those firmwares, NetStack was compiled directly into vendor binaries, sitting there quietly without a clean package entry or a polite “hello, my name is” version string.

On the dashboard, those seven images had been “clean.”

On the bench, the same vulnerable code was right there in the binaries. This was a textbook example of how the Firmware Genome extends far beyond what SBOMs and vulnerability dashboards can see.

We rang the security manager back and showed him the expanded list. For a moment, he just stared at it.

“So, the report undercounted the problem,” he said.

“And we were about to tell management we had it contained.”

NetStack stopped being just another line in a long report. It became proof of something deeper: if you only listen to what components declare about themselves, you will almost always underestimate your real attack surface.

Also Read: Part 1: The Invisible Perimeter – Why Firmware Visibility Is the Next Security Frontier

Mapping the Ecosystem: Profiling and Connections

Of course, finding a handful of library patterns is one thing. Making sense of thousands of files across dozens of images is another.

On the lab machine we now had a mess: executables and libraries for different architectures, pieces of various operating systems, bootloaders, blobs, little stubs of code that might do nothing or might do something very important.

If we left it as a pile of files, it would quickly become unmanageable.

So, we started doing something boring but essential. Every binary we cared about got a profile attached to it.

What architecture is this? Little-endian or big-endian? Does it look like a user-space program, a kernel module, something from an RTOS, something closer to bare metal? Where did it sit in the firmware? What did its hashes look like?

Individually, these details are not exciting. Together, they are maps.

With that map, simple questions suddenly became answerable. Where are all the PowerPC bootloaders hiding across all the devices you’ve ever shipped? Where does this specific library fingerprint show up again under different filenames?

Once these connections emerged, we stopped talking about individual firmware versions and started looking at the Firmware Genome as a connected system of code that moves, mutates, and reappears across a product portfolio.

Shared DNA Across Products: The Automotive Case

We saw this even more clearly with a different customer, this time in automotive.

On paper, three separate systems. An infotainment head unit. A telematics control unit. A small body control module living quietly on the CAN bus.
Different suppliers, different projects, different hardware.

In the binaries, they were related.

The same proprietary crypto routines appeared in all three, compiled for different architectures and sprinkled into different places. Error strings, complete with identical typos, echoed across codebases that were supposedly independent.

At that point, the pattern was unmistakable. The Firmware Genome was being reused, forked, and propagated across systems assumed to be isolated.

Tracing it back, you could see the shape of an old SDK that had been handed from program to program, modified a little each time, never quite retired.

When a vulnerability advisory later came out for that family of crypto, the question was no longer “does supplier X use this library.”

We already knew exactly which binaries carried it, which firmware builds they lived in, and which ECUs those builds were running on.

Firmware had revealed itself for what it really is: a shared genome that evolves, forks, and reappears in unexpected places.

Portfolio Perspective: Unique Binaries and Real Risk

By then, we weren’t thinking in terms of single images at all. The interesting unit was the portfolio.

Take all the binaries across all firmwares, across all devices, and treat them as one population. Then the questions become different.
How many truly unique binaries do you have, versus the same ones reused with different badges?
Which of them are exposed enough that a bug in them would matter? Which ones sit in the boot chain? Which ones sit in safety-relevant paths?
Which ones are in code that handles untrusted input from networks or physical interfaces?

Looking at risk through the lens of the Firmware Genome made these questions answerable in concrete terms.

During one internal exercise, we asked for all binaries that matched the vulnerable NetStack pattern, that were present in firmware versions still deployed in the field, and that sat in code paths reachable from the outside network.

From thousands of binaries, we ended up with a short list of eleven.

That list—not the original six-thousand-line spreadsheet—was what the operations lead really needed on his desk.

Changing the Conversation: From Lists to Maps

That whole experience changed how we talked about firmware inside our own work.

Before, “firmware analysis” conjured up images of unpacking files, running a scanner, counting CVEs, and exporting charts.

After, it was much harder to pretend that was enough.

The spreadsheet from that first call wasn’t useless. It just answered a much smaller question than everyone thought.

What mattered were the things beneath it: whether we could see the ecosystem hiding in each image, whether we could recognize shared DNA between “unrelated” products, and whether we could tell the difference between a vulnerable function sitting in dead code and the same function sitting on a live request path.

Understanding the Firmware Genome is what finally allowed us to make those distinctions.

The techniques and tools are one part of that, but they’re not the main story. What really changed were the questions we started asking.

Not “How many CVEs did we find?”
Not “Which tool has the best score?”

Questions like:

What code is really running on our devices, and how does it spread from product to product?
Where are we relying on components whose real footprint we can’t see?
Which vulnerabilities are just names in a report, and which ones sit in code that can actually be reached in the way our devices are used?
When we say, “we’ve patched this issue,” can we prove that the old code is gone from all the places it hid in the past?

And maybe the hardest one of all:

If someone handed us a spreadsheet like the one from that first call, would we trust it enough to bet our maintenance windows, our uptime, and our reputation on it?

Those are not questions a scanner can answer for you.

But they are exactly the questions that turn a firmware inventory into a map of where real risk lives.

Author

Ghost Collective
Ghost Collective: We are a team focused on offensive security for production automotive electronics. For many years we’ve been reversing, compromising, and auditing real-world ECUs and in-vehicle devices (Infineon TriCore, Renesas RH850, NXP MPC5xxx and similar families). Our work blends black-box and white-box diagnostic exploitation, on-target debugging, firmware triage, persistent implant simulation, and threat-intelligence collection. We also audit commercial tuning toolchains to identify the exact techniques and vulnerabilities they use to access and modify production ECUs so OEMs and Tier-1s can harden their diagnostic surfaces and remediation workflows.

Just In

Part 4: Dissecting the Firmware Genome

The Hidden Firmware Genome: Beyond Declared Components

Changing the Conversation: From Lists to Maps

Author

Related Posts

Most Commented News