Popular searches
//

Why Scanners Fail in Practice: Lessons from the Shai-Hulud Attacks on NPM

9.12.2025 | 8 minutes reading time

2025 marks the year supply chain security stopped being a theoretical risk and became a practical nightmare for anyone managing a package.json file. The recent attack waves on the NPM ecosystem demonstrated this vividly, turning trusted libraries into attack vectors that compromised pipelines before the code even hit production.

First, the compromise of several popular NPM packages including chalk and debug showed how easy one phishing attack on a single developer can have widespread implications. Only a week later, the first Shai-Hulud wave introduced a self-replicating worm and large-scale credential stealing from developer machines. Recently, Shai-Hulud 2 (a.k.a. Sha1-Hulud) carried this strategy to the extremes and added the perfidious behavior of deleting user files in case of a takedown attempt.

How can development and infosec teams tackle this situation and identify compromises? When discussing this question, Software Composition Analysis (SCA) through dependency scanning and/or Software Bills of Materials (SBOMs) usually come up. Once you know all your dependencies, it should be easy to automatically identify compromised ones and start the mitigation process from there.

There are some inherent limitations of this approach in the case of Shai-Hulud:

  • The malicious payload does not need to be deployed as part of a release, but runs during the build process. This means that any execution can be a risk, including CI pipelines for any branch. However, SCA often only runs on the main branch or release artifacts.
  • Since the malware's targets include developer machines, it only needs to be installed locally, far removed from the environments where SCA typically runs.

Despite these limits, one would expect composition analysis to serve as one line of defense and provide valuable insights into the propagation of the compromised packages.

Evaluating dependency scanning tools

When running Trivy on an allegedly affected project, I was surprised that it could not identify any issues. I then did a poll on Mastodon to figure out if I was missing something:

shaihulud-poll-1.png

Of the 50 participants, the majority (68 %) shared my own expectations.

This caused us to dig deeper and investigate the behavior of the most popular open-source SCA and dependency scanning tools: Besides Trivy by Aqua Security, this includes Grype by Anchore, OSV-Scanner by Google, and OWASP Dependency-Track. Due to their restricted availability, we did not investigate commercial tools such as Snyk, Aikido, or GitLab Ultimate Dependency Scanning.

For this comparison, I set up a demo project with affected versions of ansi-regex and kill-port:

1{
2  "dependencies": {
3    "ansi-regex": "6.2.1",
4    "kill-port": "2.0.2"
5  }
6}

ansi-regex was affected by the first major wave of NPM package compromises in September 2025, while kill-port was compromised as part of Shai-Hulud 2.

Since those versions had quickly been removed from NPM (and, of course, contain malware), we did not actually install them, but set up fake package.json and package-lock.json files. We verified that all scanners picked up these fake versions from the metadata files.

Unless otherwise noted, the default configurations of all tools were used. All tests were performed with current vulnerability information as of 2025-12-03.

Trivy

As mentioned above, Trivy did not identify any issues. This held true whether scanning the project directly (trivy fs) or scanning a pre-generated SBOM (trivy sbom).

Report Summary

┌───────────────────┬──────┬─────────────────┐
│      Target       │ Type │ Vulnerabilities │
├───────────────────┼──────┼─────────────────┤
│ package-lock.json │ npm  │        0        │
└───────────────────┴──────┴─────────────────┘

Grype

Grype did find both issues as Critical when run on the SBOM generated with Trivy (grype bom.json):

NAME        INSTALLED  TYPE  VULNERABILITY        SEVERITY  EPSS  RISK
ansi-regex  6.2.1      npm   GHSA-jvhh-2m83-6w29  Critical  N/A   N/A
kill-port   2.0.2      npm   GHSA-3j2r-p9f6-rw66  Critical  N/A   N/A

Executing it directly on the project folder (grype .) also yielded the issues, but also many false-positives:

NAME                   INSTALLED  FIXED IN  TYPE  VULNERABILITY        SEVERITY  EPSS           RISK
json5                  1.0.1      1.0.2     npm   GHSA-9c47-m6qq-7p4h  High      37.3% (97th)   27.2
json5                  2.2.1      2.2.2     npm   GHSA-9c47-m6qq-7p4h  High      37.3% (97th)   27.2
trim-newlines          1.0.0      3.0.1     npm   GHSA-7p7h-4mm5-852v  High      1.3% (78th)    0.9

# [...]

ansi-regex             6.2.1                npm   GHSA-jvhh-2m83-6w29  Critical  N/A            N/A
kill-port              2.0.2                npm   GHSA-3j2r-p9f6-rw66  Critical  N/A            N/A

Note the high-severity alerts for json5 and trim-newlines. These (and many other omitted ones) are present because Grype descended into the node_modules directory, read the package.json metadata of all modules, and incorrectly identified their devDependencies as part of our project. This could probably be fixed through configuration, but we stuck to the default and did not bother with that.

OSV-Scanner

OSV-Scanner scored (almost) perfectly, identifying the issues both directly in the project directory as well as on the SBOM from Trivy. However, it could not provide information on the criticality or fixed versions:

╭─────────────────────────────────┬──────┬───────────┬────────────┬─────────┬───────────────┬──────────╮
│ OSV URL                         │ CVSS │ ECOSYSTEM │ PACKAGE    │ VERSION │ FIXED VERSION │ SOURCE   │
├─────────────────────────────────┼──────┼───────────┼────────────┼─────────┼───────────────┼──────────┤
│ https://osv.dev/MAL-2025-46966  │      │ npm       │ ansi-regex │ 6.2.1   │ --            │ bom.json │
│ https://osv.dev/MAL-2025-191116 │      │ npm       │ kill-port  │ 2.0.2   │ --            │ bom.json │
╰─────────────────────────────────┴──────┴───────────┴────────────┴─────────┴───────────────┴──────────╯

OWASP Dependency-Track

Finally, let's have a look at OWASP Dependency-Track. In contrast to the other tools, it is not invoked from the command line, but runs as a web application. It also cannot perform its own composition analysis, but always needs to be provided with an existing SBOM. For this purpose, I once again used the SBOM file from Trivy.

In its default configuration, Dependency-Track could not identify any vulnerabilities:

shaihulud-dependencytrack-1.png

However, Dependency-Track allows you to configure its data sources for vulnerability information. Besides the built-in NVD CVE feed, the openly available additional options include GitHub Advisories and Open Source Vulnerabilities (OSV, the database source behind OSV-Scanner).

Enabling GitHub Advisories yielded no changes and still, no vulnerabilities were detected. While the OSV data source is only in Beta state, enabling it actually made a difference:

shaihulud-dependencytrack-2.png

Exploring data sources

What are the different sources of vulnerability information typically accessed by dependency scanning tools?

The most well-known is NVD's Common Vulnerabilities and Exposures program, a.k.a. the CVE database. In addition to its ongoing general data quality issues, there is a major catch here for compromised NPM packages: For (almost?) all of them, nobody bothered to issue CVEs! We can therefore rule out this data source for identifying Shai-Hulud and the likes.

GitHub, which also happens to run the NPM package registry, provides its own Security Advisory database (GHSA). Whenever GitHub removed a compromised package version, they also issued a respective advisory for it. However, those are special Malware advisories and therefore well-hidden: In order to find them, you have to add the special filter type:malware to your search query.

GitHub's stated reasoning is as follows:

Our malware advisories are mostly about substitution attacks. During this type of attack, an attacker publishes a package to the public registry with the same name as a dependency that users rely on from a third party or private registry, with the hope that the malicious version is consumed. [...] Users who have their dependencies appropriately scoped should not be affected by malware.

While this makes sense for substitution attacks, it falls apart once real, trustworthy packages get compromised with malware.

Malware advisories are not returned from the GHSA API by default, which is probably the reason why they are not picked up by OWASP Dependency-Track and Trivy (which also uses GHSAs as its data source for NPM packages). Grype appears to handle this differently and includes the malware advisories by default.

The third major data source is Google's Open Source Vulnerabilities (OSV) project. While it "only" aggregates vulnerability information from other sources, including GHSA, it does include GitHub malware advisories in its main feed. From there, the information does find its way to OSV-Scanner and (optionally) OWASP Dependency-Track.

Some might argue that malware infections should not be part of vulnerability databases, since they are not exploitable vulnerabilities, but rather instances where the compromise has already occurred in the supply chain. We view this as a purely theoretical distinction. This is supported by the fact that there is indeed a CWE (Common Weakness Enumeration, a categorization system from the CVE ecosystem) entry for replicating malicious code.

Ultimately, these dependencies are blatantly insecure. Of course, we want to learn about them from a security vulnerability feed! If anything, the risk is higher than that of a vulnerable, but unexploited package.

Endpoint protection to the rescue?

In light of the limitations discussed initially and dependency scanners being a somewhat mixed bag, should we move to another line of defense? After all, in the Shai-Hulud waves, a huge part of the risk stems from the malware getting installed on developer machines. Couldn't an endpoint protection solution (antivirus/EDR) identify it and prevent further damage?

To verify this idea, I did another poll on Mastodon, where the majority once again confirmed it:

shaihulud-poll-2.png

As endpoint protection is notoriously hard to test, we had a look at the VirusTotal results for the well-known payloads of Shai-Hulud 2. VirusTotal primarily checks static signatures, whereas an advanced EDR would ideally catch the malicious behavior. However, if the file signature isn't even flagged, the first line of defense is already broken. After all, detection in this case basically comes down to spotting files with a few, well-known checksums.

This was the result for one of the malicious files as of 2025-11-26, two days after the initial detection:

shaihulud-virustotal-1.png

When looking at the live results now, detection rates have improved slightly, but several major players are still not detecting it.

Things looked even more dire for another one of the malicious payloads on 2025-11-26:

shaihulud-virustotal-2.png

In this case, live results have also improved, but as of now, the malware is still not detected by around half of the scanners.

Conclusion

While GitHub as the operator of NPM was quick to remove package versions affected by large-scale compromises (at least in most instances), the information sharing around it leaves room for improvement – from notes on the packages' NPM pages to issuing (the right kind of) security advisories. This makes the use of dependency scanners less reliable than it could be, making detection heavily dependent on the specific data sources used.

We were surprised to learn how many scanners could not identify the compromises, be it at the software composition or the endpoint layer. While the right scanners usually identify the issues or at least can be configured to do so, that is not without its pitfalls and not every product delivers a satisfying result. This is particularly concerning given the large impact and widespread attention for the recent NPM compromises. One can only wonder what happens in case of more subtle attacks with less public attention.

share post

//

More articles in this subject area

Discover exciting further topics and let the codecentric world inspire you.