Dependency Mapping With Graphs


Preamble ramble

Like all things in this world, it started with a meme.


August 17, 2020

It’s a hilarious example of what we all know is true about the modern digital ecosystem- it’s built on the backs of a small handfull of underresourced projects. Projects that’re maintained by weekend warriors, developers with some niche interest, and/or just a guy that wrote a thing 10 years ago that somehow has become his passion project. Some of these packages are utilized in tens or even hundreds of thousands of installs, underpinning billions of dollars in commercial revenue and yet only a small handful of that money gets funneled back to the maintainers of those core software packages.

…yet if one of those projects was to have a critical flaw it would have an oversized impact on the security of the world around us.

These are just some of the examples of the vulnerabilities that rocked the security community, but in all of these cases were projects that were highly utilized, and under-resourced for the true critical infrastructure that they provided.


Why does any of this matter, right? Well, a while back I began wondering to myself would I be able to identify software packages that could pose a higher risk to the global security ecosystem? Software packages that were among the most highly linked packages in their respective ecosystems? Could I come up with a generic enough framework that could be built upon? is this a project where I can finally learn how to use graph databases?!?

So- today is the day that I begin my quest to hopefully come up with solves for these (non) trivial problems, and hopefully learn some new things along the way.

My Gameplan

  1. In order to solve the complex dependency graph problem, I’m going to have to learn to use graph databases.
  2. I’m going to iterate a bit on a data model that works for a repository/build repository that’s easy to parse.
  3. I’ll add weights for things like complexity, age of project, number of contributors, etc.
  4. Based on my outputs I’ll do a real light impact analysis of a hypothecital package or two with a critical CVE (10/10 on the CVSS scale)

My expectation is that there are a small number of projects that meet the following criteria

We’ll see what happens after that. I’ve got quite a bit of work ahead of me.