YFVS Sidebar 2: Likelihood Ratings

Ben Lincoln

One of the most common questions regarding vulnerabilities (or negative scenarios in general) is "how likely is it to occur?".

This seemingly simple question is actually extremely difficult (and/or expensive) to answer with any degree of accuracy for most real-world scenarios. There are far too many variables to take into account.

I am presently aware of one way to provide a reasonably accurate response to this type of question, and it's the one that insurance companies use: collect enormous quantities of real-world data and use it to build actuarial tables. This is a more-or-less black box model that provides statistical figures for how frequently a given scenario has occurred historically (e.g. "people who drive green Hondas are x% more likely to be issued speeding tickets than the average customer") without the need to necessarily understand the cause of that correlation.

There are two main prerequisites to using that model:

Access to large amounts of reasonably-accurate real-world data regarding the types of scenarios being discussed.
One or more people or systems which can take that data and calculate (with reasonable accuracy) the probability of a given scenario occurring.

In information technology-related discussions, the first element is nearly always missing, and usually the second one is as well. This means that any resulting "likelihood" rating is effectively someone's opinion, but it's been alchemically transmuted into a number, so it looks like it has some sort of weight to it.

I find this to be a dangerous way to do things. If your raw data has so little basis in hard fact, it doesn't matter if you have accurate processing of that data or not — the results are going to be unreliable. Without reliable results, not only are poor decisions likely to be made, but when real-world events don't line up with the "likelihood" rating, the credibility of whoever issued that rating suffers.

An even worse scenario is when the figure is labeled "probability". Please do not do this if you are just making a percentage up. Every time someone puts the formal label "probability" on a number that they fabricated instead of calculating properly, one more enchanted brick is removed from the sorcerous Great Wall which prevents the enraged, furious spirits of Gauss and Fermat (among others) from rushing into this world to avenge themselves upon the human race. The secrets which allowed that Wall to be built have been lost to us — such damage is now irrevocable.

When I designed the Yield-Focused Vulnerability Score (YFVS) model, I tried to provide something that was sort of like a "likelihood" rating, but grounded in quantifiable information: what prerequisites does a vulnerability require in order to be exploited? Are there many? On average, are they difficult/expensive/time-consuming to obtain? To a lesser extent, will exploiting the vulnerability create a lot of "noise" that is likely to be noticed? The numbers I assigned to each of the possible answers are somewhat arbitrarily based on my own experience (it is, after all, a draft/work-in-progress), but because the raw data (the way the questions were answered) is preserved, the calculation can be corrected if necessary in future versions without having to go back and re-collect the input data for each vulnerability. More importantly, unless someone misunderstands the questions or the answers, they should all end up scoring vulnerabilities the same way.

If someone is dead set on using someone's personal estimate of "likelihood" as a data point, I won't lay down in front of a bulldozer to stop them, but hopefully I've explained why I think it's a bad idea.