Interpreting the 2019/2020 APT29 MITRE ATT&CK Vendor Evaluation Results

May 6, 2020 | 6 min read


This year, MITRE’s ATT&CK based evaluation focused on demonstrating 30 separate industry technology defense capabilities against a series of attacks simulating the advanced Tactics, Techniques and Procedures (TTPs) of a Russian Government-sponsored offensive cyber operations group called Advanced Persistent Threat 29 (“APT29”). Within days of their posting of the results, waves of materials came in from the vendors claiming success from MITRE’s results. For those who’ve seen the raw results from MITRE (which never explicitly score anything as “good or bad”), you’re probably wondering:

  1. Is the vendor response just good marketing/hype?
  2. What’s an impartial view of who did well and who didn’t?
  3. How do I interpret the MITRE results for myself, and what do I do with that information?

First, the MITRE results are not easy to interpret because MITRE’s goal isn’t the same as Gartner or Forrester who offer vendor and capability reviews. They’re not trying to rank, they’re trying to identify how well a tool’s detection and alerting capabilities align with APT TTPs. You can definitely derive *some* good and bad outcomes from the MITRE evaluation, but what BlueVoyant found most interesting was some common themes across all “scores.”

Again, “score” may not be the most accurate term in this case because it implies winners and losers, but for the sake of simplicity, we’ll use it anyway.

Looking at the image below, as an example, the X axis is the attack lifecycle representing the various phases of how the simulated attack gets executed by the red team. The Y axis is the number of possible detections, aka attacker "procedures" or “techniques”, present in each phase that a tool could, and hopefully DOES, detect.

The color coding is where it gets interesting. There are 6 different types of “scores” assigned with colors to highlight the “type” of detection used by the tool to respond. The following is the way that we, BlueVoyant, interpret these scores - which does NOT represent any opinion from MITRE or the vendors who were evaluated.

  • "None" or dark blue means there was no alert for that particular attack procedure. There may have been “an alert,” but it was associated with some other signal on the endpoint. Maybe a misfire, or maybe an alert from a completely different procedure being conducted at the same time. This could be considered “a miss,” but keep in mind that there still may be an actionable alert which a SOC could use to respond and contain from another attack procedure, potentially with the same outcome. So it may not be a total “miss”.
  • "Telemetry" means you collected enough data to find the attack procedure, but you never generated an alert. You may not think this is great, but keep in mind that any leading SOC MUST be doing constant threat research on identifying advanced procedures using just telemetry, and crafting their own detections from raw telemetry.
  • "MSSP" means you generated something, but it needed human analysis and enrichment to actually identify it. This may or may not be good. If your SOC is using advanced automation for enrichment of alerts and investigations to augment NGAV/EDR solutions, this may actually be a very good thing, because you’re still detecting and responding at machine speed, you just need the SOC and SOC platform on top of the endpoint tool to get it right.
  • "General" means there was a generic alert associated with the attack procedure, but the alert was not particularly contextualized or rich. As such, mode and motivation of the attack weren’t immediately apparent from the alert - the implication being that response would not be as fast or efficient as desired based on this type of alert. Again, similar to “MSSP,” if your SOC’s automation is strong, they may make up for the lack of enrichment here.
  • "Tactic" means that you grasped what the adversary is doing with the attack, in terms of where they are in their phase of attack and what systems they are currently breaching. This means you can more readily respond with this additional insight.
  • "Technique" is the most rich outcome. This means the tool successfully alerted and provided insight into what the attack was trying to accomplish (e.g,. they’re looking to use SMB to spread laterally). This gives the SOC the most proscriptive insight into how to respond.

Ok, so the right thing to do with this information is immediately start stacking scores in each category and using that to say good, better, or best, right?

We disagree. In general, our thoughts are that while the best tools are in-market for cyber defense, no one tool should be considered the SOC’s go-to for APT defense. While our team at BlueVoyant loves great alerts, which make the SOC’s life easier, what we’re really looking for is total coverage. What do we mean by total coverage? We’re looking for total scores for everything but “none” because our SOC is performing their own enrichment, contextualization and alerting using raw telemetry and our threat insights - alerts are icing on the cake.

As we review the outcomes of MITRE’s ATT&CK assessments, we focus on what technologies, or combinations of technologies, provide the relevant context, enrichment, decoration, and raw data required for our analysts to find adversaries. When choosing tools to deploy in front of analysts, we gravitate towards those that provide accurate signals that something malicious or unusual has occurred, whether via an alert or enriched telemetry. Seldom do we rely on a single tool or technology to identify the scope and breadth of an incident. However, it’s important that our detection technology… detect, but reconstructing the narrative of an incident requires correlating events with additional data sources such as firewalls, authentication records, email logs, and other secondary sources. When identifying the technology that best suits your needs, consider not only how it performs on its own, but how it fits into your larger detection spectrum and how it bolsters your detection-in-depth strategy.

Ultimately, the MITRE ATT&CK adversary emulations provide potential buyers with a tangible example of how tested technologies can perform against targeted attacks and ingenuitive methodologies. The adversary emulations are not exhaustive, and may not represent risks present in a particular environment, but can provide the reader with an understanding of adversary and vendor capabilities. It is important to note that MITRE provides adversary emulation capabilities for anyone to test their existing toolsets, as well as the ATT&CK Navigator to assist users in identifying technology gaps. Keep this in mind as you consider your detection stack - no vendor is capable of detecting everything, nor is a particular technology type (e.g., EDR, NGAV, IPS, etc.). Use these excellent emulations and assessments to best identify tools that do well in your focus or risk area, and augment them with an overall detection gap closure roadmap using the tools provided by MITRE.

So what does that mean for the rest of the organizations out there? In our opinion, while this assessment against APT29s TTPs is outstanding, the bigger takeaway is to ask yourself, “Am I using all of my security data in a way that gives me the right coverage against threats to my organization?” And the only way to answer that is with a detailed walkthrough of frameworks like MITRE ATT&CK to see where your SOC has gaps in their detections.