Merging Threat Clusters
by thesilence | 2024-07-29
In a previous blog, we defined a threat cluster and described how clusters are created and grow over time. As clusters expand, they may touch or overlap with indicators or activity associated with another cluster. When this happens, we should review our data to see if the two clusters actually represent a single set of related activity. This blog describes how we can identify overlapping clusters, things to consider when deciding whether to merge clusters, and the specific steps to take when merging two clusters in Synapse.
Merge Overview
Merging threat clusters (or expanding individual clusters) is a form of attribution. When we associate activity (more commonly, the indicators or evidence representing that activity) with a cluster, we attribute the activity to the threat actor (individual or group) represented by the cluster, even if we do not yet know the actor's real-world identity.
We grow individual clusters by adding new or unattributed activity to the cluster. We merge clusters by combining two previously independent sets of activity. The decision-making process in both cases is similar. By growing and / or merging clusters, we build a more detailed and comprehensive understanding of threats over time.
When growing or merging clusters, we focus on our own assessments and where they may overlap - not on third party reporting. (Whether or not Trend Micro's "Earth Lusca" is the same as Recorded Future's "TAG-22" is a separate problem.)
Identifying Merge Candidates
In Synapse, a simple way to identify overlaps is to look for nodes that are associated with more than one cluster, based on the nodes' tags. We can review these candidates as we come across them, but ideally we will proactively search for candidates. Nodes associated with more than one cluster may imply that the clusters can be merged; but it can also be a sign of mis-tagged data or analytical errors. The sooner we identify these issues, the easier it is to determine the reason for the overlap and clean up our data.
We can use Storm to:
select all of the threat clusters (
risk:threat
nodes) that we are tracking;pivot to the tags (
syn:tag
nodes) used to annotate threat activity;pivot to all nodes that have any of those tags;
de-duplicate the results; and
identify any resulting nodes that have more than one
own
tag (i.e.,own
tags associated with more than one threat cluster):
risk:threat:reporter:name=vertex :tag -> syn:tag -> * | uniq | +{ -> #cno.threat.*.own }>1
Tip
We could save this query as an Optic Bookmark for convenience, or modify it to run as a cron job and periodically flag overlapping clusters for review.
We are searching for clusters that overlap based on own
tags, which annotate things we assess are owned or controlled by (i.e., exclusive to) a threat cluster. By definition, an exclusive indicator should not be associated with more than one cluster, so we want to review these cases. (See our previous blog for details on our distinction between own
and use
tags, and why this matters.)
Merge Review Process
Once we have our candidates we can begin our review process by following a series of steps.
Review the existing clusters
Before we even consider merging our clusters, we need to review them independently to ensure each one is as complete and correct as possible. There are any number of reasons why a cluster may be untidy. These include:
Resource constraints. We created the original cluster for tracking purposes, but never got around to examining it more closely.
Differences in analysis workflows. Individual analysts vary in the level of detail used to build out a cluster and the threshold for making clustering assessments.
Errors or inconsistencies. We all make mistakes when clustering (and applying the associated tags). It's easy to tag something in error or forget to remove a tag when an assessment changes.
New or updated data. We should ensure that our clustering reflects the most current information available to us, in case new evidence affects any prior clustering decisions.
Our review of the clusters may result in changes that show the clusters do not overlap after all. This is okay - we have cleaned up our data and our assessments, and that is always a good thing!
If we still have overlaps, we should start reviewing the clusters and noting similarities. We'll discuss various kinds of evidence to consider, from the most conclusive or compelling to the least.
Start with "own" overlaps
When associating activity with a threat cluster, Vertex distinguishes between things a cluster "owns" (controls or uses exclusively) and things the cluster "uses" (but which may also be used by other clusters). This distinction is reflected in the tags we apply to nodes associated with a cluster - for example, cno.threat.t212.own
vs. cno.threat.t212.use
(see our previous blog for additional discussion of this concept).
Nodes that have more than one own
tag are often indicators (such as files/hashes, FQDNs, or various kinds of infrastructure). Because these are typically atomic IOCs or observable data (as opposed to abstract patterns of behavior or inferred objectives), they are more straightforward to evaluate.
Because an own
tag indicates something owned or controlled by a cluster, there are only a few explanations for own
overlaps:
One (or both) of the tags is wrong. Perhaps we attributed a node to two clusters by mistake. Ideally, our independent review of the clusters should have ruled out this possibility.
The indicators in fact overlap. So far, our tagging appears to be correct, and the clusters may need to be merged. Continue to review additional evidence.
We can examine these specific overlaps in Synapse:
We can also get a sense for the overall connectivity between the clusters in Force Graph mode:
There is a third option to consider with respect to own
overlaps: it is possible that our assumptions about threat actor behavior are wrong. That is, something that we thought was exclusive actually isn't. For example:
a mutex we believed was unique to malware variants used by a specific threat cluster is in fact the default mutex encoded into a malware builder used by multiple clusters. In this case, the
own
tags we applied to the mutex need to beuse
tags instead.a self-registered FQDN used by a threat cluster expired and was re-registered by a different cluster. In this case, the FQDN may be exclusive to each cluster, but only during the time window when they controlled it. In this case, our
own
tags may need associated timestamps to show when each cluster controlled the FQDN.
These cases are less common, but they do happen. What we know about the threat landscape (based on observation and past experience) is true...until it isn't any more. This is why the entire body of evidence is important in merge decisions. A single own
overlap is notable, but not convincing. Multiple own
overlaps make a merge more convincing. So does a direct overlap path from one cluster's seed
node, through multiple own
overlaps, to the other cluster's seed
node. But we should still examine additional evidence. (See our previous blogs for a discussion of Vertex's tagging conventions.)
Consider "use" overlaps
Next we should consider any use
tag overlaps between the candidates. Overlapping use
tags may indicate shared resources or behaviors that imply the clusters may be the same. Depending on your analysis needs and tagging practices, things used by a cluster may include IP addresses (IPv4 or IPv6), mutexes, file paths or file names (PDB paths, staging directories, or the names of malware, scripts, or output files), C2 configuration parameters, file resource or section hashes, infrastructure providers (domain registrars or hosting providers), DNS name servers/name server FQDNs, and more.
Because use
tags are not exclusive, we need to look at this data in multiple ways. Say we are considering merging two clusters, T212 and T5376. What are all the indicators / evidence used by each cluster? This gives us an overall profile of each candidate.
#cno.threat.t212.use
#cno.threat.t5376.use
How much overlap exists for things used by both clusters? More overlap implies greater similarity.
#cno.threat.t212.use +#cno.threat.t5376.use
For any overlaps, how many other clusters also use those same resources? Resources that are widely used make poor correlation points. The fact that our candidate clusters have the resource in common is notable, but is less significant in cases where the resource is also used by many other clusters. (In the example Storm below, we are excluding overlapping use
tags where the object - indicator, resource, etc. - is used by more than five clusters, a value we selected arbitrarily.)
#cno.threat.t212.use +#cno.threat.t5376.use -{ -> #cno.threat.*.use }>5
Finally, for things that overlap, does their use also overlap in time? Use of the same resource (such as an IP address or a particular domain registrar) at the same time is more convincing than the same resource used at different times. (In the example Storm below, we lift the T212 use
indicators that have any associated timestamps, assign the timestamp value to the variable $time
, and then filter to those results also used by T5376 where the T5376 timestamps overlap with those of T212.)
#cno.threat.t212.use@=(1970/01/01, now) $time=#cno.threat.t212.use +#cno.threat.t5376.use@=$time
Keep in mind that in order to determine how common or widely used a shared indicator is, we need to track that indicator's use consistently across all of our threat activity. Our tracking will never be perfect, but if some analysts are tagging or recording certain data and other analysts are not, our data will be skewed. It is important to have agreed-upon practices and workflows; Synapse's automation features can help ensure consistency (e.g., in applying tags, "pushing" tags to related nodes, or ensuring tag timestamps are set when appropriate).
Consider other similarities
Overlaps based on tags often indicate related IOCs or tactical behaviors. Clusters may have other similarities that are more operational or strategic in nature. These include malware, tools, or techniques used by a cluster; targeting associated with a cluster (e.g., countries or industries); or a cluster's assessed goals.
Similar to use
tags, all of these elements help build a profile of a cluster, but are rarely unique to a cluster. Once again we need to examine these aspects of each cluster individually, but also look at the specific overlap between our clusters and how widely these characteristics are shared across all clusters.
For example, we can ask about the malware or tools used by an individual cluster (in this case, T212):
risk:threat:org:name=t212 -(uses)> risk:tool:software
To compare two clusters, we can use the Synapse intersect command to query them both and only show those elements they have in common. This query shows the malware or tools used by both T212 and T5376:
risk:threat:org:name*in=(t212, t5376) | intersect { -(uses)> risk:tool:software }
We can identify tools that are common across many of our clusters (in this case, any tool used by more than five clusters):
risk:tool:software +{ <(uses)- risk:threat +:reporter:name=vertex }>5
And we can combine the two queries to find tools used by both of our clusters while filtering out any tools used by more than five clusters:
risk:threat:org:name*in=(t212, t5376) | intersect { -(uses)> risk:tool:software } | -{ <(uses)- risk:threat +:reporter:name=vertex }>5
Even if a tool (or technique, or goal) is shared by multiple clusters, we can still consider it as part of the merge decision for our two candidates. The shared technique represents a similarity, and more points of similarity support a decision in favor of the merge. But similarities that are also shared with other groups should be given less weight. For example, it is worth noting that our candidates both use Cobalt Strike or leverage spear phishing; but it is also worth noting that many other clusters use these tools and techniques as well.
A few notes about overlaps
Most of the characteristics available to us when comparing our clusters (use
tags, techniques…) are not exclusive things that could only be used by one threat actor. The overlaps we identify might be due to the two clusters representing the same activity. But there could also be other explanations, including:
Independent clusters may appear similar as more groups deliberately make use of generic tools and native commands (e.g., "living off the land binaries" (lolbins)).
Independent groups may follow similar methodologies or "playbooks" that guide their activity.
Independent groups may receive the same tasking and therefore target the same sectors, technologies, or data.
A group may deliberately imitate another group (e.g., as a false flag). Threat actors make use of threat reporting just as defenders do!
When considering a merge (or clustering activity in general), it's important to consider alternative explanations (analysis of competing hypotheses, or ACH) for what we see.
Present your evidence
Now we need to present our case! Formulating our arguments forces us to think through (and be prepared to defend!) our reasoning. A list of bullets representing our key points will suffice - no need to write an essay! Ideally, every bullet has an associated Storm query that precisely demonstrates our supporting evidence (and makes it easier for others to review our work).
We strongly encourage peer review of all key analytical decisions, including proposed merges. Reviewers bring additional questions and perspectives to the table, and can help counter our biases. We can use our existing internal review process, whether that is Synapse's Quorum feature (which allows designated users to review and vote on proposed changes) or a more informal process.
If our evidence is compelling, merging the clusters is an easy choice. In other cases, the evidence may not be as convincing. It is okay to NOT merge the clusters! All of the data for each cluster remains in Synapse and can be readily queried and viewed. It is much easier to revisit the merge in the future than it is to "unmerge" two clusters that turn out to be unrelated.
Merge Mechanics
We have passed all the hurdles, and it's time to merge the clusters! The steps to combine our cluster data are listed below. We recommend automating this process (e.g., with a macro, for example) for consistency and to simplify the process.
Combine the risk:threat nodes
We "combine" the risk:threat
nodes by updating each node to note that one has been merged into another. We can choose our own criteria for which cluster to merge and which to keep; a common method is to merge the newer cluster into the older one.
For the risk:threat
that is being merged, we need to:
set the
:merged:time
property to the date of the merge;set the
:merged:isnow
property to the guid of therisk:threat
that will remain and represent the combined cluster; andoptionally add a
meta:note
to the old (merged)risk:threat
node documenting the reasons for the merge.
For the risk:threat
that will remain, we need to combine the :goals
from both clusters by copying any goals from the merged cluster to the remaining one. Depending on how we are modeling and tracking threats, we may also need to review other risk:threat
properties (such as :active
, :sophistication
, or :type
) to ensure our combined cluster accurately reflects our assessments.
Note that we do not delete the old risk:threat
node, we simply update both clusters to reflect our merge decision. The merged cluster remains in Synapse, along with the history of when it was merged and why!
Copy light edges
Threat clusters (risk:threat
nodes) may have various light edges to other nodes, such as:
risk:threat -(uses)>
(e.g., for tools or techniques);risk:threat -(targets)>
(e.g., for industries or countries); andrisk:threat <(refs)-
(e.g. for articles or reports).
We'll need to identify the specific edges present in our environment for our clusters, and then copy the edges on the merged cluster to our new, combined cluster. Once the edges have been copied we can remove them from the old cluster.
Conclusion
Merging threat clusters is a common process in threat tracking and cyber threat intelligence. The process, from identification to review to merge, is not difficult, especially with guidance on what evidence to consider. Synapse simplifies the process by representing clusters and their activity in a consistent and readily queryable way - we can compare clusters directly and based on our latest data instead of relying on analyst impressions or historical (and potentially outdated) prose reporting.
That said, merging clusters should still be done carefully and with appropriate consideration. While Synapse makes it easier to review clusters, no Storm query (or set of queries) can give us a conclusive answer as to whether or not two clusters should be merged. And while we have included several sample queries above, there are any number of other queries you could use, based on your available data and the specific clusters in question. You must weigh the evidence available on a case by case basis. We strongly recommend the use of peer review, and that the level of confidence required to merge should be high. It is much easier to wait for more evidence and merge the clusters later than to discover you were wrong and have to untangle two incorrectly merged clusters.
To learn more about Synapse, join our Slack Community, check out our videos on YouTube, and follow us on Twitter.