This article presents my personal approach to threat modelling for software architects. Threat modelling is also known as software security assessment and is related to vulnerability assessment. The following content describes a general process for finding and minimising security problems for software at both the architecture and implementation level, whether it is being developed (new), being modified (existing) or simply being audited (finally, on the principle of better late than never).
The content here is not operating-system-specific, programming-language-specific or software-framework-specific. However the people who actually apply the suggested process will need to have deep technical knowledge.
The topic of IT security is of course a huge one; many books have been written on various sub-topics. This article cannot go into depth in just a few pages - it is only a starting point (my starting point when discussing such issues with customers).
It isn’t necessary to be an expert to develop a threat model, and it isn’t rocket science. Experience will help but common sense goes a long way and any model is better than none.
Sources and Motivations
I’ve been a developer, team lead, and software architect for a long time now, and worked on many projects. Ensuring the resulting system is reasonably secure has been a part of many of these projects (though none of the projects involved critical, life-endangering features). I have therefore effectively been doing ad-hoc threat modelling for a long time - identifying security weaknesses in a design or implementation, and assessing the options to deal with each weakness.
Recently I had motivation and time to research the current state of threat modelling and formalise my ideas into a basic process that I can apply to projects and recommend to customers. Interestingly, although there are many conferences on computer security there does not appear to be any standard approach. I did find two quite different books on the topic:
- Threat Modeling: Designing for Security; Shostock 2014 (STRIDE methodology) – referred to later as “the STRIDE book”
- Risk Centric Threat Modeling: Process for Attack Simulation and Threat Analysis; UcedaVelez and Morana (PASTA methodology) – referred to later as “the PASTA book”
The first was very helpful, and the remainder of this article is a combination of the recommendations from that book and my personal experiences. It focuses on improving the software design and implementation without excessive time-wasting. I recommend reading the STRIDE book and drawing your own conclusions.
The second was, in my opinion, a fine example of all that is wrong with the computer security industry - a bureaucratic approach that is likely to generate huge fees for consultants, consume vast amount of time in workshops, and generate large volumes of management reports - while almost ignoring the step of actually fixing problems.
There are more detailed reviews of these two books towards the end of this article.
The Wikipedia article on Computer Security is quite a good overview of the field in general. SAFECode has a great introduction to threat-modelling which recommends an approach similar to the one presented here.
Note that my primary interests are back-end data processing systems. Sometimes client/server interactions. Not my focus: designing authentication systems (I delegate to existing systems), user interfaces, or deciding what data to gather (requirements). The discussion of threat-modelling below may be influenced somewhat by these interests.
This article does not address incident response, ie the business planning needed to ensure that chaos does not break out if (when?) a security breach occurs.
It does not address specific protections such as intrusion detection tools or similar; the point of threat modelling is to find the weaknesses; this already requires significant technical knowledge. Mitigations for these weaknesses can then be addressed at the appropriate level of detail.
It does not discuss specific tools for probing systems (eg metasploit, nmap) or specific tools for protecting systems. When actually doing the threat modelling, having someone with knowledge of such tools is helpful (even knowledge that such tools exist will help).
It also does not (directly) address:
- “Social engineering” problems.
- Protection of physical assets, except where relevant to an IT system
- Specific techniques for dealing with specific problems (eg how to prevent buffer-overflow in C)
All of the above are important, just not in scope.
In short, this article looks at “system-wide” problems, and at how systematic issues can be found and recorded for fixes to be applied.
I hope to write a separate article on the subject of IT security at the management level (ie how architects, developers, testers and operators can communicate security concerns to managers, and how managers can inform themselves about dealing with security issues).
What is Threat Modelling (and why do I spell it with two “l”s?)
First: “modeling” is American english; “modelling” is British english. Both are correct. Celebrate diversity!
The term threat modelling probably comes from the military: the concept of figuring out what an enemy might do to you, how much damage that might cause, and what precautions can be taken ahead of time. Obviously the same principle can be useful in software. However in my opinion the analogy between military defense and software security can be overextended - the problems are only superficially similar. A software system is stationary, only slowly changing, and properly-implemented defences are truly bullet-proof. The problem is that a software system can easily have thousands of different parts, and is only as strong as its weakest link. In addition, mounting an attack on a software system is cheap - in most cases, a single competent person and a one-thousand-dollar laptop is all that is needed. Military goals and attackers have quite different properties.
The concept of applying threat modelling to software appears to have been first published in Writing Secure Code, 2nd Edition (Microsoft Press, 2002) by Michael Howard and David Le Blanc. It was later expanded and refined in Threat Modeling (Microsoft Press, 2004) by Frank Swiderski and Window Snyder. The STRIDE book is heavily influenced by Swiderski + Snyders work.
Very briefly, the process proposed in this article is to use an architectural data flow diagram (DFD) of a system to study all the different components of a software system and their interactions, and create a risk register of plausible points at which unpleasant things might happen. Each risk can then be categorised as:
- Adequately protected (with suitable documentation, and ideally unit or system tests to verify this and detect regressions)
- To be mitigated (with issue in issue-tracker to be sure it is done)
- To be documented and accepted as not significant enough
For some risks, it might be necessary to evaluate the risk in order to decide between the last two options. This evaluation estimates the likelihood that someone can really use this theoretical vulnerability to misuse the system, and estimates the cost of damages if that does occur. Accepting a risk is perfectly acceptable - sometimes the cost of avoiding something is higher than dealing with the consequences, particularly if the likelihood is low. However such decisions should be made deliberately and clearly rather than ad-hoc.
The word “mitigation” here means doing something to block a potential vulnerability, or at least to make it much harder to exploit. Incomplete mitigations should be further analysed and either documented as accepted, or further mitigations should be added (defense in depth) until the remaining risk can be accepted.
Proposed mitigations can be labelled in the risk register entries as:
- A = architectural mitigation
- I = implementation mitigation (ie “feature” does not appear in an arch diagram or document)
- O = operations mitigation (applied by sysadmins, eg firewall)
- B = business-process mitigation (eg “background checks reqd for sysadmins”, “backups to be stored offline in locked room”)
Note that a threat/risk does not necessarily mean a person with bad intent - natural catastrophes are also threats/risks.
One additional output of threat modelling that is not quite captured in this summary is:
- Ensure that the system has proper auditing and logging to detect cases where architectural or implementation protections have not been sufficient.
I use threat modelling as a synonym for security review - though “modelling” is ideally done by the development team while the word “review” can potentially imply “external check” (which is far less effective).
In general the proposed flow is to detect risks first, then decide how to deal with them. Sometimes an architect will throw a bunch of mitigations into their design “on a hunch”, eg encrypt this and hash that, add a firewall rule there. Often such hunches are correct but in my opinion it is still worth trying to think of a list of circumstances that mitigation actually protects against - features are never free, and it is good practice to ensure there really is a feasible threat. The identified risk can go into the risk register, and the proposed feature into the mitigations - this time with formal justification.
There are also “mitigations” which are simply required by law or company rules, regardless of whether an actual risk can be identified. This is called compliance and is discussed later.
As an architect and developer, I prefer code over reports, and coding over sitting in meetings. However doing a good job of finding security holes is really a job for a team - having different skillsets is important, as is knowledge of every component and every third-party framework. And teamwork does require some kind of organised process. My proposed process here is as light-weight as possible, I think.
And just to be clear: most of the credit for the ideas here belongs to Adam Shostock, his book, and those who inspired the ideas in it. I can recommend it (particularly the first few chapters).
The results of a security review will never be perfect - you can never know when it is complete. However perfect is the enemy of good; most security failures that land in the news are due to gross negligence rather than truly cunning attackers (though the CEOs of such companies always claim the opposite). Doing a reasonable job of threat modelling is better than not doing it at all. And is IMO also better than throwing a bunch of “standard mitigations” into a system without knowing exactly what scenarios they protect against. As a final thought: if things do go bad in production, having a nice risk register document to show a good effort was made could be very helpful; it makes it more plausible to claim the security problem occurred “because they were experts” rather than “because we were incompetent”.
When can Threat Modelling be Applied?
Threat modelling can be applied to a software architecture before it is built, to components as they are being built or modified, or to a system after it has been built. All are useful, though some are more useful than others:
- Analysing early (ie checking the software design) is definitely the most cost-effective if possible. It does require a reasonably complete design - see comments on waterfall vs agile later.
- Analysing component-by-component during build is possible, though care needs to be taken not to forget the “big picture”; security problems often occur between components
- Analysing after build but before “going live” is better than nothing. Unfortunately, fixing problems can be difficult at this point. QA staff (testers) can be a very helpful part of threat modelling - but getting them involved earlier is even better.
- Analysing after “going live” is probably still better than waiting for some external party (whether Journalist, “white hat” or “black hat”) to find problems.
Any time the system is updated, a quick check should be made: has something been changed that might be security-relevant? if not, no need to set off alarms and organise meetings - I recommend basic common sense be used to limit the paperwork and meetings. But when the modifications look tricky, a short session focusing on just the changes may be needed - with a look at the existing risk register to see what is applicable.
In any system which has been in development for a long time, a final review before “go live” would be advisable. In particular, it is important to ensure that all system components are represented on the Data Flow Diagrams used as input for threat modelling - ie that no components have been forgotten about during analysis.
Who should Threat Modelling be Applied by?
The analysis should be done by those who know the system best - the architects, developers, and testers. Operations staff can also be useful, particularly when looking at interactions with things like authentication servers, networks, firewalls, and fileservers. Having people with security experience is ideal - but everyone in the above roles should have knowledge of the common vulnerabilities - or will learn from their peers in the workshop.
The “leader” of the threat modelling process should be someone with significant experience as a software developer and software architect. They (or at least some members of the team) should have good knowledge of programming, operating systems, networks/firewalls, network protocols, databases, hashing, encryption (symmetric and key-based), digital signatures, authentication protocols, and similar topics.
How should Threat Modelling be Applied?
I recommend running a sequence of short workshops (a few hours max, for concentration reasons), until the system has been completely covered. Regular short workshops are also often easier for people to fit around their schedules. As an extra bonus, questions raised during a workshop can be researched by one or more team members in preparation for the next session.
As risks are identified, the cost of various mitigations needs to be compared to the probability and cost of a security leak. Some decisions will be easy, and can be made at the same time the risk is identified (keep it simple). In other cases, whether to mitigate or document-and-accept is something that may require involvement from business-level staff.
The general process involves simple “brainstorming” - open discussions between the assembled team about where problems can occur. Using a DFD and stride-per-element (see later) adds some guided structure so the brainstorming sessions do not get too off-track, and so that good coverage of the overall system is achieved. A whiteboard is usually the most useful technical support device.
Getting Management Support
Ideally, application security should be taken seriously in all projects, be properly funded, and be performed in a properly planned way to ensure good coverage of the whole applications. Sadly, that is seldom the case.
One of the (few) positive points from the PASTA book is the emphasis on involving senior management, ensuring they feel part of the process, and giving them at least some information in a form they can understand. That is not one of my strong points, and this article proposes as outputs from the threat modelling process only:
- A risk register (a spreadsheet of some sort)
- Issues in an issue tracker, indicating things to fix or enhance
Suggestions on how to make results more visible to management are welcome..
When a plausible risk (potential vulnerability) has been identified, it needs to be evaluated:
- What is the probability of attack?
- Ease of attack (skills required)
- Cost of attack (time)
- Cost of attack (dollars)
- Chance of getting caught
- What is the value of success to attacker?
Action then needs to be taken:
- Verify auditing (if it happens, is it logged?)
- Decide how to resolve
- Mitigate (fix at implementation, operational, or business-process level)
- Delegate (make someone else responsible!)
- Document and accept (particularly as “too unlikely”)
Be pragmatic: if the resolution for a risk is obvious, don’t waste time doing a detailed risk evaluation. As example, if the risk can be removed via encryption, and it is obvious that this is the right solution, determining the cost of attack can be skipped.
Don’t be drawn into too much detail; some threats are simply “out of scope”, eg the risk of attackers manipulating CPU designs can be ignored for most projects, as can attackers with sufficient resources to brute-force modern encryption.
For each required fix, file an implementation issue in the usual issue tracker. New issues in the issue-tracker (and changes in the design) are the real “deliverables” of a threat modelling workshop - ie the things that actually make the delivered system better. All other documents (in particular, the risk-register) are just byproducts - useful for internal purposes, but not actual deliverables.
For mitigations (fixes), don’t forget “secondary threats” - bypasses to the mitigation.
And don’t ignore risks as “not possible” because there is one line of defense that prevents access. A good system provides “defense in depth” so that if the unexpected happens and one of the mitigations fails, additional protections exist - and ideally auditing reveals the failure in the first-line defense so it can be repaired. Even more ideal is when the audit information is automatically scanned to detect unexpected patterns (intrusion detection).
Security holes can be opened by very small technical errors, even a single wrong character in a single line of code, or a typo in a firewall rule can sometimes cause a security hole, but:
- Well-designed “defense in depth” ensures that failures in a single component expose only limited access
- Apply principle of least privilege
- Divide system into semi-isolated functional groups, eg with DMZ network configurations
- Proper “validation checks” can detect misconfiguration
- Properly chosen libraries and appropriate languages can reduce the number of such flaws
- Properly chosen development processes (eg code-review, testing) can reduce the number of such flaws that get into production
- Well-defined test cases can detect problems (including regressions)
- Properly designed auditing and logging allow the exploitation of a flaw to be detected
- Rapid fix of the problem
- Minimisation of damage
- Redress through courts
- Deterrence through likelihood of detection
Threat modelling can help the architects and implementers of a system to put the above protections in place:
- Find possible weaknesses
- Assess the max amount of effort that is worth applying to make the weakpoint more robust (cost/benefit)
- Design mitigations for those weaknesses (or at least register a task to design a suitable mitigation).
- Repeat until no potential vulnerabilities are worth fixing
- Document discovered weaknesses, cost/benefit assessments and mitigations‚
- Regularly review software to see if new vulnerabilities are present, or the cost/benefit ratio of a known risk has changed
A threat is something somebody might try. A vulnerability is a point where the threat might possibly be successful. It isn’t necessary for someone to actually prove a vulnerable point is exploitable (ie a “proof-of-concept for the vulnerability is not needed) in order to add an item to the risk register; if the team thinks “yeah, somebody clever might be able to do something nasty there” then that should be sufficient.
Threat modelling is not intended to produce output like “line 35 of file Foo.java fails to check that the caller is authorized”, but rather things like “if a programmer forgets to check authorization in a request-handler, that might get into production without being detected”. The mitigation might be something like:
- A report-generator which runs nightly and produces a list of all request-handlers which do not have an “authorization annotation”, or
- Automated integration tests which call all request-handlers without authentication credentials, and fail if an “auth required” error is not returned, or
- Authorization framework alterations so that all requests are rejected unless code/annotations explicitly enable access for some role, or
- An item on the code-review checklist to ensure auth-checks are appropriate, or
- Ensuring that systems invoked from the request-handler also check credentials (thus providing “defense in depth” against this particular issue), or
- Register a task to schedule “security training” on the issue for all developers, or
- Registering a task for the team to review all existing code for missing authorization checks, or
- Externalize checks in a proxy-server driven by configuration, or
- Multiple items from above
These mitigations are “generic”, but not abstract. They address “classes of vulnerabilities”. Some of the above mitigations are architectural, some are implementation-level, some are business-process-based. Some fall between categories, but that doesn’t matter; labelling them is not as important as getting them fixed.
Waterfall vs Agile
As noted in “when to apply threat modelling” above, the most cost-efficient time to do the first pass is when the architecture is complete but implementation has not yet begun. This sounds a lot like waterfall, and indeed it would fit well in a waterfall model.
Threat modelling fits naturally into waterfall-like designs; once the system has been designed it gets security-reviewed. However it is also possible to perform threat-modelling in an agile project. In some ways, threat discovery can be applied like test-driven-development - you think about what tests the feature-to-be-built needs to pass before building it. Similarly, you can think about the threats the feature may be exposed to, and the necessary mitigations, before building it. There will need to be some “overall system view” at some time, but that is also useful for the overall architecture even in an agile project.
One of the things I argue about often when discussing “agile” is the common opinion that in an “agile project” coding can start as soon as funding is available. I won’t get into details of my opinions here, but recommend that basic requirements-analysis and architectural design be done on any project. For an agile project these documents don’t need to be as formal, don’t need “signoff” from various heads-of-department, and will change as the project progresses. And not every part in the design has to be built - that is still decided at the sprint level. However every team needs a general idea of where they are going, why, and some shared opinion about how. This initial architectural doc can still form a basis for threat modelling. As features are planned for development, it would seem a good idea to do “mini modelling” around that specific feature, and occasionally a “wider pass” to ensure no problems have been created due to interactions between features.
The STRIDE Methodology
Shostock’s book presents the STRIDE methodology which was primarily developed within Microsoft for improving the security of their own software products. Various forums and blogs related to the STRIDE methodology developed useful content over time, and Shostock was an active participant/leader in this area. As far as I am aware, the book is strongly influenced by the practical experience and feedback from applying this approach to multiple projects and the related forum discussions.
To (rather brutally) summarize a book in a few lines: Shostock recommends starting with a “data flow diagram” (DFD) of the system being analysed, and a set of high-level threats (vulnerability types). Then either:
- For each threat, look for all matching weaknesses within the system, or
- For each component and dataflow (connection) in the system, look for matching weaknesses (“stride per element”)
In a large project, DFDs for parts of the application appear as a single block on high-level diagrams.
The second approach appeals to me more. Stride-per-element seems easier to apply than per-threat due to the ability to draw just the relevant people into a meeting looking at specific components. The “find vulnerability X in the whole system” requires experts on the whole system to be available. The “whole system” approach can be handled by having “nested components” - ie a DFD whose blocks are expanded in separate DFDs. The “whole system” analysis is then done at the top-level and each nested level - but at the cost of some duplicated work. With per-element, the only synchronization needed between different workshops is to ensure they don’t file identical risks multiple times.
Importantly, the “set of high level vulnerability types” is not a huge catalog of all software bugs known to humankind. Instead, STRIDE includes only about 60 general-purpose vulnerability types, as triggers/hints for the team doing the threat modelling. This does require more knowledge, creativity and improvisation from the team doing the analysis, but in my opinion is far more productive than dryly and boringly going through lists of things which mostly do not apply.
The vulnerability “hints” are divided into 6 categories:
- S: Spoofing (pretenting to be something without valid credentials, eg faking servers, services, or users)
- T: Tampering (modifying data at-rest or in-transit without permission)
- R: Repudiation (performing operations while hiding identity)
- I: Information Disclosure (copying data without permission)
- D: Denial of Service (making services unavailable to real users)
- E: Escalation of Privilege (executing commands as another user)
The STRIDE book then gives about 10 examples of each category; some more detail is present later in the article.
What I like most about this approach is the creativity it stimulates in the team, debating and brainstorming what kind of links there might be between the hints and the actual system being analysed. Also nice is the low amount of paperwork needed - no reference books or tables, just a whiteboard and concentrated thinking about the problem. And importantly, applying STRIDE against a diagram of the system means any identified problems can be linked reasonably directly to possible solutions.
Possibly the most important part of STRIDE is the R. Repudiation means that the system performs an action, and cannot prove who initiated that action. This is somewhat linked to authentication, but that is already covered under Spoofing. The “R” hint really means ensuring that any action is linkable back to an authenticated initiator. Mechanisms include signatures, audit-trails, and logging in general. A threat modelling session should include careful thought about how each component records data coming in and out, and how this information could be used later to identify the cause of changes in the system. No system will ever be perfect; you have to assume that a security hole will be found and exploited - at which point audit-trails and logging become critical. When a vulnerability is exploited at some later time, and no information is available about who, when or how the unwanted change occurred, then fixing the system can be extremely difficult; systems may need to be taken offline for long periods of time while the investigation occurs. With sufficient logs, the hole can at least be plugged within a reasonable amount of time and the system restored to availability. Financial losses may also be reclaimable through the courts - something impossible if evidence is not available.
As mentioned in the “Who” section, the results of analysis depend on the skills of those involved. The STRIDE items are just hints; team members should have reasonable background knowledge on concepts such as SQL-injection, cross-site-scripting, MITM attacks, and DNS poisoning.
Some might claim that the STRIDE list is simply not detailed enough, and vulnerabilities may be missed. There are various “attack libraries” available with great detail, eg Mitre CAPEC. If you and your team have time to go through an entire “attack library” and evaluate each possibility against your system, good for you. In most cases, it is important to not get too detailed - deal with the obvious first, and ensure logging/auditing is present to detect other attacks if they happen. Perfect is the enemy of good..
Microsoft provide a powerpoint presentation which is an excellent summary of STRIDE.
Less Useful Approaches
Above, I recommend stride-per-element. Some methodologies emphasis different approaches; while I don’t see these as particularly effective, they are briefly summarized here.
STRIDE uses a system diagram as the basis for analysis, ie focuses on the software itself. Some alternate threat-modelling methodologies instead recommend basing analysis around the assets to be protected, for example the PASTA book which recommends asset lists be compiled with valuations and security-impact-assessments, and each asset be linked to lists of “attacker types” who might be interested in such assets.
At first glance, this seems plausible. A bank might first think about what they want to protect (the money) rather than about their system (the doors, cameras, sensors, alarms, guards). However the STRIDE book provides a number of good arguments why this approach is not particularly productive. The most convincing is that we want to stop unauthorized people getting in. That means reinforcing doors, adding cameras and sensors, etc. Focusing on the asset does not directly lead to finding the vulnerabilities and relevant mitigations.
I would nevertheless recommend spending a short amount of time making a list of the assets to be protected. It is important to have a general feel for how much effort (ie money) should be spent to protect an asset (must the solution be bullet-proof, or simply script-kiddie-proof?). It may also lead to simple and radical solutions:
- Is the asset really worth protecting at all?
- Can the asset be deleted, or moved offline?
Classifying data assets as “public/internal/confidential/restricted” might be useful in some cases. However in general only public vs other is relevant for security at the architecture or implementation level (operations might define additional roles), and that distinction is obvious for anyone competent enough to be part of a threat modelling workshop.
Identifying specific assets (specific databases, filesystems, etc) can be helpful to ensure nothing critical has been left off the DFD. However knowing an asset like “customer address info” exists is not really helpful - where it exists is important. And in fact what info exists is usually more naturally derived from “what datastores do we have, and what is in them” than from the abstract question “what data do we have?”. Generating a list of assets and APIs as an output from the DFD-based STRIDE analysis could be a nice idea; it should be a natural byproduct. Having that list would be a good cross-check on coverage of the system review.
Examples of tangible assets a system might have:
- Machines (CPU time, network bandwidth)
- Value for attacker
- Spam generation
- Malware hosting
- Cost for owner
- Power and network costs
- Loss of resources for intended goals
- Cleanup costs
- Loss of reputation
- Value for attacker
- Company and customer data
- Value for attacker
- Sale of private info (eg credit card numbers)
- Data for phishing attacks
- Reputation damage (competitors)
- Cost for owner
- Reputation damage
- Reparation costs to customer
- Legal fines and fees
- Cleanup costs
- Potentially, termination of business (for severe cases)
- Value for attacker
- System availability
- Value for attacker
- Reputation damage
- Possibly other benefits, depending on system functionality
- Cost for owner
- Reputation damage
- Financial loss due to downtime
- Value for attacker
Examples of intangible assets in a system:
- Reputation (external) as service or product provider (for customers/sales)
- Value for attacker:
- Competitive advantage (competitors)
- Cost for owner
- Lost sales
- Value for attacker:
- Reputation as employer (for recruiting) and company morale (internal)
- Value for attacker:
- Competitive advantage (competitors)
- Cost for owner
- Recruitment difficulties
- Retention difficulties
- Loss of productivity/motivation
- Value for attacker:
Attacker motivation is useful as part of “risk estimation” - ie once a vulnerability has been found, and its “cost” is being estimated to determine the priority and effort to assign to its remediation. Finding a plausible attacker type is useful - if none can be found, then the priority of the mitigation can be set low. However starting with attacker type and motivation is not helpful.
An attack tree is a graphical or text representation of a single high-level threat, with child nodes (and their child nodes) becoming more and more concreate/detailed until actual vulnerabilities are revealed. Some more information on attack trees is included in an appendix at the end of this article.
I’m not convinced that attack trees are a productive way to spend the limited amount of time available for security analysis. It might be something that a well-organised attacker might do, which initially does seem appealing (“think like an attacker”). However if you have a DFD of the system (something an attacker probably will not have), then it seems more effective to use that instead. Creating an attack tree can be a lot of work, with much of that not actually contributing to improving the security of the current architecture or implementation.
If you cannot build a DFD of the system being analysed then an attack tree might be useful.
Various books recommend developing an attack tree in consultation with the DFD, ie expanding nodes in the tree only where they seem “applicable” to the DFD. While this is better than a full abstract attack tree, I’m still not convinced. To expand a node into its children, you need full knowledge of the entire system - which makes for workshops with lots of people (per/component allows more focused workshops).
Child nodes of a parent node in the attack tree may be “or-nodes” or “and-nodes”, ie the vulnerability described in the parent node might be possible if any of the child-node vulnerabilities can be exploited, or only if all of the child-node vulnerabilities can be exploited. There are various proposed syntaxes for drawing this; when drawing a tree in a graphical manner, I find the most elegant soluton is to assume OR, and to join “anded” child nodes with an arc between the lines from those nodes to their parent.
A risk-register is basically a list of the leaf nodes of an attack tree.
Picking the root nodes for attack trees can be difficult (a tree per identified asset is not very effective or complete). A per-component evaluation has no such problem.
Attack trees and attacker modelling might be useful for finding non-technical vulnerabilities, eg social engineering based attacks.
There are existing “libraries of attack patterns” which can be useful; read the patterns and decide which apply to the system being analysed and how. However such sets of patterns are at different levels of abstraction.
The most detailed libraries are “checklists” - you just tick off whether your system implements that or not.
One kind of “attack library” often forgotten is the list of security problems for similar systems. Did they have flaws that should be avoided?
Some books recommend using the requirements use-cases to do security analysis, ie walking through each use-case and seeing if some unexpected variant of the use-case could trigger unusual system behaviour. I think this is likely to take significant amounts of time. Maybe useful where some components on the DFD have very complex APIs (particularly when those APIs are not well documented).
Data Flow Diagrams (DFDs)
The components of a data flow diagram (DFD) are:
- Data stores (databases, filesystems, etc)
- Software components (modules) which process (transform) data
- External systems and users which interact with the system being analysed
- And the connections between all of the above over which data is transferred
A DFD should only be a single easily-readable page. A single box on a DFD can be a reference to a separate more detailed DFD if needed.
The DFD does not need to go into huge detail - only security-relevant details are needed. It is therefore best to start with a single high-level diagram, and create/retrieve more detailed diagrams only when threat modelling discussions require them.
Traditionally the symbols used on a DFD are:
- Rectangles for external entities outside of system control
- Rounded rectangles (recommended) or circles for processes (transformatinos) within the system, labelled with the transformation they apply (use verbs)
- Something else for data stores (label using nouns)
- Lines with arrows for flow of information between components, labelled with the data that is flowing (use nouns)
- Dotted lines for trust boundaries
Other diagram types that can be useful are:
- Swim-lane aka interaction diagrams
- State diagrams
Only model things that help to identify vulnerabilities. Too much detail is not helpful. Use sub-diagrams where appropriate.
Like the risk registry, any diagrams should be kept for reuse the next time the system is assessed.
DFD Trust Boundaries
Methodologies that recommend DFDs for security analysis also recommend drawing “trust boundaries” on the DFD, and then asking “what can go wrong as data crosses this boundary”?
I have troubles with this personally; in most systems, I find it almost impossible to draw reasonable trust boundaries.
Wikipedia defines a trust boundary as a “boundary within which a system trusts all sub-systems”.
In the traditional three-tier architecture (client, business-tier, database-tier), the business tier usually runs all SQL statements against the database as a single user (via a pool of connections). In this case there is clearly a “trust boundary” between the business and database tiers; the database has no concept of the original user whose request triggered the database operation. And there is definitely also a trust boundary between client and business tiers; nothing in the business tier should “trust” the client.
However within the business tier, there are likely to be multiple places where client credentials are verified. Therefore a complex “trust boundary” should also be drawn through the middle of the business-tier application. And in fact, within the database the pooled user-id that the business tier is using is not fully trusted by the database (ie is not usually the DB “admin” account). So shouldn’t there be a complex boundary line through the middle of the database too?
Because of these complexities, I don’t really see “trust boundary lines” as particularly useful; the reality is just too complex for a few simple lines to capture. Therefore, feel free to draw such lines if you wish but I would recommend against spending too much time or arguing too long about exactly where the lines should be. Instead, focus on the components - what authorization-checks are applied to the incoming request?
The Microsoft presentation on STRIDE (referenced earlier) states that they consider a “trust boundary” to be present between any two components that communicate over a network, ie that components are only within the same trust boundary when part of the same process, or local processes communicating on the same host. Microsoft use STRIDE to analyse complex monolithic applications, while this article is talking about applying it to distributed systems. This may explain why I find the concept of a “trust boundary” difficult to apply and generally not helpful.
Stride in Detail
The following sections look at the different STRIDE hint categories, and some suggested items to discuss for each category. These points are not exclusive lists, but instead ideas to start relevant discussion.
The STRIDE book has even more items in each category - or see the “Elevation of Privilege Game” card deck.
This hint encourages the analysis team to think about whether an attacker can pretend to be:
- an unauthenticated user
- an authenticated user
- a trusted partner system
- another component of the system
- a database or filesystem relied on by the system
Of course, anyone can pretend to be an “unauthenticated user” - the question here is more “are unauthenticated users correctly limited?”.
This topic is tightly linked to the concept of authentication - how does the system know who is interacting with it?
Example questions include:
- Does the component being evaluated read or write files? If so, is the identity of that filesystem verified?
- Does the component read config settings on startup? If so, is the identity of that data provider verified?
- Does the component listen on a network connection? If so:
- How does the system ensure that no other application opens a connection on that same port first?
- Can an attacker redirect the DNS name for the system to their own address (spoof that component)?
- Does the component provide a server-side certificate? If so, is it properly signed (trustable by clients)?
- Does the component open network sockets to external services (including local ones)? If so, how does it validate that the target is trustworthy?
- Does the component accept requests from other components? If so:
- How does it verify the identity of the sender of each request?
- Is that verification vulnerable to programmer errors (eg requires code in every rest endpoint)?
- Are multiple kinds of authentication supported? If so, are all equally secure?
- Are there “bypasses” to authentication (eg account-recovery, “authenticated by callcenter”, etc)?
- Are credentials persisted client-side? If so, is that appropriate and are users aware of their responsibilities?
- Are requests accepted on alternate ports with different authentication rules?
- Is there any rate-limit to attempts to present different credentials?
- Does the component support remote administration? If so, are remote administrator credentials properly verified?
Where a risk of implementation error is identified, the mitigation should not be just “fix line 17 of file X” or “be more careful”, but instead define a method that prevents or catches the problems. Examples include using a framework where authentication is “verified by default” so that every new entrypoint added during implementation automatically gets maximum protection (possibly to the point of being useless); where less protection is needed then the developer must actively enable that. Automated report generation and automated security tests are also good solutions.
This hint encourages the analysis team to think about whether an attacker can modify data or bypass authorization:
- Does the component being evaluated read files? If so, is access to that filesystem properly controlled?
- Does the component being evaluated write files? If so, can they be modified by something else after writing?
- Does the component read config settings on startup? If so, is access to that info properly controlled?
- Can data in a database be modified directly, bypassing the system checks?
- Is the system relying on data in the client which can be modified?
- Where encryption is used, are standard algorithms used and is the implementer or reviewer suitably qualified?
- Is the mapping from authentication info (id) to authentication info (rights) secure and reliable?
- Does the implementation consistently check authorization? Is the approach vulnerable to programmer errors (eg requires code in every rest endpoint)?
- Can URL parameters be modified by clients with unexpected results?
- Are “replay attacks” possible, eg an encrypted or signed message from an authenticated user be intercepted and sent again?
- Can data be modified after validation but before use?
If running this system in the cloud, then you might also need to think about which other users might be running software on the same physical host, and whether that raises tampering (or info-disclosure or denial-of-service) risks.
This hint encourages the analysis team to think about whether an attacker can perform an operation untraceably. As noted earlier, this is tightly linked to authentication:
- Are there any important operations which are accessible to anonymous users?
- Are all important operations logged, with the requester’s id?
- Will requests still be processed when logs cannot be written (eg due to out-of-disk-space)?
- Can logging be truncated by forcing too many log messages to be output in a short time?
- Can logging be corrupted by passing weird strings which end up in log messages?
- Is the logged information sufficient to describe exactly what operation the (identified) user performed?
- Can you tell if a logfile has been deleted?
- Can you tell if a logfile has been altered?
This hint encourages the analysis team to think about what information an attacker may be able to obtain which was not intended:
- Is network traffic encrypted where appropriate?
- Are man-in-the-middle attacks blocked?
- Are files written by the component? If so:
- Is access to that filesystem properly controlled?
- Are digital signatures used to detect modification?
- Are backups properly secured?
- Do URLs include sensitive data useful for an observer?
- Do error messages include sensitive data?
- Do file-listings or similar results reveal information about the existence of other data, even when the content is not available?
- Do logfiles (or logservers) reveal sensitive information?
- Where encryption is used, are standard algorithms used and is the implementer or reviewer suitably qualified?
- Do responses to requests leak information in any circumstances?
A somewhat related topic is “reconnaissance protection”, ie ensuring all components exposed to attackers reveal as little info as possible. While security-through-obscurity is not a good approach, it is nevertheless useful to deny attackers easy access to information about the system being protected. Thought should be given to removing product-version-strings from standard responses, using firewalls to block port-mapping, etc.
Denial of Service
This hint encourages the analysis team to think about the business value of “uptime” for the system, and what an attacker might possibly do to reduce system uptime:
- Can filesystems or databases be filled up via malicious requests?
- Are there specific requests that will take large amounts of time to process? If so, are there limits on the number of such requests?
- Can an attacker register a DNS name to interfere with the system?
- Does the system rely on external servers which are more vulnerable than the system being analysed?
- Can a user be “locked out” by deliberately making requests with incorrect authentication information?
- And the usual “flood the network” attacks of course (if relevant) - including whether logging is sufficient to identify the source
This hint is also a good point to think about various natural disasters, from floods and earthquakes to cleaners unplugging servers by accident.
Escalation of Privilege
This hint encourages the analysis team to think about whether an attacker might be able to provide their own logic which is executed with rights associated with some other user.
- Does the system include third-party executable content in its responses, leading client to process third-party content with the trust associated with the system being analysed (eg cross-site-scripting (XSS) attacks)?
- If the implementation uses an interpreted language such as Python, Perl or PHP, then what development processes are in place to ensure that developers do not use “
eval” or similar methods?
- Do requests include references to data in external stores? If so, can an attacker modify the referenced data and thus execute requests as the original user but with data controlled by them?
- Can the component’s code (including required libraries) be modified or substituted?
Rich APIs and attack surfaces
Threat modelling against components with rich APIs is challenging; eg when a component is a webserver then “info disclosure through insufficient validation of url parameters” is something that needs very detailed code analysis to check. That’s ok - the risk goes into the risk-register, and an issue can be raised in the issue-tracker to verify that threat. An automated test suite would be be the best way to close that issue - ie close once tests exist to verify that such a vulnerability does not exist. Verifying that an architectural decision in the system (eg a standard framework for handling such params) exists would also be sufficient. A code-audit would also do, but is of course susceptible to regression. All these lower-level security checks can be done by the dev-team rather than in the workshop.
The “data flow diagram” used for stride is related to the “system attack surface” - the set of interfaces through which an attack can occur. A “list of assets” is not associated with a “system attack surface” and thus harder to identify mechanisms and mitigations with.
During threat modelling, it is probably not practical, and not necessary, to evaluate each endpoint and parameter in a component offering a “rich API”. Risks can be identified at a slightly abstract level (authentication not enforced, sql statements include user-provided data), and then general mitigations can be identified (see first paragraph in this section).
Involving QA in Threat Modelling
A good QA team will look for security holes as part of their testing, in a manner similar to threat modelling approach stride-per-element: look at each component and each interface and see if something unexpected can be triggered. However:
- QA is always under time-pressure, as it is the stage that blocks “product release”
- QA is often “feature-focused” rather than “security focused” - particularly managers (same problem as with dev)
- QA is not always properly staffed with experts in the area
- QA working alone (even with good security experience) does not have the product knowledge available to the members of a threat-modelling workshop.
- QA is done only at the end of the project (or after each feature is complete), while at least the first pass of threat modelling should be done at the start.
If the project has good QA staff, it can be helpful to include them in the threat modelling workshops. As noted above, this work should be somewhat familiar to them. Where risks are identified, having automated QA tests which verify that the implementation is robust can be very useful - particularly to perform the same tests against large numbers of API endpoints, and to detect regressions at future times.
Threat modelling can also be an interesting basis for system testing. A QA/test team can use threat modelling to find interesting test scenarios. However such tests alone do not result in a “defense in depth”; such tests will be blocked by the first successful line of defense without indicating whether any further lines of defense exist.
One common approach to security is mitigation-first; leap to the assumption that “we need encryption somewhere”. Often these gut-feelings are correct, but not always - it pays to identify the actual threats for which this mitigation is the fix, and see if they really are valid threats for the system being analysed, and whether the effort to implement the fix is in proportion to the risk/cost associated with the threat. Worse, there is sometimes an assumption that “we have provided the N most common mitigation technologies in at least one place in our app, therefore security is done”. That tech might need to be applied in a few other places, or might need to be applied differently, or other types of mitigations might be needed too - and threat modelling is the best way to figure that out.
Actually, every decently secure system has had “threat modelling” done for it - just sometimes it was informal and the architect/devs/testers doing it didn’t realize that was what they were doing. “this problem just occurred to me ..” is the same thing that someone in a threat-mod workshop would say, except that in the workshop people are concentrating on looking for such things, and are doing it as a team so they can inspire each other, and are using some helper tools (lists of components, lists of threats) to stimulate the imagination.
While the above section was critical of “hunch-based security”, there is one case where it is unavoidable: compliance.
When national laws or company rules require specific security features, despite their being no credible risk, there’s no point in fighting it.
Some of the following documents are also linked to in the article text above.
- Wikipedia: Computer Security – an excellent short overview on IT security issues in general
- Wikipedia: Threat Model – overview of IT Threat Modelling
- SAFECode Whitepaper on Threat Modelling – an excellent introduction to threat modelling
- Microsoft Press: The Security Development Lifecycle: SDL: A Process for Developing Demonstrably More Secure Software - free ebook from Microsoft Press - from 2006, but still interesting. Chapter 9 addresses threat modelling.
- SANS Threat Risk Analysis - a graduate student paper presenting “An Overview of Threat and Risk Assessment”
- OWASP: Application Threat Modelling
- OWASP: Threat Modelling
- OWASP: Threat Modelling Cheat Sheet
- MITRE CAPEC Attack Library – useful reading for a security professional; lists many different kinds of software attacks.
- Websec: Trust Boundaries - good intro to DFDs
- Microsoft: The Elevation of Privilege Game
- Microsoft: Elevation of Privilege Game Whitepaper
- Microsoft: Elevation of Privilege Game - Download
- Microsoft: Threat Modelling Tool for STRIDE
- OVVL: An Open Source Threat Modelling Tool - an application that provides a web-based interface through which a DFD can be drawn and threats registered
Appendix A: Book Reviews
This site has a review of two books on threat modelling.
Appendix B: Attack Trees in Detail
While attack trees were briefly discussed earlier, with my personal conclusion that they are not particularly helpful in most cases, they are moderately popular in the literature - and just cool. Here is my quick summary of attack trees.
Usually, an attack tree is created for each goal an attacker may wish to achieve. Usually, that means such things as:
- obtaining a copy of a specific dataset
- obtaining control over a specific machie
That goal is the root node of the attack tree. Its direct child nodes are high-level things that could provide a path to the goal. Each new level of child nodes makes its parent more concrete, until eventually leaf nodes are reached which are technical in nature rather than abstract, and potentially feasible.
The main benefit of such a tree is that achieving a goal is often only possible via a number of intermediate “stepping stones”. It can be hard to see a chain of successful steps leading to the goal; the attack tree instead work backwards by starting with the goal.
The child nodes of a parent node are usually “or” options - any one of the child nodes will lead to the parent goal. However child nodes can also be “and” options - the parent is achieved only if all of the child nodes are achievable.
There are various ways to write a tree down; a graphical representation is easiest to read. However a text representation is generally easier to write, and to include in a document. Representing AND and OR qualifiers is tricky in either representation, and not really standardized.
A brief example from the physical world:
- access room (goal)
- go through a door
- case door unlocked
- case door locked
- pick lock
- use key
- find key (standard hiding location?)
- steal key
- reproduce key
- from photo
- from impression
- borrow key (social engineering)
- have someone else open it
- follow someone in
- make friends
- act busy
- appear official (eg reflective vest)
- get lucky (just try)
- spoof credentials
- maintenance staff
- emergency services (gas, fire, etc)
- official inspector
- follow someone in
- case door unlocked
- go through a window
- go through a wall
- go via roof
- go through a door
A brief software example:
- Goal: obtain admin rights on server X
- attack people
- subvert somebody with access
- attack business process
- attack hardware supply chain
- attack software supply chain
- attack technology
- get physical access
- modify software
- attack people
One of the most widely-known published attack trees is one created for analysing an electronic voting system.
- Goal: manipulate election via voting equipment
- gather knowledge
- from insider
- from component analysis
- gain insider access
- at vendor before shipment
- at polling site
- gather knowledge