My colleagues and I (as architects) created (in collaboration with development teams) a list of architectural characteristics that we decided were important for business-tier software components (deployable artifacts). This list was used to assess existing components, ie determine which were most in need of a refresh, and to evaluate proposals for new components.
The reasons behind each item are sometimes complex, and the justification does not fit well into a checklist document. However most of these are well documented in general literature on software architecture.
We attempt in this document to specify goals and not technologies, allowing individual teams to select any technology which achieves the goals that we believe are important. This is, however, a difficult task.
I have left out any items which are specific to our architecture, or turned them into general items.
If you find this list useful, please let me know.
Applying the Checklist
Checklists are not a great way to guide architecture; developers generally don’t read them unless they are pushed to do so as day-to-day work tends to take priority. This meant that we (as architects) need to schedule meetings with development teams to review existing components against this checklist, and need to ensure we are aware of all new components in order to schedule reviews of them before coding started. Ideally the ideas in this checklist would instead be automated, ie violation of the rules by existing components would be automatically detected, and new components which don’t follow the rules would just not be deployable. However automating all these checks is a non-trivial thing to do.
An (internal) tool was developed to present the items below as a form/survey/questionnaire. Each item can be answered as fulfilled/not-fulfilled/not-applicable, and notes can be added (eg a link to relevant documentation, dashboards, etc). An “overall score” can then be computed. Typically when performing a review, an appropriate number of tickets are raised to resolve the not-fulfilled items which have the greatest benefit/cost ratio. Future reviews then focus only on the still-unfulfilled items.
One positive side of this checklist is that it is at least more graspable than an “architecture guidelines” document - developers really don’t read those. By going through this checklist for a few projects, and discussing where necessary why some items are present in this list, a lot of knowledge was shared about what we (as architects) would like to see and why.
Developing this list also acted as a good focus for discussions (with all developers) about what is important to us as an organisation and software development group.
Note: The term “tribe” refers to a cross-functional software development team responsible for several software domains. The term “domain” means “a set of business functionality belonging to a team/tribe”; others might call this a subdomain, and in many cases it is equivalent to a bounded context.
Topics and Items
- Topic 1: Domain Independence
- Topic 2: Datastores and External Services
- Topic 3: Availability
Topic 4: Deployability
- Continuous Integration
- Automated Deployment of non-main branches
- Automated Deployment of Main Branch
- Alert Generation
- Versioned Build Pipeline
- Versioned Deployment Pipeline
- Pipeline Independence
- Containerized Deployment
- Zero Downtime Deployment
- Progressive Rollout
- Trusted Dependencies
- Pipeline Ownership
- Environment Consistency
- Topic 5: Interoperability
- Topic 6: Feature Toggles
Topic 7: Security
- Follow Organisation Security Guidelines
- Documents Security Variance
- Threat Modelling
- Initial Security Review
- Recent Security Review
- Standardized Authentication
- Standardized Authorization
- Document Roles
- Credentials are Configured
- Supplychain Security for Dependencies
- Modern Dependencies
- External Security-relevant Dependencies are Registered
- Environment Isolation
- Supply-chain Security for Base Images
- Topic 8: Observability
- Topic 9: Traceability
- Topic 10: Alerts and Notifications
- Topic 11: Tech Fitness KPIs
- Topic 12: Communication
- Topic 13: Workflow
Topic 1: Domain Independence
Topic Goal: Each back-end (business-tier) component in a domain (typically 1) is decoupled from components in other domains during development, testing, deployment and at runtime. This allows development to scale linearly with the number of components.
Notes: In the case where a single domain is implemented as multiple back-end components, then a higher level of coupling between these components can be tolerated as the same development team is responsible for all of them. Component independence is nevertheless helpful even in this scenario.
Goal: The component owner and only the component owner decides what code goes into their components (as long as that is consistent with organisation architectural requirements). There is no “cross coding” from other tribes/domains and no non-library code is shared with other tribes.
Hint: The list of contributors in the VCS (version control system) might give you a hint if mostly the owners contribute to a component or not. This does not mean that people outside the tribe cannot contribute to a component they do not own. However, changes to a component’s codebase must always be approved by the component owner.
Note: Pull-requests from other teams are acceptable, and even encouraged, as long as the component owners have full rights to accept or decline.
Code Quality Standards Independence
Goal: The component owner agrees on and enforces code quality standards (as long as that is consistent with the tech KPI goals set by the organisation)
Hint: This item can be fulfilled by setting a quality gate in a tool such as Sonarqube that verifies bugs and code smells.
Goal: New features can be implemented without waiting for changes in other component codebases.
Prerequisite: No libraries are shared with other projects.
Prerequisite: Minimal coupling to other components via APIs or messaging (general-purpose APIs and messages are better than purpose-specific ones).
Goal: The component can be deployed at any time without the need to coordinate with other tribes.
Prerequisite: All interface changes are backwards-compatible, ie integration-points of a component which are visible to other software have to remain stable. As a result, there are no strict dependencies between application deployments regarding the deployment order. This also enables rollback of an unsuccessful deployment.
Prerequisite: The codebase for the component is small enough to be maintained by a single tribe.
Prerequisite: Private datastores (not readable by any other component); see section “Datastores”.
Goal: The deployed component remains largely functional during outages of other components.
Hint: Dependencies to authentication/authorization systems are excepted from this rule.
Prerequisite: There is no synchronous communication between backend components that perform business logic (ideally), or appropriate fallback behaviour is in place.
Goal: The component guarantees the integrity and privacy of its own persistent data per component.
Prerequisite: Component uses a private datastore (not writable by any other component); see section “Datastores”.
No Shared Functionality
Goal: The component’s codebase does not share functionality with any other domain (eg via common libraries encoding business logic).
Hint: Code sharing of components within the same domain can be acceptable to a certain degree.
Topic 2: Datastores and External Services
Topic Goal: Data is persisted in a way that allows development to scale linearly with the number of components.
Note: A datastore can be anything that holds data. Examples: databases, memcached, file-caching-servers.
Goal: Ability to change data schemas without the need to coordinate with any other tribe.
Hint: This is the case if no other component directly accesses the database and any messages emitted from the component are appropriately decoupled from the storage schema.
Goal: Ability to cache db results without danger that the underlying data is modified by another component.
Hint: This is especially risky when using things such as a shared cache-server; ensure keys are appropriately namespaced.
Data Access Permission Ownership
Goal: Ability to enforce rules on data access (only the component owners can grant and revoke permissions, no one else).
Isolation from External Datastores
Goal: Stability of this component regardless of changes made to data persistence in other components.
Private Record Keys
Goal: No internal (potentially DB specific) surrogate keys are exposed (and thereby used by other components).
Note: When an external system depends upon an internal key, then that field cannot be changed in future without breaking backwards compatibility.
Prerequisite: Every business entity should get an organisation-wide artificial unique ID that does not change if the leading component or any implementation detail changes (e.g. by re-inserting entries into the database and thus re-generating auto-allocated keys).
Hint: This is only relevant for data that is owned by this component and under your control. Data that you process as part of a read model needs to be fixed by the owner of the data. Also exposing randomly generated UUIDs is fine (as they are neither auto-incremented nor numeric).
External Resource Addresses are Configurable
Goal: Accesses external services (including databases) via a configurable address, i.e. backing services are attached resources. This is fulfilled if endpoints and their connection properties are configurable and not hardcoded.
External Resource Access Rights are Minimised
Goal: Application users and administrative users (that perform db migrations, for instance) are separated to limit the damage if this component has vulnerabilities.
Hint: For each (accountid, credential) provided as configuration for this component, are the privileges associated with that account truly as low as possible?
Topic 3: Availability
Goal: To provide a service which is available “around the clock” and which has no user-visible downtime.
SLOs and SLAs
Goal: The component has defined SLO/SLAs based upon business requirements and they are reflected in alerting.
Goal: Component has a documented backup policy. The implementation of a policy might be done by another tribe (eg one responsible for infrastructure), but the component needs should be defined by the tribe and the respective SLO.
Goal: Component relies only on datastores that are highly available or does so only opportunistically (eg a cache, where unavailability does not lead to an SLO miss).
Goal: Component is “disposable” - an instance can be terminated and replaced by a new one without significant system impact.
Prerequisite: The component should pick up service (e.g. serve requests or start batch processing) from the time it starts within a few seconds. It should also shut down clean when it receives a SIGTERM signal.
Goal: Component is stateless and share-nothing; any data that needs to persist must be stored in a stateful backing service, typically a database.
Prerequisite: The component does not rely on sticky sessions; any instance can process any request.
Links: 12 Factor App: VI. Processes
Startup Dependency Isolation
Goal: Component starts up even when external services are unavailable (private databases are excluded).
Hint: Requiring an external service on startup can lead to circular dependencies which makes it impossible to bootstrap the platform in case of a complete platform outage.
Goal: Component starts rapidly.
Prerequisite: Avoids designs that need long warmup times on new deployments (e.g. by building up large cache structures).
Hint: This question deals with long startup times on new deployments, and is not related to performance when serving requests.
Goal: Accept requests only when an instance is ready to process them.
Hint: Many runtimes rely on the component responding appropriately to a “readiness check”.
Goal: Provides health, readiness and status data (see Observability).
Hint: It isn’t enough to be available; availability must also be measurable.
Topic 4: Deployability
Topic Goal: To achieve rapid delivery of features, the component and associated infrastructure must be easily and rapidly deployable.
Goal: Component is built automatically on push of changes to version control system (Continuous Integration).
Automated Deployment of non-main branches
Goal: Component can be deployed with no manual steps (other than authorizing the deployment) - to both test and production environments (Continuous Delivery).
Automated Deployment of Main Branch
Goal: Component is automatically deployed to production on merge to the main branch.
Goal: Alerts are generated automatically on deployment failure.
Versioned Build Pipeline
Goal: Build-pipeline configuration is stored together with the code (under version control).
Versioned Deployment Pipeline
Goal: Deployment-pipeline configuration is stored together with the code (under version control).
Goal: Build and deployment pipelines are isolated from performance or stability of other components. Expressed differently: build and deployment is possible even when other services are not currently available.
Prerequisites: Integration tests mock all external components.
Hint: This is a primary use case for PACT.
Goal: Support deployment to modern environments by packaging the component as a container image and supporting the organisation’s container-management system in deployment pipelines.
Goal: Supports rapid rollback of a deployment.
Zero Downtime Deployment
Goal: Can do zero-downtime deployments, ie be able to deploy the component during normal working hours.
Goal: Supports gradual rollout of a deployment driven by health metrics (eg blue-green or automated canary deployments).
Note: This is an advanced capability…desirable but not expected of every component.
Goal: Builds rely only on trusted image repositories (e.g. no downloads from arbitrary urls in the build process).
Hint: Downloading from untrusted sources poses two risks: 1) Continuity Risk: The artefact may suddenly become unavailable and 2) Security Risk: It might be possible to perform a supply chain attack by replacing the artefact.
Goal: The component owner has the ability to modify the build and deployment pipelines.
Goal: Dev/Test/Prod environments are structurally as similar as possible.
Note: This item is only concerned about structural similarity, not about the data and config from prod. Non-production environments must never have production data or configuration.
Topic 5: Interoperability
Topic Goal: Support a scalable, loosely-coupled organisation-wide architecture.
Goal: Publishes events of significance via a message-broker (significance is defined by your team or the organisation needs).
Goal: Events are generic and self descriptive. Topics and fields have descriptive names and use enums instead of status codes.
Hint: Where relevant, event schemas should be registered in an appropriate registry (eg Confluent Schema Registry).
Events are Documented
Goal: The events emitted by the component are documented in the appropriate place.
Goal: Has a documented plan for schema evolution and versioning to avoid communication breaking unexpectedly because of schema changes (backwards-compatibility).
Hint: A schema registry may assist in verifying backwards compatibility (if appropriately configured).
Goal: The component is completely self contained and provides its service via port binding.
Prerequisite: The deployable artifact is directly executable. An artifact which is deployed into some “host” application does not fulfil this.
Goal: Source code is meaningfully documented.
Hint: Documentation should help a reader of the code (including your future self) understand the context why a particular class/method exists or why particular design or configuration choices have been made. Code comments that simply describe what the code does (“getter/setter javadoc”) is not helpful.
Goal: Provides automated documentation generation for api interfaces.
Topic 6: Feature Toggles
Topic Goal: Support trunk based development (which in turn supports rapid software deployment).
Hint: Trunk-based development means avoiding long-lived code branches; most branches should be merged to the main branch (and thus be released to production) within 2 days. Having not-yet-ready code active only when a feature-flag is enabled allows testing of code in non-production environments without having it active in production.
Supports Feature Toggles
Goal: Code-paths within the component can be enabled and disabled via external configuration at runtime.
Toggles are Off By Default
Goal: Uses feature toggles to enable features, not disable them (i.e. a feature is off by default).
Goal: A process exists to ensure feature toggles are removed from the code as soon as the feature is complete.
Hint: Feature toggles are intended only to support merge of not-yet-production-ready code into the main branch.
Topic 7: Security
Topic Goal: Provide a service which protects data and operations.
Follow Organisation Security Guidelines
Goal: Component follows any organisation-relevant guidelines - and this is documented.
Documents Security Variance
Goal: Component documentation explicitly describes any security requirements beyond the minimum requirements.
Goal: Project documents the results of at least one threat modelling session.
Initial Security Review
Goal: Project documents the results of at least one independent security review.
Hint: The review can be internal to the organisation or an external party.
Recent Security Review
Goal: Project has had at least one security review in the last 2 years.
Goal: The component authenticates incoming requests with the organisation standard, eg OpenID Connect.
Note: Applies only to components which need authentication - typically webservers. Business-tier components usually require authorization only.
Goal: The component verifies that incoming requests have appropriate authorization to perform the requested operation, and do so via the organisation standard approach (eg OAuth2). Originating IP address is never used as the sole factor for Authorization. Roles used for authorization are appropriately sized (no “superuser” permissions).
Links: Zero Trust
Goal: The set of roles used for authorization decision are appropriately documented.
Credentials are Configured
Goal: Uses credentials/secrets only from a designated secret storage with appropriate access restrictions (does NOT embed secrets in code).
Supplychain Security for Dependencies
Goal: Project has a process for regular review and reporting of dependency security issues, ie ensures that third-party libraries are kept up-to-date with security patches.
Hint: The use of an automated dependency-scanning tool on a regular schedule is a sufficient process.
Goal: All dependencies are regularly updated to new versions, even in the absence of known security vulnerabilities.
Note: Security patches are often only available for newer versions of libraries; these may be hard to apply to components relying on older dependency versions. Keeping dependencies up-to-date is therefore good preparation for applying security fixes.
External Security-relevant Dependencies are Registered
Goal: Any external sites or other resources which the component interacts with are registered with the organisation’s relevant security tools.
Goal: Deployment environments (dev/test/prod/etc) are separated and do not allow artifacts or data to move between them except under approved conditions.
Hint: This forbids the use of production data in test environments, and use of production services from non-production environments.
Supply-chain Security for Base Images
Goal: When building container images, use only approved base images.
Hint: The organisation security team may not wish to allow every image to be used in production environments.
Topic 8: Observability
Topic Goal: To ensure component state can be inspected at runtime. This supports detection and analysis of problems.
Export Process Metrics
Goal: Export metrics that are related to technical state (Eg for a java application, jvm metrics).
Export Business Metrics
Goal: Export metrics that are related to business processes, eg counts of active users.
Export SLO/SLA metrics
Goal: Export metrics related to SLO/SLAs (counts of failing requests, errors/exceptions emitted, response times, etc).
Has a Monitoring Dashboard
Goal: Project has a live dashboard which shows all important metrics related to the component - particularly whether SLOs/SLAs are satisfied.
Goal: All exported metrics have labels with low cardinalities to avoid performance issues with the monitoring system.
Topic 9: Traceability
Topic Goal: Component records audit/trace data to enable debugging of problems in production environments.
Goal: Treats logs as event streams: Each running process writes its log-messages, unbuffered, to stdout. In staging or production deploys, the stream for each process will be captured by the execution environment, collated together with all other streams from the app, and routed to one or more final destinations for viewing and secure, read-only long-term storage.
Hint: Storing logs as files on the host of each component instance does not fulfil this requirement.
Links: 12 Factor App: XI. Logs
Goal: Uses log levels and log formats consistently (as agreed by tribe).
Goal: Supports the organisation’s distributed request tracing tool.
Private Data Access Monitoring
Goal: Logs all interactions with personal data not belonging to the originator of the request. This is mandatory for data classified as PII under the GDPR. Such logs must be kept for at least 90 days.
Goal: Logs security-relevant events (eg logon, logoff, configuration changes).
Topic 10: Alerts and Notifications
Topic Goal: Generate alerts and notifications for early problem detection and remediation.
Goal: Project has a defined Alerting Policy following organisation guidelines, and the component complies with those guidelines.
Topic 11: Tech Fitness KPIs
Topic Goal: Provide visibility of project status to the owner and management.
Goal: Project has documented goals for metrics Lead Time for Change (LTC), Deployment Frequency (DF), Change Fail Rate (CFR), and Mean Time to Recover/Restore (MTTR).
Goal: Project publishes its LTC, DF, CFR, and MTTR.
Topic 12: Communication
Topic Goal: Ensure other tribes and individuals within the organisation can get in touch with the owners of this component.
Goal: Have a well-documented chat channel or email group through which questions or information can be sent to the component owners.
Topic 13: Workflow
Topic Goal: Use best-practice workflows for development.
Goal: Use trunk-based development, ie branches used to develop features should be very short-lived (in most cases less than 2 days).
Hint: Feature toggles can allow code which is not yet production-ready to be merged into the main branch.
Links: Trunk Based Development
The original list had a somewhat different name. I have changed the name for the purposes of this article. ↩