This article is intended to provide an architect-level view of Google’s cloud-computing services - the Google Cloud Platform (GCP).
Sadly when I was first dumped into the middle of a GCP-based project, I was unable to find any book or other source which helped me put the pieces into context. Google’s online documentation is unfortunately too detailed; I found nothing which gives an overview of the situation - the big picture into which the dozens of Google components fit. The situation is further confused by the fact that Google often has overlapping services, due to various reasons including:
- Some services being phased out while others are phased in
- Where companies have been purchased by google and their services merged in to the overall solution even when some components duplicate existing functionality
- Apparently competing departments within google
The following concepts are covered (in brief overview):
- How Google represents identities (users)
- How GCP accounts are created
- What resources a GCP account manages
- How permissions/policies are inherited
- How billing is managed
- How users can interact with a GCP account (web, rest, gcloud)
I hope to write followup articles discussing authorization, and looking at the various resources and services in more detail - at least those with which I have experience.
Note that I am not an expert in this area, and have no insider information about Google services; everything written below is the result of a moderate amount of experience using GCP, some experimentation, and logical deduction (which could possibly be wrong). Feedback is welcome!
Getting Started with the Google Cloud Platform
Google provide a number of end-user services - file storage, online document editing, etc. These are software as a service - something you can use but not program.
These services provided by Google need to be secure and very scalable; Google have built datacenters around the world and developed software frameworks that run in these datacenters to support their software as a service offerings (eg Google Docs). And fortunately Google also make it possible for a software developer to get access to these services to run custom code - for a fee of course. This set of services is called the google cloud platform aka GCP.
In fact, Google has a very generous “free tier”, charging for use of the GCP infrastructure only when that usage grows beyond specific limits (storage size, transactions-per-day, etc). It is quite possible to implement reasonable-sized applications without paying a cent - but which can scale to larger data volumes when needed. And if usage does increase to the level that payment is needed, then presumably the service is successful enough that it pays for itself.
Getting access to Google’s GCP services as a developer starts by creating a simple end-user Google account, as needed to access the free software-as-a-service tools. It is then possible to create a Google Cloud Platform account linked to that end-user account, and then various interesting resources can be added to the cloud platform account such as virtual machines on which to run code, or database services into which data can be programmatically stored.
The central service which ties everything together is identity-management; this discussion therefore starts there.
The Google Identity Service
Google’s identity service is a distributed database of (id, credentials, profile) information, and various APIs for interacting with this database.
Entries in this database are of four different types:
- A GMail account directly with Google (personal account)
- A member of a Cloud Identity account
- A member of a GSuite account (similar to cloud-identity)
- An application service account (which represent programs rather than users)
Each entry is an identity with a unique string-typed id; for the first three types of entries, the id is of form
name@domain. It is common for this id to also be a valid email-address for the user associated with this account - but the concept of account-id and email-address are logically separate.
Every identity also has some associated credentials that can be used to “log in” as that identity. Various types of credentials are supported; the simplest of course being a plain password. More complex options include two-factor authentication, public keys, etc.
As well as implementing a global distributed database for identity information, google provides an associated REST service for interacting with the Google identity service, in particular to submit credentials and get back an OAuth ticket that can then be used to authenticate to other Google services. Other REST endpoints allow update of the profile information for the identity. The identity service also provides an OpenID Connect page for web-based interactive login and single-signon support - which again results in an OAuth ticket being issued that can then be provided to other Google services.
Any person can create a GMail account for free, in just a few minutes. As well as creating an entry in the global Google identity service, user setup is triggered for a number of services including:
- Hangouts (video calls)
- Docs (word processing, spreadsheet, presentation)
- Drive (file storage)
- Google Cloud Platform
In the case of email, “setup” includes:
- Allocating storage space for email (“mailbox”) with a standard storage quota
- Configuring Google permissions to allow the user to access the email REST api, and to access the email web interface
- Configuring the Google email servers to accept incoming emails of form
firstname.lastname@example.org forward them into the mailbox
- Configuring the Google email servers to accept outgoing emails of form
email@example.com(via REST api, POP or SMTP)
Setup for other services will be similar, ie usually allocating a storage location and updating permissions associated with the identity.
In general, each service that Google provides has a REST api. A web-based interface is then also provided which is implemented in terms of that API - ie whatever can be done interactively with a web browser can also be accessed programatically.
As noted earlier, each Google identity has a unique id; the ids for “personal accounts” are strings of form “firstname.lastname@example.org” - which happens to be the same as the email address associated with that account.
With a free GMail account, the user can visit the Google Cloud Platform admin page at https://console.cloud.google.com and immediately start using GCP resources (eg creating virtual machines running custom code). See later for more details.
When registering a gmail account, it is also possible to use an existing email-address as the identity. In this case, no google-hosted mailbox is created. Note that it is not possible to change the email address associated with a gmail account; using an external address is therefore only advisable if that address is very stable, ie will outlast the GMail account lifetime.
A gmail account can be deleted, but the id used remains reserved (at least for a time-period).
GMail “personal” accounts have ids of form “email@example.com”, unless you already had a stable external email address and decided to use that as your gmail account id. Such personal accounts can be administered only by the account owner.
It is also possible to enable “Cloud Identity” for a GCP account and associate it with a custom domain-name you own. It is then possible to
- Create users with ids of form
- Make some users “administrators” who can manage other user accounts (including locking, resetting, deleting); and
- Sync Cloud Identity with an existing LDAP server (so that user data does not need to be entered twice)
To enable Cloud Identity a GCP account is required. This discussion therefore needs to refer to some GCP-related topics that are presented fully later - but as the concept of identity is central to GCP it is best to describe Cloud Identity at least briefly first.
Enabling Cloud Identity with a custom domain requires first creating a personal account, then creating a GCP account, linking a custom domain-name to that GCP account and then enabling Cloud Identity for the GCP account. A Rest API and web interface is then available to allocate identities with custom ids.
Like google personal identities, each identity specifies (id, credentials, profile) and is stored in the global distributed database. In short, Cloud Identity is simply an entry-point into the google identity service which allows ids other than “@gmail.com”.
Users associated with the same domain do have some special interaction, including sharing an address-book.
To enable Cloud Identity you need:
- A GCP account
- A billing account
- Your own domain name
Then in GCP console, go to any project, menu option
IAM > Identity and choose sign up. Note that although Cloud Identity is configured via a project, it is actually a global setting associated with the Google Cloud Platform (GCP) account. As part of configuring Cloud Identity, an Organization resource is created and associated with the Google Cloud Platform account (see later).
Cloud Identity is currently not very well documented; I suspect it was originally part of GSuite and has only recently been “factored out” as a standalone service.
Note that although a GCP account with an associated billing account is needed (ie you do need a credit card), there is no charge for Cloud Identity.
Accounts created through Cloud Identity have ids of form
someuser@yourdomain rather than
As far as I know (ie not confirmed), users created through Cloud Identity do still get a google-hosted mailbox. However Google does not publish DNS MX records for the custom domain, so mail for those users will not be directed to Google’s infrastructure by default. If you wish to provide your own email hosting infrastructure for those email addresses, then the Google mailboxes can just be ignored. If you wish to use Google’s hosting then you just need to publish the appropriate MX records for your custom domain; publishing SPF and DKIM records is optional but recommended.
When a user with an identity created via Cloud Identity visits the GCP admin page at console.cloud.google.com, they see the GCP account through which their identity was defined - or at least those parts of it for which they have been granted rights to see.
GSuite is a combination of a GCP account with Cloud Identity enabled (ie requires a custom domain name), and a license for extended versions of the Google Docs application suite. This allows a company which has a domain-name to manage its own users, but those users are automatically configured with permissions to access all Google services enabled for the associated GCP account. Things like address-books are also shared with other users in the same Cloud Identity domain.
Because GSuite implicitly sets up Cloud Identity, the associated GCP account always has an Organization resource associated with it.
The intended audience for GSuite is companies who need email, word-processing, spreadsheets, and shared storage for these documents, but do not wish to manage their own physical infrastructure or track software licences. With a GSuite account, a Cloud Identity “admin” user can create and manage accounts for company employees - or LDAP synchronization from an external LDAP server can be configured.
Google is also a domain name registrar; it is common for small companies to purchase a suitable domain-name from Google at the same time they sign up for GSuite. When Google is the domain name registrar then it can publish DNS records (MX for mail, A for websites, etc) automatically, eg making email address
user@customdomain direct to the Google-hosted email infrastructure automatically.
A two-week trial license for GSuite is available for free. With this licence you can create and administer user accounts (as with non-gsuite cloud identity), use the extended features of the Google online services, and generally get a feel for how GSuite works.
Permission and Policy Management (IAM)
The Google IAM service provides authorization throughout the google services, ie maps users to roles and roles to permissions. Various Google services then (indirectly) test whether the user invoking a service (usually via a rest call) is permitted to perform that operation by checking the IAM permissions for that user.
Examples of the things IAM controls is whether a specific user:
- Is permitted to change the budget associated with a billing account
- Is permitted to create a new project within an GCP account
- Is permitted to create a new virtual machine instance within a GCP account project
- May deploy a new version of an AppEngine application
- Has read/write/create/delete rights on a specific cloud storage bucket (object storage “filesystem”)
Google groups can be used to define groups of multiple users, and IAM permissions can then be granted to the entire group. IAM permissions can also be attached to entire domains, eg all users in a specific Cloud Identity domain.
IAM is not a general-purpose authorization platform; it is hard-wired in many ways to support specifically the set of services that Google offers. Note in particular that if you are writing a custom application that will run on GCP resources such as VMs or containers, then IAM will do very little to help you manage users of that application. If your application users are all registered in the Google identity service (eg if this is a company-internal app) then IAM may allow you to control who can access the application at all (ie who can reach a specific host/port), but will not provide any finer-grained control. Of course your app will need appropriate IAM permissions on its application service account identity in order to access cloud-storage-buckets, databases, etc.
The Google Cloud Platform
The Google Cloud Platform, or GCP for short, is the set of services google offer for storing and processing data, and running custom applications, within Google’s datacenters.
A GCP account is always owned by a single Google personal account, and the owner can never be changed. It is therefore good practice for a company to create a Google personal account with an id like “firstname.lastname@example.org”, set up email-forwarding for the corresponding email account to the company IT department, and store the login credentials for that gmail identity in the company safe. Further accounts can then be created with access-rights to the GCP account for daily administration.
A GCP account contains:
- Zero or one Organisation resources (which describes the company or other entity associated with the GCP account)
- Zero or more billing accounts (each with associated credit-card)
- One or more projects (which hold resources; see below)
- Zero or more folders (which define a logical tree view of the GCP account projects)
- Global permissions (actually associated with the folders)
As noted earlier, an Organization resource is created when Cloud Identity is enabled for a GCP project - or created during GSuite setup. A simple GCP account without Cloud Identity enabled will not have an Organization resource. A GCP account can be configured to grant access to any identities registered with Google, not just those allocated with Cloud Identity.
A GCP project is always a direct child of a GCP account - projects are never nested. A project holds multiple resources such as:
- An optional reference to a “billing account”
- Cloud Storage buckets
- Virtual network definitions (with associated firewall rules)
- Virtual machines
- Access permission rules
- Licences for third-party APIs (free or paid)
- and various other things
Registering a GCP Account and a First Project.
Just create a gmail account and then visit the standard GCP administration page at console.cloud.google.com. Click on “create an empty project”.
A free GCP account (one without an associated billing account) may have a maximum of 12 projects associated with it.
One of the nice things about GCP is that so much is available without having to create a billing account (ie register a credit card). This implies that there is no way that you can incur any expenses associated with that account, as there is no way for Google to charge you; the worst case is that services stop working when the free limit is reached. This is different from Microsoft Azure where some services are also free, but a credit-card must be registered before even the free services are available; here a false step can result in charges.
Note that (at least currently) there is a button on the admin page labeled “Sign up for a free trial”. However it is not necessary to “sign up” in order to use free services. The offer applies to a billing account, ie when you wish to use services that are normally chargeable, the “free trial” gives a new billing account an initial $300 credit allowing normally chargeable services to be tried out.
There are some features that cannot be used without a billing account, including the Compute Engine (VMs) feature. However even when you have registered a billing-account wiht credit card, Google still provides quite a generous “free quota” for these “billing account required” services - ie the fact that a credit-card is required does not necessarily mean that you will be charged anything.
The list of things that can be added to a project has already (partly) been described above. Further articles (planned) will look at these in more detail; this article is just looking at the general GCP structure.
In any project, visiting menu option
IAM | Identity allows Cloud Identity to be configured. This does require a billing account.
The Organization Resource
A GCP account has zero or one Organization resources. As well as providing admin information about the company or other organization associated with the GCP account, permissions (an IAM policy) can be associated with the Organization. This policy applies to (is inherited by) all other resources in the project. When a GCP account does not have an Organization resource then there is no global policy that applies to all resources; each project and other resources (eg billing accounts) have independent policies.
The Resource Hierarchy (aka Folders)
Projects themselves are direct children of a GCP account. Each project has its own permissions-settings (IAM policy) which controls who can access what - though the project does inherit policy settings defined on the Organization resource (if one exists).
A GCP project also has an optional tree of “folder” resources; each “folder” can have an IAM policy attached to it, and can have projects and other folders as children. Policy settings (ie permissions) are inherited through the folder structure. Folders are typically used to model the department or reporting hierarchy of a company or organisation, ie the projects being run by a specific department are attached to a folder representing that department, and admin users from that department are granted permission to alter the policy attached to their folder and projects - but not parent folders.
One of the effects of inherited policies is that the Organization administrator (who can set policies on the Organization resource) can override policies set lower down in the hierarchy, eg granting access rights on projects in situations where none of the original admin-users associated with that project are available.
Folders can also have labels attached to them, allowing specific folders across the hierarchy to be grouped together.
A GCP account has a set of billing accounts. A billing account has a credit-card number through which payment is charged, and a bill (cost report) is available per-billing-account.
Each billing account has budget controls, after which no further charges will be incurred (but of course paid-for services will no longer be available).
Each project is associated with zero or one billing accounts from the parent GCP account; if no billing account is linked then the project cannot use any paid features and transaction/storage volumes are limited to the free quota allowed by Google.
The GCloud Tool
This article has pointed out that all Google services are accessible via a REST API in addition to a web interface (AIUI, the web interface is actually implemented via the REST API).
The Google Cloud SDK is a commandline toolset (implemented in Python) that can be installed on developer/administrator systems in order to administer/configure GCP resources; it simply makes calls to the rest APIs, ie anything that can be done via raw rest or via a web interface can also be done via the commandline tool
The Google Cloud SDK is actually split into modules; the initial install provides the gloud tool which also acts as a kind of ‘package manager’ through which additional modules can be installed. Useful modules include things such as an emulator for the Google Datastore NoSQL database, so that code interacting with Datastore can be tested on developer laptops, etc.
I can highly recommend the gcloud commandline tools; it is often far easier to discover and use functionality via this tool than via the web interface.
The Google Resource Manager
The Resource Manager is a REST service which provides access to information about a GCP account, eg
- Data associated with the Organization resource can be read and updated
- Projects can be listed, created, deleted
- Billing accounts can be listed and updated
- Folders can be listed, created and deleted
- Add temporary access limitations (modifications to the IAM policy) aka Lien, eg making a resource temporarily undeletable until some other process is complete
This API does not manage the internal resources of projects, ie the resource manager API can create and delete projects, but does not provide methods to add or remove resources (eg VMs or firewall rules) within a project; that is provided by a separate API.
After visiting the GCP admin page (console.cloud.google.com) and selecting a project from the dropdown list, a menu of options is displayed on the left. There is a huge amount to learn about all the different services and options available, but it might be useful to get a brief summary of at least the top-level menu items in that list:
Cloud Launcher – uses predefined templates to install complete “packages” of software onto the GCP, eg a virtual network plus a set of VMs each running a specific predefined VM image. Things like a LAMP stack (Linux/Apache/MySQL/PHP) can be installed from Cloud Launcher with just a few clicks.
Billing – described above
APIs & services – configures IAM permissions to allow code within the GCP project to invoke specific APIs (some from Google, some third-party). Some services require payment, in which case a billing account is required. Enabling an API often includes a “setup” phase in which data is entered.
IAM & admin – configuring access permissions for users and applications; configure Cloud Identity; manage encryption keys.
Compute – configure ways to run custom code, from low-level (pure VMs) to high-level (cloud functions).
- Compute Engine manages pure VMs, on which you boot a VM image and then configure everything yourself
- Kubernetes Engine manages containers; you provide the container images and declaratively specify how they should be scaled and wired together
- App Engine manages applications; you provide java packaged apps as
.warfiles, or equivalent “app packages” using Python, PHP, and several other supported languages/frameworks, and google deploys and scales them
- Cloud Functions manages code fragments; you provide very fine-grained logical modules in various supported languages and Google deploys and scales them
Storage – persisting data in various ways
- Storage (Cloud Storage) is an object store (somewhere between a filesystem and a key-value store for large amounts of data)
- Bigtable is a very scalable noSQL database
- Datastore is an alternative noSQL database
- SQL provides various not-particularly-scaleable SQL-compatible databases
- Spanner is a very scalable SQL-compatible database with transaction support
- Define virtual networks which VMs/containers can be bound to
- Define firewall rules associated with virtual networks
- Define load-balancers
- Define DNS records for use within the cloud environment (VM-to-VM lookup)
- Access CDN (content distribution) services for serving static content to large numbers of users
Stackdriver - tools for monitoring and debugging code running in the compute environments
- Container Registry – central storage for images deployed to Kubernetes Engine
- Source Repositories – google-hosted version control systems
- Endpoints – provides optional monitoring and security features for REST applications deployed on compute environments (provided those apps are endpoint-enabled)
Bigdata – services for storing and transforming large amounts of data
- BigQuery – an SQL-execution engine for business intelligence (OLAP) workloads
- Pub/Sub – a scalable message broker service
- Dataproc – batch and streaming data processing based on Hadoop and Spark
- Dataflow – batch and streaming data processing based on Apache Beam and Google’s proprietary execution engine (similar to Spark)
- ML Engine – machine learning tools
- Dataprep – ETL data-cleansing tools
Of course there are far more online services available via the “APIs & services” menu.
See this official list of GCP products for more details.