People tell their kids that they need to set goals. Well, cloud architects should have goals to.
Depending on your application, customer, and use cases, you may set different priorities or goals. When designing systems, more than likely you will have multiple goals depending on what you are designing for.
There are many things that are important when designing systems for the cloud. Fault tolerance, high availability, scalability, elasticity, and disaster recovery are a few.
Fault tolerance is a system's ability continue to work even when there are hardware failures.
For a system to be fault tolerant, there has to be a backup of everything. The cloud provider handles some things for you. Obviously, you don't have to worry about a battery backup.
Other things are your responsibility though. You need to decide whether to create a failover database when using Cloud SQL, for example. Or, when you are configuring an instance group in Compute Engine, you have to create a health check to ensure the system detects when a server is down and needs to start a new one.
Highly available systems are ones that work almost all the time. In the cloud, to achieve high availability, you must always have at least two of every server and those servers need to be in different zones.
Google Cloud Platform ensures resources in different zones share no common points of failure. So, they handle part of the work for you.
Availability is a KPI. You can't say your application will have 100% availability. That is not achievable.
While 99.999% availability might be achievable, it might be too expensive to make it relevant for the system you are developing.
For example, if you're deploying a service that is used by mobile phone users that are often on slow and unreliable networks, it's probably overkill for you to design for five-9s availability. It might better to design for lower availability, and let the programmers code for retries when requests fail.
This diagram depicts how you would deploy a highly available system in Google Cloud Platform. Deploy multiple servers to multiple zones within a region. Since you have multiple servers, put a load balancer in front of them to handle requests and route them to a healthy server.
Scalability is the ability for a system to continue working as the number of users and/or the amount of data increases.
In the cloud, you also want elasticity. Elastic scalability not only adds resources when demand increases, but turns off resources when demand decreases.
When deploying systems to internal data centers, you might not worry about turning unneeded resources off because you already purchased the hardware. In the cloud though, you are renting compute resources. You should turn those resources off as soon as possible to reduce the amount that you are spending.
In internal data centers, you might prefer a fewer number of large servers. In the cloud, you should prefer a larger number of small servers and turn them off when they aren't needed.
Deploying to multiple zones within a region will allow you to achieve high availability. However, if the entire region is down, then so is your application.
It is very unlikely that an entire region would be down. It could happen though. It could be caused by a natural or man-made disaster or some kind of human error. You need to plan for it.
In GCP, you can create a load balancer that will balance traffic to machines in different regions. You can also create multi-region buckets in cloud storage. So, it is possible to operate out of multiple regions at the same time. This is known as a hot standby. This might cost more than you want to pay though.
You may decide, that in the unlikely event of a region being down, that it is acceptable for your application to be down for some period of time. If this is the case, you can create a cold standby. That is to say, you keep a copy of everything that is required to run an application and all your data in another region as a backup. Then, if you ever need to, you can spin up your environment in your backup region and be up and running again.
Needless to say, if you don't regularly practice turning on the backup environment, you don't really have a cold standby.
You may be lucky enough to have customers all over the world! If this is the case, then you want to deliver them your content as fast as possible, with the lowest latency possible, and at the cheapest cost. To minimize latency, you need to respond to user requests as close as possible to the users.
This is a problem Google has more than any other company in the world. So, it's not surprising that Google Cloud Platform provides the tools that handle this problem easily.
GCP's load balancer can send requests to multiple regions. The load balancer detects where the requests are coming from and automatically routes those requests to the region closest to the user. If a region gets overloaded with too much traffic, the load balancer will choose a different region. If a region becomes unavailable, it will choose a different region.
In addition, when configuring a load balancer, you can enable the Content Delivery Network (CDN). If you do, your static content will be copied to Google edge caching locations all over the world. This increases performance and lowers latency. It also reduces the cost of network egress, because the cost of network egress from the CDN is less expensive than from the data centers.
Your architectural goals are different if you are doing big data processing. Big data processing requires jobs be split across multiple machines in a cluster.
Dataproc and BigQuery are examples of Big data processing services provided by GCP. To get jobs done quickly, many machines are used and the work is divided between them.
The key is to keep the machines as close to one another as possible and to keep those machines as close to the data as possible. This minimizes latency and also minimizes cost. There is no cost for transferring data between machines in the same zone. There is also no cost to transferring data from a Google service running in the same region as a machine.
This is why when you create a DataProc cluster, you choose a zone not a region, and all your machines are put in the zone you specify.