You might need a database for hosting a dynamic web application. One good way of doing this is using Cloud SQL to host a MySQL or PostgreSQL database. For high availability, you can create a failover replica in another zone. This is not the only way to do this. If you prefer a NoSQL database, you could use Firestore instead.
Google’s main data warehouse product is BigQuery. Data from many sources can be imported into BigQuery and then easily analyzed using SQL.
Google Cloud Storage is often used as a staging location. So, upload your data to Storage first and then importing the data into BigQuery will be faster.
If your data needs to be transformed before it is put in BigQuery, you can use Dataflow to build an ETL pipeline.
Cloud Dataflow is Google’s service for building data processing pipelines. Dataflow uses an open-source data processing framework called Apache Beam to code the pipeline.
Dataflow can do batch processing of static data that is uploaded to Storage or some other data service.
Dataflow can also process streaming data in real time by attaching to Google’s messaging service Pub/Sub. Messages can be sent into Pub/Sub from any application or device and Dataflow can process those messages as they arrive.
At the end of the pipeline, Dataflow can write to many different storage services. BigQuery may be the most common, but depending on how the data is processed, Bigtable might be better, or maybe a relational database like Cloud SQL or Spanner.
All requests made to GCP are written to the logging system provided by Stackdriver. You can also program your applications to send messages to the logs. Logs can contain valuable information about your application usage, security, cost, and many other things.
You can have the system output the logs to storage or Pub/Sub. If you choose Pub/Sub, the logs will be streamed in real time. Dataflow can be used for processing the log data and writing the results into BigQuery for analytics.
Using Google’s data storage and processing services, you can aggregate data from many applications and sources and build complex data analysis systems without having to worry about all the plumbing that makes these services work.
Being a competent Google Cloud Professional Data Engineer requires knowledge of these services and where they can fit into your overall solution.
This chapter was a big picture overview of data engineering problems and how GCP services can fit into how you architect solutions.
To finish this module, take the quiz. Also, don’t forget to do any hands-on exercises and explore any links that are provided in this chapter.