Part 3: Schedule Report Logs to Google Cloud Storage Using BigQuery — Audit VM and Cloud PostgreSQL on Google Cloud Platform
After we create an alert and notify by email, in some cases we need to store the logs. Actually, we can directly store the logs into Google Cloud Storage (GCS) by sinking from Cloud Logging to GCS. But in this case, the auditor wants to get the weekly report so the report that will be stored just happened once a week.
In this case, there are some ways that we can use. But at this time, we’ll try to run the weekly report via BigQuery because it supports running a scheduled query. Let’s try…
- Sink the Logs to BigQuery
Before we start to make a sink, we need to prepare the query first and make sure the query running properly with the result we want. After that, we can create a sink.
Then specify the name of the sink and choose the destination as BigQuery datasets:
We can create a new dataset or use the existing datasets or use another dataset in a different project:
fill in the dataset id and the location where you would like to store the data:
After that, we can save and let the other as default and “create sink”.
2. Prepare a Bucket in Google Cloud Storage
Create a bucket and folder in Google Cloud Storage, how to create a bucket you can see in the official documentation. For now, we already create the bucket:
This bucket will be used to store our logs.
3. Create a Schedule Running Query in BigQuery
Go to the BigQuery dashboard and we can see that our dataset to store the logs already exists. Then we need to create a temporary dataset to store the logs. I’ll explain to you later why we need this temporary dataset, for now we can create the dataset on our project:
As in the previous step, we can specify the name, location then we need to set up the table expiration. Because this dataset was just stored temporarily for 1 week, after that “create dataset”.
So, for now we have two dataset:
Why we need 2 datasets? Because in our scenario, the logs just to report every one week. When we are sinking the logs from Cloud Logging to BigQuery, every time that the query is suitable it will be store directly to BigQuery since it’s a realtime system. But we need a temporary dataset to store the logs after one week, after that it will be send to the GCS and the temporary dataset will be deleted after 1 week and continue to other logs in next week. The flow is figured in picture below:
So we’ll create two schedules:
- Store the sink datasets to temp datasets with temp 7 days
Try to run the query and see the logs, then create a schedule query:
We can specify our schedule as our need:
Then check the destination tables and choose the dataset name, fill the table id and your write preference. Also you can add notification channel if the job failed in the future. After that Save our schedule:
After the schedule was run, we can see in the history:
Let’s check in the temporary datasets:
- Send the temp datasets to GCS
After that we can make a new schedule to send the logs into GCS [1]:
EXPORT DATA OPTIONS(
uri=CONCAT('gs://your-bucket-path/New_User_Created_', FORMAT_DATETIME("%G%m%d", CURRENT_DATETIME()), '-*.json'),
format='json',
overwrite=true) AS
SELECT * FROM `your-project.temporary_db_user_created.temporary_db_user_created`
WHERE REGEXP_CONTAINS(protopayload_auditlog.requestJson, "CREATE ROLE") OR REGEXP_CONTAINS(protopayload_auditlog.requestJson, "CREATE ROLE")
After that specify the name, and schedule time (make sure it running after the temporary schedule was run). Make to other as default and save.
*remember that the CURRENT_DATETIME will follow the time that you running in BigQuery, you can change the query location such below:
*the query running location must be the same as the dataset location.
Now the schedule was run succesfully:
And let’s see in the bucket:
*take note that the report is in JSON format as follows from Cloud Logging if you need the other format for the report you need another solution. Cloud Function or others.
Finally, we’ve done for our audit report, I think there is another way to make this report. If you have another solution, let’s discuss it in the comment!
Thanks, hope this help for you!
References: