Logging Monitoring on bucket

how to implement a logging monitoring from Cloud CLI or gsutil on the below usecase ;

Match the count of log entries in Log Explorer for a particular Project before routing with count of entries moved into GCS bucket, in case of mismatch get an alreat notification mailer.

Any pointers would help

0 6 638
6 REPLIES 6

Based on your end goal you may try the following steps to monitor logs and get alerts. But first make sure you have Google Cloud SDK installed and you have logs being exported from Log Explorer to a GCS bucket.

From gcloud, set up your environment:

‘gcloud config set project YOUR_PROJECT_ID’  to set your project ID and default region.

’gsutil mb gs://YOUR_LOG_COUNT_BUCKET/’ to set your storage of the log counts.

Create a script ‘compare_logs.sh’

#!/bin/bash

 

# Count logs from Cloud Storage

GCS_COUNT=$(gsutil ls gs://YOUR_LOGS_BUCKET/ | wc -l)

 

# Fetch count from Log Explorer's exported count (This assumes you have a file that keeps this count. We'll set this up later.)

LOG_COUNT=$(gsutil cat gs://YOUR_LOG_COUNT_BUCKET/log_count.txt)

 

# Compare

if [ "$GCS_COUNT" -ne "$LOG_COUNT" ]; then

    echo "Mismatch detected!"

    # Insert email sending command here. 

    # If you're using SendGrid or another service, you'd make an API call here.

fi

 

This is basically the process, for the automation part, that will depend on your setup. I attached some good read docs for support. [1][2][3]

 

[1] https://cloud.google.com/logging/docs/routing/overview

[2] https://cloud.google.com/storage/docs/gsutil

[3] https://cloud.google.com/scheduler/docs/schedule-run-cron-job

Hi @rayjohnn , 

Thank you for your response, got a basic overview on the solution.

But have a query regarding the count mismatch ;

  • The LOG_COUNT from log explorer with a particular search parameter suppose results to 2000 for a time series.
  • Similarly the GCS_COUNT from cloud storage bucket gives the count of json files in the bucket.

So the problem is we have a log router which routes the log into GCS bucket, that stores in form of timestamp json on hourly basis. Hence is there a way to know exact count of log entries that got routed into bucket.

Also do we have any Alerting mechanism via Policy creation to achieve the notification triggering.

To implement logging monitoring from Cloud CLI or gsutil on the use case you described, you can follow these steps:

  1. Create a Cloud Function

Create a Cloud Function that will be triggered when the count of log entries in Log Explorer for a particular Project before routing does not match the count of entries moved into the GCS bucket. The Cloud Function should:

* Query Log Explorer for the count of log entries for the specified Project and time period.
* Query the GCS bucket for the count of log entries.
* Compare the two counts and send an alert notification mailer if they do not match.
  1. Configure a Cloud Scheduler job

Create a Cloud Scheduler job to trigger the Cloud Function on a regular basis. For example, you could configure the Cloud Scheduler job to trigger the Cloud Function every hour.

  1. Allow the Cloud Function to access the GCS bucket

Grant the Cloud Function the necessary permissions to access the GCS bucket. You can do this by adding the Cloud Function's service account to the GCS bucket's IAM role.

  1. Configure the alert notification mailer

Configure the alert notification mailer to send an email to the desired recipients when the Cloud Function detects a mismatch between the count of log entries in Log Explorer and the count of entries moved into the GCS bucket.

Here is an example of a Cloud Function that you can use:

Python
import logging
import json
from google.cloud import logging_v2, storage

def log_count_mismatch(event, context):
    """A Cloud Function that sends an alert notification mailer when the count of log entries in Log Explorer does not match the count of entries moved into the GCS bucket."""

    # Get the project ID from the event context.
    project_id = context.resource.get('labels').get('project_id')

    # Get the time period for the log entries.
    start_time = event['data']['startTime']
    end_time = event['data']['endTime']

    # Get the count of log entries in Log Explorer.
    logging_client = logging_v2.Client()
    log_count_in_explorer = logging_client.project_logs(project_id).entries().list(filter=f'timestamp >= "{start_time}" AND timestamp <= "{end_time}"').result().count

    # Get the count of log entries in the GCS bucket.
    storage_client = storage.Client()
    bucket = storage_client.get_bucket('my-gcs-bucket')
    log_count_in_gcs = len(bucket.list_blobs())

    # Compare the two counts and send an alert notification mailer if they do not match.
    if log_count_in_explorer != log_count_in_gcs:
        logging.error('Log count mismatch. Expected: %d, Actual: %d.', log_count_in_explorer, log_count_in_gcs)

        # Send an alert notification mailer.
        # ...

You can use gsutil to create a Cloud Function and configure a Cloud Scheduler job. For more information, see the Cloud Functions documentation: https://cloud.google.com/functions.

Hi @pgstorm148 

Thank you for this automated approach, just wanted to confirm on the count from the GCS bucket.

log_count_in_gcs = len(bucket.list_blobs()) ---> So will this give me the exact count of number of log entries from the hourly JSON structure. 

For example in my bucket following is the structure ;

abc/2023/09/27/hourly_time_stamp.json 

So will the log count from the bucket give me daily based count taking date as parameter and iterating through all the hourly JSON files from that particular day.

 

The log_count_in_gcs variable will give you the total number of objects in the GCS bucket, including all hourly JSON files, as well as any other objects that may be in the bucket.

To get the daily count of log entries, you can iterate through all hourly JSON files for the specified day and count the number of entries in each file.

Here is an example of how to do this in Python:

Python
import json
from google.cloud import storage

def get_daily_log_count(bucket_name, date):
    """Gets the daily count of log entries in the specified GCS bucket for the specified date."""

    # Get the list of hourly JSON files for the specified day.
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blobs = bucket.list_blobs(prefix=f'{date}/')

    # Count the number of log entries in each hourly JSON file.
    total_log_count = 0
    for blob in blobs:
        with blob.open('r') as f:
            log_entries = json.load(f)
            total_log_count += len(log_entries)

    return total_log_count

# Get the daily log count for the specified date.
daily_log_count = get_daily_log_count('my-gcs-bucket', '2023-09-27')

# Print the daily log count.
print(daily_log_count)

You can modify this example to meet your specific needs, such as filtering the log entries based on certain criteria.

Hi this same solution want to implement by using MQL or PROMQL from alerting page, can you help with the joined metric solution to find the mismatch for count reconcilation