Airflow DAGs - Connect Cloud SQL Instance from Airflow DAGs

I want to connect an external Cloud SQL instance to my Airflow DAGs, Or How to connect Cloud SQL Private Instance with Cloud SQL Proxy in Airflow

Any insights or resources on the best approach for this scenario would be greatly appreciated.

Solved Solved
1 3 120
1 ACCEPTED SOLUTION

Deploying the Cloud SQL Proxy as a sidecar in your Airflow environment within Google Cloud Composer involves modifying your deployment configuration to include the Cloud SQL Proxy container alongside your existing Airflow services. This setup enables a secure and private connection to your Cloud SQL instance without exposing it to the internet. Here’s a detailed guide on how to configure the Cloud SQL Proxy as a sidecar and integrate it with your Airflow DAGs, including solving the issue of adding the Cloud SQL Auth Proxy to the pod configuration:

A sidecar deployment means running the Cloud SQL Proxy in the same Kubernetes pod as your Airflow scheduler or worker but in a separate container. This co-location ensures that all database traffic from Airflow goes securely through the Cloud SQL Proxy.

Step 1: Define the Cloud SQL Proxy Container

You'll need to update your Kubernetes deployment configuration for Airflow to include the Cloud SQL Proxy as a sidecar. Here’s how you can define the Cloud SQL Proxy container:

  1. Docker Image: Use the official Cloud SQL Proxy Docker image.

  2. Credentials: The Proxy requires credentials to connect to Google Cloud SQL. These are typically provided through a mounted volume that contains the service account key JSON file.

  3. Connection Name: You must specify the instance connection name in the format project-id:region:instance-id.

Sample Kubernetes Deployment Configuration

Below is an example of how you might configure the Kubernetes deployment for Airflow with the Cloud SQL Proxy sidecar:

apiVersion: apps/v1
kind: Deployment
metadata:
name: airflow
spec:
replicas: 1
selector:
matchLabels:
component: webserver
template:
metadata:
labels:
component: webserver
spec:
containers:
- name: airflow-webserver
image: apache/airflow:2.1.0
ports:
- containerPort: 8080
- name: cloud-sql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.19.1
command: ["/cloud_sql_proxy",
"-instances=my-project-id:us-central1:my-instance-id=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials

 

Here, localhost:5432 points to the Cloud SQL Proxy sidecar, which securely manages communication with your Cloud SQL instance.

Step 3: Deploy and Test

Once you have updated your deployment configuration:

  • Apply the changes via your Kubernetes management tool (e.g., kubectl apply -f your-deployment-file.yaml).

  • Check the logs of both the Airflow and Cloud SQL Proxy containers to ensure there are no connectivity errors.

  • Test by running a simple Airflow DAG that queries your database.

Troubleshooting

If you encounter issues where the Cloud SQL Auth Proxy cannot be added to the pod configuration:

  • Verify that your Kubernetes deployment file is correctly formatted and that all container specifications are valid.

  • Ensure that the service account associated with the deployment has sufficient permissions to pull the Cloud SQL Proxy image and access the Cloud SQL instance.

  • Check for any typos in your Kubernetes deployment descriptors, especially in the secrets and volume mounts.

View solution in original post

3 REPLIES 3

Private Cloud SQL instances are not directly accessible from the internet. As Cloud Composer environments also operate within Google Cloud, a secure connection method is necessary. The Cloud SQL Proxy provides a robust solution for this purpose by enabling secure access to your Cloud SQL instances without the need for a public IP.

Recommended Approach

  1. Service Account and Permissions:

    • Create a dedicated service account within your Google Cloud project for the Cloud SQL Proxy.
    • Assign the following roles to this service account:
      • Cloud SQL Client: Essential for the proxy to establish a connection.
      • Service Account Token Creator (Optional): Simplifies authentication by allowing the proxy to auto-refresh access tokens.
    • Enable Cloud SQL Admin API: To use the cloud-sql-proxy in your connection string.
  2. Cloud SQL Proxy Configuration:

    • Environment Variables:
      • Set GOOGLE_APPLICATION_CREDENTIALS within your Composer environment to point to the absolute path of the service account key file in Cloud Storage (e.g., gs://your-bucket/key.json).
  3. Airflow Connection:

    • Connection Type: Use a specific database type (e.g., PostgreSQL or MySQL) instead of the generic "Google Cloud Platform" type.
    • Connection String: Configure it to use the Cloud SQL Proxy:
      postgresql+psycopg2://[USER]:[PASSWORD]@127.0.0.1/[DATABASE]?host=/cloudsql/[INSTANCE_CONNECTION_NAME]&unix_sock=/cloudsql/[INSTANCE_CONNECTION_NAME]/.s.PGSQL.5432
      
      • Replace placeholders with your actual database credentials and instance connection name.
      • Ensure the unix_sock path matches the location where the Cloud SQL Proxy creates its socket file (usually /cloudsql/[INSTANCE_CONNECTION_NAME]/).
  4. Airflow DAG:

     
    from airflow.providers.google.cloud.operators.cloud_sql import CloudSQLExecuteQueryOperator
    
    execute_query = CloudSQLExecuteQueryOperator(
        task_id='execute_query',
        sql='SELECT * FROM your_table',
        gcp_conn_id='my_private_cloudsql',  # Ensure this matches your Airflow connection ID
    )
    
     
     
     

Additional Tips:

  • Secrets Management: Store your service account key in Secret Manager for enhanced security. Airflow provides seamless integration for fetching secrets.
  • Composer Configuration: If your Composer environment and Cloud SQL instance are not in the same VPC, use VPC Peering or Shared VPC.
  • Cloud SQL Proxy as a Sidecar (Optional): Consider deploying the Cloud SQL Proxy as a sidecar container within your Composer environment for a more robust and scalable solution.

Troubleshooting:

  • Permissions: Verify the service account has the Cloud SQL Client role on the Cloud SQL instance and, optionally, the Service Account Token Creator role.
  • Network Connectivity: Confirm that your Composer environment can reach the private IP of your Cloud SQL instance.
  • Cloud SQL Admin API: Ensure that the Cloud SQL Admin API is enabled in your project.

Could you please provide additional information on deploying Cloud SQL Proxy as a Sidecar, including how it connects to a Cloud SQL instance and how I can integrate it with my Airflow DAGs? and here is main issue that I'm not able to Add the Cloud SQL Auth Proxy to the pod configuration in deployment files.

Deploying the Cloud SQL Proxy as a sidecar in your Airflow environment within Google Cloud Composer involves modifying your deployment configuration to include the Cloud SQL Proxy container alongside your existing Airflow services. This setup enables a secure and private connection to your Cloud SQL instance without exposing it to the internet. Here’s a detailed guide on how to configure the Cloud SQL Proxy as a sidecar and integrate it with your Airflow DAGs, including solving the issue of adding the Cloud SQL Auth Proxy to the pod configuration:

A sidecar deployment means running the Cloud SQL Proxy in the same Kubernetes pod as your Airflow scheduler or worker but in a separate container. This co-location ensures that all database traffic from Airflow goes securely through the Cloud SQL Proxy.

Step 1: Define the Cloud SQL Proxy Container

You'll need to update your Kubernetes deployment configuration for Airflow to include the Cloud SQL Proxy as a sidecar. Here’s how you can define the Cloud SQL Proxy container:

  1. Docker Image: Use the official Cloud SQL Proxy Docker image.

  2. Credentials: The Proxy requires credentials to connect to Google Cloud SQL. These are typically provided through a mounted volume that contains the service account key JSON file.

  3. Connection Name: You must specify the instance connection name in the format project-id:region:instance-id.

Sample Kubernetes Deployment Configuration

Below is an example of how you might configure the Kubernetes deployment for Airflow with the Cloud SQL Proxy sidecar:

apiVersion: apps/v1
kind: Deployment
metadata:
name: airflow
spec:
replicas: 1
selector:
matchLabels:
component: webserver
template:
metadata:
labels:
component: webserver
spec:
containers:
- name: airflow-webserver
image: apache/airflow:2.1.0
ports:
- containerPort: 8080
- name: cloud-sql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.19.1
command: ["/cloud_sql_proxy",
"-instances=my-project-id:us-central1:my-instance-id=tcp:5432",
"-credential_file=/secrets/cloudsql/credentials.json"]
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials

 

Here, localhost:5432 points to the Cloud SQL Proxy sidecar, which securely manages communication with your Cloud SQL instance.

Step 3: Deploy and Test

Once you have updated your deployment configuration:

  • Apply the changes via your Kubernetes management tool (e.g., kubectl apply -f your-deployment-file.yaml).

  • Check the logs of both the Airflow and Cloud SQL Proxy containers to ensure there are no connectivity errors.

  • Test by running a simple Airflow DAG that queries your database.

Troubleshooting

If you encounter issues where the Cloud SQL Auth Proxy cannot be added to the pod configuration:

  • Verify that your Kubernetes deployment file is correctly formatted and that all container specifications are valid.

  • Ensure that the service account associated with the deployment has sufficient permissions to pull the Cloud SQL Proxy image and access the Cloud SQL instance.

  • Check for any typos in your Kubernetes deployment descriptors, especially in the secrets and volume mounts.