Soda Agent Extra
Last updated
Was this helpful?
Last updated
Was this helpful?
When you deploy a self-hosted Soda Agent to a Kubernetes cluster in your cloud service provider environment, you need to provide several key parameters and values to ensure optimal operation and to allow the agent to connect to your Soda Cloud account (API keys), and connect to your data sources (data source login credentials) so that Soda can run data quality scans on the data.
By default, Soda uses as part of the Soda Agent deployment. The agent automatically converts any sensitive values you add to a values YAML file, or directly via the CLI, into Kubernetes Secrets.
As these values are sensitive, you may wish to employ the following alternative strategies to keep them secure.
When you deploy a self-hosted Soda Agent from the command-line, you provide values for the API key id and API key secret which the agent uses to connect to your Soda Cloud account. You can provide these values during agent deployment in one of two ways:
directly in the helm install
command that deploys the agent and stores the values as in your cluster; see
OR
in a values.yml
file which you store locally but reference in the helm install
command as in the example below.
If you use private key with Snowflake or BigQuery, you can provide the required private key values in a values.yml
file when you deploy or redeploy the agent.
You can add those details directly in Soda Cloud, but because any user can then access these values, you may wish to store them securely in the values YAML file as environment variables.
Create or edit your local values YAML file to include the values for the environment variables you input into the connection configuration.
After adding the environment variables to the values YAML file, update the Soda Agent using the following command:
Follow the remaining guided steps to add a new data source in Soda Cloud. When you save the data source and test the connection, Soda Cloud uses the values you stored as environment variables in the values YAML file you supplied during redeployment.
Use External Secrets Operator (ESO) to integrate your self-hosted Soda Agent with your secrets manager, such as a Hashicorp Vault, AWS Secrets Manager, or Azure Key Vault, and securely reconcile the login credentials that Soda Agent uses for your data sources.
Say you use a Hashicorp Vault to store data source login credentials and your security protocol demands frequent rotation of passwords. In this situation, the challenge is that apps running in your Kubernetes cluster, such as a Soda Agent, need access to the up-to-date passwords.
To address the challenge, you can set up and configure ESO in your Kubernetes cluster to regularly reconcile externally-stored password values so that your apps always have the credentials they need. Doing so obviates the need to manually redeploy a values YAML file with new passwords for apps running in the cluster each time your system refreshes the passwords.
To integrate Soda Agent with a secret manager, you need the following:
External Secrets Operator (ESO) which is a Kubernetes operator that facilitates a connection between the Soda Agent and your secrets manager
a ClusterSecretStore resource which provides a central gateway with instructions on how to access your secret backend
an ExternalSecret resource which instructs the cluster on what values to fetch, and references the ClusterSecretStore
The following procedure outlines how to use ESO to integrate with a Hashicorp Vault that uses a KV Secrets Engine v2. Extrapolate from this procedure to integrate with another secrets manager such as:
You have set up a Kubernetes cluster in your cloud services environment and deployed a self-hosted Soda Agent in the cluster.
For the purpose of this example procedure, you have set up and are using a Hashicorp Vault which contains a key-value pair for POSTGRES_USERNAME
and POSTGRES_PASSWORD
at the path local/soda
.
Verify the installation using the following command:
Deploy the ClusterSecretStore
to your cluster.
Create an soda-secret.yml
file for the ExternalSecret
configuration. The details in this file instruct the Soda Agent which values to fetch from the external secrets manager vault.
This example identifies:
the namespace
of the Soda Agent
two remoteRef
configurations, including the file path in the vault, one each for POSTGRES_USERNAME
and POSTGRES_PASSWORD
, to detail what the ExternalSecret
must fetch from the Hashicorp Vault
a refreshInterval
to indicate how often the ESO must reconcile the remoteRef
values; this ought to correspond to the frequency with which your passwords are reset
the secretStoreRef
to indicate the ClusterSecretStore
through which to access the vault
a target template
that creates a file called soda-agent.conf
into which it adds the username and password values in the dotenv format that the Soda Agent expects.
Deploy the ExternalSecret
to your cluster.
Use the following command to get the ExternalSecret
to authenticate to the Hashicorp Vault using the ClusterSecretStore
and fetch secrets.
Output:
Deploy the Soda Agent using the following command:
Output:
By default, the Soda Agent creates a secret for storing the Soda Cloud API Key details securely in your cluster. If you want to use a different secret, you can point the Soda Agent to an existing Kubernetes Secret in your cluster using the soda.apikey.existingSecret
property.
To use an existing Kubernetes secret for Soda Agent’s Cloud API credentials, add existingSecret
and the secretKeys
values to your agent’s values YAML file, as in the following example.
The default Soda Agent settings balance performance and cost-efficiency. You can adjust these settings to better suit your needs, optimizing for larger datasets, faster scans, or improved resource management.
The hard query cursor limit setting controls how many rows Soda Library can store in memory during a scan. By default, this value is 10,000 rows, preventing Out-Of-Memory (OOM) errors by capping the number of rows Soda holds in memory at any given time.
If you need to work with larger sets of sample data or failed rows, you can raise the query_cursor_hard_limit
. Be aware that if you increase or remove the limit, you must ensure that the Soda Agent has enough memory to prevent it from causing OOM errors.
To turn off the limit completely, set the value of query_cursor_hard_limit
to null
.
The example below demonstrates how you can clear the limit and increase the memory limit using settings in your values.yml
file:
Refer to the exhaustive for more detail on how to deploy an agent using a values YAML file.
When you, or someone in your organization, follows the guided steps to use a self-hosted Soda Agent to in Soda Cloud, one of the steps involves providing the connection details and credentials Soda needs to connect to the data source to run scans.
In of the add a data source guided steps, add data source connection configuration which look something like the following example for a PostgreSQL data source. Note the environment variable values for username and password.
The current integration of Soda Agent and a secrets manager does not yet support the configuration of the Soda Cloud credentials. For those credentials, use a tool such as or .
Read more about the .
Consider referencing the for integrating an External Secrets Manager with a Soda Agent which offers step-by-step instructions to set everything up locally to see the integration in action.
Use helm to install the External Secrets Operator from the into the same Kubernetes cluster in which you deployed your Soda Agent.
Create a cluster-secret-store.yml
file for the ClusterSecretStore
configuration. The details in this file instruct the Soda Agent how to access the external secrets manager vault.
This example uses . AppRole authenticates with Vault using the to access the contents of the secret store. It uses the SecretID in the Kubernetes secret, referenced by secretRef
and the roleID
, to acquire a temporary access token so that it can fetch secrets.
Access external-secrets.io documentation for configuration examples for:
Prepare a values.yml
file to deploy the Soda Agent with the existingSecrets
parameter that instructs it to access the ExternalSecret
file to fetch data source login credentials. Refer to complete , or if you already have an agent running in a cluster.