Be first to try Soda's new AI-powered metrics observability, and collaborative data contracts.

Deployment options

PreviousDataset Attributes & Responsibilities NextDeploy Soda Agent

Last updated 7 days ago

Was this helpful?

Deployment options

Soda offers flexible deployment models to suit your team’s infrastructure, scale, and security needs. Whether you want to embed Soda directly into your pipelines, use a centrally managed deployment, or rely on Soda’s fully-hosted solution, there’s an option for you.

This guide provides an overview of the three main deployment models: Soda Core, Soda-hosted Agent, and Self-hosted Agent, to help you choose the right setup for your organization.

Overview of Deployment Models

Deployment Model

Description

Ideal For

Key Features

Considerations

Soda Core

Open-source Python library (with commercial extensions) and CLI for running Data Contracts in your pipelines.

Data engineers integrating Soda into custom workflows.

Full control over orchestration
In-memory data support
Contract verification

No observability features. Required for in-memory sources (e.g., Spark, DataFrames). Data source connections managed at the environment level.

Soda-hosted Agent

Fully-managed Soda Agent, hosted by Soda.

Teams seeking a simple, managed solution for data quality.

Centralized data source access
No setup required
Observability features enabled

Enables users to create, test, execute, and schedule contracts and checks directly from the Soda Cloud UI.

Required for observability features. Cannot scan in-memory sources like Spark or DataFrames.

Self-hosted Agent

Same as Soda-hosted Agent, but deployed and managed in your own Kubernetes environment.

Teams needing full control over infrastructure and deployment.

Similar to Soda-hosted Agent, but deployed within the customer’s environment; data stays within your network.

Full control over deployment
Integration with secrets managers
Customization to meet your organization’s specific requirements

Required for observability features. Cannot scan in-memory sources like Spark or DataFrames. Kubernetes expertise required.

Deployment Models in Detail

Soda Core

Soda Core is an open-source Python library and CLI that allows you to embed Soda directly in your data pipelines. You can orchestrate scans using your preferred orchestration tools or pipelines, and execute them within your own infrastructure.

Key points:

Ideal for teams who want full control over scan orchestration and execution.
Data source connections are configured and managed at the environment level.
Required for working with in-memory data sources like Spark and Pandas DataFrames.

Get started with Soda Core Install and Configure

Soda-hosted Agent

Soda-hosted is a fully-managed deployment of the Soda Agent, hosted by Soda in our infrastructure. It allows you to connect to your data sources and manage data quality directly from the Soda Cloud UI without any infrastructure setup on your end.

Key points:

No setup or management required. Soda handles deployment and scaling.
Data source connections are centralized in Soda Cloud, and users can leverage the Soda Agent to execute scans across those data sources.
Enable observability features in Soda Cloud, such as profiling, metric monitoring, and anomaly detection.
Enables users to create, test, execute, and schedule contracts and checks directly from the Soda Cloud UI.

Onboard your datasets in Soda Cloud with Soda-hosted agent: Onboard datasets on Soda Cloud

Soda Agent

The Self-hosted Agent offers the same capabilities as the Soda-hosted Agent, but it is deployed and managed by your team within your own Kubernetes environment (e.g., AWS, GCP, Azure). This model provides full control over deployment, infrastructure, and security, while enabling the same centralized data source access and Soda Cloud integration for scans, contract execution, and observability features.

Learn how to deploy the Self-hosted Soda Agent: Deploy Soda Agent

Onboard your datasets in Soda Cloud with self-hosted agent: Onboard datasets on Soda Cloud

PreviousDataset Attributes & Responsibilities NextDeploy Soda Agent

Last updated 7 days ago

Was this helpful?

This guide provides an overview of the three main deployment models: Soda Core, Soda-hosted Agent, and Self-hosted Agent, to help you choose the right setup for your organization.

Overview of Deployment Models

Deployment Model

Description

Ideal For

Key Features

Considerations

Soda Core

Open-source Python library (with commercial extensions) and CLI for running Data Contracts in your pipelines.

Data engineers integrating Soda into custom workflows.

Full control over orchestration
In-memory data support
Contract verification

No observability features. Required for in-memory sources (e.g., Spark, DataFrames). Data source connections managed at the environment level.

Soda-hosted Agent

Fully-managed Soda Agent, hosted by Soda.

Teams seeking a simple, managed solution for data quality.

Centralized data source access
No setup required
Observability features enabled

Enables users to create, test, execute, and schedule contracts and checks directly from the Soda Cloud UI.

Required for observability features. Cannot scan in-memory sources like Spark or DataFrames.

Self-hosted Agent

Same as Soda-hosted Agent, but deployed and managed in your own Kubernetes environment.

Teams needing full control over infrastructure and deployment.

Similar to Soda-hosted Agent, but deployed within the customer’s environment; data stays within your network.

Full control over deployment
Integration with secrets managers
Customization to meet your organization’s specific requirements

Required for observability features. Cannot scan in-memory sources like Spark or DataFrames. Kubernetes expertise required.

Deployment Models in Detail

Soda Core

Key points:

Ideal for teams who want full control over scan orchestration and execution.
Data source connections are configured and managed at the environment level.
Required for working with in-memory data sources like Spark and Pandas DataFrames.

Get started with Soda Core Install and Configure

Soda-hosted Agent

Key points:

No setup or management required. Soda handles deployment and scaling.
Data source connections are centralized in Soda Cloud, and users can leverage the Soda Agent to execute scans across those data sources.
Enable observability features in Soda Cloud, such as profiling, metric monitoring, and anomaly detection.
Enables users to create, test, execute, and schedule contracts and checks directly from the Soda Cloud UI.

Onboard your datasets in Soda Cloud with Soda-hosted agent: Onboard datasets on Soda Cloud

Soda Agent

Learn how to deploy the Self-hosted Soda Agent: Deploy Soda Agent

Onboard your datasets in Soda Cloud with self-hosted agent: Onboard datasets on Soda Cloud