Managing roles for PostgreSQL with Vault on Kubernetes

9
79
0
40 min read

Vault has a database secret engine with a PostgreSQL driver that helps to create short-lived roles with random passwords for your database applications, but putting everything in production is not as simple as it seems.

Context

Classic password management (static and rarely rotated) is not suitable for the cloud or for modern software development. It is so common to unwillingly leave secrets and credentials behind during tests and deployments, that a whole family of software is dedicated to just scan discs and other repositories :

  1. to try to find the leaks before the damage is done, or
  2. to shorten the opportunity window an exploit can be attempted with an already leaked secret.

We can surely encrypt every single secrets inside repositories, and decrypt on deploy (ex: sealedsecrets), but it doesn’t really solve the problem of leaving secrets behind (did I really encrypt all secrets ?), or secret rotation.

A far better approach is to never use secrets in your codebase anymore. For that, we need to centralize the generation, delivery, rotation of credentials and follow a secure, on-demand and revokable procedure preceded by an authn/authz workflow. This centralized service should of course be redundant to not introduce a single point of failure.

Vault was made to tackle those really technical challenges, to unify common cryptography tasks under the same umbrella, and to offer them as simple services.

The big picture

Vault creating PostgreSQL roles on demand

This figure roughly shows what happens when an application requests a secure database role to Vault, and how everything fits together (use the arrows to advance the workflow).

  1. When the application is scheduled, a JWT token corresponding to the service associated with the application, is injected into the pod by the Service Account admission controller using a tmpfs volume mount :

  2. The configuration service starts and uses that token to log in as a specified role name using k8s authentication. Vault verifies the JWT token signature and its claims (namespace and serviceaccount) against the role definition.

  3. Vault returns a short-lived token that authenticates the role and which should be used to access further Vault resources.

  4. The application uses that token to ask for a database role. Vault checks that the role (associated with the login token) has access to the database role generation path.

  5. Vault creates a new short-lived (1h) database role and password within the PostgreSQL server,

  6. and waits for the confirmation that the role has been successfully created.

  7. It then returns the created credentials to the configuration service with information about its validity.

  8. Configuration files are generated, the dependent services are started/signaled, and the renewal process is scheduled before the expiration time.

  9. The application starts and uses the credentials injected in its configuration file.

  10. The application receives database data.

  11. Vault delete expired roles.

Steps 2 to 7 are repeated during renewal process.

Dynamic roles

Vault database secrets engine works in 2 modes:

  • Static roles: where only the password is changing,
  • Dynamic roles: both the username (which follows a scheme) and the password are changing.

Static roles are easier to operate, but dynamic roles offers the possibility to differentiate multiple access from replicas. Each pod has its own username/password to access the database, making the service availability resilient to revocation as long as at least one pod keeps its access.

Static roles are well covered in tutorials on the web, but I only found sparse and scarce resources on Dynamic roles, especially about the challenges they create over privileges sharing and ownership. It is the main focus point of this article.

Setup

PostgreSQL setup is focused on :

  1. creating the database and the roles (shared and admin),
  2. and creating the triggers to resolve the ownership problem.

Vault setup is mainly focused on :

  1. activating needed functionalities,
  2. configuring the database secret engine,
  3. and defining roles and policies that authorize the use of Vault functionalities.

Finally, Kubernetes setup is limited to checking that the manifests match the Vault configuration.

Application setup is not covered here because each one is different, although it is generally just a matter of parametrizing a configuration service that can talk to Vault and that will forward the credentials.

Both Vault and PostgreSQL use the term role, and it can be confusing. Keep in mind the context in which it is used and refer to the workflow graph to build your mental model.

PostgreSQL

There is mainly 2 difficulties when using short-lived randomly generated roles:

  • As we can have multiple accesses from multiple temporary accounts, we need a way to share authorization on the database (group),

  • We also need to transfer ownership of all database objects created by a temporary role to the group otherwise access by other temporary accounts (pods) can be denied, and the removal of expired accounts will not be allowed by PostgreSQL.

Let>s create a database with an associated administrator role that will be used as a group for generated roles

# postgresql has no group concept, but we can grant a role to another role sudo -u postgres psql <<EOF CREATE DATABASE db1; REVOKE ALL ON SCHEMA public FROM PUBLIC; REVOKE ALL ON DATABASE db1 FROM PUBLIC; CREATE ROLE db1_grp; GRANT CONNECT ON DATABASE db1 TO db1_grp; GRANT ALL PRIVILEGES ON SCHEMA public TO db1_grp; GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO db1_grp; GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO db1_grp; GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public TO db1_grp; ALTER DATABASE db1 OWNER TO db1_grp; EOF

We also need a superuser account that will be managed (rotated) by Vault to create new database roles.

# don't be afraid of dummy password anymore sudo -u postgres psql <<EOF CREATE ROLE admin_db1 WITH LOGIN SUPERUSER PASSWORD 'password'; EOF

If a temporary role is used to alter the schema (for instance, during on upgrade or a plugin installation), we must immediately change the owner of the created objects to avoid race conditions. For that effect, we use some triggers on table, function, schema and sequence creation.

sudo -u postgres psql -d db1 <<EOF CREATE OR REPLACE FUNCTION trg_create_set_owner() RETURNS event_trigger LANGUAGE plpgsql AS \$\$ DECLARE obj record; BEGIN FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands() LOOP IF obj.schema_name IN ('public') THEN IF obj.command_tag IN ('CREATE TABLE', 'CREATE FUNCTION', 'CREATE SCHEMA') THEN EXECUTE format('ALTER %s %s OWNER TO db1_grp', substring(obj.command_tag from 8), obj.object_identity); ELSIF obj.command_tag = 'CREATE SEQUENCE' AND NOT EXISTS(SELECT s.relname FROM pg_class s JOIN pg_depend d ON d.objid = s.oid WHERE s.relkind = 'S' AND d.deptype='a' and s.relname = split_part(obj.object_identity, '.', 2)) THEN EXECUTE format('ALTER SEQUENCE %s OWNER TO db1_grp', obj.object_identity); END IF; END IF; END LOOP; END; \$\$; CREATE EVENT TRIGGER trg_create_set_owner ON ddl_command_end WHEN tag IN ('CREATE TABLE', 'CREATE FUNCTION', 'CREATE SCHEMA', 'CREATE SEQUENCE') EXECUTE PROCEDURE trg_create_set_owner(); EOF

To avoid repetitions, you can define a template database with the triggers activated, you can derive new database from.

Vault

We need to activate the Kubernetes authentication method and the database secrets engine :

# activate kubernetes authentication under the path /auth/kubernetes/ vault auth enable kubernetes # activate database secret engine under the path /database/ vault secrets enable database

Next we configure a database secret path which will be linked to a db1 database in the postgres.db server on port 5432. We use the admin_db1 role created above. password and username are injected into the connection_url with placeholders.

# create a configuration for our database db1 vault write database/config/db1 \ plugin_name=postgresql-database-plugin \ allowed_roles=db1 \ connection_url=postgresql://{{username}}:{{password}}@postgres.db:5432/db1 \ max_open_connections=5 \ max_connection_lifetime=5s \ username=admin_db1 \ password=password

Now that we have defined a database configuration we ask Vault to immediately rotate the admin password and manage it from there.

# rotate the dummy password vault write -force database/rotate-root/db1

We can now create the vault database role mentioned above under the allowed_roles key. This role contains SQL directives with placeholders to create and drop the database role as well as other parameters like the time to live (ttl).

On creation, we give the role the group privileges. When we drop the role, we reassign all created object to the group (also this should normally not happen because of the triggers).

# this specify how the database role is created vault write database/roles/db1 \ db_name=db1 \ creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; \ ALTER DEFAULT PRIVILEGES FOR ROLE \"{{name}}\" IN SCHEMA public \ GRANT ALL PRIVILEGES ON TABLES TO db1_grp; \ ALTER DEFAULT PRIVILEGES FOR ROLE \"{{name}}\" IN SCHEMA public \ GRANT ALL PRIVILEGES ON SEQUENCES TO db1_grp; \ ALTER DEFAULT PRIVILEGES FOR ROLE \"{{name}}\" IN SCHEMA public \ GRANT ALL PRIVILEGES ON FUNCTIONS TO db1_grp; \ GRANT db1_grp TO \"{{name}}\";" \ revocation_statements="REASSIGN OWNED BY \"{{name}}\" TO \"db1_grp\"; \ DROP OWNED BY \"{{name}}\"; \ DROP ROLE \"{{name}}\";" \ default_ttl=1h \ max_ttl=24h

Now we can define the Vault role and its related policy :

# As a convention, name the role and policy after the namespace and service namespace=public service=application # The role is bounded to a specific service in a specific namespace. These are checked # against the JWT token claims that the configuration service is sending to log in. vault write auth/kubernetes/role/$namespace-$service \ bound_service_account_names=$service \ bound_service_account_namespaces=$namespace \ policies=$namespace-$service ttl=1h # The policy only allows the role to ask for new db1 credentials vault policy write $namespace-$service - <<EOF path "database/creds/db1" { capabilities = ["read"] } EOF

Kubernetes

On the Kubernetes side, we should verify that the service name and the namespace (application, public) used in our application manifests match the ones in the Vault role definition :

--- apiVersion: v1 kind: ServiceAccount metadata: name: application namespace: public labels: app: application --- apiVersion: v1 kind: Service metadata: name: application namespace: public labels: app: application spec: ports: - name: application protocol: TCP port: 443 targetPort: 8443 selector: app: application --- apiVersion: apps/v1 kind: Deployment metadata: name: application namespace: public labels: app: application spec: replicas: 1 selector: matchLabels: app: application template: metadata: labels: app: application spec: # this is what tie up the pod to a service account serviceAccountName: application containers: - name: application image: "reg.domain.my/containers/application:0.6.0" imagePullPolicy: IfNotPresent env: - name: VAULT_URL value: https://vault.kube-system.svc:8200/v1 - name: NAMESPACE valueFrom: fieldRef: fieldPath: metadata.namespace ports: - name: application containerPort: 8443

How to use it

You can easily test that everything works as expected by using the HTTP API inside your application container. You only need to install curl and jq :

kubectl exec -it -n public application-xxxx -- sh # if you use alpine > sudo apk add curl jq

First we test that we can successfully log in with the Vault role db1 using the JWT service account token :

# we connect to vault using https. Its certificate is emited by the same # authority than the JWT token VAULT_TOKEN=$(curl \ --silent \ --request POST \ --data "{\"role\": \"db1\", \"jwt\": \"$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\"}" \ --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \ https://vault.kube-system.svc:8200/v1/auth/kubernetes/login \ | jq -r '.auth.client_token')

Then we test that this Vault role is allowed to generate database credentials by using the path /database/creds/db1 with db1 being the Vault database role :

curl \ --silent \ --header "X-Vault-Token: $VAULT_TOKEN" \ --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \ https://vault.kube-system.svc:8200/v1/database/creds/db1 | jq
{ "request_id": "94d452e8-c464-8539-2697-2b492b507a7f", "lease_id": "database/creds/db1/3uigqkeKAGDjQ9akJXo5jDnW", "renewable": true, "lease_duration": 3600, "data": { "password": "7Mq56fVT3KnvCAmQSF-B", "username": "v-kubernet-db1-vX5pY322GqFXRArv7gnL-1685970930" }, "wrap_info": null, "warnings": null, "auth": null }

As you can see, having a hundred megabytes Vault agent as a sidecar to refresh secrets in a shared volume is a little overkill when all you have to do is to call 2 API points. No surprise there that it doesn’t scale well, and that HashiCorp is exploring other means.

Although it should be easy to script the renewal of secrets, I preferred to use rconfd to avoid any dependency on external tools. Developed in Rust, it is fast, light on resources’ consumption, memory and thread safe, embeds a (pretty fast) jsonnet interpreter to generate configuration files with secrets, as well as a Vault client.

By using a side process with a process supervision, we have something that is more scalable and simpler to operate than the classical sidecar + admission controller solution. An alpine golden image makes the whole solution easily composable, and streamlines the integration of theses 2 components in all images that need short-lived secrets.

What we have so far

We have an application accessing a database with random roles and passwords (and eventually restricted privileges) that are rotated every hour and deleted after expiration.

Nothing but Vault has access to the database administrator password.

Only connections coming from a defined Kubernetes service and namespace are authorized.

We can identify which pod is associated to a given database role and react accordingly to problems without disrupting the other pods or incurring downtime.

Éric BURGHARD


Related posts

Development

alpinetutorial

Building an alpine golden image

How to build an alpine image to base all your containers on.

11
79
0
27 min read

Operations

k8sflatcartutorial

Intalling Kubernetes with cri-o inside flatcar Container Linux

How to run containers without dockershim / containerd by installing cri-o with crun under flatcar Container Linux.

12
79
0
47 min read

Development

alpinetutorial

Building / consuming alpine Linux packages inside containers and images

How to build alpine Linux packages you can later install inside other alpine based containers or images

15
79
0
26 min read