Managing roles for PostgreSQL with Vault on Kubernetes
Vault has a database secret engine with a PostgreSQL driver that helps to create short-lived roles with random passwords for your database applications, but putting everything in production is not as simple as it seems.
Context
Classic password management (static and rarely rotated) is not suitable for the cloud or for modern software development. It is so common to unwillingly leave secrets and credentials behind during tests and deployments, that a whole family of software is dedicated to just scan discs and other repositories :
- to try to find the leaks before the damage is done, or
- to shorten the opportunity window an exploit can be attempted with an already leaked secret.
We can surely encrypt every single secrets inside repositories, and decrypt on deploy (ex: sealedsecrets), but it doesn’t really solve the problem of leaving secrets behind (did I really encrypt all secrets ?), or secret rotation.
A far better approach is to never use secrets in your codebase anymore. For that, we need to centralize the generation, delivery, rotation of credentials and follow a secure, on-demand and revokable procedure preceded by an authn/authz workflow. This centralized service should of course be redundant to not introduce a single point of failure.
Vault was made to tackle those really technical challenges, to unify common cryptography tasks under the same umbrella, and to offer them as simple services.
The big picture
This figure roughly shows what happens when an application requests a secure database role to Vault, and how everything fits together (use the arrows to advance the workflow).
When the application is scheduled, a JWT token corresponding to the service associated with the application, is injected into the pod by the Service Account admission controller using a
tmpfs
volume mount :The configuration service starts and uses that token to log in as a specified role name using k8s authentication. Vault verifies the JWT token signature and its claims (
namespace
andserviceaccount
) against the role definition.Vault returns a short-lived token that authenticates the role and which should be used to access further Vault resources.
The application uses that token to ask for a database role. Vault checks that the role (associated with the login token) has access to the database role generation path.
Vault creates a new short-lived (1h) database role and password within the PostgreSQL server,
and waits for the confirmation that the role has been successfully created.
It then returns the created credentials to the configuration service with information about its validity.
Configuration files are generated, the dependent services are started/signaled, and the renewal process is scheduled before the expiration time.
The application starts and uses the credentials injected in its configuration file.
The application receives database data.
Vault delete expired roles.
Steps 2 to 7 are repeated during renewal process.
Dynamic roles
Vault database secrets engine works in 2 modes:
- Static roles: where only the password is changing,
- Dynamic roles: both the username (which follows a scheme) and the password are changing.
Static roles are easier to operate, but dynamic roles offers the possibility to differentiate multiple access from replicas. Each pod has its own username/password to access the database, making the service availability resilient to revocation as long as at least one pod keeps its access.
Static roles are well covered in tutorials on the web, but I only found sparse and scarce resources on Dynamic roles, especially about the challenges they create over privileges sharing and ownership. It is the main focus point of this article.
Setup
PostgreSQL setup is focused on :
- creating the database and the roles (shared and admin),
- and creating the triggers to resolve the ownership problem.
Vault setup is mainly focused on :
- activating needed functionalities,
- configuring the database secret engine,
- and defining roles and policies that authorize the use of Vault functionalities.
Finally, Kubernetes setup is limited to checking that the manifests match the Vault configuration.
Application setup is not covered here because each one is different, although it is generally just a matter of parametrizing a configuration service that can talk to Vault and that will forward the credentials.
Both Vault and PostgreSQL use the term role
, and it can be confusing. Keep in mind the context
in which it is used and refer to the workflow graph to build your mental model.
PostgreSQL
There is mainly 2 difficulties when using short-lived randomly generated roles:
As we can have multiple accesses from multiple temporary accounts, we need a way to share authorization on the database (group),
We also need to transfer ownership of all database objects created by a temporary role to the group otherwise access by other temporary accounts (pods) can be denied, and the removal of expired accounts will not be allowed by PostgreSQL.
Let>s create a database with an associated administrator role that will be used as a group for generated roles
# postgresql has no group concept, but we can grant a role to another role
sudo -u postgres psql <<EOF
CREATE DATABASE db1;
REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON DATABASE db1 FROM PUBLIC;
CREATE ROLE db1_grp;
GRANT CONNECT ON DATABASE db1 TO db1_grp;
GRANT ALL PRIVILEGES ON SCHEMA public TO db1_grp;
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO db1_grp;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO db1_grp;
GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA public TO db1_grp;
ALTER DATABASE db1 OWNER TO db1_grp;
EOF
We also need a superuser account that will be managed (rotated) by Vault to create new database roles.
# don't be afraid of dummy password anymore
sudo -u postgres psql <<EOF
CREATE ROLE admin_db1 WITH LOGIN SUPERUSER PASSWORD 'password';
EOF
If a temporary role is used to alter the schema (for instance, during on upgrade or a plugin installation), we must immediately change the owner of the created objects to avoid race conditions. For that effect, we use some triggers on table, function, schema and sequence creation.
sudo -u postgres psql -d db1 <<EOF
CREATE OR REPLACE FUNCTION trg_create_set_owner()
RETURNS event_trigger
LANGUAGE plpgsql
AS \$\$
DECLARE
obj record;
BEGIN
FOR obj IN SELECT * FROM pg_event_trigger_ddl_commands() LOOP
IF obj.schema_name IN ('public') THEN
IF obj.command_tag IN ('CREATE TABLE', 'CREATE FUNCTION', 'CREATE SCHEMA') THEN
EXECUTE format('ALTER %s %s OWNER TO db1_grp', substring(obj.command_tag from 8), obj.object_identity);
ELSIF obj.command_tag = 'CREATE SEQUENCE' AND NOT EXISTS(SELECT s.relname FROM pg_class s JOIN pg_depend d ON d.objid = s.oid WHERE s.relkind = 'S' AND d.deptype='a' and s.relname = split_part(obj.object_identity, '.', 2)) THEN
EXECUTE format('ALTER SEQUENCE %s OWNER TO db1_grp', obj.object_identity);
END IF;
END IF;
END LOOP;
END;
\$\$;
CREATE EVENT TRIGGER trg_create_set_owner
ON ddl_command_end
WHEN tag IN ('CREATE TABLE', 'CREATE FUNCTION', 'CREATE SCHEMA', 'CREATE SEQUENCE')
EXECUTE PROCEDURE trg_create_set_owner();
EOF
To avoid repetitions, you can define a template database with the triggers activated, you can derive new database from.
Vault
We need to activate the Kubernetes authentication method and the database secrets engine :
# activate kubernetes authentication under the path /auth/kubernetes/
vault auth enable kubernetes
# activate database secret engine under the path /database/
vault secrets enable database
Next we configure a database secret path which will be linked to a db1
database in the
postgres.db
server on port 5432
. We use the admin_db1
role created above. password
and username
are injected into the connection_url
with placeholders.
# create a configuration for our database db1
vault write database/config/db1 \
plugin_name=postgresql-database-plugin \
allowed_roles=db1 \
connection_url=postgresql://{{username}}:{{password}}@postgres.db:5432/db1 \
max_open_connections=5 \
max_connection_lifetime=5s \
username=admin_db1 \
password=password
Now that we have defined a database configuration we ask Vault to immediately rotate the admin password and manage it from there.
# rotate the dummy password
vault write -force database/rotate-root/db1
We can now create the vault database role mentioned above under the allowed_roles
key. This role
contains SQL directives with placeholders to create and drop the database role as well as other
parameters like the time to live (ttl).
On creation, we give the role the group privileges. When we drop the role, we reassign all created object to the group (also this should normally not happen because of the triggers).
# this specify how the database role is created
vault write database/roles/db1 \
db_name=db1 \
creation_statements="CREATE ROLE \"{{name}}\" WITH LOGIN PASSWORD '{{password}}' VALID UNTIL '{{expiration}}'; \
ALTER DEFAULT PRIVILEGES FOR ROLE \"{{name}}\" IN SCHEMA public \
GRANT ALL PRIVILEGES ON TABLES TO db1_grp; \
ALTER DEFAULT PRIVILEGES FOR ROLE \"{{name}}\" IN SCHEMA public \
GRANT ALL PRIVILEGES ON SEQUENCES TO db1_grp; \
ALTER DEFAULT PRIVILEGES FOR ROLE \"{{name}}\" IN SCHEMA public \
GRANT ALL PRIVILEGES ON FUNCTIONS TO db1_grp; \
GRANT db1_grp TO \"{{name}}\";" \
revocation_statements="REASSIGN OWNED BY \"{{name}}\" TO \"db1_grp\"; \
DROP OWNED BY \"{{name}}\"; \
DROP ROLE \"{{name}}\";" \
default_ttl=1h \
max_ttl=24h
Now we can define the Vault role and its related policy :
# As a convention, name the role and policy after the namespace and service
namespace=public
service=application
# The role is bounded to a specific service in a specific namespace. These are checked
# against the JWT token claims that the configuration service is sending to log in.
vault write auth/kubernetes/role/$namespace-$service \
bound_service_account_names=$service \
bound_service_account_namespaces=$namespace \
policies=$namespace-$service ttl=1h
# The policy only allows the role to ask for new db1 credentials
vault policy write $namespace-$service - <<EOF
path "database/creds/db1" {
capabilities = ["read"]
}
EOF
Kubernetes
On the Kubernetes side, we should verify that the service name and the namespace (application
,
public
) used in our application manifests match the ones in the Vault role definition :
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: application
namespace: public
labels:
app: application
---
apiVersion: v1
kind: Service
metadata:
name: application
namespace: public
labels:
app: application
spec:
ports:
- name: application
protocol: TCP
port: 443
targetPort: 8443
selector:
app: application
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: application
namespace: public
labels:
app: application
spec:
replicas: 1
selector:
matchLabels:
app: application
template:
metadata:
labels:
app: application
spec:
# this is what tie up the pod to a service account
serviceAccountName: application
containers:
- name: application
image: "reg.domain.my/containers/application:0.6.0"
imagePullPolicy: IfNotPresent
env:
- name: VAULT_URL
value: https://vault.kube-system.svc:8200/v1
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
ports:
- name: application
containerPort: 8443
How to use it
You can easily test that everything works as expected by using the
HTTP API inside your application container.
You only need to install curl
and jq
:
kubectl exec -it -n public application-xxxx -- sh
# if you use alpine
> sudo apk add curl jq
First we test that we can successfully log
in with the Vault role
db1
using the JWT service account token
:
# we connect to vault using https. Its certificate is emited by the same
# authority than the JWT token
VAULT_TOKEN=$(curl \
--silent \
--request POST \
--data "{\"role\": \"db1\", \"jwt\": \"$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)\"}" \
--cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
https://vault.kube-system.svc:8200/v1/auth/kubernetes/login \
| jq -r '.auth.client_token')
Then we test that this Vault role is allowed to generate database
credentials
by using the path /database/creds/db1
with db1
being the Vault database role :
curl \
--silent \
--header "X-Vault-Token: $VAULT_TOKEN" \
--cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
https://vault.kube-system.svc:8200/v1/database/creds/db1 | jq
{
"request_id": "94d452e8-c464-8539-2697-2b492b507a7f",
"lease_id": "database/creds/db1/3uigqkeKAGDjQ9akJXo5jDnW",
"renewable": true,
"lease_duration": 3600,
"data": {
"password": "7Mq56fVT3KnvCAmQSF-B",
"username": "v-kubernet-db1-vX5pY322GqFXRArv7gnL-1685970930"
},
"wrap_info": null,
"warnings": null,
"auth": null
}
As you can see, having a hundred megabytes Vault agent as a sidecar to refresh secrets in a shared volume is a little overkill when all you have to do is to call 2 API points. No surprise there that it doesn’t scale well, and that HashiCorp is exploring other means.
Although it should be easy to script the renewal of secrets, I preferred to use rconfd to avoid any dependency on external tools. Developed in Rust, it is fast, light on resources’ consumption, memory and thread safe, embeds a (pretty fast) jsonnet interpreter to generate configuration files with secrets, as well as a Vault client.
By using a side process with a process supervision, we have something that is more scalable and simpler to operate than the classical sidecar + admission controller solution. An alpine golden image makes the whole solution easily composable, and streamlines the integration of theses 2 components in all images that need short-lived secrets.
What we have so far
We have an application accessing a database with random roles and passwords (and eventually restricted privileges) that are rotated every hour and deleted after expiration.
Nothing but Vault has access to the database administrator password.
Only connections coming from a defined Kubernetes service and namespace are authorized.
We can identify which pod is associated to a given database role and react accordingly to problems without disrupting the other pods or incurring downtime.
Éric BURGHARD
Related posts
Building an alpine golden image
How to build an alpine image to base all your containers on.
Intalling Kubernetes with cri-o inside flatcar Container Linux
How to run containers without dockershim / containerd by installing cri-o with crun under flatcar Container Linux.
Building / consuming alpine Linux packages inside containers and images
How to build alpine Linux packages you can later install inside other alpine based containers or images