uWGI was generating warnings about being unable to find plugin files we
specify. To fix this, I've added uwsgi-python3 to the list of Alpine
packages we install in the container specified the plugins directory in
the uWSGI config. The updated uWSGI ConfigMap has been applied to the
staging cluster, which eliminated the warning about the logfile plugin.
The remaining warning about the python3 plugin will be eliminated once
the new container built by this branch is deployed.
I left the domains hard-coded for the redirects in our NGINX config,
which was breaking authentication for versions of the site that don't
use that domain. This updates the config to use the domains supplied via
environment variable.
This got lost somewhere along the way (almost certainly by me), so this
commit tries to make it explicit. The app needs to be able to configure
the session cookie domain name so that it is valid for both the main
site domain and the authentication subdomain. For instance, if the site
is runnning at uat.atat.code.mil and authentication happens at
auth-uat.atat.code.mil, SESSION_COOKIE_DOMAIN should be set to
atat.code.mil so that it's valid for both.
This adds the setting to the base INI file and a default for our K8s
clusters.
This does the following:
- Removes references to the atst-override.ini file, now deprecated.
- Adds all non-secret data that was managed in the override file to the
relevant K8s ConfigMaps.
- Adds additional documentation explaining out use of Key Vault for
secrets management.
This adds an additional volume mount for Flask application secrets.
These will be mounted into the ATST container so that their values can
be read in as config.
This updates the configuration handling for the Redis connection string.
The motivation is so that the Redis password can be managed separately
via Azure Key Vault and eventually be rotated independently of the rest
of the connection URI.
This also tweaks the method we use to build the DATABASE_URI and removes
some stale config from the CI config file.
The K8s CronJob that manages CRL syncing often leaves pods hanging
around for days at a time. This appears to happen when the download of a
particular CRL from DISA hangs for whatever reason. This updates the
configuration so that a running cronjob is automatically replaced by its
successor, rather than the two running concurrently. (The CRL CronJob
runs every hour, and it one has taken that long then it's hanging and
needs to be replace.) Similarly, this updates the config to only retain
one successful CRL pod, rather than the default of three.
FlexVol requires that you specify certificates as secrets in order to get both the certificate and private key in the appropriate format for nginx to consume. Additionally, flexvol shouldn't interfer with other secrets mounted in it's host directory.
This adds additional SSL/TLS config to specify the acceptable TLS
version, cipher suites, session cache, etc. Values are currently based
on the Mozilla Foundation's recommendations for intermediate
compatibility:
https://wiki.mozilla.org/Security/Server_Side_TLS
We will manage NGINX configuration snippets as a K8s ConfigMap so that
they can be included in server blocks as-needed.
This configures the NGINX container to log in JSON. It also updates the
K8s config so that we mount all of the key/value pairs available in the
atst-nginx ConfigMap as files in "/etc/nginx/conf.d" inside the
container. This simplifies the config a little.
Turns out you can't map multiple K8s resources over the same directory.
The K8s secret for the INI file and the ConfigMap for the uWSGI config
both map into /opt/atat/atst in the container. This caused errors when
the container tried to launch. Instead, we need to specify the full file
path for every file we're mapping into that directory to avoid
conflicts.
Updates the K8s config to enable extended uWSGI JSON logging again. This
commit updates the name of the ConfigMap for the uWSGI config to avoid
confusion.
This commit is the first part of consuming secrets from the Azure Key Vault. This will set up the required services to consume Azure's RBAC controls in the cluster, an identity to read the secrets, and the tool (FlexVol) to mount the secrets.
We should not run a redundant testing workflow on merges to master or
staging.
This also includes a quick fix to configure the FLASK_ENV for the main
site.
Our content security policy in non-dev environments didn't allow uploading to azure blob storage. This adds a configurable blob storage base URL to allow regions to specify which storage endpoint they expect the upload request to use.
This value is set as the Access-Control-Allow-Origin header value for
the application. When using Azure CDN, the CDN will consume this header
when it populates its cache and use it on subsequent requests.
It would be possible to make this the same as the Flask SERVER_NAME
value. We explicitly set SERVER_NAME for Celery worker processes because
they need that information to contruct URLs outside of the request cycle
(Flask can infer the server name within a request cycle). I decided not
to rely on SERVER_NAME though because it has side effects:
- It determines what `url_for` uses as the host domain (which would be
fine).
- It makes it so that the Flask app can only server requests to that
domain (probably fine, but it felt like too big a side effect).
Additionally, SERVER_NAME does not include the scheme. For all of these
reasons I opted to make CDN_ORIGIN a separate config value.
Supplying this will prevent queue clashes between various ATAT sites
sharing the same Redis instance.
Note that the Celery documentation is currently wrong about the name for
configuring this:
https://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-task_default_queue
It specifies `CELERY_TASK_DEFAULT_QUEUE`, but
`CELERY_DEFAULT_QUEUE` is the value that Celery currently looks for.
This appears to be fixed in on an upcoming release:
https://github.com/celery/celery/issues/5575
This is worth keeping an eye on, since the configuration key could
change in the future.
This is not the certificate setup we will use in production. I'd like to
merge this configuration as a reference point because this is the
easiest way to handle manual LetsEncrypt verification within the
cluster.
This allows NGINX to serve static files over HTTP from the
".well-known/acme-challenge" directory, which is necessary for certbot
validation of domain ownership.
This generalizes the deploy step into a configurable CircleCI command.
The available parameters are:
- `namespace`: the K8s namespace to alter
- `tag`: the docker tag to apply to the image
The script for applying migrations to the K8s environment and the
corresponding K8s Job config have been generalized so that they can be
configured to run in the specified namespace.
The main workflow has been updated so that the appropriate deployment
will happen, depending on whether we are merging to staging or master.
In the future, we could look to add an additional workflow based around
Git tags for production.
Note that this also removes the creation of the `latest` tag from CD.
That tag is no longer hard-coded into our K8s config and so there's no
longer a need to update it in our container registry.
Adds a [kustomize](https://github.com/kubernetes-sigs/kustomize) overlay
for a new staging environment. Additionally, adds environment variables
in the place of certain pieces of information that need to be templated.
The K8s README ("deploy/README.md") has been updated to reflect the new
method for applying config.
This commit also removes the configuration for the AWS cluster and
references to AWS in the README.
This will allow us to force SSL connections to the database in
production by setting two values:
- PGSSLMODE should be set to "verify-full". This forces the client to
verify the server against a known CA: https://www.postgresql.org/docs/10/libpq-ssl.html
- PGSSLROOTCERT should be set to the path of the public cert for the
relevant CA.
When the database connection is made, these values are passed to the
adapter. For local development, PGSSLMODE is set to "prefer" and
PGSSLROOTCERT is left unset.
Kubernetes config has been added to maintain the root CAs for both Azure
and AWS as k8s ConfigMap objects. These are mounted into the containers
and referenced by PGSSLROOTCERT in the container environment.
Because I pushed the environment variable changes to the cluster
already, psycopg2 was automatically trying to connect to the database
using the file specified in PGSSLROOTCERT. That ConfigMap was not
mounted into the migrations container, so I'm doing that here.
The Celery worker cannot render URLs for the app without having a
SERVER_NAME value set. AT-AT's ability to send notifications when an
environment is ready is broken as a result.
This commit sets a null default value for SERVER_NAME in the default
config file. A setting must exist in the INI file in order to be
over-written by an environment variable, which is why we declare it as
null here. There is an additional kwarg, "allow_no_value", that must be
passed to ConfigParser to allow null values.
This also applies the correct domains as SERVER_NAME environment
variables in the Kubernetes ConfigMaps for the AWS and Azure Celery
workers.
With this configuration, all Kubernetes logs within the ATAT cluster
will be sent to AWS CloudWatch.
Note that this requires applying an additional IAM policy to the worker
nodes' role.