Currently, we use both Python's built-in datetime library and Pendulum
to do datetime operations. For the sake of consistency, we should try to
stick to one library for datetimes. We could have used either, but
Pendulum has a more ergonomic API, so I decided to go with it when
possible.
The places where were we didn't / couldn't replace datetime are:
- checking instances of datetimes. Pendulum's objects are subclasses of
python native datetime objects, so it's still useful to import
datetime in those cases of using is_instance()
- WTForms date validators expect datetime style string formats --
Pendulum has its own format for formatting/ parsing strings. As such,
our custom validator DateRange needs to use datetime.stptime() to
account for this format.
This change allows the newly made database user to apply migrations.
It also includes a very Azure-specific change. Say we have an Azure
Postgres database user "root", which is the user making the database
connections for this script, and it is creating an "atat" user/role.
That root user will be a member of the azure_pg_admin group. In order
for root to change the ownership of the tables in the database to
atat, it needs to have membership in the atat role. To achieve this we
grant azure_pg_admin the atat role.
- Transition to VMSS identity for flexvol
- Update some environment variables for cloudzero dev
- Overlay for applying migrations
- Updates to disable CDN, which will not be available
- Removes CronJob for resetting the database; don't need that in this
cluster for now.
Creating the ATAT database requires a separate connection to one of the
default Postgres databases, like `postgres`. This updates the scripts
and secrets-tool command to handle creating the database. It also
removes database creation from Terraform and updates the documentation.
This additional secrets-tool command can be used to run the database
bootsrapping script (`script/database_setup.py`) inside an ATAT docker
container against the Azure database. It sources the necessary keys from
Key Vault.
This script is for bootstrapping the initial database. It can be run via
a container, but requires that a Postgres superuser's credentials be
provided via our normal config. That way the superuser can provision a
less-privileged user for the application's database connection.
Move cloud.py to a module init. Move policy with it. Update related unit tests. Also adds a patch to state machine test to prevent randomness in mock from failing test.
We do not have the bandwidth to keep the Minikube deployment up-to-date,
so rather than leave half-baked config in the repo we'll remove it for
now. Complications that would have to be resolved for running Minikube
locally include managing secrets out of Azure Key Vault and managing TLS
termination over localhost.
The Synack audit also identified the Minikube basic auth password as an
issue; it's only for demo purposes, but this will resolve that ticket.
This removes all error-catching from the test scripts. If unit tests
fail, the script will exit immediately. The error catching functionality
was not working correctly using the sh shell in Alpine inside the
containers, and so CI was allowed to continue after test failures.
We use curl in our integration test script to make sure the application container is
available before moving on. We expect many connection errors and don't care
about the output of curl, so this will just swallow all of the output.
Right now, unit test failures in script/cibuild are not being emitted
correctly. Instead, we'll just `set -e` at the top of the CI script so
that failures are fast and obvious.
Eventually, this should replace the CircleCI config for running the
integration tests to avoid duplication. In the interest of time so that
I don't have to debug broken builds, I'm only adding it as a utility
script.
This updates the script for resetting the database so that it drops and
recreates all the tables, instead of disabling Postgres triggers and
truncating most of the tables. The latter strategy requires superuser
permissions in Postgres that the db user we manage in Azure does not
have. The script now:
- drops the tables
- reruns the alembic migrations
- reseeds the permission sets
Renamed this script because it's current name is misleading. It does not
just remove sample data; it truncates every table except the alembic
version table and `permission_sets`.
This generalizes the deploy step into a configurable CircleCI command.
The available parameters are:
- `namespace`: the K8s namespace to alter
- `tag`: the docker tag to apply to the image
The script for applying migrations to the K8s environment and the
corresponding K8s Job config have been generalized so that they can be
configured to run in the specified namespace.
The main workflow has been updated so that the appropriate deployment
will happen, depending on whether we are merging to staging or master.
In the future, we could look to add an additional workflow based around
Git tags for production.
Note that this also removes the creation of the `latest` tag from CD.
That tag is no longer hard-coded into our K8s config and so there's no
longer a need to update it in our container registry.
The CircleCI Orbs were useful for getting started, but now that we only
have to deploy to one provider our pipeline should be tailored to
efficiently push to just that environment. This inlines all the relevant
pieces from the Orbs we were relying on as bash/sh commands instead.
This builds the Docker images upfront. Since we have a multi-stage
Dockerfile, it builds the first stage as a separate image and then
proceeds to build the complete image. This is done so that the first
stage (called "builder") can be used for testing. It retains executables
like pipenv that we need to install development dependencies needed for
tests.
Other notes:
- CircleCI does not persist Docker images between jobs. As a
work-around, we use the CircleCI caching mechanism to create a named
cache with *.tar copies of the images. Subsequent jobs use the cache
and load the images.
- Both the test and integration-tests jobs need to make minor
modifications to the container to run correctly. The test job needs to
install the development Python dependencies, and the integration-tests
job needs to rebuild the JS bundle so that it uses the mock uploader
(the container is build to use the Azure uploader by default).
- The test and integration-tests jobs run in parallel.
- This adjusts the Dockerfile so that the TZ environment variable is set
for both stages of the build.
Because I pushed the environment variable changes to the cluster
already, psycopg2 was automatically trying to connect to the database
using the file specified in PGSSLROOTCERT. That ConfigMap was not
mounted into the migrations container, so I'm doing that here.
The Minikube version of the cluster has some differences from the main
config (noted in the README) but will be useful for for future DevOps
development.
Add a command to the test script to output up-to-date Vue component
templates. Most of the Vue component tests rely on HTML templates built
from Jinja.
The beat schedule is set to once per minute for each of the three
environment provisioning tasks.
Adding a beat schedule surfaced two problems that are addressed here
with the following changes:
- Commit the SQLALchemy session in order to release the environment
lock. Otherwise the change to the `claimed_until` field is not
persisted.
- Set `none_as_null` on the JSOB fields on the `Environment`. This
avoids problems with querying on Postgres JSON fields that are empty.
This also adds a small change to the development command for the Celery
worker. Multiple child processes were executing the beat jobs, which
lead to exceptions for environment locks and confusing log output. This
contrains the dev command to a single Celery worker.
The Kubernetes CronJob for syncing CRLs syncs them to a temporary folder
and then copies them to the real location once the sync is complete. If
the temporary folder is empty, the `cp` command throws an error. This
updates the bash script that manages the sync so that it will skip the
copy command if the temporary location is empty.
Celery provides a more robust set of queueing options for both tasks and
worker processes. Updates include:
- infrastructure necessary to run Celery, including celery entrypoint
- backgrounded functions are now imported directly from atst.jobs
- update tests as-needed
- update kubernetes worker pod command