Creating the ATAT database requires a separate connection to one of the
default Postgres databases, like `postgres`. This updates the scripts
and secrets-tool command to handle creating the database. It also
removes database creation from Terraform and updates the documentation.
This additional secrets-tool command can be used to run the database
bootsrapping script (`script/database_setup.py`) inside an ATAT docker
container against the Azure database. It sources the necessary keys from
Key Vault.
This script is for bootstrapping the initial database. It can be run via
a container, but requires that a Postgres superuser's credentials be
provided via our normal config. That way the superuser can provision a
less-privileged user for the application's database connection.
Move cloud.py to a module init. Move policy with it. Update related unit tests. Also adds a patch to state machine test to prevent randomness in mock from failing test.
We do not have the bandwidth to keep the Minikube deployment up-to-date,
so rather than leave half-baked config in the repo we'll remove it for
now. Complications that would have to be resolved for running Minikube
locally include managing secrets out of Azure Key Vault and managing TLS
termination over localhost.
The Synack audit also identified the Minikube basic auth password as an
issue; it's only for demo purposes, but this will resolve that ticket.
This removes all error-catching from the test scripts. If unit tests
fail, the script will exit immediately. The error catching functionality
was not working correctly using the sh shell in Alpine inside the
containers, and so CI was allowed to continue after test failures.
We use curl in our integration test script to make sure the application container is
available before moving on. We expect many connection errors and don't care
about the output of curl, so this will just swallow all of the output.
Right now, unit test failures in script/cibuild are not being emitted
correctly. Instead, we'll just `set -e` at the top of the CI script so
that failures are fast and obvious.
Eventually, this should replace the CircleCI config for running the
integration tests to avoid duplication. In the interest of time so that
I don't have to debug broken builds, I'm only adding it as a utility
script.
This updates the script for resetting the database so that it drops and
recreates all the tables, instead of disabling Postgres triggers and
truncating most of the tables. The latter strategy requires superuser
permissions in Postgres that the db user we manage in Azure does not
have. The script now:
- drops the tables
- reruns the alembic migrations
- reseeds the permission sets
Renamed this script because it's current name is misleading. It does not
just remove sample data; it truncates every table except the alembic
version table and `permission_sets`.
This generalizes the deploy step into a configurable CircleCI command.
The available parameters are:
- `namespace`: the K8s namespace to alter
- `tag`: the docker tag to apply to the image
The script for applying migrations to the K8s environment and the
corresponding K8s Job config have been generalized so that they can be
configured to run in the specified namespace.
The main workflow has been updated so that the appropriate deployment
will happen, depending on whether we are merging to staging or master.
In the future, we could look to add an additional workflow based around
Git tags for production.
Note that this also removes the creation of the `latest` tag from CD.
That tag is no longer hard-coded into our K8s config and so there's no
longer a need to update it in our container registry.
The CircleCI Orbs were useful for getting started, but now that we only
have to deploy to one provider our pipeline should be tailored to
efficiently push to just that environment. This inlines all the relevant
pieces from the Orbs we were relying on as bash/sh commands instead.
This builds the Docker images upfront. Since we have a multi-stage
Dockerfile, it builds the first stage as a separate image and then
proceeds to build the complete image. This is done so that the first
stage (called "builder") can be used for testing. It retains executables
like pipenv that we need to install development dependencies needed for
tests.
Other notes:
- CircleCI does not persist Docker images between jobs. As a
work-around, we use the CircleCI caching mechanism to create a named
cache with *.tar copies of the images. Subsequent jobs use the cache
and load the images.
- Both the test and integration-tests jobs need to make minor
modifications to the container to run correctly. The test job needs to
install the development Python dependencies, and the integration-tests
job needs to rebuild the JS bundle so that it uses the mock uploader
(the container is build to use the Azure uploader by default).
- The test and integration-tests jobs run in parallel.
- This adjusts the Dockerfile so that the TZ environment variable is set
for both stages of the build.
Because I pushed the environment variable changes to the cluster
already, psycopg2 was automatically trying to connect to the database
using the file specified in PGSSLROOTCERT. That ConfigMap was not
mounted into the migrations container, so I'm doing that here.
The Minikube version of the cluster has some differences from the main
config (noted in the README) but will be useful for for future DevOps
development.
Add a command to the test script to output up-to-date Vue component
templates. Most of the Vue component tests rely on HTML templates built
from Jinja.
The beat schedule is set to once per minute for each of the three
environment provisioning tasks.
Adding a beat schedule surfaced two problems that are addressed here
with the following changes:
- Commit the SQLALchemy session in order to release the environment
lock. Otherwise the change to the `claimed_until` field is not
persisted.
- Set `none_as_null` on the JSOB fields on the `Environment`. This
avoids problems with querying on Postgres JSON fields that are empty.
This also adds a small change to the development command for the Celery
worker. Multiple child processes were executing the beat jobs, which
lead to exceptions for environment locks and confusing log output. This
contrains the dev command to a single Celery worker.
The Kubernetes CronJob for syncing CRLs syncs them to a temporary folder
and then copies them to the real location once the sync is complete. If
the temporary folder is empty, the `cp` command throws an error. This
updates the bash script that manages the sync so that it will skip the
copy command if the temporary location is empty.
Celery provides a more robust set of queueing options for both tasks and
worker processes. Updates include:
- infrastructure necessary to run Celery, including celery entrypoint
- backgrounded functions are now imported directly from atst.jobs
- update tests as-needed
- update kubernetes worker pod command
`script/seed_sample.py` was creating portfolio users with no names
because it was calling `Users.get_or_create_by_dod_id` with a DOD ID as
its only argument. This updates it to pass the rest of the profile
information for the sample user.
This adds the following:
- A detect-secrets dependency and a related script
(`script/detect_secrets`) to find and alert developers to secrets
added to the code. By default, the script will search staged and new,
unstaged files. It can optionally search only staged files.
- A whitelist, `.secrets.baseline`, that tracks instances of secrets or
false positives already in the repo.
- Modifies `script/test` to detect secrets as part of the test suite.
- Updates to the README regarding the use of detect-secrets.