Currently, we use both Python's built-in datetime library and Pendulum
to do datetime operations. For the sake of consistency, we should try to
stick to one library for datetimes. We could have used either, but
Pendulum has a more ergonomic API, so I decided to go with it when
possible.
The places where were we didn't / couldn't replace datetime are:
- checking instances of datetimes. Pendulum's objects are subclasses of
python native datetime objects, so it's still useful to import
datetime in those cases of using is_instance()
- WTForms date validators expect datetime style string formats --
Pendulum has its own format for formatting/ parsing strings. As such,
our custom validator DateRange needs to use datetime.stptime() to
account for this format.
To satisfy security requirements, we need to explicitly track:
- when a user attempts to log in, successful or not
- when a user logs out
- whether or not the user associated with a request is logged in
The first two are satisfied by extra log statements and the last is a
new boolean field on the JSON logs.
The previous solution (ad-hoc stream-parsing the CRLs to obtain their
issuers and nextUpdate) was too cute. It began breaking on CRLs that had
an addition hex 0x30 byte somewhere in their header. I thought that 0x30
was a reserved character only to be used for tags in ASN1 encoded with
DER; turns out that's not true. Rather than write a full-fledged ASN1
stream-parser, the simplest solution is to just maintain the list of
issuers as a constant in the codebase. This is fine because the issuer
for a specific CRL URI should not change. If it does, we've probably got
bigger problems.
This also removes the Flask app's functionality for updating the local
CRL cache. This is being handled out-of-band by a Kubernetes CronJob
and is not a concern of the app's. This means that instances of the
CRLCache do not have to explicitly track expirations for CRLs.
Previously, the in-memory dictionary or CRL issuers and locations
included expirations; now it is flattened to not include that
information.
The CRLCache class has been updated to accept a crl_list kwargs so that
unit tests can provide their own alternative CRL lists, since we now
hard-code the expected CRLs and issuers. The nightly CRL check job has
been updated to check that the hard-coded list of issuers matches what
we get when we actually sync the CRLs.
AT-AT needs to maintain a key-value CRL cache where each key is the DER
byte-string of the issuer and the value is a dictionary of the CRL file
path and expiration. This way when it checks a client certificate, it
can load the correct CRL by comparing the issuers. This is preferable to
loading all of the CRLs in-memory. However, it still requires that AT-AT
load and parse each CRL when the application boots. Because of the size
of the CRLs and their parsed, in-memory size, this leads to the
application spiking to use nearly 900MB of memory (resting usage is
around 50MB).
This change introduces a small function to ad-hoc parse the CRL and
obtain the information in the CRL we need: the issuer and the
expiration. It does this by reading the CRL byte-by-byte until it
reaches the ASN1 sequence that corresponds to the issuer, and then looks
ahead to find the nextUpdate field (i.e., the expiration date). The
CRLCache class uses this function to build its cache and JSON-serializes
the cache to disk. If another AT-AT application process finds the
serialized version, it will load that copy instead of rebuilding it. It
also entails a change to the function signature for the init method of
CRLCache: now it expects the CRL directory as its second argument,
instead of a list of locations.
The Python script invoked by `script/sync-crls` will rebuild the
location cache each time it's run. This means that when the Kubernetes
CronJob for CRLs runs, it will refresh the cache each time. When a new
application container boots, it will get the refreshed cache.
This also adds a nightly CircleCI job to sync the CRLs and test that the
ad-hoc parsing function returns the same result as a proper parsing
using the Python cryptography library. This provides extra insurance
that the function is returning correct results on real data.
The Celery worker cannot render URLs for the app without having a
SERVER_NAME value set. AT-AT's ability to send notifications when an
environment is ready is broken as a result.
This commit sets a null default value for SERVER_NAME in the default
config file. A setting must exist in the INI file in order to be
over-written by an environment variable, which is why we declare it as
null here. There is an additional kwarg, "allow_no_value", that must be
passed to ConfigParser to allow null values.
This also applies the correct domains as SERVER_NAME environment
variables in the Kubernetes ConfigMaps for the AWS and Azure Celery
workers.
Adjust the broken tests to use our dynamic fixtures for PKI files. Some
tests still rely on these fixtures, but this is a minimal patch to get
the test suite passing again. Eventually all tests should use the pytest
fixtures.