AT-AT needs to be able to track which user tasks failed and why. To accomplish this we: - Enabled Celery's results backend, which logs task results to a data store; a Postgres table, in our case. (https://docs.celeryproject.org/en/latest/userguide/tasks.html#result-backends) - Created tables to track the relationships between the relevant models (Environment, EnvironmentRole) and their task failures. - Added an `on_failure` hook that tasks can use. The hook will add records to the job failure tables. Now a resource like an `Environment` has access to it task failures through the corresponding failure table. Notes: - It might prove useful to use a real foreign key to the Celery results table eventually. I did not do it here because it requires that we explicitly store the Celery results table schema as a migration and add a model for it. In the current implementation, AT-AT can be agnostic about where the results live. - We store the task results indefinitely, so it is important to specify tasks for which we do not care about the results (like `send_mail`) via the `ignore_result` kwarg.
42 lines
1.4 KiB
Python
42 lines
1.4 KiB
Python
import pytest
|
|
|
|
from atst.jobs import RecordEnvironmentFailure, RecordEnvironmentRoleFailure
|
|
|
|
from tests.factories import EnvironmentFactory, EnvironmentRoleFactory
|
|
|
|
|
|
def test_environment_job_failure(celery_app, celery_worker):
|
|
@celery_app.task(bind=True, base=RecordEnvironmentFailure)
|
|
def _fail_hard(self, environment_id=None):
|
|
raise ValueError("something bad happened")
|
|
|
|
environment = EnvironmentFactory.create()
|
|
celery_worker.reload()
|
|
|
|
# Use apply instead of delay since we are testing the on_failure hook only
|
|
task = _fail_hard.apply(kwargs={"environment_id": environment.id})
|
|
with pytest.raises(ValueError):
|
|
task.get()
|
|
|
|
assert environment.job_failures
|
|
job_failure = environment.job_failures[0]
|
|
assert job_failure.task == task
|
|
|
|
|
|
def test_environment_role_job_failure(celery_app, celery_worker):
|
|
@celery_app.task(bind=True, base=RecordEnvironmentRoleFailure)
|
|
def _fail_hard(self, environment_role_id=None):
|
|
raise ValueError("something bad happened")
|
|
|
|
role = EnvironmentRoleFactory.create()
|
|
celery_worker.reload()
|
|
|
|
# Use apply instead of delay since we are testing the on_failure hook only
|
|
task = _fail_hard.apply(kwargs={"environment_role_id": role.id})
|
|
with pytest.raises(ValueError):
|
|
task.get()
|
|
|
|
assert role.job_failures
|
|
job_failure = role.job_failures[0]
|
|
assert job_failure.task == task
|