Record job failures with application context.

AT-AT needs to be able to track which user tasks failed and why. To accomplish this we: - Enabled Celery's results backend, which logs task results to a data store; a Postgres table, in our case. (https://docs.celeryproject.org/en/latest/userguide/tasks.html#result-backends) - Created tables to track the relationships between the relevant models (Environment, EnvironmentRole) and their task failures. - Added an `on_failure` hook that tasks can use. The hook will add records to the job failure tables. Now a resource like an `Environment` has access to it task failures through the corresponding failure table. Notes: - It might prove useful to use a real foreign key to the Celery results table eventually. I did not do it here because it requires that we explicitly store the Celery results table schema as a migration and add a model for it. In the current implementation, AT-AT can be agnostic about where the results live. - We store the task results indefinitely, so it is important to specify tasks for which we do not care about the results (like `send_mail`) via the `ignore_result` kwarg.
2019-09-05 11:07:51 -04:00
parent 1cbefb099b
commit 7010bdb09c
11 changed files with 158 additions and 2 deletions
--- a/tests/test_jobs.py
+++ b/tests/test_jobs.py
@@ -0,0 +1,41 @@
+import pytest
+
+from atst.jobs import RecordEnvironmentFailure, RecordEnvironmentRoleFailure
+
+from tests.factories import EnvironmentFactory, EnvironmentRoleFactory
+
+
+def test_environment_job_failure(celery_app, celery_worker):
+    @celery_app.task(bind=True, base=RecordEnvironmentFailure)
+    def _fail_hard(self, environment_id=None):
+        raise ValueError("something bad happened")
+
+    environment = EnvironmentFactory.create()
+    celery_worker.reload()
+
+    # Use apply instead of delay since we are testing the on_failure hook only
+    task = _fail_hard.apply(kwargs={"environment_id": environment.id})
+    with pytest.raises(ValueError):
+        task.get()
+
+    assert environment.job_failures
+    job_failure = environment.job_failures[0]
+    assert job_failure.task == task
+
+
+def test_environment_role_job_failure(celery_app, celery_worker):
+    @celery_app.task(bind=True, base=RecordEnvironmentRoleFailure)
+    def _fail_hard(self, environment_role_id=None):
+        raise ValueError("something bad happened")
+
+    role = EnvironmentRoleFactory.create()
+    celery_worker.reload()
+
+    # Use apply instead of delay since we are testing the on_failure hook only
+    task = _fail_hard.apply(kwargs={"environment_role_id": role.id})
+    with pytest.raises(ValueError):
+        task.get()
+
+    assert role.job_failures
+    job_failure = role.job_failures[0]
+    assert job_failure.task == task