Run Celery Tasks in the Background with a Django App in AWS Elastic Beanstalk

Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.

Celery is a task queue implementation for Python web applications used to asynchronously execute work outside the HTTP request-response cycle. It’s a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system. It’s a task queue with focus on real-time processing, while also supporting task scheduling.

If you’re running a Django app in Beanstalk & need a Celery background task with it, run Celery as a daemon in the background. Beanstalk already uses supervisord to run some daemon processes. You can leverage that to run celeryd.

Using an .ebextensions config, create a celeryd config file in Beanstalk after the main app deployment completes. The script will also set environment variables & other configuration for the Celery task. Especially if the background task interacts will other AWS services, it’ll need AWS access keys, etc.

Here’s an example script from Stack Overflow. Put this in a .config file in .ebextensions.

files:
  "/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
    mode: "000755"
    owner: root
    group: root
    content: |
      #!/usr/bin/env bash

      # Get django environment variables
      celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
      celeryenv=${celeryenv%?}

      # Create celery configuraiton script
      celeryconf="[program:celeryd]
      ; Set full path to celery program if using virtualenv
      command=/opt/python/run/venv/bin/celery worker -A myappname --loglevel=INFO

      directory=/opt/python/current/app
      user=nobody
      numprocs=1
      stdout_logfile=/var/log/celery-worker.log
      stderr_logfile=/var/log/celery-worker.log
      autostart=true
      autorestart=true
      startsecs=10

      ; Need to wait for currently executing tasks to finish at shutdown.
      ; Increase this if you have very long running tasks.
      stopwaitsecs = 600

      ; When resorting to send SIGKILL to the program to terminate it
      ; send SIGKILL to its whole process group instead,
      ; taking care of its children as well.
      killasgroup=true

      ; if rabbitmq is supervised, set its priority higher
      ; so it starts first
      priority=998

      environment=$celeryenv"

      # Create the celery supervisord conf script
      echo "$celeryconf" | tee /opt/python/etc/celery.conf

      # Add configuration script to supervisord conf (if not there already)
      if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
          then
          echo "[include]" | tee -a /opt/python/etc/supervisord.conf
          echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
      fi

      # Reread the supervisord config
      supervisorctl -c /opt/python/etc/supervisord.conf reread

      # Update supervisord in cache without restarting all services
      supervisorctl -c /opt/python/etc/supervisord.conf update

      # Start/Restart celeryd through supervisord
      supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd

Elastic Beanstalk uses a standardized directory structure for hooks on custom platforms. These are scripts that are run during lifecycle events and in response to management operations: when instances in your environment are launched, or when a user initiates a deployment or uses the restart application server feature.

Place scripts that you want hooks to trigger in one of the subfolders of the /opt/elasticbeanstalk/hooks/ folder.

Custom platform hooks — Beanstalk documentation

As described in the quote above, our script will be executed post app deployment. It first captures environment variables in supervisord notation, creates a supervisord config file for Celery daemon & updates supervisord.conf to point to it.

Note that this approach uses supervisord on the same instances where the app is deployed, so if either the app or the Celery task needs more resources, it’ll trigger a scale-out of the instances. In this case, it’s possible that multiple instances end up running the same tasks.

Also see django-eb-sqs-worker — Django Background Tasks for Amazon Elastic Beanstalk. django-eb-sqs-worker lets you handle background jobs on Elastic Beanstalk Worker Environment sent via SQS and provides methods to send tasks to worker. You can use the same Django codebase for both your Web Tier and Worker Tier environments and send tasks from Web environment to Worker environment. Amazon fully manages autoscaling for you. Tasks are sent via Amazon Simple Queue Service and are delivered to your worker with Elastic Beanstalk’s SQS daemon. Periodic tasks are also supported.