Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.
Celery is a task queue implementation for Python web applications used to asynchronously execute work outside the HTTP request-response cycle. It’s a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system. It’s a task queue with focus on real-time processing, while also supporting task scheduling.
If you’re running a Django app in Beanstalk & need a Celery background task with it, run Celery as a daemon in the background. Beanstalk already uses supervisord
to run some daemon processes. You can leverage that to run celeryd
.
Using an .ebextensions
config, create a celeryd
config file in Beanstalk after the main app deployment completes. The script will also set environment variables & other configuration for the Celery task. Especially if the background task interacts will other AWS services, it’ll need AWS access keys, etc.
Here’s an example script from Stack Overflow. Put this in a .config
file in .ebextensions
.
files:
"/opt/elasticbeanstalk/hooks/appdeploy/post/run_supervised_celeryd.sh":
mode: "000755"
owner: root
group: root
content: |
#!/usr/bin/env bash
# Get django environment variables
celeryenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
celeryenv=${celeryenv%?}
# Create celery configuraiton script
celeryconf="[program:celeryd]
; Set full path to celery program if using virtualenv
command=/opt/python/run/venv/bin/celery worker -A myappname --loglevel=INFO
directory=/opt/python/current/app
user=nobody
numprocs=1
stdout_logfile=/var/log/celery-worker.log
stderr_logfile=/var/log/celery-worker.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
; if rabbitmq is supervised, set its priority higher
; so it starts first
priority=998
environment=$celeryenv"
# Create the celery supervisord conf script
echo "$celeryconf" | tee /opt/python/etc/celery.conf
# Add configuration script to supervisord conf (if not there already)
if ! grep -Fxq "[include]" /opt/python/etc/supervisord.conf
then
echo "[include]" | tee -a /opt/python/etc/supervisord.conf
echo "files: celery.conf" | tee -a /opt/python/etc/supervisord.conf
fi
# Reread the supervisord config
supervisorctl -c /opt/python/etc/supervisord.conf reread
# Update supervisord in cache without restarting all services
supervisorctl -c /opt/python/etc/supervisord.conf update
# Start/Restart celeryd through supervisord
supervisorctl -c /opt/python/etc/supervisord.conf restart celeryd
Elastic Beanstalk uses a standardized directory structure for hooks on custom platforms. These are scripts that are run during lifecycle events and in response to management operations: when instances in your environment are launched, or when a user initiates a deployment or uses the restart application server feature.
Place scripts that you want hooks to trigger in one of the subfolders of the /opt/elasticbeanstalk/hooks/ folder.
— Custom platform hooks — Beanstalk documentation
As described in the quote above, our script will be executed post app deployment. It first captures environment variables in supervisord
notation, creates a supervisord
config file for Celery daemon & updates supervisord.conf
to point to it.
Note that this approach uses supervisord
on the same instances where the app is deployed, so if either the app or the Celery task needs more resources, it’ll trigger a scale-out of the instances. In this case, it’s possible that multiple instances end up running the same tasks.
Also see django-eb-sqs-worker — Django Background Tasks for Amazon Elastic Beanstalk. django-eb-sqs-worker lets you handle background jobs on Elastic Beanstalk Worker Environment sent via SQS and provides methods to send tasks to worker. You can use the same Django codebase for both your Web Tier and Worker Tier environments and send tasks from Web environment to Worker environment. Amazon fully manages autoscaling for you. Tasks are sent via Amazon Simple Queue Service and are delivered to your worker with Elastic Beanstalk’s SQS daemon. Periodic tasks are also supported.