WRITELOOP

CELERY LEARNINGS

2020 August 28

In the python world, it is common to use celery when you need to deal with queues. For simple and quick tasks, it is quite trivial to use it. It becomes more complicated when you need to deal with long-running tasks, workflows and broadcasts.

Broadcasts

By default, the celery main process running on a node will only send a message that is on a queue to one of the workers running on it. But the need may arise to send a message to all workers when you are coordinating a more complicated task. To do that, you can use a Broadcast queue:

from kombu.common import Broadcast
CELERY_QUEUES = (
...
Broadcast('my_broadcast'),
)

Notice that this will work on a single celery node. If you need to do something like that on all nodes running celery, you may have to come with a more elaborated custom solution. To do that, you must take in mind how the celery main process and workers do their job. Only the celery main process consumes the messages. It consumes one message at a time, delivering it to one of its workers (subprocesses). Internally, it is done on this way: The celery main process runs celery.Task.__init__(). After that, it forks itself into subprocesses [^1]. Those are in fact the workers (that you define with --concurrency/-c option). After the forking, each one gets a copy of an already initialized Task object, but each has its’ own resources, like e.g. memory. Only celery.Task.run() is called on each of the subprocesses. So, if you need a more elaborate solution, you must take into account that each celery node has a celery main process that executes celery.Task.__init__() only once, and that the workers will only execute celery.Task.run(). [^1] A real process (with a pid). NOT a thread, so no GIL to stop concurrency.

References: