Summary:
Using order by and offset makes the partitioning a n^2 operation on the number
of entries in the table, rather than an instant operation when using
min/max.
This assumes the indexable column is more or less uniform, which is not exactly
true but not the worst approximation either.
Test Plan: tox
Reviewers: #reviewers, douardda
Reviewed By: #reviewers, douardda
Subscribers: douardda, swh-public-ci
Differential Revision: https://forge.softwareheritage.org/D1267
Summary:
Using order by and offset makes the partitioning a n^2 operation on the number
of entries in the table, rather than an instant operation when using
min/max.
This assumes the indexable column is more or less uniform, which is not exactly
true but not the worst approximation either.
Test Plan: tox
Reviewers: #reviewers, douardda
Reviewed By: #reviewers, douardda
Subscribers: douardda, swh-public-ci
Differential Revision: https://forge.softwareheritage.org/D1267
when this lister is called for the first time, db_num_entries() may return
a null value, so the min() will also be zero, making the range() call crash.
so that we can easily manage its configuration (especially in the docker
environment) by referring to this lister as only 'bitbucket' everywhere
(ie. python package name and config file names).
using the host of the given url.
This allows to create a lister task by simply specify the API base url
and prevent 'inconsistent by default' behavior, eg. with:
swh-scheduler task add swh-lister-gitlab-full \
api_baseurl=https://0xacab.org/api/v4
the created task does not use 'gitlab' as instance name (but '0xacab.org'
here).
It's still possible to explicitely specify the instance name if needed.
in order to be able to run unit tests using celery pytest fixtures, we
use a dedicated swh_app fixture that ensure the "main" celery app
is the test app (otherwise subtasks won't work).
Get rid of the class based task definition in favor of decorator-based
task declarations.
Doing so, we can get rid of core/tasks.py
Task names are explicitely set to keep compatibility with task
definitions in schedulers' database.
This also add debug statements at the beginning and end of each lister
task.