Previously gnu lister was using same code as that of tarball loader
to download, unzip and read tree.json file.
To make the code consise the downloading method is changed to
requests library.
Some of the new listers like GNU and CRAN do not
follow the conventional way of making an HTTP
request, hence they do not need some of the methods
which are usually needed by in conventional HTTP
request.
But those method are marked abstractmethod in the
core making them necessary to be present. So it is in
best to remove abstractmethod to increase the
readability of those listers.
also add a cli group named 'lister' for the sake of consistency with
other swh packages and rename the command as 'db-init', like:
swh lister db-init LISTER [...]
Summary:
Using order by and offset makes the partitioning a n^2 operation on the number
of entries in the table, rather than an instant operation when using
min/max.
This assumes the indexable column is more or less uniform, which is not exactly
true but not the worst approximation either.
Test Plan: tox
Reviewers: #reviewers, douardda
Reviewed By: #reviewers, douardda
Subscribers: douardda, swh-public-ci
Differential Revision: https://forge.softwareheritage.org/D1267
when this lister is called for the first time, db_num_entries() may return
a null value, so the min() will also be zero, making the range() call crash.
so that we can easily manage its configuration (especially in the docker
environment) by referring to this lister as only 'bitbucket' everywhere
(ie. python package name and config file names).
using the host of the given url.
This allows to create a lister task by simply specify the API base url
and prevent 'inconsistent by default' behavior, eg. with:
swh-scheduler task add swh-lister-gitlab-full \
api_baseurl=https://0xacab.org/api/v4
the created task does not use 'gitlab' as instance name (but '0xacab.org'
here).
It's still possible to explicitely specify the instance name if needed.
in order to be able to run unit tests using celery pytest fixtures, we
use a dedicated swh_app fixture that ensure the "main" celery app
is the test app (otherwise subtasks won't work).
Get rid of the class based task definition in favor of decorator-based
task declarations.
Doing so, we can get rid of core/tasks.py
Task names are explicitely set to keep compatibility with task
definitions in schedulers' database.
This also add debug statements at the beginning and end of each lister
task.