The new lister has incremental and full listing capability.
It can request the Bitbucket API in anonymous and HTTP basic authentication
modes. Rate-limiting is not aggressive and is handled.
Fix error when a configuration value loaded from a config file is also
given as keyword parameter to the from_configfile method.
Override configuration loaded from config file only if the provided
value is not None.
Add swh.lister.utils.throttling_retry decorator enabling to retry a
function that performs an HTTP request who can return a 429 status code.
The implementation is based on the tenacity module and it is assumed
that the requests library is used when querying an URL.
The default wait strategy is based on exponential backoff.
The default max number of attempts is set to 5, HTTPError exception
will then be reraised.
All tenacity.retry parameters can also be overridden in client code.
Previously, the following error was raised when the task has finished
its execution: "Object of type ListerStats is not JSON serializable".
So ensure ListerStats object gets converted to dict before returning it.
Also add missing test for task function.
This replaces the test data with some manually generated answers, which allows
us to test a few more cases for instantiating the lister.
This also expands test coverage to test behavior on rate-limited requests.
We stop depending on the ListerBase implementation. The main hoop we're jumping
through is the config override mechanism in swh.lister.get_lister, as it's
really specifc to the ListerBase `override_config` argument, which is dropped in
pattern.Lister (in favor of explicit arguments at lister instantiation).
We implement a small shim in swh.lister.pattern.Lister to give
backwards-compatibility for the new pattern to get_lister.
This generic configuration override mechanism will probably be completely
removed when the configuration mechanism is reworked. We'll see.
This new pattern uses the lister support features introduced in swh.scheduler to
replace the database management done in previous iterations of the listers.
The scheduler fixture introduced truncates tables in between tests. The debian
tests unfortunately share state and it broke when that changed. This fixes the
test by avoiding the truncation of the scheduler db table "task".
Ideally those tests need to be reworked to avoid sharing state between tests.
[1] https://jenkins.softwareheritage.org/job/DLS/job/tests/1043
The gitea lister can be run on multiple instances which could use the same id.
So listing another gitea instance, the current code would fail to insert data
for such case.
This commit fixes that behavior by prefixing the uid with the instance name.
Related to T2577
Prior to this commit, all listers were instantiated at the same time even if
only one was needed. This commit separates those instantiations.
The only drawback to this is the db model initialization which now happens at
each lister instantiation. This can be dealt with if needed at another time
though.