Drop launchpad lister from the lister to check, its test setup is more involved than the
other listers. As its setup is not done in that test, it's actually connecting
anonymously to the launchpad server. So remove such lister from the test.
This should also fix the debian build which refuses such access [1]
[1] https://jenkins.softwareheritage.org/job/debian/job/packages/job/DLS/job/gbp-buildpackage/97/console
Port debian lister to `swh.lister.pattern.Lister` API.
The new implementation will produce one instance of ListedOrigin model
per package, notably containing the set of parameters expected by the
debian loader.
The lister is also stateful, meaning only new packages and those with
new found versions since the last listing will be returned.
Closes T2979
The previous pagination implementation has a hard-coded limit server side [1]
[1]
```
{"error":"Offset pagination has a maximum allowed offset of 50000 for requests that return objects of type Project. Remaining records can be retrieved using keyset pagination."}
```
Related to T2994
R package last update date can be found in the "Packaged" field of
package info returned by tools::CRAN_package_db().
So retrieve it and parse it as a datetime to provide as last_update
parameter value in ListedOrigin model.
Closes T2989
The PaginatedListedOriginList model has been updated in
rDSCHb93aa5be2c2d5dc2130e1027698f3e1255052d8d and the origins
field has been renamed to results.
The lister is stateless and has full listing capability.
It can request the Gitea API using HTTP token authentication.
Rate-limiting was not encountered but is handled generically.
Added support for getting repo last update date through API.
Note that the current implementation will start back the new visit from the last
next_page link seen (that's what is stored in the lister state to avoid computing back
the url). This means that this page will be seen at least 2 times, on the first visit
and on the next. This should not pose any problems as the listing is idempotent.
Related to T2987
Port npm lister to `swh.lister.pattern.Lister` API.
As before, the lister can be run in full or incremental mode.
When using incremental mode, only new and modified packages will
be returned since the last incremental listing process.
Otherwise, all packages will be listed in lexicographical order.
One major improvement to be noted, latest package update date
is now retrieved when available and sent to scheduler database.
Closes T2972
The new lister has incremental and full listing capability.
It can request the Bitbucket API in anonymous and HTTP basic authentication
modes. Rate-limiting is not aggressive and is handled.
Fix error when a configuration value loaded from a config file is also
given as keyword parameter to the from_configfile method.
Override configuration loaded from config file only if the provided
value is not None.
Add swh.lister.utils.throttling_retry decorator enabling to retry a
function that performs an HTTP request who can return a 429 status code.
The implementation is based on the tenacity module and it is assumed
that the requests library is used when querying an URL.
The default wait strategy is based on exponential backoff.
The default max number of attempts is set to 5, HTTPError exception
will then be reraised.
All tenacity.retry parameters can also be overridden in client code.
Previously, the following error was raised when the task has finished
its execution: "Object of type ListerStats is not JSON serializable".
So ensure ListerStats object gets converted to dict before returning it.
Also add missing test for task function.
This replaces the test data with some manually generated answers, which allows
us to test a few more cases for instantiating the lister.
This also expands test coverage to test behavior on rate-limited requests.
We stop depending on the ListerBase implementation. The main hoop we're jumping
through is the config override mechanism in swh.lister.get_lister, as it's
really specifc to the ListerBase `override_config` argument, which is dropped in
pattern.Lister (in favor of explicit arguments at lister instantiation).
We implement a small shim in swh.lister.pattern.Lister to give
backwards-compatibility for the new pattern to get_lister.
This generic configuration override mechanism will probably be completely
removed when the configuration mechanism is reworked. We'll see.
This new pattern uses the lister support features introduced in swh.scheduler to
replace the database management done in previous iterations of the listers.