Gitea API return next pagination link with all query parameters provided
to an API request.
As we were also passing a dict of fixed query parameters to the page_request
method, some query parameters ended up having multiple instances in the URL
for fetching a new page of repositories data. So each time a new page was
requested, new instances of these parameters were appended to the URL which
could result in a really long URL if the number of pages to retrieve is high
and make the request fail.
Also remove a debug log already present in http_request method.
The implementation of `HTTPError` in `requests` does not guarantee that
the `response` property will always be set. So we need to ensure it is
not `None` before looking for the return code, for example.
This also makes mypy checks pass again, as `types-request` was updated
in 2.31.0.9 to better match this particular aspect. See:
https://github.com/python/typeshed/pull/10875
In contrary of gitea listing which does not require to provide the q query
parameter, it is required for the gogs case.
After reading the gogs source code, the /repos/search endpoint generates
a sql request of the form: "SELECT * FROM repos WHERE name LIKE '%{q}%'".
By setting the q parameter value to "_", the LIKE clause acts as a
wildcard and all repositories are ensured to be returned.
Fixes#4698.
This pushes the rather elementary logic within the lister's scope. This will simplify
and unify cli call between lister and scheduler clis. This will also allow to reduce
erroneous operations which can happen for example in the add-forge-now.
With the following, we will only have to provide the type and the instance, then
everything will be scheduled properly.
Refs. swh/devel/swh-lister#4693
Prior to this commit, the lister assumed authentication was required. It exists public
gogs instances which do not require it.
This also updates documentation to mention the usual api location. This is useful when
people wants to actually trigger a listing as a pre-check flight.
This drops repetitive instruction in the gitea lister as well.
Co-authored with Antoine Lambert (@anlambert) <anlambert@softwareheritage.org>.
Related to infra/sysadm-environment#4644
Numerous listers were using the same page_request method or equivalent
in their implementation so prefer to deduplicate that code by adding
an http_request method in base lister class: swh.lister.pattern.Lister.
That method simply wraps a call to requests.Session.request and logs
some useful info for debugging and error reporting, also an HTTPError
will be raised if a request ends up with an error.
All listers using that new method now benefit of requests retry when
an HTTP error occurs thanks to the use of the http_retry decorator.
Instead of retrying HTTP requests only for 429 status code by default,
prefer to use the generic retry policy enabling to also retry for status
codes >= 500 but also on ConnectionError exceptions.
Rename throttling_retry decorator to http_retry to reflect this change.
By using a single equality instead of checking len() then zip()
to check one by one, pytest can find the common/missing elements
and print them nicely when the two lists are unequal.
This removes code and adds support for incremental pagination.
While both are essentially the same lister now, it still makes sense to
keep the Gitea lister separate, in order to:
1. display them in different categories on https://archive.softwareheritage.org/
2. support possible divergence of APIs in the future