Prior to this commit, the lister assumed authentication was required. It exists public
gogs instances which do not require it.
This also updates documentation to mention the usual api location. This is useful when
people wants to actually trigger a listing as a pre-check flight.
This drops repetitive instruction in the gitea lister as well.
Co-authored with Antoine Lambert (@anlambert) <anlambert@softwareheritage.org>.
Related to infra/sysadm-environment#4644
Instead of retrying HTTP requests only for 429 status code by default,
prefer to use the generic retry policy enabling to also retry for status
codes >= 500 but also on ConnectionError exceptions.
Rename throttling_retry decorator to http_retry to reflect this change.
By using a single equality instead of checking len() then zip()
to check one by one, pytest can find the common/missing elements
and print them nicely when the two lists are unequal.
This removes code and adds support for incremental pagination.
While both are essentially the same lister now, it still makes sense to
keep the Gitea lister separate, in order to:
1. display them in different categories on https://archive.softwareheritage.org/
2. support possible divergence of APIs in the future
Make the instance parameter of the base pattern lister optional and set
lister name to URL network location when not provided.
It simplifies lister creation when associated forge type have a lot of
instances in the wild (e.g. gitlab or cgit) while giving more details
about the listed forge instance.
Also process listers for forge with multiple instances (cgit, gitea,
gitlab, phabricator and tuleap) to ensure URL network location will be
used when instance parameter is not provided.
Related to T3403
The PaginatedListedOriginList model has been updated in
rDSCHb93aa5be2c2d5dc2130e1027698f3e1255052d8d and the origins
field has been renamed to results.
The lister is stateless and has full listing capability.
It can request the Gitea API using HTTP token authentication.
Rate-limiting was not encountered but is handled generically.
Added support for getting repo last update date through API.
The gitea lister can be run on multiple instances which could use the same id.
So listing another gitea instance, the current code would fail to insert data
for such case.
This commit fixes that behavior by prefixing the uid with the instance name.
Related to T2577
Prior to this commit, all listers were instantiated at the same time even if
only one was needed. This commit separates those instantiations.
The only drawback to this is the db model initialization which now happens at
each lister instantiation. This can be dealt with if needed at another time
though.