Instead of having a single crate and its versions info per page,
prefer to have up to 1000 crates per page to significantly speedup
the listing process.
Previously, the lister state was recorded regardless if errors occurred
when listing crates as the finalize method is called regardless of raised
exception during listing.
As a consequence some crates could be missed as the incremental listing
restarts from the dump date of the last processed crate database.
So ensure all crates have been processed by the lister before recording
its state.
packaging.version.parse is dedicated to parse Python package version
numbers but crate versions do not necessarily respect Python version
number conventions and thus some crate versions cannot be parsed.
Prefer to use looseversion.LooseVersion2 instead which in a drop-in
replacement for deprecated distutils.version.LooseVersion and enables
to parse all kind of version numbers.
Ensure that all lister classes have the same set of mandatory parameters
in their constructors, notably: scheduler, url, instance and credentials.
Add a new test checking listers classes have mandatory parameters declared
in their constructors. The purpose is to avoid deployment issues on staging
or production environment as celery tasks can fail to be executed if mandatory
parameters are not handled by listers.
Reated to swh/infra/sysadm-environment#5030.
In order to reduce http api call amount made by the loader, download a
crates.io database dump, and parse its csv files to get a last_update
value for each versions of a Crate.
Those values are sent to the loader through extra_loader_arguments
'crates_metadata'.
'artifacts' and 'crates_metadata' now uses "version" as key.
Related T4104, D8171
Prefer to execute lister through a celery task as it also enables to
catch possible issues with task implementation.
Also use docker compose v2 commands.
By using a single equality instead of checking len() then zip()
to check one by one, pytest can find the common/missing elements
and print them nicely when the two lists are unequal.
Previously we had as many origins as version for a crate package, url was a link
to a specific crate version package.
Refactor to have one origin per package name and add an 'artifacts' entry to
extra_loader_arguments that list all versions, package url and checksum.
Origin url is now a link to the related http api endpoint for a package name.
Related to T4104
The Crates lister retrieves crates package for Rust lang.
It basically fetches https://github.com/rust-lang/crates.io-index.git
to a temp directory and then walks through each file to get the
crate's info.