swh-lister/swh/lister
Antoine Lambert 108816f232 rubygems: Use gems database dump to improve listing output
Instead of using an undocumented rubygems HTTP endpoint that only
gives us the names of the gems, prefer to exploit the daily PostgreSQL
dump of the rubygems.org database.

It enables to list all gems but also all versions of a gem and its
release artifacts. For each relase artifact, the following info are
extracted: version, download URL, sha256 checksum, release date
plus a couple of extra metadata.

The lister will now set list of artifacts and list of metadata as extra
loader arguments when sending a listed origin to the scheduler database.
A last_update date is also computed which should ensure loading tasks
for rubygems will be scheduled only when new releases are available since
last loadings.

To be noted, the lister will spawn a temporary postgres instance so this
require the initdb executable from postgres server installation to be
available in the execution environment.

Related to T1777
2022-10-07 16:54:48 +02:00
..
arch Send package artifact checksums to loaders when info is available 2022-09-30 18:44:11 +02:00
aur Update instructions for running a lister in docker 2022-09-29 11:26:40 +02:00
bitbucket Refactor and deduplicate HTTP requests code in listers 2022-09-26 10:48:40 +02:00
bower Update instructions for running a lister in docker 2022-09-29 11:26:40 +02:00
cgit cgit/tests: Rename readme.md to readme 2022-09-26 13:22:10 +02:00
conda Conda: switch artifacts from dict to list 2022-09-30 15:55:53 +02:00
cpan Send package artifact checksums to loaders when info is available 2022-09-30 18:44:11 +02:00
cran Send package artifact checksums to loaders when info is available 2022-09-30 18:44:11 +02:00
crates Crates.io: Add last_update for each version of a crate 2022-10-05 17:10:28 +02:00
debian debian: Remove no longer needed code to get accurate origins count 2022-09-29 11:14:42 +02:00
gitea Use generic HTTP retry policy by default and rename dedicated decorator 2022-09-26 10:48:40 +02:00
github Update value of User-Agent HTTP request header used by listers 2022-09-26 10:48:40 +02:00
gitlab Update value of User-Agent HTTP request header used by listers 2022-09-26 10:48:40 +02:00
gnu nixguix: Add lister 2022-10-03 18:26:36 +02:00
gogs Refactor and deduplicate HTTP requests code in listers 2022-09-26 10:48:40 +02:00
golang Refactor and deduplicate HTTP requests code in listers 2022-09-26 10:48:40 +02:00
hackage Hackage: List origins from hackage.haskell.org, The Haskell Package Repository 2022-09-27 14:22:03 +02:00
launchpad Use generic HTTP retry policy by default and rename dedicated decorator 2022-09-26 10:48:40 +02:00
maven Update value of User-Agent HTTP request header used by listers 2022-09-26 10:48:40 +02:00
nixguix nixguix: Exclude faulty "recursive" file origins from listing 2022-10-07 14:33:38 +02:00
npm Update value of User-Agent HTTP request header used by listers 2022-09-26 10:48:40 +02:00
nuget Nuget: Lister for NuGet the package manager for .NET 2022-09-27 14:56:36 +02:00
opam python: Reformat code with black 22.3.0 2022-04-08 15:15:09 +02:00
packagist Refactor and deduplicate HTTP requests code in listers 2022-09-26 10:48:40 +02:00
phabricator Update value of User-Agent HTTP request header used by listers 2022-09-26 10:48:40 +02:00
pubdev Update instructions for running a lister in docker 2022-09-29 11:26:40 +02:00
puppet Send package artifact checksums to loaders when info is available 2022-09-30 18:44:11 +02:00
pypi Use generic HTTP retry policy by default and rename dedicated decorator 2022-09-26 10:48:40 +02:00
rubygems rubygems: Use gems database dump to improve listing output 2022-10-07 16:54:48 +02:00
sourceforge Update value of User-Agent HTTP request header used by listers 2022-09-26 10:48:40 +02:00
tests nixguix: Register task 2022-10-03 18:26:36 +02:00
tuleap Refactor and deduplicate HTTP requests code in listers 2022-09-26 10:48:40 +02:00
__init__.py nixguix: Add lister 2022-10-03 18:26:36 +02:00
cli.py python: Reformat code with black 22.3.0 2022-04-08 15:15:09 +02:00
pattern.py pattern: Ensure accurate origin counts returned by run method 2022-09-29 11:14:08 +02:00
py.typed typing: minimal changes to make a no-op mypy run pass 2019-10-28 15:35:21 +01:00
utils.py Use generic HTTP retry policy by default and rename dedicated decorator 2022-09-26 10:48:40 +02:00