Add save-bulk lister to check origins prior their insertion in database

This new and special lister enables to verify a list of origins to archive
provided by users (for instance through the Web API).

Its purpose is to avoid polluting the scheduler database with origins that
cannot be loaded into the archive.

Each origin is identified by an URL and a visit type. For a given visit type
the lister is checking if the origin URL can be found and if the visit type
is valid.

The supported visit types are those for VCS (bzr, cvs, hg, git and svn) plus
the one for loading a tarball content into the archive.

Accepted origins are inserted or upserted in the scheduler database.

Rejected origins are stored in the lister state.

Related to #4709
This commit is contained in:
Antoine Lambert 2024-05-22 17:42:56 +02:00
parent 6618cf341c
commit af24960bc2
10 changed files with 763 additions and 0 deletions

View file

@ -34,6 +34,7 @@ testing = {file = ["requirements-test.txt"]}
"lister.bioconductor" = "swh.lister.bioconductor:register"
"lister.bitbucket" = "swh.lister.bitbucket:register"
"lister.bower" = "swh.lister.bower:register"
"lister.save-bulk" = "swh.lister.save_bulk:register"
"lister.cgit" = "swh.lister.cgit:register"
"lister.conda" = "swh.lister.conda:register"
"lister.cpan" = "swh.lister.cpan:register"