swh-lister/swh/lister
Antoine Lambert af24960bc2 Add save-bulk lister to check origins prior their insertion in database
This new and special lister enables to verify a list of origins to archive
provided by users (for instance through the Web API).

Its purpose is to avoid polluting the scheduler database with origins that
cannot be loaded into the archive.

Each origin is identified by an URL and a visit type. For a given visit type
the lister is checking if the origin URL can be found and if the visit type
is valid.

The supported visit types are those for VCS (bzr, cvs, hg, git and svn) plus
the one for loading a tarball content into the archive.

Accepted origins are inserted or upserted in the scheduler database.

Rejected origins are stored in the lister state.

Related to #4709
2024-09-04 10:42:23 +02:00
..
arch Use beautifulsoup4 CSS selectors to simplify code and type checking 2024-04-16 11:22:51 +02:00
aur python: Fix black formatting after bump to 23.1.0 in pre-commit 2023-12-05 10:33:07 +01:00
bioconductor Use beautifulsoup4 CSS selectors to simplify code and type checking 2024-04-16 11:22:51 +02:00
bitbucket tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
bower Harmonize listers parameters and add test to check mandatory ones 2023-09-06 11:55:34 +02:00
cgit tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
conda Harmonize listers parameters and add test to check mandatory ones 2023-09-06 11:55:34 +02:00
cpan python: Fix black formatting after bump to 23.1.0 in pre-commit 2023-12-05 10:33:07 +01:00
cran cran: Use pyreadr instead of rpy2 to read a RDS file from Python 2023-11-14 17:09:42 +01:00
crates crates: Remove crates metadata as loader argument 2024-08-27 12:28:05 +02:00
debian python: Fix black formatting after bump to 23.1.0 in pre-commit 2023-12-05 10:33:07 +01:00
dlang Remove spurious space 2023-09-21 09:18:54 +02:00
elm Elm stateful lister 2024-01-09 14:05:56 +01:00
gitea gogs, gitea: Fix task execution to pass along extra kwargs 2022-12-14 16:09:56 +01:00
github GitHub: record whether the origin is a fork 2024-07-18 10:45:06 +02:00
gitiles tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
gitlab tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
gitweb tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
gnu python: Fix black formatting after bump to 23.1.0 in pre-commit 2023-12-05 10:33:07 +01:00
gogs gitea, gogs: Ensure query parameters are not duplicated in API URLs 2024-06-05 15:27:58 +02:00
golang Make qa tools happy again 2024-08-27 17:40:30 +02:00
hackage Harmonize listers parameters and add test to check mandatory ones 2023-09-06 11:55:34 +02:00
hex python: Fix black formatting after bump to 23.1.0 in pre-commit 2023-12-05 10:33:07 +01:00
julia Stateful Julia lister 2023-12-18 16:02:22 +01:00
launchpad tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
maven tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
nixguix Move tarball validation functions from nixguix to utils 2024-09-02 11:29:47 +02:00
npm tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
nuget Use beautifulsoup4 CSS selectors to simplify code and type checking 2024-04-16 11:22:51 +02:00
opam opam: Fix 'opam init' error when relisting an opam instance 2023-06-29 17:49:21 +02:00
packagist Make qa tools happy again 2024-08-27 17:40:30 +02:00
pagure pagure/tasks: Add missing docstring for list_pagure task function 2023-06-23 14:29:17 +02:00
phabricator tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
pubdev python: Fix black formatting after bump to 23.1.0 in pre-commit 2023-12-05 10:33:07 +01:00
puppet python: Fix black formatting after bump to 23.1.0 in pre-commit 2023-12-05 10:33:07 +01:00
pypi Harmonize listers parameters and add test to check mandatory ones 2023-09-06 11:55:34 +02:00
rpm python: Fix black formatting after bump to 23.1.0 in pre-commit 2023-12-05 10:33:07 +01:00
rubygems Use beautifulsoup4 CSS selectors to simplify code and type checking 2024-04-16 11:22:51 +02:00
save_bulk Add save-bulk lister to check origins prior their insertion in database 2024-09-04 10:42:23 +02:00
sourceforge tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
stagit tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
tests Add save-bulk lister to check origins prior their insertion in database 2024-09-04 10:42:23 +02:00
tuleap tests: Fix mocking of sleep calls with tenacity 8.4.2 2024-06-28 18:15:36 +02:00
__init__.py Add support for more tarball recognition based on extensions 2022-10-25 09:50:31 +02:00
cli.py cli: Print lister stats at the end of the run command 2023-11-07 19:00:53 +01:00
pattern.py packagist: Yield pages of origins to regularly record origins 2023-08-04 11:09:58 +02:00
py.typed typing: minimal changes to make a no-op mypy run pass 2019-10-28 15:35:21 +01:00
utils.py Move tarball validation functions from nixguix to utils 2024-09-02 11:29:47 +02:00