Add Gitweb lister

Depending on some instances, we have some specific heuristics, some instances:
- have summary pages which do not not list metadata_url (so some
  computation happens to list git:// origins which are cloneable)
- have summary page which reference metadata_url as a multiple comma separated urls
- lists relative urls of the repository so we need to join it with the main instance url
  to have a complete cloneable origins (or summary page)
- lists "down" http origins (cloning those won't work) so lists those as cloneable https
  ones (when the main url is behind https).

Refs. swh/devel/swh-lister#1800
This commit is contained in:
Antoine R. Dumont (@ardumont) 2023-07-08 14:57:24 +02:00
parent 87d2344c60
commit 573958ce64
No known key found for this signature in database
GPG key ID: 52E2E9840D10C3B8
20 changed files with 1814 additions and 1 deletions

View file

@ -6,6 +6,7 @@ beautifulsoup4
launchpadlib
tenacity >= 6.2
lxml
dateparser
dulwich
testing.postgresql
psycopg2