Commit graph

13 commits

Author SHA1 Message Date
David Douard
714fccc3c7 python: Fix black formatting after bump to 23.1.0 in pre-commit 2023-12-05 10:33:07 +01:00
Antoine Lambert
6e7bc49ec7 Harmonize listers parameters and add test to check mandatory ones
Ensure that all lister classes have the same set of mandatory parameters
in their constructors, notably: scheduler, url, instance and credentials.

Add a new test checking listers classes have mandatory parameters declared
in their constructors. The purpose is to avoid deployment issues on staging
or production environment as celery tasks can fail to be executed if mandatory
parameters are not handled by listers.

Reated to swh/infra/sysadm-environment#5030.
2023-09-06 11:55:34 +02:00
Nicolas Dandrimont
e785e67315 Hook up recently introduced options to all listers
Hopefully one day we'll be able to replace all of this mess with PEP692
TypedDict kwargs, but that's only on track for Python 3.12.
2022-12-05 16:33:45 +01:00
Antoine Lambert
60707a45dd pubdev: Update outdated lister documentation 2022-10-28 11:22:15 +02:00
Antoine Lambert
dabb1a2ae5 Update instructions for running a lister in docker
Prefer to execute lister through a celery task as it also enables to
catch possible issues with task implementation.

Also use docker compose v2 commands.
2022-09-29 11:26:40 +02:00
Antoine Lambert
d5c30a3ce3 Update value of User-Agent HTTP request header used by listers
That HTTP header value will now contain the lister name but also a link
to our contact form in order for sysadmins to easily reach us if needed.

The following template is used to generate it:

"Software Heritage <lister_name> lister v<swh-lister version>
 (+https://www.softwareheritage.org/contact)"
2022-09-26 10:48:40 +02:00
Antoine Lambert
db6ce12e9e Refactor and deduplicate HTTP requests code in listers
Numerous listers were using the same page_request method or equivalent
in their implementation so prefer to deduplicate that code by adding
an http_request method in base lister class: swh.lister.pattern.Lister.

That method simply wraps a call to requests.Session.request and logs
some useful info for debugging and error reporting, also an HTTPError
will be raised if a request ends up with an error.

All listers using that new method now benefit of requests retry when
an HTTP error occurs thanks to the use of the http_retry decorator.
2022-09-26 10:48:40 +02:00
Antoine Lambert
9c55acd286 Use generic HTTP retry policy by default and rename dedicated decorator
Instead of retrying HTTP requests only for 429 status code by default,
prefer to use the generic retry policy enabling to also retry for status
codes >= 500 but also on ConnectionError exceptions.

Rename throttling_retry decorator to http_retry to reflect this change.
2022-09-26 10:48:40 +02:00
Antoine R. Dumont (@ardumont)
67211adb60
pubdev.lister: Decrease verbosity
This matches other lister verbosity.

Related to T4517
2022-09-09 12:31:43 +02:00
Antoine Lambert
c819cc237d pubdev: Update User-Agent request header value
Use a value that matches good practice recommended by pub.dev REST API doc.

https://github.com/dart-lang/pub/blob/master/doc/repository-spec-v2.md
2022-09-07 12:15:34 +02:00
Antoine Lambert
44560c2383 pubdev: Retrieve last publication date for each listed package
In order to get a last_update for each ListedOrigin sent to scheduler
database, send an extra HTTP request for each listed package to the
/api/packages/<package_name> endpoint of pub.dev API.

A pub.dev developer inform us that endpoint is heavily used and cached
so there is no particular issues to query that endpoint for each package
in a row periodically.
2022-09-02 16:50:12 +02:00
Antoine Lambert
49b79b0759 pubdev: Modify origin URL for listed packages
Use https://pub.dev/packages/<package_name> instead of
https://pub.dev/api/packages/<package_name>
2022-09-02 16:48:29 +02:00
Franck Bret
5410b6e3f3 Pub.dev lister for Dart and Flutter packages
Stateless lister for https://pub.dev based on http api to list package names
2022-08-26 10:24:08 +02:00