From 60707a45ddfd86b27aacf95b6de62df3efb694f6 Mon Sep 17 00:00:00 2001 From: Antoine Lambert Date: Thu, 27 Oct 2022 16:55:34 +0200 Subject: [PATCH] pubdev: Update outdated lister documentation --- swh/lister/pubdev/__init__.py | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/swh/lister/pubdev/__init__.py b/swh/lister/pubdev/__init__.py index 310595f..06ca022 100644 --- a/swh/lister/pubdev/__init__.py +++ b/swh/lister/pubdev/__init__.py @@ -17,21 +17,21 @@ As of August 2022 `pub.dev`_ list 33535 package names. Origins retrieving strategy --------------------------- -To get a list of all package names we call `https://pub.dev/api/packages` endpoint. +To get a list of all package names we call `https://pub.dev/api/package-names` endpoint. There is no other way for discovery (no archive index, no database dump, no dvcs repository). -Page listing ------------- - -There is only one page that list all origins url based -on `https://pub.dev/api/packages/{pkgname}`. -The origin url corresponds to the http api endpoint that returns complete information -about the package versions (name, version, author, description, release date). - Origins from page ----------------- -The lister yields all origins url from one page. +The lister yields all origin urls from a single page. + +Getting last update date for each package +----------------------------------------- + +Before sending a listed pubdev origin to the scheduler, we query the +`https://pub.dev/api/packages/{pkgname}` endpoint to get the last update date +for a package (date of its latest release). It enables Software Heritage to create +new loading task for a package only if it has new releases since last visit. Running tests -------------