Add description in task_dict method because
the only metadata that can be found for a
package at CRAN is its decsription. That can
only br achived from the build in API in R,
which ister is already using. Hence instead of
getting metadata in loader, it is passed
by lister.
instead of converting that column as a string
As a side effect, bitbucket wise, we provided improperly the after query
parameter as a date not url encoded. This resulted in improper api response from
bitbucket's (we received from time to time the same next index as the current
one).
Related T1826
If nothing has been done prior to a full relisting, there is actually nothing
to list. So the relister in question does nothing.
In that context, the IndexingLister class's `db_partition_indices` method now
returns an empty list instead of raising a ValueError when there is nothing to
list.
Related T1826
Related e129e48
As phabricator is an "instance" lister (there exists multiple instances of
phabricator in the wild), we need to reference that information.
In effect, this aligns phabricator lister with for example the gitlab one.
Related T1801
Related P434
Prior to this commit, this expected the api.token to be provided at task
initialization. That behavior has been kept for cli purposes. It's no good for
production purposes though (as this leaks the credentials in the scheduler db).
So now, the credentials is fetched from the lister's configuration file as the
other listers do.
Another change is the authentication mechanism which is slighly different. It's
not using a basic `auth` mechanism. It's expecting an `api.token` query
parameter so the `request_params` is overriden to provide that.
Related T1809
This should have been removed along with the code in b816212.
The request authentication has been reworked so that all listers use the same
credentials dict.
Related b816212
Related T1772
As discussed in T1389 to ingest all packages using base loader, it need
a list of all the tarballs for a pakage.
Hence modifified lister to recursively list all the tarballs for a
package with their last updated time.
Previously gnu lister was using same code as that of tarball loader
to download, unzip and read tree.json file.
To make the code consise the downloading method is changed to
requests library.
Some of the new listers like GNU and CRAN do not
follow the conventional way of making an HTTP
request, hence they do not need some of the methods
which are usually needed by in conventional HTTP
request.
But those method are marked abstractmethod in the
core making them necessary to be present. So it is in
best to remove abstractmethod to increase the
readability of those listers.
also add a cli group named 'lister' for the sake of consistency with
other swh packages and rename the command as 'db-init', like:
swh lister db-init LISTER [...]
Summary:
Using order by and offset makes the partitioning a n^2 operation on the number
of entries in the table, rather than an instant operation when using
min/max.
This assumes the indexable column is more or less uniform, which is not exactly
true but not the worst approximation either.
Test Plan: tox
Reviewers: #reviewers, douardda
Reviewed By: #reviewers, douardda
Subscribers: douardda, swh-public-ci
Differential Revision: https://forge.softwareheritage.org/D1267