This is required to be able to make lister classes instanciation easier and more
reliable, especially in the context of cli tools like 'swh lister run', for which
we want to be able to specify any lister init argument as extra parameter of the
command.
Simplify the code:
- do only inherit from ListerBase
- implement HTTP queries directly using requests
- get rid of convoluted code
Make the origin_url gathered from the git repo's "project" page instead of
building it from the 'url_prefix' hack. Now, the lister WILL make substancially
more requests, since it will make one request per listed git repo, but
the provided origin_url should be pretty reliable now.
When several url are provided as clonable URLs, choose the http/https one first,
otherwise, choose the first one of the list.
Add proper tests for the cgit lister.
Also, get rid of the 'time_updated' column in the model.
Example use case:
swh lister run --lister gitlab \
--priority high \
--policy oneshot \
--db-url postgresql://postgres@localhost:5432/swh-listers \
api_baseurl=https://gitlab.ow2.org/api/v4/
Related T1919
Prior to this commit, the policy and priority were hard-coded.
The default values are now the old hard-coded values.
This will allow to develop a cli to trigger forges listing with oneshot policy
and some priority tasks. Thus ingesting those faster and without manual
interventation as we currently do.
There were previously no tests for the listers
which are using the class SimpleLister(like pypi)
Refractored test_lister.py of lister core to
accomodate tests for SimpleLister keeping the tests
undisturbed for other lister.
Remove the need to visit every page and extract the
origin url by introducing a parameter url_prefix.
The origin url is in format <prefix>/<repo_name> where
The prefix is same for all the repos for a particular
cgit instance.
Add description in task_dict method because
the only metadata that can be found for a
package at CRAN is its decsription. That can
only br achived from the build in API in R,
which ister is already using. Hence instead of
getting metadata in loader, it is passed
by lister.
instead of converting that column as a string
As a side effect, bitbucket wise, we provided improperly the after query
parameter as a date not url encoded. This resulted in improper api response from
bitbucket's (we received from time to time the same next index as the current
one).
Related T1826
If nothing has been done prior to a full relisting, there is actually nothing
to list. So the relister in question does nothing.
In that context, the IndexingLister class's `db_partition_indices` method now
returns an empty list instead of raising a ValueError when there is nothing to
list.
Related T1826
Related e129e48