Commit graph

711 commits

Author SHA1 Message Date
tenma
b6a69b2ed9 gitea.lister: improve handling of credentials 2021-01-25 15:54:06 +01:00
tenma
c220d7d299 gitea.tests: split and make them more thorough 2021-01-25 15:54:06 +01:00
tenma
c780ad4b44 Reimplement Gitea lister using new Lister API
The lister is stateless and has full listing capability.
It can request the Gitea API using HTTP token authentication.
Rate-limiting was not encountered but is handled generically.
Added support for getting repo last update date through API.
2021-01-25 15:54:06 +01:00
tenma
7892077a89 tests.cli: add Gitea lister mandatory params 2021-01-25 15:54:06 +01:00
Antoine R. Dumont (@ardumont)
02871f16c9
gitlab: Adapt celery task implementations to the new lister api
Related to T2987
2021-01-25 15:08:31 +01:00
Vincent SELLIER
e4a590fc7f
Port cgit lister to the new lister api
Related to T2984
2021-01-25 14:57:45 +01:00
Antoine Lambert
59c9abb916 bitbucket: Pick random credentials in configuration and improve logging
Use random credentials from the list in configuration and improve related
logging messages.
2021-01-25 14:34:22 +01:00
Antoine R. Dumont (@ardumont)
ce87a8f7b2
gitlab: Let the lister compute the internal project listing page
Related to T2987
2021-01-25 14:05:34 +01:00
Antoine R. Dumont (@ardumont)
7f1609265f
test: Rename internal method to something public
It's used in multiple module tests now.
2021-01-25 13:39:07 +01:00
Antoine R. Dumont (@ardumont)
d3fe3d5747
gitlab: Fix mypy issue 2021-01-25 13:39:07 +01:00
Antoine R. Dumont (@ardumont)
2246d28606
gitlab: Document the lister constructor parameters
Related to T2987
2021-01-25 13:30:45 +01:00
Antoine R. Dumont (@ardumont)
b352b8e11e
gitlab: Add test on rate-limit support
Related to T2987
2021-01-25 09:23:22 +01:00
Antoine R. Dumont (@ardumont)
1f911401a1
gitlab: Add test on incremental implementation
Note that the current implementation will start back the new visit from the last
next_page link seen (that's what is stored in the lister state to avoid computing back
the url). This means that this page will be seen at least 2 times, on the first visit
and on the next. This should not pose any problems as the listing is idempotent.

Related to T2987
2021-01-25 08:51:23 +01:00
Antoine R. Dumont (@ardumont)
84dd616ab6
gitlab: Add test on pagination
Related to T2987
2021-01-25 08:51:23 +01:00
Antoine R. Dumont (@ardumont)
1390a513f2
gitlab: Port to the new lister api
Related to T2987
2021-01-25 08:51:16 +01:00
Antoine Lambert
ff232f0d91 npm: Reimplement lister using new Lister API
Port npm lister to `swh.lister.pattern.Lister` API.

As before, the lister can be run in full or incremental mode.
When using incremental mode, only new and modified packages will
be returned since the last incremental listing process.
Otherwise, all packages will be listed in lexicographical order.

One major improvement to be noted, latest package update date
is now retrieved when available and sent to scheduler database.

Closes T2972
2021-01-22 10:58:52 +01:00
tenma
5411141e3a gitlab.tests: fix erroneous import from gitea module 2021-01-21 16:17:36 +01:00
tenma
6d22007946 pypi.tests: simplify requests_mock invocation 2021-01-21 16:17:29 +01:00
tenma
62c825b8cb Reimplement PyPI lister using new Lister API
The new lister has only full listing capability.
It scrapes pypi.org list of packages.
Rate-limiting was not encountered but is handled generically.
2021-01-20 15:45:16 +01:00
tenma
565e7423e3 Reimplement Bitbucket lister using new Lister API
The new lister has incremental and full listing capability.
It can request the Bitbucket API in anonymous and HTTP basic authentication
modes. Rate-limiting is not aggressive and is handled.
2021-01-20 15:28:34 +01:00
Antoine Lambert
9fd91f007d pattern: Fix and improve config overriding in from_configfile method
Fix error when a configuration value loaded from a config file is also
given as keyword parameter to the from_configfile method.

Override configuration loaded from config file only if the provided
value is not None.
2021-01-18 17:55:53 +01:00
Antoine Lambert
a41c03e4c8 phabricator: Ensure request errors are raised as exceptions
This ensures that a celery task will be marked as failed if a request error
happens when listing origins.
2021-01-18 12:11:26 +01:00
Antoine Lambert
b743c36496 phabricator: Add test for new lister implementation
Also remove no longer used JSON files.
2021-01-18 12:11:26 +01:00
Antoine Lambert
d691c04eb8 phabricator: Allow to pass forge base URL as lister parameter 2021-01-18 12:11:26 +01:00
Antoine Lambert
d1fbccd988 lister: Add utility decorator to ease HTTP requests rate limit handling
Add swh.lister.utils.throttling_retry decorator enabling to retry a
function that performs an HTTP request who can return a 429 status code.

The implementation is based on the tenacity module and it is assumed
that the requests library is used when querying an URL.

The default wait strategy is based on exponential backoff.

The default max number of attempts is set to 5, HTTPError exception
will then be reraised.

All tenacity.retry parameters can also be overridden in client code.
2021-01-18 11:28:51 +01:00
Antoine Lambert
c782275296 phabricator/tasks: Fix task function return type
Previously, the following error was raised when the task has finished
its execution: "Object of type ListerStats is not JSON serializable".

So ensure ListerStats object gets converted to dict before returning it.

Also add missing test for task function.
2021-01-11 17:59:24 +01:00
Antoine Lambert
b48f71ff93 phabricator/tasks: Allow to pass api_token as optional parameter
This is useful when one wants to test the lister in docker environment.
2021-01-11 17:53:11 +01:00
Nicolas Dandrimont
9944295729 Implement phabricator lister using the new pattern class 2021-01-11 11:00:29 +01:00
Nicolas Dandrimont
734901747b Implement a base pattern for listers with no state storage 2021-01-11 11:00:29 +01:00
Nicolas Dandrimont
b63aa83b41 Reimplement the GitHub lister using the new pattern class
This replaces the test data with some manually generated answers, which allows
us to test a few more cases for instantiating the lister.

This also expands test coverage to test behavior on rate-limited requests.
2021-01-11 11:00:29 +01:00
Nicolas Dandrimont
f1eabc5283 Add a helper to instantiate a new-style lister from a config file
This helper will be used in the task entry points.
2021-01-11 11:00:29 +01:00
Nicolas Dandrimont
525fc0102d Hook up listers implemented with the new pattern to the CLI
We stop depending on the ListerBase implementation. The main hoop we're jumping
through is the config override mechanism in swh.lister.get_lister, as it's
really specifc to the ListerBase `override_config` argument, which is dropped in
pattern.Lister (in favor of explicit arguments at lister instantiation).

We implement a small shim in swh.lister.pattern.Lister to give
backwards-compatibility for the new pattern to get_lister.

This generic configuration override mechanism will probably be completely
removed when the configuration mechanism is reworked. We'll see.
2021-01-11 11:00:29 +01:00
Nicolas Dandrimont
9e083c1eea Introduce a simpler base pattern for lister implementations.
This new pattern uses the lister support features introduced in swh.scheduler to
replace the database management done in previous iterations of the listers.
2021-01-11 11:00:29 +01:00
Antoine R. Dumont (@ardumont)
884585034b
lister.phabricator.tests: Add task-type related to lister 2020-10-30 13:30:15 +01:00
Antoine R. Dumont (@ardumont)
243f3573bc
lister.npm.tests: Clarify lister configuration 2020-10-30 13:30:15 +01:00
Antoine R. Dumont (@ardumont)
36bc51cec5
lister.gnu.tests: Clarify lister configuration 2020-10-30 13:30:15 +01:00
Antoine R. Dumont (@ardumont)
20a91482ca
lister.gitlab.tests: Clarify lister configuration 2020-10-30 13:30:15 +01:00
Antoine R. Dumont (@ardumont)
978fbbe029
lister.github.tests: Clarify lister configuration 2020-10-30 13:30:15 +01:00
Antoine R. Dumont (@ardumont)
def0eb5060
lister.gitea.tests: Clarify lister configuration 2020-10-30 13:30:15 +01:00
Antoine R. Dumont (@ardumont)
47ccce3817
lister.cran.tests: Clarify lister configuration 2020-10-30 13:30:14 +01:00
Antoine R. Dumont (@ardumont)
5d4b38999d
lister.cgit.tests: Clarify lister configuration 2020-10-30 13:30:14 +01:00
Antoine R. Dumont (@ardumont)
c5af0efbf4
lister.bitbucket.tests: Clarify lister configuration 2020-10-30 13:24:38 +01:00
Antoine R. Dumont (@ardumont)
f09b9f8cfa
lister.launchpad.tests: Clarify lister configuration 2020-10-30 13:24:38 +01:00
Antoine R. Dumont (@ardumont)
b90ffa4bdd
tests: Reduce db initialization fixtures to a minimum 2020-10-30 13:24:38 +01:00
Antoine R. Dumont (@ardumont)
b35dff1266
Create listing task with a default of 3 if unspecified
This will allow to retry task if they do not specify it.
2020-10-30 09:10:27 +01:00
Antoine R. Dumont (@ardumont)
e2a861c801
debian.tests: Fix test
The scheduler fixture introduced truncates tables in between tests. The debian
tests unfortunately share state and it broke when that changed. This fixes the
test by avoiding the truncation of the scheduler db table "task".

Ideally those tests need to be reworked to avoid sharing state between tests.

[1] https://jenkins.softwareheritage.org/job/DLS/job/tests/1043
2020-10-30 09:09:56 +01:00
Antoine R. Dumont (@ardumont)
a19cb5fb51
lister.pytest_plugin: Simplify fixture setup 2020-10-21 17:46:52 +02:00
Antoine R. Dumont (@ardumont)
d7d38090f5
lister.config: Adapt scheduler configuration structure 2020-10-19 09:42:16 +02:00
Antoine R. Dumont (@ardumont)
30ad6200a2
Drop mock_get_scheduler which creates indirection for no good reason
This is no longer useful, as removing it and tests are still ok.
2020-10-16 14:32:46 +02:00
Antoine R. Dumont (@ardumont)
2ff92ef406
lister_base: Drop leftover mixin SWHConfig which is no longer used
Related to D4178
2020-10-07 14:00:57 +02:00