Commit graph

7 commits

Author SHA1 Message Date
Raphaël Gomès
9ca5295a40 sourceforge: retry for all retryable exceptions
Since this lister is doing a lot more requests than most other, it makes
sense that issues would arise more often. We want the lister to continue
even if the website is having issues and not break on the first 500 or
closed connection it encounters.

This change introduces a mechanism to retry all exceptions worth
retrying and uses it for the SourceForge lister. Other listers might
benefit from this, but this is out of scope here.

Tests had to be adjusted to stub the sleep function since retries happened
way more often.
2021-05-26 12:05:39 +02:00
Antoine Lambert
d1fbccd988 lister: Add utility decorator to ease HTTP requests rate limit handling
Add swh.lister.utils.throttling_retry decorator enabling to retry a
function that performs an HTTP request who can return a 429 status code.

The implementation is based on the tenacity module and it is assumed
that the requests library is used when querying an URL.

The default wait strategy is based on exponential backoff.

The default max number of attempts is set to 5, HTTPError exception
will then be reraised.

All tenacity.retry parameters can also be overridden in client code.
2021-01-18 11:28:51 +01:00
Antoine R. Dumont (@ardumont)
d1ce9b0912
tox/pytest: Activate doctest-module
Related to D3899#96289
2020-09-10 11:47:47 +02:00
Antoine R. Dumont (@ardumont)
e3c856b5ee
utils.split_range: Split into not overlapping ranges
Existing listers use the `is_within_bound` [1] method from the base lister.
This method uses inclusive boundaries in all cases.

As some "range" task listers [2] [3] are using `split_range` function to create
"overlapping" ranges, this can cause concurrent insert issues down the line [4].

This commit adapts the function `split_range` to make the generated ranges no
longer overlap.

[1]
https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/core/lister_base.py$194-199

[2]
https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitlab/tasks.py$37-41

[3]
https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitea/tasks.py$36-41

Related to T2577
2020-09-10 11:01:44 +02:00
Antoine R. Dumont (@ardumont)
e24cf1d4f1
gitlab.lister: Simplify retrieving headers information
As response headers' keys are case-insensitive and requests does the
aggregation magic.

[1] http://docs.python-requests.org/en/master/user/quickstart/#response-headers
2018-07-16 13:57:34 +02:00
Antoine R. Dumont (@ardumont)
81fd5f9c5d
swh.lister.gitlab.tasks: Fix range computations 2018-07-12 14:23:14 +02:00
Antoine R. Dumont (@ardumont)
13bb7aca58
swh.lister.gitlab: Improve headers extraction 2018-07-11 18:09:18 +02:00