lister: Add utility decorator to ease HTTP requests rate limit handling

Add swh.lister.utils.throttling_retry decorator enabling to retry a
function that performs an HTTP request who can return a 429 status code.

The implementation is based on the tenacity module and it is assumed
that the requests library is used when querying an URL.

The default wait strategy is based on exponential backoff.

The default max number of attempts is set to 5, HTTPError exception
will then be reraised.

All tenacity.retry parameters can also be overridden in client code.
This commit is contained in:
Antoine Lambert 2021-01-15 12:25:45 +01:00
parent c782275296
commit d1fbccd988
3 changed files with 174 additions and 5 deletions

View file

@ -1,9 +1,15 @@
# Copyright (C) 2018-2020 the Software Heritage developers
# Copyright (C) 2018-2021 the Software Heritage developers
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
from typing import Iterator, Tuple
from requests.exceptions import HTTPError
from requests.status_codes import codes
from tenacity import retry as tenacity_retry
from tenacity.stop import stop_after_attempt
from tenacity.wait import wait_exponential
def split_range(total_pages: int, nb_pages: int) -> Iterator[Tuple[int, int]]:
"""Split `total_pages` into mostly `nb_pages` ranges. In some cases, the last range can
@ -27,3 +33,73 @@ def split_range(total_pages: int, nb_pages: int) -> Iterator[Tuple[int, int]]:
if index != total_pages:
yield index, total_pages
def is_throttling_exception(e: Exception) -> bool:
"""
Checks if an exception is a requests.exception.HTTPError for
a response with status code 429 (too many requests).
"""
return (
isinstance(e, HTTPError) and e.response.status_code == codes.too_many_requests
)
def retry_attempt(retry_state):
"""
Utility function to get last retry attempt info based on the
tenacity version (as debian buster packages version 4.12).
"""
try:
attempt = retry_state.outcome
except AttributeError:
# tenacity < 5.0
attempt = retry_state
return attempt
def retry_if_throttling(retry_state) -> bool:
"""
Custom tenacity retry predicate for handling HTTP responses with
status code 429 (too many requests).
"""
attempt = retry_attempt(retry_state)
if attempt.failed:
exception = attempt.exception()
return is_throttling_exception(exception)
return False
WAIT_EXP_BASE = 10
MAX_NUMBER_ATTEMPTS = 5
def throttling_retry(
retry=retry_if_throttling,
wait=wait_exponential(exp_base=WAIT_EXP_BASE),
stop=stop_after_attempt(max_attempt_number=MAX_NUMBER_ATTEMPTS),
**retry_args,
):
"""
Decorator based on `tenacity` for retrying a function possibly raising
requests.exception.HTTPError for status code 429 (too many requests).
It provides a default configuration that should work properly in most
cases but all `tenacity.retry` parameters can also be overridden in client
code.
When the mmaximum of attempts is reached, the HTTPError exception will then
be reraised.
Args:
retry: function defining request retry condition (default to 429 status code)
https://tenacity.readthedocs.io/en/latest/#whether-to-retry
wait: function defining wait strategy before retrying (default to exponential
backoff) https://tenacity.readthedocs.io/en/latest/#waiting-before-retrying
stop: function defining when to stop retrying (default after 5 attempts)
https://tenacity.readthedocs.io/en/latest/#stopping
"""
return tenacity_retry(retry=retry, wait=wait, stop=stop, reraise=True, **retry_args)