pattern: Improve handling of max_origins_per_page parameter
Instead of fully consuming the get_origins_from_page generator into a list and truncate it, prefer to consume the generator origin per origin and abort the process when the max number of origin per page is reached. Indeed some non trivial listers like the cgit one can perform costly processing, HTTP request for instance, for each origin in a page. So better not consuming the full generator in a row to avoid such side effects.
This commit is contained in:
parent
45bbc29a52
commit
35871896b2
1 changed files with 14 additions and 11 deletions
|
@ -182,17 +182,20 @@ class Lister(Generic[StateType, PageType]):
|
|||
try:
|
||||
for page in self.get_pages():
|
||||
full_stats.pages += 1
|
||||
origins = list(self.get_origins_from_page(page))
|
||||
if (
|
||||
self.max_origins_per_page
|
||||
and len(origins) > self.max_origins_per_page
|
||||
):
|
||||
logger.info(
|
||||
"Max origins per page set, truncated %s page results down to %s",
|
||||
len(origins),
|
||||
self.max_origins_per_page,
|
||||
)
|
||||
origins = origins[: self.max_origins_per_page]
|
||||
origins = []
|
||||
for origin in self.get_origins_from_page(page):
|
||||
origins.append(origin)
|
||||
if (
|
||||
self.max_origins_per_page
|
||||
and len(origins) == self.max_origins_per_page
|
||||
):
|
||||
logger.info(
|
||||
"Max origins per page set to %s and reached, "
|
||||
"aborting page processing",
|
||||
self.max_origins_per_page,
|
||||
)
|
||||
break
|
||||
|
||||
if not self.enable_origins:
|
||||
logger.info(
|
||||
"Disabling origins before sending them to the scheduler"
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue