Commit graph

4 commits

Author SHA1 Message Date
Antoine Lambert
4f6b3f3f09 conda: Yield listed origins after all artifacts in a page are processed
swh-scheduler will deduplicate listed origins according to their URL
and visit type but not according to their extra loader arguments.

Previously, listed origins were yielded after each processed artifact
in a page so we could lose some package version info due to the
deduplication process.

So ensure to yield listed origins once all artifacts in a page have
been processed.
2022-10-25 10:49:52 +02:00
Franck Bret
6f40d2c1a5 Conda: switch artifacts from dict to list
'artifacts' extra_loader_arguments should be a list
2022-09-30 15:55:53 +02:00
Antoine Lambert
8d85b2e4e8 pattern: Ensure accurate origin counts returned by run method
Previously, the run method was returning the total count of ListedOrigin
objects sent to scheduler database.

However, some listers can send multiple ListedOrigin objects for a given
origin URL during the listing process, for instance when an origin is
contained in multiple pages (e.g. gogs listing) or when the listing
is gathering multiple versions of an origin spread across multiple
pages (e.g. maven listing).

This changes ensures an accurate count of listed origins by maintaining
a set of origin URLs associated to the sent ListedOrigin objects.
2022-09-29 11:14:08 +02:00
Franck Bret
8ff418fbc2 Conda: List origins for Anaconda, the package manager that provides tooling for datascience
Related T4547
2022-09-27 14:17:26 +02:00