The Attic folder that can sometimes be found in a CVS respository
is a special one used by CVS to store RCS files and should not be
considered as a valid module name when listing CVS projects.
This aligns the behavior with other listers (e.g. sourceforge, ...) to continue listing
if some information is not retrievable at all.
Related to T3874
The Crates lister retrieves crates package for Rust lang.
It basically fetches https://github.com/rust-lang/crates.io-index.git
to a temp directory and then walks through each file to get the
crate's info.
Commit 6a7479553e modified the origin URLs for CVS projects
hosted on SourceForge but it also broke incremental listing
due to a no longer valid assertion, so fix that issue.
CVS projects are different from other VCS ones, they use the rsync
protocol, a list of modules needs to be fetched from an info page
and multiple origin URLs can be produced for a same project.
Related to T3789
Prior to this commit, the listing could fail when either reading a page or the page of
results (lauchpad api raises RestfulError). This now retries when those kind of
exceptions happen. If the error persists (after multiple tryouts and exponential
backoff), the listing continues nonetheless (with warning logs).
Note that if the page ends up being empty, it's no longer accounted for.
This actually allows the listing to finish in case of issues.
Related to T3945
Bazaar support was removed a long time ago and predates a lot of the new
mechanisms in place in the API. Unfortunately, it looks like a lot of
the URLs are offline now, but there are still a few projects that can be
listed, this is pretty low-effort.
I would like to use it as the metadata authority URI in the loader,
instead of '{p_url.scheme}://{p_url.netloc}/', which I do not think
is accurate, as it is possible to have multiple Maven instances at
the same netloc.
A debian package can have sources coming from multiple suites
so we need to ensure to update the last_update field in the
ListedOrigin model if the current processed suite has a greater
modification time for its sources index.
Related to T2400
Use the value of the "Last-Modified" header from the HTTP response
resulting of the debian sources index HTTP request.
It will prevent to create loading tasks for debian packages with no
changes since last listing.
Related to T2400
All debian suites do not necessarily have the same set of components.
So prefer to log that a component is missing for a suite instead of
raising an excption that will stop the listing.
For a given package, the debian lister generates a dictionary mapping
distribution and version to a list of files to be processed by the
debian loader.
For each file to process, the debian loader expects to find an URI
in order to download it and then use its content to ingest package
source code into the archive.
However, it turns out these URIs were not computed by the lister
in its current implementation making any debian loading task fail
due to these missing info.
So add the computation of these URIS and ensure they will be provided
in the debian loader input parameters.
Related to T2400
In some circumstances, GitHub will return two separate repos with the
same html_url in the same page. This makes the lister fail with a
cardinality error.
The Maven lister retrieves the maven central indexes, exports them in a
convenient text format, and parse them to identify all src archives and
pom files in the maven repository. Then the pom files are downloaded and
analysed to find and yield any scm reference.
Note: This is a new version of the maven lister diff D6133 which takes
into account the initial round of reviews.
Related to T1724
That avoids having multiple distinct opam root directories per opam lister instance. The
current opam commands used by the lister are actually listing specifically per instance.
Related to P1171
We should avoid side-effects in the constructor as much as possible. That avoids
surprising behavior at object instantiation time. The state if needed must be
initialized into the `swh.lister.pattern.Lister.get_pages` method, as preconized in the
class docstring.
This also fixes the current test that actually bootstrap a real opam local "clone" in
/tmp.
Related to T3590