Commit graph

13 commits

Author SHA1 Message Date
Antoine Lambert
22bcd9deb2 maven: Create one origin per package instead of one per package version
Previously the maven lister was creating an origin for each source
archive (jar, zip) it discovered during the listing process.

This is not the way Software Heritage decided to archive sources
coming from package managers. Instead one origin should be created
per package and all its versions should be found as releases in the
snapshot produced by the package loader.

So modify the maven lister in order to create one origin per package
grouping all its versions.

This change also modifies the way incremental listing is handled,
ListedOrigin instances will be yielded only if we discovered new
versions of a package since the last listing.

Tests have been updated to reflect these changes.

Related to T3874
2022-04-29 10:57:04 +02:00
Antoine Lambert
334c54091e maven: Remove duplicated code related to setting instance from netloc
That processing is already handled in the base Lister class constructor.
2022-04-25 17:31:02 +02:00
Antoine R. Dumont (@ardumont)
10bb8db345
maven: Fix argument of type 'NoneType' is not iterable
Related to T3874
2022-04-14 15:33:24 +02:00
Antoine R. Dumont (@ardumont)
7c8428d01c
maven: Continue listing if unable to retrieve pom information
This aligns the behavior with other listers (e.g. sourceforge, ...) to continue listing
if some information is not retrievable at all.

Related to T3874
2022-04-13 17:59:20 +02:00
Antoine R. Dumont (@ardumont)
e4b27a1e98
maven: log error message when not able to retrieve the index to read
Without this, the lister legitimately cannot list anything.
2022-04-13 17:41:44 +02:00
Antoine Lambert
d38e05cff7 python: Reformat code with black 22.3.0
Related to T3922
2022-04-08 15:15:09 +02:00
Antoine R. Dumont (@ardumont)
7ff1390378
maven: Fix last update datetime
We need to avoid using naive datetime as this fails during conversion.

Related to T3746
Related to P1280
2022-02-09 16:59:40 +01:00
Boris Baldassari
d4e1e8212a maven: Fix undef last_update in ListedOrigins. 2022-02-08 07:51:01 +01:00
Boris Baldassari
24eeabfade maven: dismiss origins if they are malformed - e.g. wrong pom scm format, add test. 2022-02-08 07:51:01 +01:00
Antoine R. Dumont (@ardumont)
a599493b48
maven: Let logging instruction do the formatting 2022-01-25 11:32:23 +01:00
Antoine R. Dumont (@ardumont)
8667b04abc
maven: Add more debug logging instruction
And log the metadata dictionary.
2022-01-25 11:32:23 +01:00
Valentin Lorentz
fa7ecc8fbd maven: Pass the base URL of the Maven instance to the loader
I would like to use it as the metadata authority URI in the loader,
instead of '{p_url.scheme}://{p_url.netloc}/', which I do not think
is accurate, as it is possible to have multiple Maven instances at
the same netloc.
2021-12-07 13:51:00 +01:00
Boris Baldassari
8991c625ea lister: Add new maven lister
The Maven lister retrieves the maven central indexes, exports them in a
convenient text format, and parse them to identify all src archives and
pom files in the maven repository. Then the pom files are downloaded and
analysed to find and yield any scm reference.

Note: This is a new version of the maven lister diff D6133 which takes
into account the initial round of reviews.

Related to T1724
2021-11-29 17:33:13 +01:00