Commit graph

9 commits

Author SHA1 Message Date
Antoine Lambert
cee6bcb514 maven: Use BeautifulSoup instead of xmltodict for parsing pom files
xmltodict cannot parse POM files with multi-byte encoding so prefer to
use the XML parser of BeautifulSoup based on lxml instead.

Also drop xmltodict requirement as it is no longer used in swh-lister
codebase.
2022-08-09 11:11:45 +02:00
Franck Bret
a6f796b268 crates.lister: Implement incremental mode:
Add incremental mode support based on a 'last_commit' state, used to get
new package versions from git diff range of commits.
2022-08-05 13:41:57 +02:00
Antoine Lambert
2fa9f0abd2 sourceforge: Fix listing of bzr projects
Fix sourceforge origin URL for bzr projects,
http://project.bzr.sourceforge.net/bzrroot/project
redirects to http://project.bzr.sourceforge.net/bzr/project.

Handle bzr projects with multiple branches, one listed origin
must be created per branch.

Discard bzr projects that no longer exist from listing.
2022-04-21 18:19:07 +02:00
Boris Baldassari
8991c625ea lister: Add new maven lister
The Maven lister retrieves the maven central indexes, exports them in a
convenient text format, and parse them to identify all src archives and
pom files in the maven repository. Then the pom files are downloaded and
analysed to find and yield any scm reference.

Note: This is a new version of the maven lister diff D6133 which takes
into account the initial round of reviews.

Related to T1724
2021-11-29 17:33:13 +01:00
Antoine Lambert
2461c97bbb pypi: Use BeautifulSoup for parsing HTML instead of xmltodict
xmltodict now raises an error while trying to parse the HTML content
of https://pypi.org/simple/ page.

So use BeautifulSoup HTML parser instead as it is aleady a requirement
of swh-lister and it does not fail parsing the PyPI HTML page.

Also drop no longer used xmltodict in requirements.
2021-02-05 14:23:11 +01:00
Antoine Lambert
8933544521 Remove no longer used legacy Lister API and update CLI options
Legacy Lister classes from the swh.lister.core mdule are no longer
used in swh-lister codebase so it is time to remove them.

Also remove lister CLI options related to legacy Lister API.

As a consequence, the following requirements are no longer needed:
arrow, SQLAlchemy, sqlalchemy-stubs and testing.postgresql.

Closes T2442
2021-02-02 15:54:55 +01:00
Antoine Lambert
f862004700 launchpad: Reimplement lister using new Lister API
Port launchpad lister to the swh.lister.pattern.Lister API.

Last update date of each listed git repositories is now sent to the scheduler.

The lister can work in incremental mode, only modified repositories since
the last listing operation will be returned in that case.

Closes T2992
2021-01-28 15:22:40 +01:00
Antoine R. Dumont (@ardumont)
d251201251
debian.models: Migrate tests from storage to debian lister model
Related bb5d405
2019-11-14 10:28:15 +01:00
Stefano Zacchiroli
974f80f966 typing: minimal changes to make a no-op mypy run pass 2019-10-28 15:35:21 +01:00