Commit graph

58 commits

Author SHA1 Message Date
David Douard
553884fa56 docs: include the README file in the main index page
Convert README from markdown to ReST to make it embeddable in
docs/index.rst
2023-11-16 16:25:56 +01:00
Franck Bret
f8cfa05f3f Add Julia Lister for listing Julia Packages
This module introduce Julia Lister.
It retrieves Julia packages origins from the Julia General Registry, a Git
repository made of per package directory with Toml definition files.
2023-10-09 15:05:25 +02:00
Kumar Shivendu
88611642fc Introduce bioconductor lister 2023-09-28 12:54:37 +00:00
Franck Bret
5b4d15090f Remove fedora entry as it as been replaced with rpm 2023-09-20 16:48:06 +02:00
Franck Bret
2793ef9aad D lang lister
Add a dlang module that retrieve origins from an http api endpoint.
Each origin is a git based project url on github.com, gitlab.com or
bitbucket.com.
2023-09-19 16:08:59 +02:00
Antoine Lambert
95714f6f37 rpm: Turn fedora lister into a generic Red Hat based distribution one
As Red Hat based linux distributions share the same type of package repository,
rework the fedora lister into a generic one to list RPM source packages and
their versions from numerous distributions.

For a given distribution, the RPM lister will fetch packages metadata from a
list of release identifiers and a list of software components. Source packages
are then processed and relevant info are extracted to be sent to the RPM loader.
When all releases and components were processed, the lister collected all versions
for each package name and send those info to the scheduler that will create RPM
loading tasks afterwards.

Nevertheless, as there is no generic way to list all releases and components for
a given distribution but also to guess the right URL to retrieve packages metadata
from, those info need to be manually provided to the lister as input parameters.
Some examples of those parameters for various distributions can be found in the
config directory of the lister.

Regarding the produced origin URLs, as there is no way to find valid HTTP ones
for all distributions, the same behavior as with the debian lister is used and
they have the following form: rpm://{instance}/packages/{package_name} where
the instance variable corresponds to the name of the listed distribution such
as Fedora, CentOS, or openSUSE.

Related to swh/meta#5011.
2023-08-16 13:25:23 +00:00
Antoine R. Dumont (@ardumont)
56b4fcc760
Add stagit lister
That lister is really near the cgit & gitweb implementations. But the dom data is again
structured differently though so this implementation stands on its own.

Refs. swh/meta#5048
2023-07-13 11:50:51 +02:00
Antoine R. Dumont (@ardumont)
3ab856288c
Add gitiles lister
Gitiles instance returns voluntarily a malformed json output (json prefixed with
``)]}'\n``) [2]. The lister deals with it to properly parse the json response
nonetheless. It drops the prefix and then parses the json.

If at some point, they drop this prefix to return json directly, the lister will be able
to deal with it too. There are 2 tests one with 'standard' gitile format and another
with standard json to account for both case.

Refs. swh/meta#5045

[2] https://github.com/google/gitiles/issues/263
2023-07-13 10:30:51 +02:00
Antoine R. Dumont (@ardumont)
573958ce64
Add Gitweb lister
Depending on some instances, we have some specific heuristics, some instances:
- have summary pages which do not not list metadata_url (so some
  computation happens to list git:// origins which are cloneable)
- have summary page which reference metadata_url as a multiple comma separated urls
- lists relative urls of the repository so we need to join it with the main instance url
  to have a complete cloneable origins (or summary page)
- lists "down" http origins (cloning those won't work) so lists those as cloneable https
  ones (when the main url is behind https).

Refs. swh/devel/swh-lister#1800
2023-07-10 16:50:41 +02:00
Antoine Lambert
c81c473a83 pagure: Implement lister for pagure forges
Pagure is a git-centered forge, python based using pygit2.

Its REST API enables to easily list all projects hosted in an
instance so the lister implementation is quite simple.

Related to swh/meta#5043.
2023-06-23 09:02:49 +00:00
KShivendu
cfd9a693aa feat(hex): Use updated_after search query 2023-03-14 17:59:46 +00:00
KShivendu
a452995d95 feat: Add Hex.pm lister 2023-03-14 17:59:46 +00:00
KShivendu
6ad61aec23 feat(fedora): Introduce fedora lister
Summary: Lister to ingest fedora mirrors (.rpm)

Reviewers: #reviewers, vlorentz

Subscribers: vlorentz, olasd

Maniphest Tasks: T4448

Differential Revision: https://forge.softwareheritage.org/D8386
2022-11-15 15:53:52 +05:30
Antoine R. Dumont (@ardumont)
6d2e7aa178
nixguix: Register task
Related to T3781
2022-10-03 18:26:36 +02:00
Franck Bret
52ccf49e11 RubyGems: List origins from https://rubygems.org
Related T1777
2022-09-29 14:19:06 +02:00
Franck Bret
3928fc9ee9 Nuget: Lister for NuGet the package manager for .NET
Related T1718
2022-09-27 14:56:36 +02:00
Franck Bret
cd596eb2b4 Puppet: Lister for Puppet modules
The puppet lister retrieves origins from from https://forge.puppet.com/modules

Related T4519
2022-09-27 14:44:13 +02:00
Franck Bret
a4aec3894e Cpan: List Perl module origins from cpan.org
Related T2833
2022-09-27 14:29:33 +02:00
Franck Bret
6696a8424a Hackage: List origins from hackage.haskell.org, The Haskell Package Repository
Use http api point to get package names and build origin urls.
2022-09-27 14:22:03 +02:00
Franck Bret
8ff418fbc2 Conda: List origins for Anaconda, the package manager that provides tooling for datascience
Related T4547
2022-09-27 14:17:26 +02:00
Raphaël Gomès
60405e78ae Add non-incremental Golang modules lister
This uses https://index.golang.org. An associated loader will be sent in
the near future, as well as an incremental version of this lister.

[1] https://go.dev/ref/mod#goproxy-protocol
2022-08-30 14:32:02 +02:00
Franck Bret
ceae8c42b5 Bower: List origins from registry.bower.io 2022-08-29 15:55:00 +02:00
Franck Bret
5410b6e3f3 Pub.dev lister for Dart and Flutter packages
Stateless lister for https://pub.dev based on http api to list package names
2022-08-26 10:24:08 +02:00
Franck Bret
97b353bf0b Arch User Repository (AUR) lister
Add 'aur' module to swh-lister with data fixtures and tests.
For now, origin url are package vcs (Git) url.
2022-08-19 12:43:15 +02:00
KShivendu
d34a6232a6 gogs: Introduce Gogs lister 2022-08-03 16:22:06 +05:30
Franck Bret
1bf11aa26d Add arch lister module (origins from archives).
After a first attempt with D7812 this one use a different strategy to
retrieve origins.

Fetch and extract "core.files.tar.gz", "extra.files.tar.gz" and "community.files.tar.gz" from archives.archlinux.org. That step ensure that we have a list of "official" packages.
Parse metadata from 'desc' file to build origins url.
Scrap the origin url to get artifacts metadata that list all versions of a package.

It also fetch and extract unofficial 'arm' packages from archlinuxarm.org but in this case we can not get all versions of an arm package.

Related T4233
2022-06-15 09:11:57 +02:00
Franck Bret
fea6fc04aa lister: Add new rust crates lister
The Crates lister retrieves crates package for Rust lang.

It basically fetches https://github.com/rust-lang/crates.io-index.git
to a temp directory and then walks through each file to get the
crate's info.
2022-03-28 08:42:31 +02:00
Boris Baldassari
8991c625ea lister: Add new maven lister
The Maven lister retrieves the maven central indexes, exports them in a
convenient text format, and parse them to identify all src archives and
pom files in the maven repository. Then the pom files are downloaded and
analysed to find and yield any scm reference.

Note: This is a new version of the maven lister diff D6133 which takes
into account the initial round of reviews.

Related to T1724
2021-11-29 17:33:13 +01:00
zapashcanon
fe01d08cd9
add opam lister 2021-07-06 15:19:00 +02:00
Boris Baldassari
04c0a50706 tuleap: initialise lister.
tuleap-lister: fix args in test_task.

tuleap-lister: Add rate-limiting test + fix debug and typo.

tuleap-lister: code review: fix mocker + tests/setup_cli.

tuleap-lister: code review: fix relister > lister.

tuleap-lister: code review: fix test_task kwargs.

tuleap-lister: code review: Remove authentication useless lines + fix typos.

tuleap-lister: code review: improve results_simplified for svn repos.

tuleap-lister: code review: add name to CONTRIBUTORS file.

tuleap-lister: code review: Update tutorial for misc files to edit.

tuleap-lister: code review: Update copyright to 2021 exactly.

tuleap-lister: code review: Update py files perms -X.

tuleap-lister: code review: minimise json files.

tuleap-lister: code review: fix chmod on json files.

tuleap-lister: code review: fix var names + add tests.

tuleap-lister: code review: fix useless indirection.

tuleap-lister: code review: Add empty repo test, minor typo fixes.
2021-05-26 11:09:12 +02:00
Raphaël Gomès
f7b27c6930 Add a non-incremental sourceforge lister
Following zack's work on T735, this change introduces an actual SWH lister for
SourceForge.

SourceForge provides a main sitemap that lists sharded sitemaps, which
themselves list pages. Each page belongs to a project (or sub-project,
though those are rare), information about which can be found by querying
a REST API, which gives us the list of any and all VCS used for said
project. Both sitemaps and pages have a "last modified" timestamp that
will be used in a future patch to implement incremental listing.

More precise information can be found as inline comments or docstrings.
2021-03-23 18:40:21 +01:00
David Douard
d10f78d80c Adapt cli declaration entrypoint to swh.core 0.3 2020-09-23 17:42:00 +02:00
Antoine Lambert
22f7181294 python: Reorder imports with isort
Related to T2610
2020-09-17 17:48:27 +02:00
David Douard
7ef7425d55 Keep plugins sorted in setup.py 2020-08-25 18:29:36 +02:00
Nicolas Dandrimont
014c446d05 Switch over to setuptools-scm 2020-06-25 12:12:47 +02:00
Léni Gauffier
1408517c08 Added GiteaLister
Summary: Lister implementation for Gitea, works for (T2313). For now because of https://github.com/go-gitea/gitea/issues/9165 it would require setting its param limit to 50.

Reviewers: #reviewers, ardumont

Reviewed By: #reviewers, ardumont

Subscribers: ardumont

Differential Revision: https://forge.softwareheritage.org/D3107
2020-06-10 17:04:28 +02:00
Stefano Zacchiroli
566294749e setup.py: add documentation link 2020-04-29 18:31:45 +02:00
Antoine R. Dumont (@ardumont)
d2daed02af
setup: Update the minimum required runtime python3 version
Related to T2367
2020-04-20 17:29:15 +02:00
Léni Gauffier
58ef08b083 Added LaunchpadLister
Summary:
Related to T1734

From abandonned D2799

Reviewers: ardumont

Reviewed By: ardumont

Differential Revision: https://forge.softwareheritage.org/D2974
2020-04-12 01:00:12 +02:00
David Douard
93a4d8b784 Enable black
- blackify all the python files,
- enable black in pre-commit,
- add a black tox environment.
2020-04-08 16:31:22 +02:00
Nicolas Dandrimont
62dc4dc257 Use pkg_resources to get the package version instead of vcversioner 2019-11-22 15:49:23 +01:00
Antoine R. Dumont (@ardumont)
85d001067a
setup.py: Kill deprecated swh-lister command
Prior to this commit, the pip activation environment failed because the old cli
name no longer exists, it's named 'lister' now.
2019-09-20 11:11:27 +02:00
David Douard
e3c0ea9d90 implement listers as plugins
Listers are declared as plugins via the `swh.workers` entry_point.

As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:

- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
  (typically, create/initialize the database for this lister).
  If not set, the default implementation creates database tables (after
  optionally having deleted exisintg ones) according to models declared in
  the `models` register field.

There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).

Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
  level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
  initializing (especially with --drop-tables) the database for a single
  lister is unreliable, since all tables are created using a sibgle MetaData
  (in the same namespace).
2019-09-03 15:02:24 +02:00
Antoine R. Dumont (@ardumont)
c507948da8
bin: Drop dead code 2019-06-28 18:17:15 +02:00
David Douard
d6169c7141 cli: register the 'lister' cli subcommand
also add a cli group named 'lister' for the sake of consistency with
other swh packages and rename the command as 'db-init', like:

  swh lister db-init LISTER [...]
2019-05-22 13:36:57 +02:00
David Douard
b8381d8a1b Add an entry point for the cli command 2019-02-06 10:21:50 +01:00
David Douard
9da0bd26eb setup: kill remaining nose whims to be around 2018-10-30 12:17:25 +01:00
David Douard
788efd31aa Fix a typo in setup.py 2018-10-23 19:09:24 +02:00
David Douard
9a205b1938 setup: prepare for pypi upload
related to T1242
2018-10-08 11:27:54 +02:00
Antoine Pietri
f65a3672bb Add requirements-test.txt 2018-10-05 12:23:15 +02:00