When relisting an opam instance and the opam root directory is already
populated, the '--set-default' parameter must be provided otherwise the
following error is reported:
No switch is currently set. Please use 'opam switch' to set or install a switch
Related to swh/infra/sysadm-environment#4971.
Use subprocess.run instead of subprocess.call and subprocess.Popen to
call opam commands and set check parameter to True in order to raise
CalledProcessError exception when an opam command failed.
This should help spotting issues with the opam lister.
Related to swh/infra/sysadm-environment#4971.
In contrary of gitea listing which does not require to provide the q query
parameter, it is required for the gogs case.
After reading the gogs source code, the /repos/search endpoint generates
a sql request of the form: "SELECT * FROM repos WHERE name LIKE '%{q}%'".
By setting the q parameter value to "_", the LIKE clause acts as a
wildcard and all repositories are ensured to be returned.
Fixes#4698.
Pagure is a git-centered forge, python based using pygit2.
Its REST API enables to easily list all projects hosted in an
instance so the lister implementation is quite simple.
Related to swh/meta#5043.
The default behavior of subprocess is to pull executables from a
hardcoded list, which doesn't work when opam is installed manually in
the user's home directory.
mypy doesn't catch that multiple uses of
`self.listed_origins[origin_url]` in the same statement should be identical.
Using a temporary local variable for it seems to help.
Prior to this, it was sending only 'directory' types for all vcs trees. Multiple
directory loaders now exist whose visit type are currently diverging, so the scheduling
would not happen correctly without it. This commit is the required adaptation for the
scheduling to work appropriately.
Refs. swh/meta#4979
This pushes the rather elementary logic within the lister's scope. This will simplify
and unify cli call between lister and scheduler clis. This will also allow to reduce
erroneous operations which can happen for example in the add-forge-now.
With the following, we will only have to provide the type and the instance, then
everything will be scheduled properly.
Refs. swh/devel/swh-lister#4693
Starting with the first url. As soon as one detection succeeds, this stops and yields
the result. Otherwise, continue with the detection on the next mirror url.
This should fix the current misbehavior [1] when multiple mirror urls are not ok but the
first one is.
[1] https://gitlab.softwareheritage.org/swh/infra/sysadm-environment/-/issues/4868#note_137483
Refs. swh/infra/sysadm-environment#4868
Instead of fully consuming the get_origins_from_page generator into
a list and truncate it, prefer to consume the generator origin per
origin and abort the process when the max number of origin per page
is reached.
Indeed some non trivial listers like the cgit one can perform costly
processing, HTTP request for instance, for each origin in a page.
So better not consuming the full generator in a row to avoid such
side effects.
This unifies with other lister tasks modules. And this allow the cgit task to
be scheduled by the add-forge-now scheduler cli.
Refs. swh/infra/sysadm-environment#4813
Some URLs of the repositories endpoint from BitBucket REST API 2.0
can return an error 500. In that case, skip the buggy repositories
page and get next one to continue listing and avoid to end it
prematurely.
Related to #4239
requests_ratelimited fixture from swh-core was renamed to
github_requests_ratelimited.
remaining_requests parameter was added to the github_response_callback
function from swh-core, making it no longer compatible with requests_mock
callback for json responses.
In order to remove warnings about /apidoc/*.rst files being included
multiple times in toc when building full swh documentation, prefer to
include module indices only when building standalone package documentation.
Also include them the proper sphinx way.
Related to T4496
Some GitLab instances use specific namespaces for transient repositories
that it doesn't make sense to archive (for example, gitlab.org has a set
of QA namespaces used for integration testing of their production
deployments; drupal has an `issues/` namespace with forks of repos that
are only used for collaboration on merge requests, and aren't that
useful to be archived).
This cuts down one more manual step in the add forge now validation
process: we can add the relevant origins to the staging scheduler
without enabling them at all.
This will allow more automation of the staging add forge now process:
for known-good listers, we can limit the number of origins being
processed and reduce the amount of manual steps taken for each instance.
The SQL dump contains ownership instructions that can't be run if you
don't have the right users in your database clusters. When someone has a
psqlrc with ON_ERROR_STOP, this fails the load of the dump.
Use the opportunity to trigger an exception when psql returns a non-zero
exit code, rather than continue with an empty/inconsistent database.
In a similar way to the debian lister, use the following versions in the
packages dictionary provided to the generic rpm loader:
- dict keys are package versions prefixed by the fedora release and
edition they have been found (fedora{release}/{edition}/{version}),
they will be used as branch names targeting releases in the snapshot
created by the rpm loader
- version fields in dict values are the package intrinsic versions parsed
from package repository metadata excluding any ".fcXY" suffixes to avoid
the loader to create multiple releases targeting the same directory,
they will be used as release names in the snapshot created by the rpm
loader
Related to T4448