Ensure that all lister classes have the same set of mandatory parameters
in their constructors, notably: scheduler, url, instance and credentials.
Add a new test checking listers classes have mandatory parameters declared
in their constructors. The purpose is to avoid deployment issues on staging
or production environment as celery tasks can fail to be executed if mandatory
parameters are not handled by listers.
Reated to swh/infra/sysadm-environment#5030.
Prior to this commit, all listers were instantiated at the same time even if
only one was needed. This commit separates those instantiations.
The only drawback to this is the db model initialization which now happens at
each lister instantiation. This can be dealt with if needed at another time
though.
The following commit adapts the return statements from both lister and their
associated tasks. This standardizes on what other modules (e.g. both dvcs and
package loaders) do.
This also adds an swh-listers fixture which allows to retrieve a test ready
lister from its name (e.g gnu). Those listers have access to a scheduler
fixture so we can check the listing output from the scheduler instance.
Loader Task signature for the loader gnu is now:
- args:
- package
- package urls
- kwargs:
tarballs: List of Dict with keys archive (unchanged), 'time' (was 'date'),
length (new)
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:
- only create missing task-types (do not update them), but check that the
backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
plugin registry field will be checked and added if needed; tasks which name
start wit an underscore will not be added,
- added task-type will have:
- the 'type' field is derived from the task's function name (with underscores
replaced with dashes),
- the description field is the first line of that function's docstring,
- default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
a simple pattern matching to have decent default values for full/incremental
tasks),
- these default values can be overloaded via the 'task_type' plugin registry
entry.
For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).
Comes with some tests.
Listers are declared as plugins via the `swh.workers` entry_point.
As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:
- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
(typically, create/initialize the database for this lister).
If not set, the default implementation creates database tables (after
optionally having deleted exisintg ones) according to models declared in
the `models` register field.
There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).
Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
initializing (especially with --drop-tables) the database for a single
lister is unreliable, since all tables are created using a sibgle MetaData
(in the same namespace).
Prior to this commit, the policy and priority were hard-coded.
The default values are now the old hard-coded values.
This will allow to develop a cli to trigger forges listing with oneshot policy
and some priority tasks. Thus ingesting those faster and without manual
interventation as we currently do.
As discussed in T1389 to ingest all packages using base loader, it need
a list of all the tarballs for a pakage.
Hence modifified lister to recursively list all the tarballs for a
package with their last updated time.
Previously gnu lister was using same code as that of tarball loader
to download, unzip and read tree.json file.
To make the code consise the downloading method is changed to
requests library.