Commit graph

9 commits

Author SHA1 Message Date
David Douard
8d9deeb8f8 plugins: add support for scheduler's task-type declaration
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:

- only create missing task-types (do not update them), but check that the
  backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
  plugin registry field will be checked and added if needed; tasks which name
  start wit an underscore will not be added,
- added task-type will have:
  - the 'type' field is derived from the task's function name (with underscores
    replaced with dashes),
  - the description field is the first line of that function's docstring,
  - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
    a simple pattern matching to have decent default values for full/incremental
    tasks),
  - these default values can be overloaded via the 'task_type' plugin registry
    entry.

For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).

Comes with some tests.
2019-09-04 15:36:08 +02:00
David Douard
e3c0ea9d90 implement listers as plugins
Listers are declared as plugins via the `swh.workers` entry_point.

As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:

- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
  (typically, create/initialize the database for this lister).
  If not set, the default implementation creates database tables (after
  optionally having deleted exisintg ones) according to models declared in
  the `models` register field.

There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).

Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
  level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
  initializing (especially with --drop-tables) the database for a single
  lister is unreliable, since all tables are created using a sibgle MetaData
  (in the same namespace).
2019-09-03 15:02:24 +02:00
Antoine R. Dumont (@ardumont)
87d2a16df0
listers: Allow to override policy and priority for scheduled tasks
Prior to this commit, the policy and priority were hard-coded.
The default values are now the old hard-coded values.

This will allow to develop a cli to trigger forges listing with oneshot policy
and some priority tasks. Thus ingesting those faster and without manual
interventation as we currently do.
2019-08-28 11:57:10 +02:00
Archit Agrawal
f76b96b825 swh.lister.gnu: Change origin type to tar
Change origin type from 'gnu' to 'tar'
2019-06-19 17:21:02 +05:30
Valentin Lorentz
aef7d5952e Remove columns 'description' and 'origin_id'.
They are useless.
2019-06-19 10:29:15 +02:00
Archit Agrawal
7c6245e663 swh.lister.gnu: Add function to check for file extension.
Added a function which will derive the extension from filename
and check if the fie extension match the type of file that is to be
archived.
2019-06-11 15:12:53 +05:30
Archit Agrawal
709ba8a6e5 swh.lister.gnu: Add functionality to list all the tarballs for a package.
As discussed in T1389 to ingest all packages using base loader, it need
a list of all the tarballs for a pakage.
Hence modifified lister to recursively list all the tarballs for a
package with their last updated time.
2019-06-08 21:56:00 +05:30
Archit Agrawal
ebdb959823 swh.lister.gnu : Change download method of tree.json file to request
Previously gnu lister was using same code as that of tarball loader
to download, unzip and read tree.json file.
To make the code consise the downloading method is changed to
requests library.
2019-06-08 21:56:00 +05:30
Archit Agrawal
151f6cd223 swh.lister.gnu
Implement first pass of gnu lister to list all the
packages present in https://ftp.gnu.org/
Add GNU lister in README and cli.py

Closes T1722
2019-06-08 21:56:00 +05:30