Commit graph

11 commits

Author SHA1 Message Date
David Douard
b810876ef8 tasks: normalize the url argument name of most lister
Since all the listing tasks accepts an url as first argument (whatever the
argument name is), it makes sense to use a simple common argument name for
this. I've chosen 'url' instead of api_baseurl/forge_url/url.

Also kill now useless `new_lister()` functions.
2019-09-04 15:38:01 +02:00
David Douard
8d9deeb8f8 plugins: add support for scheduler's task-type declaration
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:

- only create missing task-types (do not update them), but check that the
  backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
  plugin registry field will be checked and added if needed; tasks which name
  start wit an underscore will not be added,
- added task-type will have:
  - the 'type' field is derived from the task's function name (with underscores
    replaced with dashes),
  - the description field is the first line of that function's docstring,
  - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
    a simple pattern matching to have decent default values for full/incremental
    tasks),
  - these default values can be overloaded via the 'task_type' plugin registry
    entry.

For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).

Comes with some tests.
2019-09-04 15:36:08 +02:00
David Douard
e3c0ea9d90 implement listers as plugins
Listers are declared as plugins via the `swh.workers` entry_point.

As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:

- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
  (typically, create/initialize the database for this lister).
  If not set, the default implementation creates database tables (after
  optionally having deleted exisintg ones) according to models declared in
  the `models` register field.

There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).

Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
  level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
  initializing (especially with --drop-tables) the database for a single
  lister is unreliable, since all tables are created using a sibgle MetaData
  (in the same namespace).
2019-09-03 15:02:24 +02:00
David Douard
87cec2f5c3 phabricator: refactor PhabricatorLister's constructor
- use the 'standard' api_baseurl as init argument,
- make it optional, with default to forge.softwareheritage.org,
- use origin_url as id.
2019-09-02 12:29:38 +02:00
David Douard
3816b4d3bf cgit: rewrite the CGit lister
Simplify the code:
- do only inherit from ListerBase
- implement HTTP queries directly using requests
- get rid of convoluted code

Make the origin_url gathered from the git repo's "project" page instead of
building it from the 'url_prefix' hack. Now, the lister WILL make substancially
more requests, since it will make one request per listed git repo, but
the provided origin_url should be pretty reliable now.

When several url are provided as clonable URLs, choose the http/https one first,
otherwise, choose the first one of the list.

Add proper tests for the cgit lister.

Also, get rid of the 'time_updated' column in the model.
2019-09-02 12:29:31 +02:00
Antoine R. Dumont (@ardumont)
4b2ab0488a
cli: Unify new_lister method name to get_lister 2019-08-28 16:29:26 +02:00
Antoine R. Dumont (@ardumont)
dee9fe93bf
cli: Bootstrap tests on cli 2019-08-28 16:29:26 +02:00
Nicolas Dandrimont
b3e815bef8 Rename test methods to test_ to allow collection by pytest
Part of T1261
2018-10-15 10:11:07 +02:00
Antoine R. Dumont (@ardumont)
e24cf1d4f1
gitlab.lister: Simplify retrieving headers information
As response headers' keys are case-insensitive and requests does the
aggregation magic.

[1] http://docs.python-requests.org/en/master/user/quickstart/#response-headers
2018-07-16 13:57:34 +02:00
Antoine R. Dumont (@ardumont)
81fd5f9c5d
swh.lister.gitlab.tasks: Fix range computations 2018-07-12 14:23:14 +02:00
Antoine R. Dumont (@ardumont)
13bb7aca58
swh.lister.gitlab: Improve headers extraction 2018-07-11 18:09:18 +02:00