Commit graph

35 commits

Author SHA1 Message Date
Antoine R. Dumont (@ardumont)
9437a643ad
pytest: Define plugin and declare it in the root conftest
Then drop all unneeded and indirect imports
2020-09-02 12:25:15 +02:00
Nicolas Dandrimont
c9963d4302 Use the new names for the swh.scheduler test fixtures 2020-07-09 17:06:50 +02:00
David Douard
93a4d8b784 Enable black
- blackify all the python files,
- enable black in pre-commit,
- add a black tox environment.
2020-04-08 16:31:22 +02:00
Gautier Pugnonblanc Yann
60adc424be add anotation type in some lister file 2020-02-17 15:58:34 +01:00
Antoine Lambert
99fcd2b3f5 docs: Fix sphinx warnings
Related to T2188
2020-01-17 16:15:11 +01:00
Antoine R. Dumont (@ardumont)
4a9608f31c
lister/tasks: Standardize return statements
The following commit adapts the return statements from both lister and their
associated tasks. This standardizes on what other modules (e.g. both dvcs and
package loaders) do.
2019-12-02 15:49:38 +01:00
Antoine R. Dumont (@ardumont)
cb853f4898
lister.tests: Avoid duplication setup step
And remove unnecessary fixture redefinition which causes indirection.
2019-11-21 14:24:01 +01:00
David Douard
3ddfd00e90 Fix typos (and trailing ws) reported by codespell 2019-11-21 14:11:18 +01:00
Antoine R. Dumont (@ardumont)
1757b0112b
cran/gnu: Rename task_type to load-archive-files 2019-11-21 13:44:20 +01:00
Antoine R. Dumont (@ardumont)
1cf7c8e86b
lister.tests: Add missing task_type for package listers
The scheduler module no longer initializes itself those task_type.
2019-11-21 13:44:20 +01:00
Antoine R. Dumont (@ardumont)
bf030c0f00
gnu.lister: Use url as primary key
Otherwise, we are failing unicity constraint.

Related to T2070
2019-11-15 11:24:21 +01:00
Antoine R. Dumont (@ardumont)
daa9a270fb
gnu.lister.tests: Add missing assertion 2019-11-15 10:49:13 +01:00
Antoine R. Dumont (@ardumont)
191043fff9
gnu.lister: Add missing retries_left parameter 2019-11-15 10:47:23 +01:00
Antoine R. Dumont (@ardumont)
89b409d30f
gnu.lister: Update docstrings and fix type annotations 2019-11-04 11:06:16 +01:00
Antoine R. Dumont (@ardumont)
c6372eea7e
gnu.lister: Unify timestamp formats to isoformat date in model
Related T2023
2019-11-04 10:08:01 +01:00
Antoine R. Dumont (@ardumont)
93e7afda3b
gnu.lister: Format timestamp to isoformat string for the tar loader
Related T2023
2019-11-04 10:08:00 +01:00
Antoine R. Dumont (@ardumont)
9fd648987e
gnu.tree: Move helper utilities function to the bottom of the file 2019-11-04 10:08:00 +01:00
Antoine R. Dumont (@ardumont)
821f3f1cc2
lister.gnu: Move version parsing logic to the lister
Related D2145 proposition
2019-11-04 10:08:00 +01:00
Nicolas Dandrimont
78105940ff Stop binding tasks to a specific instance of the celery app
The celery.shared_task decorator allows late-binding of tasks to any celery app,
which is well suited for our "task plugin" architecture.
2019-10-18 18:02:25 +02:00
Antoine R. Dumont (@ardumont)
a8cde12d72
tests: Update pytest_plugin according to latest version change 2019-10-14 18:20:15 +02:00
Antoine R. Dumont (@ardumont)
ef2c1847e4
gnu.lister: Move tests datadir into its dedicated folder
Relatd D2076#inline-13551
2019-10-10 11:50:11 +02:00
Antoine R. Dumont (@ardumont)
0f0b840178
gnu.tests: Checks lister output from scheduler
This also adds an swh-listers fixture which allows to retrieve a test ready
lister from its name (e.g gnu). Those listers have access to a scheduler
fixture so we can check the listing output from the scheduler instance.
2019-10-09 18:23:51 +02:00
Antoine R. Dumont (@ardumont)
04ca318680
simple_lister: Extract common behavior in base class 2019-10-09 17:35:12 +02:00
Antoine R. Dumont (@ardumont)
3ce6c5c6ef
lister/gnu: Modify gnu lister's loading task creation
Loader Task signature for the loader gnu is now:
- args:
  - package
  - package urls

- kwargs:
  tarballs: List of Dict with keys archive (unchanged), 'time' (was 'date'),
      length (new)
2019-10-04 11:05:58 +02:00
Antoine R. Dumont (@ardumont)
00bb6c7bbf
lister/gnu: Remove unneeded get_file method 2019-10-04 11:02:17 +02:00
Antoine R. Dumont (@ardumont)
322c9dc7e2
lister/gnu: Modify default policy to oneshot 2019-10-04 11:01:49 +02:00
David Douard
8d9deeb8f8 plugins: add support for scheduler's task-type declaration
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:

- only create missing task-types (do not update them), but check that the
  backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
  plugin registry field will be checked and added if needed; tasks which name
  start wit an underscore will not be added,
- added task-type will have:
  - the 'type' field is derived from the task's function name (with underscores
    replaced with dashes),
  - the description field is the first line of that function's docstring,
  - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
    a simple pattern matching to have decent default values for full/incremental
    tasks),
  - these default values can be overloaded via the 'task_type' plugin registry
    entry.

For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).

Comes with some tests.
2019-09-04 15:36:08 +02:00
David Douard
e3c0ea9d90 implement listers as plugins
Listers are declared as plugins via the `swh.workers` entry_point.

As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:

- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
  (typically, create/initialize the database for this lister).
  If not set, the default implementation creates database tables (after
  optionally having deleted exisintg ones) according to models declared in
  the `models` register field.

There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).

Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
  level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
  initializing (especially with --drop-tables) the database for a single
  lister is unreliable, since all tables are created using a sibgle MetaData
  (in the same namespace).
2019-09-03 15:02:24 +02:00
Antoine R. Dumont (@ardumont)
87d2a16df0
listers: Allow to override policy and priority for scheduled tasks
Prior to this commit, the policy and priority were hard-coded.
The default values are now the old hard-coded values.

This will allow to develop a cli to trigger forges listing with oneshot policy
and some priority tasks. Thus ingesting those faster and without manual
interventation as we currently do.
2019-08-28 11:57:10 +02:00
Archit Agrawal
f76b96b825 swh.lister.gnu: Change origin type to tar
Change origin type from 'gnu' to 'tar'
2019-06-19 17:21:02 +05:30
Valentin Lorentz
aef7d5952e Remove columns 'description' and 'origin_id'.
They are useless.
2019-06-19 10:29:15 +02:00
Archit Agrawal
7c6245e663 swh.lister.gnu: Add function to check for file extension.
Added a function which will derive the extension from filename
and check if the fie extension match the type of file that is to be
archived.
2019-06-11 15:12:53 +05:30
Archit Agrawal
709ba8a6e5 swh.lister.gnu: Add functionality to list all the tarballs for a package.
As discussed in T1389 to ingest all packages using base loader, it need
a list of all the tarballs for a pakage.
Hence modifified lister to recursively list all the tarballs for a
package with their last updated time.
2019-06-08 21:56:00 +05:30
Archit Agrawal
ebdb959823 swh.lister.gnu : Change download method of tree.json file to request
Previously gnu lister was using same code as that of tarball loader
to download, unzip and read tree.json file.
To make the code consise the downloading method is changed to
requests library.
2019-06-08 21:56:00 +05:30
Archit Agrawal
151f6cd223 swh.lister.gnu
Implement first pass of gnu lister to list all the
packages present in https://ftp.gnu.org/
Add GNU lister in README and cli.py

Closes T1722
2019-06-08 21:56:00 +05:30