Commit graph

49 commits

Author SHA1 Message Date
Antoine Lambert
ceb1b6450e gnu: Fix KeyError exception due to missing field in JSON data
Latest GNU JSON listing is missing the contents field for a directory
so a KeyError exception was raised by the lister.
2025-04-04 12:03:10 +00:00
David Douard
cccb8c21ff Replace all remaining occurrences of the 'local' cls by 'postgresql'
The former has been deprecated for ages...
2024-10-28 14:35:29 +01:00
David Douard
714fccc3c7 python: Fix black formatting after bump to 23.1.0 in pre-commit 2023-12-05 10:33:07 +01:00
Antoine Lambert
6e7bc49ec7 Harmonize listers parameters and add test to check mandatory ones
Ensure that all lister classes have the same set of mandatory parameters
in their constructors, notably: scheduler, url, instance and credentials.

Add a new test checking listers classes have mandatory parameters declared
in their constructors. The purpose is to avoid deployment issues on staging
or production environment as celery tasks can fail to be executed if mandatory
parameters are not handled by listers.

Reated to swh/infra/sysadm-environment#5030.
2023-09-06 11:55:34 +02:00
Nicolas Dandrimont
e785e67315 Hook up recently introduced options to all listers
Hopefully one day we'll be able to replace all of this mess with PEP692
TypedDict kwargs, but that's only on track for Python 3.12.
2022-12-05 16:33:45 +01:00
Antoine R. Dumont (@ardumont)
fbfdf88ea4
nixguix: Add lister
Related to T3781
2022-10-03 18:26:36 +02:00
Antoine Lambert
d38e05cff7 python: Reformat code with black 22.3.0
Related to T3922
2022-04-08 15:15:09 +02:00
Antoine R. Dumont (@ardumont)
5ab6b00408
gnu: Respect the pattern docstring about state initialization
Any extra state initialization (outside the scheduler scope) is to happen in the
get_pages method.
2021-09-21 11:17:16 +02:00
Antoine Lambert
82ab96ad06 gnu: Remove dependency on pytz
UTC timezone settings can be obtained from the datetime.timezone
module from Python standard library so remove dependency on external
pytz module.
2021-02-02 13:19:04 +01:00
Antoine Lambert
4cf0c7f765 gnu: Reimplement lister using new Lister API
ISO functionalities port of the stateless GNU lister to the new
swh.lister.pattern.Lister API.

Closes T2990
2021-01-29 14:39:36 +01:00
Antoine R. Dumont (@ardumont)
36bc51cec5
lister.gnu.tests: Clarify lister configuration 2020-10-30 13:30:15 +01:00
Antoine Lambert
22f7181294 python: Reorder imports with isort
Related to T2610
2020-09-17 17:48:27 +02:00
Antoine R. Dumont (@ardumont)
d1ce9b0912
tox/pytest: Activate doctest-module
Related to D3899#96289
2020-09-10 11:47:47 +02:00
Antoine R. Dumont (@ardumont)
5a5b7ef70b
tests: Separate lister instantiations
Prior to this commit, all listers were instantiated at the same time even if
only one was needed. This commit separates those instantiations.

The only drawback to this is the db model initialization which now happens at
each lister instantiation. This can be dealt with if needed at another time
though.
2020-09-02 12:49:00 +02:00
Antoine R. Dumont (@ardumont)
9437a643ad
pytest: Define plugin and declare it in the root conftest
Then drop all unneeded and indirect imports
2020-09-02 12:25:15 +02:00
Nicolas Dandrimont
c9963d4302 Use the new names for the swh.scheduler test fixtures 2020-07-09 17:06:50 +02:00
David Douard
93a4d8b784 Enable black
- blackify all the python files,
- enable black in pre-commit,
- add a black tox environment.
2020-04-08 16:31:22 +02:00
Gautier Pugnonblanc Yann
60adc424be add anotation type in some lister file 2020-02-17 15:58:34 +01:00
Antoine Lambert
99fcd2b3f5 docs: Fix sphinx warnings
Related to T2188
2020-01-17 16:15:11 +01:00
Antoine R. Dumont (@ardumont)
4a9608f31c
lister/tasks: Standardize return statements
The following commit adapts the return statements from both lister and their
associated tasks. This standardizes on what other modules (e.g. both dvcs and
package loaders) do.
2019-12-02 15:49:38 +01:00
Antoine R. Dumont (@ardumont)
cb853f4898
lister.tests: Avoid duplication setup step
And remove unnecessary fixture redefinition which causes indirection.
2019-11-21 14:24:01 +01:00
David Douard
3ddfd00e90 Fix typos (and trailing ws) reported by codespell 2019-11-21 14:11:18 +01:00
Antoine R. Dumont (@ardumont)
1757b0112b
cran/gnu: Rename task_type to load-archive-files 2019-11-21 13:44:20 +01:00
Antoine R. Dumont (@ardumont)
1cf7c8e86b
lister.tests: Add missing task_type for package listers
The scheduler module no longer initializes itself those task_type.
2019-11-21 13:44:20 +01:00
Antoine R. Dumont (@ardumont)
bf030c0f00
gnu.lister: Use url as primary key
Otherwise, we are failing unicity constraint.

Related to T2070
2019-11-15 11:24:21 +01:00
Antoine R. Dumont (@ardumont)
daa9a270fb
gnu.lister.tests: Add missing assertion 2019-11-15 10:49:13 +01:00
Antoine R. Dumont (@ardumont)
191043fff9
gnu.lister: Add missing retries_left parameter 2019-11-15 10:47:23 +01:00
Antoine R. Dumont (@ardumont)
89b409d30f
gnu.lister: Update docstrings and fix type annotations 2019-11-04 11:06:16 +01:00
Antoine R. Dumont (@ardumont)
c6372eea7e
gnu.lister: Unify timestamp formats to isoformat date in model
Related T2023
2019-11-04 10:08:01 +01:00
Antoine R. Dumont (@ardumont)
93e7afda3b
gnu.lister: Format timestamp to isoformat string for the tar loader
Related T2023
2019-11-04 10:08:00 +01:00
Antoine R. Dumont (@ardumont)
9fd648987e
gnu.tree: Move helper utilities function to the bottom of the file 2019-11-04 10:08:00 +01:00
Antoine R. Dumont (@ardumont)
821f3f1cc2
lister.gnu: Move version parsing logic to the lister
Related D2145 proposition
2019-11-04 10:08:00 +01:00
Nicolas Dandrimont
78105940ff Stop binding tasks to a specific instance of the celery app
The celery.shared_task decorator allows late-binding of tasks to any celery app,
which is well suited for our "task plugin" architecture.
2019-10-18 18:02:25 +02:00
Antoine R. Dumont (@ardumont)
a8cde12d72
tests: Update pytest_plugin according to latest version change 2019-10-14 18:20:15 +02:00
Antoine R. Dumont (@ardumont)
ef2c1847e4
gnu.lister: Move tests datadir into its dedicated folder
Relatd D2076#inline-13551
2019-10-10 11:50:11 +02:00
Antoine R. Dumont (@ardumont)
0f0b840178
gnu.tests: Checks lister output from scheduler
This also adds an swh-listers fixture which allows to retrieve a test ready
lister from its name (e.g gnu). Those listers have access to a scheduler
fixture so we can check the listing output from the scheduler instance.
2019-10-09 18:23:51 +02:00
Antoine R. Dumont (@ardumont)
04ca318680
simple_lister: Extract common behavior in base class 2019-10-09 17:35:12 +02:00
Antoine R. Dumont (@ardumont)
3ce6c5c6ef
lister/gnu: Modify gnu lister's loading task creation
Loader Task signature for the loader gnu is now:
- args:
  - package
  - package urls

- kwargs:
  tarballs: List of Dict with keys archive (unchanged), 'time' (was 'date'),
      length (new)
2019-10-04 11:05:58 +02:00
Antoine R. Dumont (@ardumont)
00bb6c7bbf
lister/gnu: Remove unneeded get_file method 2019-10-04 11:02:17 +02:00
Antoine R. Dumont (@ardumont)
322c9dc7e2
lister/gnu: Modify default policy to oneshot 2019-10-04 11:01:49 +02:00
David Douard
8d9deeb8f8 plugins: add support for scheduler's task-type declaration
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:

- only create missing task-types (do not update them), but check that the
  backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
  plugin registry field will be checked and added if needed; tasks which name
  start wit an underscore will not be added,
- added task-type will have:
  - the 'type' field is derived from the task's function name (with underscores
    replaced with dashes),
  - the description field is the first line of that function's docstring,
  - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
    a simple pattern matching to have decent default values for full/incremental
    tasks),
  - these default values can be overloaded via the 'task_type' plugin registry
    entry.

For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).

Comes with some tests.
2019-09-04 15:36:08 +02:00
David Douard
e3c0ea9d90 implement listers as plugins
Listers are declared as plugins via the `swh.workers` entry_point.

As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:

- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
  (typically, create/initialize the database for this lister).
  If not set, the default implementation creates database tables (after
  optionally having deleted exisintg ones) according to models declared in
  the `models` register field.

There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).

Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
  level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
  initializing (especially with --drop-tables) the database for a single
  lister is unreliable, since all tables are created using a sibgle MetaData
  (in the same namespace).
2019-09-03 15:02:24 +02:00
Antoine R. Dumont (@ardumont)
87d2a16df0
listers: Allow to override policy and priority for scheduled tasks
Prior to this commit, the policy and priority were hard-coded.
The default values are now the old hard-coded values.

This will allow to develop a cli to trigger forges listing with oneshot policy
and some priority tasks. Thus ingesting those faster and without manual
interventation as we currently do.
2019-08-28 11:57:10 +02:00
Archit Agrawal
f76b96b825 swh.lister.gnu: Change origin type to tar
Change origin type from 'gnu' to 'tar'
2019-06-19 17:21:02 +05:30
Valentin Lorentz
aef7d5952e Remove columns 'description' and 'origin_id'.
They are useless.
2019-06-19 10:29:15 +02:00
Archit Agrawal
7c6245e663 swh.lister.gnu: Add function to check for file extension.
Added a function which will derive the extension from filename
and check if the fie extension match the type of file that is to be
archived.
2019-06-11 15:12:53 +05:30
Archit Agrawal
709ba8a6e5 swh.lister.gnu: Add functionality to list all the tarballs for a package.
As discussed in T1389 to ingest all packages using base loader, it need
a list of all the tarballs for a pakage.
Hence modifified lister to recursively list all the tarballs for a
package with their last updated time.
2019-06-08 21:56:00 +05:30
Archit Agrawal
ebdb959823 swh.lister.gnu : Change download method of tree.json file to request
Previously gnu lister was using same code as that of tarball loader
to download, unzip and read tree.json file.
To make the code consise the downloading method is changed to
requests library.
2019-06-08 21:56:00 +05:30
Archit Agrawal
151f6cd223 swh.lister.gnu
Implement first pass of gnu lister to list all the
packages present in https://ftp.gnu.org/
Add GNU lister in README and cli.py

Closes T1722
2019-06-08 21:56:00 +05:30