Commit graph

48 commits

Author SHA1 Message Date
David Douard
cccb8c21ff Replace all remaining occurrences of the 'local' cls by 'postgresql'
The former has been deprecated for ages...
2024-10-28 14:35:29 +01:00
Antoine Lambert
4aee4da784 cran: Use pyreadr instead of rpy2 to read a RDS file from Python
The CRAN lister improvements introduced in 91e4e33 originally used pyreadr
to read a RDS file from Python instead of rpy2.

As swh-lister was still packaged for debian at the time, the choice of using
rpy2 instead was made as a debian package is available for it while it is not
for pyreadr.

Now debian packaging was dropped for swh-lister we can reinstate the pyreadr
based implementation which has the advantages of being faster and not depending
on the R language runtime.

Related to swh/meta#1709.
2023-11-14 17:09:42 +01:00
Antoine Lambert
6e7bc49ec7 Harmonize listers parameters and add test to check mandatory ones
Ensure that all lister classes have the same set of mandatory parameters
in their constructors, notably: scheduler, url, instance and credentials.

Add a new test checking listers classes have mandatory parameters declared
in their constructors. The purpose is to avoid deployment issues on staging
or production environment as celery tasks can fail to be executed if mandatory
parameters are not handled by listers.

Reated to swh/infra/sysadm-environment#5030.
2023-09-06 11:55:34 +02:00
Antoine Lambert
91e4e33dd5 cran: Improve listing of R packages
Previously, the lister was relying on the use of the CRANtools R module
but it has the drawback to only list the latest version of each registered
package in the CRAN registry.

In order to get all possible versions for each CRAN package, prefer to exploit
the content of the weekly dump of the CRAN database in RDS format.

To read the content of the RDS file from Python, the rpy2 package is used as
it has the advantage to be packaged in debian.

Related to swh/meta#1709.
2023-08-21 16:38:08 +02:00
Nicolas Dandrimont
e785e67315 Hook up recently introduced options to all listers
Hopefully one day we'll be able to replace all of this mess with PEP692
TypedDict kwargs, but that's only on track for Python 3.12.
2022-12-05 16:33:45 +01:00
Antoine Lambert
fa1205c4df Send package artifact checksums to loaders when info is available
In listers collecting artifacts for each package to load, add artifacts
checksums, when that info is available, in parameters sent to loaders
in order to check downloaded artifact integrity.
2022-09-30 18:44:11 +02:00
Antoine Lambert
d38e05cff7 python: Reformat code with black 22.3.0
Related to T3922
2022-04-08 15:15:09 +02:00
Valentin Lorentz
6243f800b4 cran: Pass the package name to the loader
It will be used to create a synthetic release message that contains
the package's name, like the Debian loader does.
2021-11-09 15:04:01 +01:00
Antoine Lambert
20232cc36e cran: Fix ListedOrigin visit type
CRAN origins must be loaded with the cran visit type and not the tar one.

Related to T3675
2021-10-22 14:42:32 +02:00
Antoine Lambert
1803b707e4 cran: Prevent multiple listing of an origin
A CRAN package can appear twice in the JSON list returned by the
list_all_packages.R script, most recent version of the package
appearing first.

So handle that edge case to avoid error when sending origins to
the scheduler.
2021-02-05 14:34:37 +01:00
Antoine Lambert
b4c4c20bb9 cran: Add support for parsing date with milliseconds 2021-02-05 14:32:49 +01:00
Antoine Lambert
4245c5046f Remove no longer used models field in dict returned by register 2021-02-02 16:33:52 +01:00
Antoine R. Dumont (@ardumont)
ae17b6b9a0
Make stateless lister constructors compatible with credentials
In effect, it just allows to add credentials to cgit, cran and pypi listers.

This fixes instances of error [1]

[1] https://sentry.softwareheritage.org/share/issue/2c35a9f129cf4982a2dd003a232d507a/

Related to T2998
2021-01-28 14:42:56 +01:00
tenma
6cd31769c1 tests: Remove no longer used conftest files
All the fixtures declared in them are not used anymore in the
tests of the listers ported to the new Lister API.
2021-01-26 17:09:04 +01:00
Antoine Lambert
22eeb0956e cran: Retrieve last update date for each listed package
R package last update date can be found in the "Packaged" field of
package info returned by tools::CRAN_package_db().

So retrieve it and parse it as a datetime to provide as last_update
parameter value in ListedOrigin model.

Closes T2989
2021-01-26 15:14:32 +01:00
Antoine Lambert
6f40ab4c57 cran: Reimplement lister using new Lister API
Related to T2989
2021-01-26 15:14:32 +01:00
Antoine R. Dumont (@ardumont)
47ccce3817
lister.cran.tests: Clarify lister configuration 2020-10-30 13:30:14 +01:00
Antoine Lambert
22f7181294 python: Reorder imports with isort
Related to T2610
2020-09-17 17:48:27 +02:00
Antoine R. Dumont (@ardumont)
5a5b7ef70b
tests: Separate lister instantiations
Prior to this commit, all listers were instantiated at the same time even if
only one was needed. This commit separates those instantiations.

The only drawback to this is the db model initialization which now happens at
each lister instantiation. This can be dealt with if needed at another time
though.
2020-09-02 12:49:00 +02:00
Antoine R. Dumont (@ardumont)
9437a643ad
pytest: Define plugin and declare it in the root conftest
Then drop all unneeded and indirect imports
2020-09-02 12:25:15 +02:00
Nicolas Dandrimont
c9963d4302 Use the new names for the swh.scheduler test fixtures 2020-07-09 17:06:50 +02:00
David Douard
93a4d8b784 Enable black
- blackify all the python files,
- enable black in pre-commit,
- add a black tox environment.
2020-04-08 16:31:22 +02:00
Antoine Lambert
99fcd2b3f5 docs: Fix sphinx warnings
Related to T2188
2020-01-17 16:15:11 +01:00
Antoine R. Dumont (@ardumont)
4761773631
cran.lister: Use cran's canonical url for origin url
Prior to this commit, we sent the origin url as a versioned artifact.
Now we send the origin url as a CRAN's canonical one, and the associated list
of artifacts found there (only 1 today).
2020-01-16 13:48:56 +01:00
Antoine R. Dumont (@ardumont)
767c4c6dc7
cran.lister: Version uid so we can list new package versions 2020-01-15 18:22:01 +01:00
Antoine R. Dumont (@ardumont)
0560b813b2
cran.lister: Adapt docstring sample accordingly 2020-01-09 10:54:49 +01:00
Antoine R. Dumont (@ardumont)
e1069f0c59
cran.lister: Align loading tasks' with loader's expectation 2020-01-09 10:18:51 +01:00
Antoine R. Dumont (@ardumont)
3f3f714c62
cran.lister: Move helper function to the bottom of the file 2020-01-06 16:55:23 +01:00
Antoine R. Dumont (@ardumont)
4a9608f31c
lister/tasks: Standardize return statements
The following commit adapts the return statements from both lister and their
associated tasks. This standardizes on what other modules (e.g. both dvcs and
package loaders) do.
2019-12-02 15:49:38 +01:00
Antoine R. Dumont (@ardumont)
cb853f4898
lister.tests: Avoid duplication setup step
And remove unnecessary fixture redefinition which causes indirection.
2019-11-21 14:24:01 +01:00
David Douard
3ddfd00e90 Fix typos (and trailing ws) reported by codespell 2019-11-21 14:11:18 +01:00
Antoine R. Dumont (@ardumont)
1757b0112b
cran/gnu: Rename task_type to load-archive-files 2019-11-21 13:44:20 +01:00
Antoine R. Dumont (@ardumont)
1cf7c8e86b
lister.tests: Add missing task_type for package listers
The scheduler module no longer initializes itself those task_type.
2019-11-21 13:44:20 +01:00
Stefano Zacchiroli
44bc4462e7 CRAN lister: fix compute_package_url interpolation 2019-10-28 15:50:23 +01:00
Stefano Zacchiroli
6159faa2f5 mypy: add typing annotations for novel lister abstractions 2019-10-28 15:35:21 +01:00
Stefano Zacchiroli
7dfd811e16 CRAN lister: make shelling out decoding compatible with Python 3.5 2019-10-28 15:35:21 +01:00
Stefano Zacchiroli
974f80f966 typing: minimal changes to make a no-op mypy run pass 2019-10-28 15:35:21 +01:00
Nicolas Dandrimont
78105940ff Stop binding tasks to a specific instance of the celery app
The celery.shared_task decorator allows late-binding of tasks to any celery app,
which is well suited for our "task plugin" architecture.
2019-10-18 18:02:25 +02:00
Antoine R. Dumont (@ardumont)
8d50e0d941
cran.lister: Fix cran lister and add proper integration test
Which checks the cran lister tasks written in the scheduler.

Related d30d574dbe
Related 5ea9d5ed39

Related T2032
2019-10-11 13:19:22 +02:00
Antoine R. Dumont (@ardumont)
04ca318680
simple_lister: Extract common behavior in base class 2019-10-09 17:35:12 +02:00
Antoine R. Dumont (@ardumont)
d30d574dbe
cran.lister: Refactor and fix cran lister
Prior to this commit, the code was actually duplicated with an old version
which would not work.

Related D1492#41287
2019-10-02 11:06:59 +02:00
David Douard
8d9deeb8f8 plugins: add support for scheduler's task-type declaration
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:

- only create missing task-types (do not update them), but check that the
  backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
  plugin registry field will be checked and added if needed; tasks which name
  start wit an underscore will not be added,
- added task-type will have:
  - the 'type' field is derived from the task's function name (with underscores
    replaced with dashes),
  - the description field is the first line of that function's docstring,
  - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
    a simple pattern matching to have decent default values for full/incremental
    tasks),
  - these default values can be overloaded via the 'task_type' plugin registry
    entry.

For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).

Comes with some tests.
2019-09-04 15:36:08 +02:00
David Douard
e3c0ea9d90 implement listers as plugins
Listers are declared as plugins via the `swh.workers` entry_point.

As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:

- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
  (typically, create/initialize the database for this lister).
  If not set, the default implementation creates database tables (after
  optionally having deleted exisintg ones) according to models declared in
  the `models` register field.

There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).

Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
  level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
  initializing (especially with --drop-tables) the database for a single
  lister is unreliable, since all tables are created using a sibgle MetaData
  (in the same namespace).
2019-09-03 15:02:24 +02:00
Antoine R. Dumont (@ardumont)
87d2a16df0
listers: Allow to override policy and priority for scheduled tasks
Prior to this commit, the policy and priority were hard-coded.
The default values are now the old hard-coded values.

This will allow to develop a cli to trigger forges listing with oneshot policy
and some priority tasks. Thus ingesting those faster and without manual
interventation as we currently do.
2019-08-28 11:57:10 +02:00
Archit Agrawal
5ea9d5ed39 swh.lister.cran: Add description in task_dict
Add description in task_dict method because
the only metadata that can be found for a
package at CRAN is  its decsription.  That can
only br achived from the build in API in R,
which ister is already using. Hence instead of
getting metadata in loader, it is passed
by lister.
2019-06-27 14:57:51 +05:30
Valentin Lorentz
52b1de87c5 Finish dropping the 'description' column.
I missed some in aef7d5952e.
2019-06-26 14:46:27 +02:00
Valentin Lorentz
aef7d5952e Remove columns 'description' and 'origin_id'.
They are useless.
2019-06-19 10:29:15 +02:00
Archit Agrawal
a9a37a85bf swh.lister.cran
Add a lister to list all the CRAN packages .
It uses the build-in API in R language to list the packages
and get their metadata. 

Closes T1709
2019-06-11 21:26:31 +05:30