swh-lister

Author	SHA1	Message	Date
David Douard	cccb8c21ff	Replace all remaining occurrences of the 'local' cls by 'postgresql' The former has been deprecated for ages...	2024-10-28 14:35:29 +01:00
Antoine Lambert	4aee4da784	cran: Use pyreadr instead of rpy2 to read a RDS file from Python The CRAN lister improvements introduced in `91e4e33` originally used pyreadr to read a RDS file from Python instead of rpy2. As swh-lister was still packaged for debian at the time, the choice of using rpy2 instead was made as a debian package is available for it while it is not for pyreadr. Now debian packaging was dropped for swh-lister we can reinstate the pyreadr based implementation which has the advantages of being faster and not depending on the R language runtime. Related to swh/meta#1709.	2023-11-14 17:09:42 +01:00
Antoine Lambert	6e7bc49ec7	Harmonize listers parameters and add test to check mandatory ones Ensure that all lister classes have the same set of mandatory parameters in their constructors, notably: scheduler, url, instance and credentials. Add a new test checking listers classes have mandatory parameters declared in their constructors. The purpose is to avoid deployment issues on staging or production environment as celery tasks can fail to be executed if mandatory parameters are not handled by listers. Reated to swh/infra/sysadm-environment#5030.	2023-09-06 11:55:34 +02:00
Antoine Lambert	91e4e33dd5	cran: Improve listing of R packages Previously, the lister was relying on the use of the CRANtools R module but it has the drawback to only list the latest version of each registered package in the CRAN registry. In order to get all possible versions for each CRAN package, prefer to exploit the content of the weekly dump of the CRAN database in RDS format. To read the content of the RDS file from Python, the rpy2 package is used as it has the advantage to be packaged in debian. Related to swh/meta#1709.	2023-08-21 16:38:08 +02:00
Nicolas Dandrimont	e785e67315	Hook up recently introduced options to all listers Hopefully one day we'll be able to replace all of this mess with PEP692 TypedDict kwargs, but that's only on track for Python 3.12.	2022-12-05 16:33:45 +01:00
Antoine Lambert	fa1205c4df	Send package artifact checksums to loaders when info is available In listers collecting artifacts for each package to load, add artifacts checksums, when that info is available, in parameters sent to loaders in order to check downloaded artifact integrity.	2022-09-30 18:44:11 +02:00
Antoine Lambert	d38e05cff7	python: Reformat code with black 22.3.0 Related to T3922	2022-04-08 15:15:09 +02:00
Valentin Lorentz	6243f800b4	cran: Pass the package name to the loader It will be used to create a synthetic release message that contains the package's name, like the Debian loader does.	2021-11-09 15:04:01 +01:00
Antoine Lambert	20232cc36e	cran: Fix ListedOrigin visit type CRAN origins must be loaded with the cran visit type and not the tar one. Related to T3675	2021-10-22 14:42:32 +02:00
Antoine Lambert	1803b707e4	cran: Prevent multiple listing of an origin A CRAN package can appear twice in the JSON list returned by the list_all_packages.R script, most recent version of the package appearing first. So handle that edge case to avoid error when sending origins to the scheduler.	2021-02-05 14:34:37 +01:00
Antoine Lambert	b4c4c20bb9	cran: Add support for parsing date with milliseconds	2021-02-05 14:32:49 +01:00
Antoine Lambert	4245c5046f	Remove no longer used models field in dict returned by register	2021-02-02 16:33:52 +01:00
Antoine R. Dumont (@ardumont)	ae17b6b9a0	Make stateless lister constructors compatible with credentials In effect, it just allows to add credentials to cgit, cran and pypi listers. This fixes instances of error [1] [1] https://sentry.softwareheritage.org/share/issue/2c35a9f129cf4982a2dd003a232d507a/ Related to T2998	2021-01-28 14:42:56 +01:00
tenma	6cd31769c1	tests: Remove no longer used conftest files All the fixtures declared in them are not used anymore in the tests of the listers ported to the new Lister API.	2021-01-26 17:09:04 +01:00
Antoine Lambert	22eeb0956e	cran: Retrieve last update date for each listed package R package last update date can be found in the "Packaged" field of package info returned by tools::CRAN_package_db(). So retrieve it and parse it as a datetime to provide as last_update parameter value in ListedOrigin model. Closes T2989	2021-01-26 15:14:32 +01:00
Antoine Lambert	6f40ab4c57	cran: Reimplement lister using new Lister API Related to T2989	2021-01-26 15:14:32 +01:00
Antoine R. Dumont (@ardumont)	47ccce3817	lister.cran.tests: Clarify lister configuration	2020-10-30 13:30:14 +01:00
Antoine Lambert	22f7181294	python: Reorder imports with isort Related to T2610	2020-09-17 17:48:27 +02:00
Antoine R. Dumont (@ardumont)	5a5b7ef70b	tests: Separate lister instantiations Prior to this commit, all listers were instantiated at the same time even if only one was needed. This commit separates those instantiations. The only drawback to this is the db model initialization which now happens at each lister instantiation. This can be dealt with if needed at another time though.	2020-09-02 12:49:00 +02:00
Antoine R. Dumont (@ardumont)	9437a643ad	pytest: Define plugin and declare it in the root conftest Then drop all unneeded and indirect imports	2020-09-02 12:25:15 +02:00
Nicolas Dandrimont	c9963d4302	Use the new names for the swh.scheduler test fixtures	2020-07-09 17:06:50 +02:00
David Douard	93a4d8b784	Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment.	2020-04-08 16:31:22 +02:00
Antoine Lambert	99fcd2b3f5	docs: Fix sphinx warnings Related to T2188	2020-01-17 16:15:11 +01:00
Antoine R. Dumont (@ardumont)	4761773631	cran.lister: Use cran's canonical url for origin url Prior to this commit, we sent the origin url as a versioned artifact. Now we send the origin url as a CRAN's canonical one, and the associated list of artifacts found there (only 1 today).	2020-01-16 13:48:56 +01:00
Antoine R. Dumont (@ardumont)	767c4c6dc7	cran.lister: Version uid so we can list new package versions	2020-01-15 18:22:01 +01:00
Antoine R. Dumont (@ardumont)	0560b813b2	cran.lister: Adapt docstring sample accordingly	2020-01-09 10:54:49 +01:00
Antoine R. Dumont (@ardumont)	e1069f0c59	cran.lister: Align loading tasks' with loader's expectation	2020-01-09 10:18:51 +01:00
Antoine R. Dumont (@ardumont)	3f3f714c62	cran.lister: Move helper function to the bottom of the file	2020-01-06 16:55:23 +01:00
Antoine R. Dumont (@ardumont)	4a9608f31c	lister/tasks: Standardize return statements The following commit adapts the return statements from both lister and their associated tasks. This standardizes on what other modules (e.g. both dvcs and package loaders) do.	2019-12-02 15:49:38 +01:00
Antoine R. Dumont (@ardumont)	cb853f4898	lister.tests: Avoid duplication setup step And remove unnecessary fixture redefinition which causes indirection.	2019-11-21 14:24:01 +01:00
David Douard	3ddfd00e90	Fix typos (and trailing ws) reported by codespell	2019-11-21 14:11:18 +01:00
Antoine R. Dumont (@ardumont)	1757b0112b	cran/gnu: Rename task_type to load-archive-files	2019-11-21 13:44:20 +01:00
Antoine R. Dumont (@ardumont)	1cf7c8e86b	lister.tests: Add missing task_type for package listers The scheduler module no longer initializes itself those task_type.	2019-11-21 13:44:20 +01:00
Stefano Zacchiroli	44bc4462e7	CRAN lister: fix compute_package_url interpolation	2019-10-28 15:50:23 +01:00
Stefano Zacchiroli	6159faa2f5	mypy: add typing annotations for novel lister abstractions	2019-10-28 15:35:21 +01:00
Stefano Zacchiroli	7dfd811e16	CRAN lister: make shelling out decoding compatible with Python 3.5	2019-10-28 15:35:21 +01:00
Stefano Zacchiroli	974f80f966	typing: minimal changes to make a no-op mypy run pass	2019-10-28 15:35:21 +01:00
Nicolas Dandrimont	78105940ff	Stop binding tasks to a specific instance of the celery app The celery.shared_task decorator allows late-binding of tasks to any celery app, which is well suited for our "task plugin" architecture.	2019-10-18 18:02:25 +02:00
Antoine R. Dumont (@ardumont)	8d50e0d941	cran.lister: Fix cran lister and add proper integration test Which checks the cran lister tasks written in the scheduler. Related `d30d574dbe` Related `5ea9d5ed39` Related T2032	2019-10-11 13:19:22 +02:00
Antoine R. Dumont (@ardumont)	04ca318680	simple_lister: Extract common behavior in base class	2019-10-09 17:35:12 +02:00
Antoine R. Dumont (@ardumont)	d30d574dbe	cran.lister: Refactor and fix cran lister Prior to this commit, the code was actually duplicated with an old version which would not work. Related D1492#41287	2019-10-02 11:06:59 +02:00
David Douard	8d9deeb8f8	plugins: add support for scheduler's task-type declaration Add a new register-task-types cli that will create missing task-type entries in the scheduler according to: - only create missing task-types (do not update them), but check that the backend_name field is consistent, - each SWHTask-based task declared in a module listed in the 'task_modules' plugin registry field will be checked and added if needed; tasks which name start wit an underscore will not be added, - added task-type will have: - the 'type' field is derived from the task's function name (with underscores replaced with dashes), - the description field is the first line of that function's docstring, - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with a simple pattern matching to have decent default values for full/incremental tasks), - these default values can be overloaded via the 'task_type' plugin registry entry. For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`). Comes with some tests.	2019-09-04 15:36:08 +02:00
David Douard	e3c0ea9d90	implement listers as plugins Listers are declared as plugins via the `swh.workers` entry_point. As such, the registry function is expected to return a dict with the `task_modules` field (as for generic worker plugins), plus: - `lister`: the lister class, - `models`: list of SQLAlchemy models used by this lister, - `init` (optionnal): hook (callable) used to initialize the lister's state (typically, create/initialize the database for this lister). If not set, the default implementation creates database tables (after optionally having deleted exisintg ones) according to models declared in the `models` register field. There is no need for explicitely add lister task modules in the main `conftest` module, but any new/extra lister to be tested must be registered (the tested lister module must be properly installed in the test environment). Also refactor a bit the cli tools: - add support for the standard --config-file option at the 'lister' group level, - move the --db-url to the 'lister' group, - drop the --lister option for the `swh lister db-init` cli tool: initializing (especially with --drop-tables) the database for a single lister is unreliable, since all tables are created using a sibgle MetaData (in the same namespace).	2019-09-03 15:02:24 +02:00
Antoine R. Dumont (@ardumont)	87d2a16df0	listers: Allow to override policy and priority for scheduled tasks Prior to this commit, the policy and priority were hard-coded. The default values are now the old hard-coded values. This will allow to develop a cli to trigger forges listing with oneshot policy and some priority tasks. Thus ingesting those faster and without manual interventation as we currently do.	2019-08-28 11:57:10 +02:00
Archit Agrawal	5ea9d5ed39	swh.lister.cran: Add description in task_dict Add description in task_dict method because the only metadata that can be found for a package at CRAN is its decsription. That can only br achived from the build in API in R, which ister is already using. Hence instead of getting metadata in loader, it is passed by lister.	2019-06-27 14:57:51 +05:30
Valentin Lorentz	52b1de87c5	Finish dropping the 'description' column. I missed some in `aef7d5952e`.	2019-06-26 14:46:27 +02:00
Valentin Lorentz	aef7d5952e	Remove columns 'description' and 'origin_id'. They are useless.	2019-06-19 10:29:15 +02:00
Archit Agrawal	a9a37a85bf	swh.lister.cran Add a lister to list all the CRAN packages . It uses the build-in API in R language to list the packages and get their metadata. Closes T1709	2019-06-11 21:26:31 +05:30

48 commits