swh-lister

Author	SHA1	Message	Date
Valentin Lorentz	a681f2f405	packagist: Canonicalize github origins In particular, there seems to be a negligeable number of origins using SSH instead of HTTPS, which the git loader cannot deal with.	2022-10-13 17:14:58 +02:00
Valentin Lorentz	f5c5599f2e	packagist: Actually test listed origins Tests implemented roughly the same algorithm as the lister, and compared both values...	2022-10-13 11:53:28 +02:00
Antoine Lambert	db6ce12e9e	Refactor and deduplicate HTTP requests code in listers Numerous listers were using the same page_request method or equivalent in their implementation so prefer to deduplicate that code by adding an http_request method in base lister class: swh.lister.pattern.Lister. That method simply wraps a call to requests.Session.request and logs some useful info for debugging and error reporting, also an HTTPError will be raised if a request ends up with an error. All listers using that new method now benefit of requests retry when an HTTP error occurs thanks to the use of the http_retry decorator.	2022-09-26 10:48:40 +02:00
Antoine Lambert	d38e05cff7	python: Reformat code with black 22.3.0 Related to T3922	2022-04-08 15:15:09 +02:00
Antoine Lambert	ff05191b7d	packagist: Reimplement lister using new Lister API The previous implementation was generating tasks for a non implemented Packagist loader. The new implementation extracts source repository URL, VCS type and last update date for each package referenced by Packagist and send those info to the scheduler. Packages metadata are retrieved using Packagist API endpoints whose responses are served from static files, which are guaranteed to be efficient on the Packagist side (no dymamic queries). Furthermore, subsequent listing will send the "If-Modified-Since" HTTP header to only retrieve packages metadata updated since the previous listing operation in order to save bandwidth and return only origins which might have new released versions. Closes T2991	2021-02-02 14:48:47 +01:00
Antoine Lambert	22f7181294	python: Reorder imports with isort Related to T2610	2020-09-17 17:48:27 +02:00
Antoine R. Dumont (@ardumont)	5a5b7ef70b	tests: Separate lister instantiations Prior to this commit, all listers were instantiated at the same time even if only one was needed. This commit separates those instantiations. The only drawback to this is the db model initialization which now happens at each lister instantiation. This can be dealt with if needed at another time though.	2020-09-02 12:49:00 +02:00
Antoine R. Dumont (@ardumont)	9437a643ad	pytest: Define plugin and declare it in the root conftest Then drop all unneeded and indirect imports	2020-09-02 12:25:15 +02:00
Nicolas Dandrimont	c9963d4302	Use the new names for the swh.scheduler test fixtures	2020-07-09 17:06:50 +02:00
David Douard	93a4d8b784	Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment.	2020-04-08 16:31:22 +02:00
Antoine Lambert	99fcd2b3f5	docs: Fix sphinx warnings Related to T2188	2020-01-17 16:15:11 +01:00
Antoine R. Dumont (@ardumont)	5ab9d67d67	core: Align listers' task output (hg/git tasks) with expected format Related to T2134 Related to D2409 Related to D2410	2019-12-09 15:12:17 +01:00
Nicolas Dandrimont	78105940ff	Stop binding tasks to a specific instance of the celery app The celery.shared_task decorator allows late-binding of tasks to any celery app, which is well suited for our "task plugin" architecture.	2019-10-18 18:02:25 +02:00
Antoine R. Dumont (@ardumont)	a8cde12d72	tests: Update pytest_plugin according to latest version change	2019-10-14 18:20:15 +02:00
Antoine R. Dumont (@ardumont)	f3bf9ae50f	packagist.lister: Add integration test which checks scheduled tasks Related T2032	2019-10-11 14:52:56 +02:00
Antoine R. Dumont (@ardumont)	04ca318680	simple_lister: Extract common behavior in base class	2019-10-09 17:35:12 +02:00
Antoine Lambert	7c8f4dc9a8	packagist/lister: Fix typos in docstring	2019-09-12 20:46:42 +02:00
David Douard	8d9deeb8f8	plugins: add support for scheduler's task-type declaration Add a new register-task-types cli that will create missing task-type entries in the scheduler according to: - only create missing task-types (do not update them), but check that the backend_name field is consistent, - each SWHTask-based task declared in a module listed in the 'task_modules' plugin registry field will be checked and added if needed; tasks which name start wit an underscore will not be added, - added task-type will have: - the 'type' field is derived from the task's function name (with underscores replaced with dashes), - the description field is the first line of that function's docstring, - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with a simple pattern matching to have decent default values for full/incremental tasks), - these default values can be overloaded via the 'task_type' plugin registry entry. For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`). Comes with some tests.	2019-09-04 15:36:08 +02:00
David Douard	e3c0ea9d90	implement listers as plugins Listers are declared as plugins via the `swh.workers` entry_point. As such, the registry function is expected to return a dict with the `task_modules` field (as for generic worker plugins), plus: - `lister`: the lister class, - `models`: list of SQLAlchemy models used by this lister, - `init` (optionnal): hook (callable) used to initialize the lister's state (typically, create/initialize the database for this lister). If not set, the default implementation creates database tables (after optionally having deleted exisintg ones) according to models declared in the `models` register field. There is no need for explicitely add lister task modules in the main `conftest` module, but any new/extra lister to be tested must be registered (the tested lister module must be properly installed in the test environment). Also refactor a bit the cli tools: - add support for the standard --config-file option at the 'lister' group level, - move the --db-url to the 'lister' group, - drop the --lister option for the `swh lister db-init` cli tool: initializing (especially with --drop-tables) the database for a single lister is unreliable, since all tables are created using a sibgle MetaData (in the same namespace).	2019-09-03 15:02:24 +02:00
Antoine R. Dumont (@ardumont)	87d2a16df0	listers: Allow to override policy and priority for scheduled tasks Prior to this commit, the policy and priority were hard-coded. The default values are now the old hard-coded values. This will allow to develop a cli to trigger forges listing with oneshot policy and some priority tasks. Thus ingesting those faster and without manual interventation as we currently do.	2019-08-28 11:57:10 +02:00
Archit Agrawal	5727f15cf3	swh.lister.packagist Implement a packagist lister to list the names and metadata url of all the packages. Closes 1776	2019-07-19 19:59:30 +05:30

21 commits