swh-lister

Author	SHA1	Message	Date
Antoine Lambert	e904f4760e	gitlab: Handle HTTP status code 500 when listing projects GitLab API can return errors 500 when listing projects (see https://gitlab.com/gitlab-org/gitlab/-/issues/262629). To avoid ending the listing prematurely, skip buggy URLs and move to next pages. Related to T3442	2021-07-23 15:07:16 +02:00
Antoine Lambert	52c3150155	gitlab: Update requests query parameters Increase number of origins per page to the maximum value allowed by GitLab API (100) to send less requests. Ask for simple responses to reduce size of JSON data.	2021-07-23 14:05:38 +02:00
Antoine Lambert	73f85c0b8a	gitlab: Adapt requests retry policy to consider HTTP 50x status codes Temporarily server failures can happen when listing a GitLab instance, HTTP status codes 502, 503 or 520 are returned in that case. So adapt lister requests retry policy to execute requests again when such errors are encountered. Related to T3442	2021-07-23 13:51:17 +02:00
Antoine Lambert	6c12350863	pattern: Use URL network location as instance name when not provided Make the instance parameter of the base pattern lister optional and set lister name to URL network location when not provided. It simplifies lister creation when associated forge type have a lot of instances in the wild (e.g. gitlab or cgit) while giving more details about the listed forge instance. Also process listers for forge with multiple instances (cgit, gitea, gitlab, phabricator and tuleap) to ensure URL network location will be used when instance parameter is not provided. Related to T3403	2021-07-13 12:33:49 +02:00
Antoine R. Dumont (@ardumont)	72be074a79	gitlab: Deal with missing or trailing / in url input	2021-01-28 10:46:58 +01:00
Antoine R. Dumont (@ardumont)	e09ad272d7	tests: Drop unneeded reset instruction Plus that instruction is not correct in most recent requests_mock version (failing the debian build)	2021-01-27 15:42:57 +01:00
Antoine R. Dumont (@ardumont)	97254a19f2	gitlab: Implement keyset-based pagination listing The previous pagination implementation has a hard-coded limit server side [1] [1] ``` {"error":"Offset pagination has a maximum allowed offset of 50000 for requests that return objects of type Project. Remaining records can be retrieved using keyset pagination."} ``` Related to T2994	2021-01-26 16:54:14 +01:00
Antoine R. Dumont (@ardumont)	aefb260f76	gitlab: Add support for last_update information during listing	2021-01-26 14:03:20 +01:00
Antoine R. Dumont (@ardumont)	1a19b2c747	gitlab: Support authentication Related to T2987	2021-01-26 14:03:20 +01:00
Antoine R. Dumont (@ardumont)	bea9d6d147	gitlab: make url mandatory and add type	2021-01-25 19:00:01 +01:00
Antoine Lambert	ea8ecee541	tests: Fix errors after swh-scheduler API update The PaginatedListedOriginList model has been updated in rDSCHb93aa5be2c2d5dc2130e1027698f3e1255052d8d and the origins field has been renamed to results.	2021-01-25 17:11:54 +01:00
Antoine R. Dumont (@ardumont)	02871f16c9	gitlab: Adapt celery task implementations to the new lister api Related to T2987	2021-01-25 15:08:31 +01:00
Antoine R. Dumont (@ardumont)	ce87a8f7b2	gitlab: Let the lister compute the internal project listing page Related to T2987	2021-01-25 14:05:34 +01:00
Antoine R. Dumont (@ardumont)	7f1609265f	test: Rename internal method to something public It's used in multiple module tests now.	2021-01-25 13:39:07 +01:00
Antoine R. Dumont (@ardumont)	d3fe3d5747	gitlab: Fix mypy issue	2021-01-25 13:39:07 +01:00
Antoine R. Dumont (@ardumont)	2246d28606	gitlab: Document the lister constructor parameters Related to T2987	2021-01-25 13:30:45 +01:00
Antoine R. Dumont (@ardumont)	b352b8e11e	gitlab: Add test on rate-limit support Related to T2987	2021-01-25 09:23:22 +01:00
Antoine R. Dumont (@ardumont)	1f911401a1	gitlab: Add test on incremental implementation Note that the current implementation will start back the new visit from the last next_page link seen (that's what is stored in the lister state to avoid computing back the url). This means that this page will be seen at least 2 times, on the first visit and on the next. This should not pose any problems as the listing is idempotent. Related to T2987	2021-01-25 08:51:23 +01:00
Antoine R. Dumont (@ardumont)	84dd616ab6	gitlab: Add test on pagination Related to T2987	2021-01-25 08:51:23 +01:00
Antoine R. Dumont (@ardumont)	1390a513f2	gitlab: Port to the new lister api Related to T2987	2021-01-25 08:51:16 +01:00
tenma	5411141e3a	gitlab.tests: fix erroneous import from gitea module	2021-01-21 16:17:36 +01:00
Antoine R. Dumont (@ardumont)	20a91482ca	lister.gitlab.tests: Clarify lister configuration	2020-10-30 13:30:15 +01:00
Antoine Lambert	22f7181294	python: Reorder imports with isort Related to T2610	2020-09-17 17:48:27 +02:00
Antoine R. Dumont (@ardumont)	e3c856b5ee	utils.split_range: Split into not overlapping ranges Existing listers use the `is_within_bound` [1] method from the base lister. This method uses inclusive boundaries in all cases. As some "range" task listers [2] [3] are using `split_range` function to create "overlapping" ranges, this can cause concurrent insert issues down the line [4]. This commit adapts the function `split_range` to make the generated ranges no longer overlap. [1] https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/core/lister_base.py$194-199 [2] https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitlab/tasks.py$37-41 [3] https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitea/tasks.py$36-41 Related to T2577	2020-09-10 11:01:44 +02:00
Antoine R. Dumont (@ardumont)	5a5b7ef70b	tests: Separate lister instantiations Prior to this commit, all listers were instantiated at the same time even if only one was needed. This commit separates those instantiations. The only drawback to this is the db model initialization which now happens at each lister instantiation. This can be dealt with if needed at another time though.	2020-09-02 12:49:00 +02:00
Antoine R. Dumont (@ardumont)	9437a643ad	pytest: Define plugin and declare it in the root conftest Then drop all unneeded and indirect imports	2020-09-02 12:25:15 +02:00
Nicolas Dandrimont	c9963d4302	Use the new names for the swh.scheduler test fixtures	2020-07-09 17:06:50 +02:00
David Douard	93a4d8b784	Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment.	2020-04-08 16:31:22 +02:00
Gautier Pugnonblanc Yann	e5fea84c55	review corrections	2020-02-20 09:13:49 +01:00
Gautier Pugnonblanc Yann	60adc424be	add anotation type in some lister file	2020-02-17 15:58:34 +01:00
Antoine R. Dumont (@ardumont)	5ab9d67d67	core: Align listers' task output (hg/git tasks) with expected format Related to T2134 Related to D2409 Related to D2410	2019-12-09 15:12:17 +01:00
Antoine R. Dumont (@ardumont)	4a9608f31c	lister/tasks: Standardize return statements The following commit adapts the return statements from both lister and their associated tasks. This standardizes on what other modules (e.g. both dvcs and package loaders) do.	2019-12-02 15:49:38 +01:00
Nicolas Dandrimont	78105940ff	Stop binding tasks to a specific instance of the celery app The celery.shared_task decorator allows late-binding of tasks to any celery app, which is well suited for our "task plugin" architecture.	2019-10-18 18:02:25 +02:00
Antoine R. Dumont (@ardumont)	a8cde12d72	tests: Update pytest_plugin according to latest version change	2019-10-14 18:20:15 +02:00
Antoine R. Dumont (@ardumont)	1889875f67	gitlab.lister: Add integration test which checks scheduled tasks Related T2032	2019-10-12 03:11:31 +02:00
David Douard	b810876ef8	tasks: normalize the url argument name of most lister Since all the listing tasks accepts an url as first argument (whatever the argument name is), it makes sense to use a simple common argument name for this. I've chosen 'url' instead of api_baseurl/forge_url/url. Also kill now useless `new_lister()` functions.	2019-09-04 15:38:01 +02:00
David Douard	8d9deeb8f8	plugins: add support for scheduler's task-type declaration Add a new register-task-types cli that will create missing task-type entries in the scheduler according to: - only create missing task-types (do not update them), but check that the backend_name field is consistent, - each SWHTask-based task declared in a module listed in the 'task_modules' plugin registry field will be checked and added if needed; tasks which name start wit an underscore will not be added, - added task-type will have: - the 'type' field is derived from the task's function name (with underscores replaced with dashes), - the description field is the first line of that function's docstring, - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with a simple pattern matching to have decent default values for full/incremental tasks), - these default values can be overloaded via the 'task_type' plugin registry entry. For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`). Comes with some tests.	2019-09-04 15:36:08 +02:00
David Douard	e3c0ea9d90	implement listers as plugins Listers are declared as plugins via the `swh.workers` entry_point. As such, the registry function is expected to return a dict with the `task_modules` field (as for generic worker plugins), plus: - `lister`: the lister class, - `models`: list of SQLAlchemy models used by this lister, - `init` (optionnal): hook (callable) used to initialize the lister's state (typically, create/initialize the database for this lister). If not set, the default implementation creates database tables (after optionally having deleted exisintg ones) according to models declared in the `models` register field. There is no need for explicitely add lister task modules in the main `conftest` module, but any new/extra lister to be tested must be registered (the tested lister module must be properly installed in the test environment). Also refactor a bit the cli tools: - add support for the standard --config-file option at the 'lister' group level, - move the --db-url to the 'lister' group, - drop the --lister option for the `swh lister db-init` cli tool: initializing (especially with --drop-tables) the database for a single lister is unreliable, since all tables are created using a sibgle MetaData (in the same namespace).	2019-09-03 15:02:24 +02:00
David Douard	befe9a6d57	gitlab: make GitLabLister's api_baseurl init argument optional and simplify a bit the code of the constructor.	2019-09-02 12:29:38 +02:00
Valentin Lorentz	52b1de87c5	Finish dropping the 'description' column. I missed some in `aef7d5952e`.	2019-06-26 14:46:27 +02:00
Antoine R. Dumont (@ardumont)	3d473c307c	lister: Type correctly the 'indexable' column instead of converting that column as a string As a side effect, bitbucket wise, we provided improperly the after query parameter as a date not url encoded. This resulted in improper api response from bitbucket's (we received from time to time the same next index as the current one). Related T1826	2019-06-26 10:58:54 +02:00
Antoine R. Dumont (@ardumont)	b99617f976	relister: Fix consistently the behavior for the first time relisting If nothing has been done prior to a full relisting, there is actually nothing to list. So the relister in question does nothing. In that context, the IndexingLister class's `db_partition_indices` method now returns an empty list instead of raising a ValueError when there is nothing to list. Related T1826 Related `e129e48`	2019-06-25 14:48:17 +02:00
Antoine R. Dumont (@ardumont)	ecdce4b0cc	gitlab.lister: Remove request_params method override This should have been removed along with the code in `b816212`. The request authentication has been reworked so that all listers use the same credentials dict. Related `b816212` Related T1772	2019-06-14 18:26:36 +02:00
Antoine R. Dumont (@ardumont)	b81621274b	lister: Unify credentials structure between listers This becomes a dictionary of key <lister-name>, value a dict of key <instance-name>, value list of dict username/password. Related T1772	2019-05-29 14:00:11 +02:00
David Douard	e5c3559033	tasks: fix handling of unsupported promise.save() calls the exception can also be an AttributeError. Also do not reraise this exception (in github/tasks.py). This promise saving feature is used for tests.	2019-04-11 11:03:48 +02:00
David Douard	f670de298f	Remove debug logging from tasks' code since this is now handled by the SWHTask itself.	2019-01-17 13:58:29 +01:00
David Douard	e31b61bee1	Do not crash range tasks if celery result backend does not support saving the group's state	2019-01-15 15:32:07 +01:00
David Douard	f46f3e2015	Remove explicit setting of the task base class since it's now the default base class in swh-scheduler (>= 0.0.39)	2019-01-10 09:55:17 +01:00
David Douard	fb9265bb03	Generate the gitlab's instance name from the api_baseurl by default using the host of the given url. This allows to create a lister task by simply specify the API base url and prevent 'inconsistent by default' behavior, eg. with: swh-scheduler task add swh-lister-gitlab-full \ api_baseurl=https://0xacab.org/api/v4 the created task does not use 'gitlab' as instance name (but '0xacab.org' here). It's still possible to explicitely specify the instance name if needed.	2019-01-10 09:49:26 +01:00
David Douard	7db281aa38	Fix gitlab task: pass per_page lister arg paremeter to the lister constructor	2019-01-08 10:35:33 +01:00

1 2

96 commits