swh-lister

Author	SHA1	Message	Date
Nicolas Dandrimont	9e083c1eea	Introduce a simpler base pattern for lister implementations. This new pattern uses the lister support features introduced in swh.scheduler to replace the database management done in previous iterations of the listers.	2021-01-11 11:00:29 +01:00
Antoine R. Dumont (@ardumont)	30ad6200a2	Drop mock_get_scheduler which creates indirection for no good reason This is no longer useful, as removing it and tests are still ok.	2020-10-16 14:32:46 +02:00
Antoine Lambert	22f7181294	python: Reorder imports with isort Related to T2610	2020-09-17 17:48:27 +02:00
Antoine R. Dumont (@ardumont)	e3c856b5ee	utils.split_range: Split into not overlapping ranges Existing listers use the `is_within_bound` [1] method from the base lister. This method uses inclusive boundaries in all cases. As some "range" task listers [2] [3] are using `split_range` function to create "overlapping" ranges, this can cause concurrent insert issues down the line [4]. This commit adapts the function `split_range` to make the generated ranges no longer overlap. [1] https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/core/lister_base.py$194-199 [2] https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitlab/tasks.py$37-41 [3] https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitea/tasks.py$36-41 Related to T2577	2020-09-10 11:01:44 +02:00
Antoine R. Dumont (@ardumont)	725c1fe4ad	test_utils: Migrate to pytest	2020-09-09 18:48:07 +02:00
Antoine R. Dumont (@ardumont)	5a5b7ef70b	tests: Separate lister instantiations Prior to this commit, all listers were instantiated at the same time even if only one was needed. This commit separates those instantiations. The only drawback to this is the db model initialization which now happens at each lister instantiation. This can be dealt with if needed at another time though.	2020-09-02 12:49:00 +02:00
Antoine R. Dumont (@ardumont)	92422dcf75	pytest_plugin: Instantiate only lister with no particular setup This should fix the remaining blocking problems in the jenkins build failure report [1] [1] https://jenkins.softwareheritage.org/view/Debian%20packages/job/debian/job/packages/job/DLS/job/gbp-buildpackage/78/consoleFull	2020-09-02 12:25:15 +02:00
Antoine R. Dumont (@ardumont)	e99d3464e4	test_cli: Exclude launchpad lister from the check This should fix the build [1] [1] https://jenkins.softwareheritage.org/view/Debian%20packages/job/debian/job/packages/job/DLS/job/gbp-buildpackage/77/console	2020-09-01 15:55:24 +02:00
Nicolas Dandrimont	211f4610df	Move get_scheduler monkeypatching into an explicit pytest fixture This allows us to actually run the lister instantiation code instead of relying on the underlying structure of the lister object. In turn, this allows future listers to use the scheduler right in their __init__.	2020-07-16 12:14:04 +02:00
Nicolas Dandrimont	c9963d4302	Use the new names for the swh.scheduler test fixtures	2020-07-09 17:06:50 +02:00
Léni Gauffier	58ef08b083	Added LaunchpadLister Summary: Related to T1734 From abandonned D2799 Reviewers: ardumont Reviewed By: ardumont Differential Revision: https://forge.softwareheritage.org/D2974	2020-04-12 01:00:12 +02:00
David Douard	93a4d8b784	Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment.	2020-04-08 16:31:22 +02:00
Antoine R. Dumont (@ardumont)	484377cc13	lister.cli: Remove task type register cli It's now defined in swh.scheduler	2019-11-18 10:41:46 +01:00
Antoine R. Dumont (@ardumont)	eebbc859fc	lister.cli: Clarify configuration loading step	2019-11-08 10:50:51 +01:00
David Douard	b810876ef8	tasks: normalize the url argument name of most lister Since all the listing tasks accepts an url as first argument (whatever the argument name is), it makes sense to use a simple common argument name for this. I've chosen 'url' instead of api_baseurl/forge_url/url. Also kill now useless `new_lister()` functions.	2019-09-04 15:38:01 +02:00
David Douard	8d9deeb8f8	plugins: add support for scheduler's task-type declaration Add a new register-task-types cli that will create missing task-type entries in the scheduler according to: - only create missing task-types (do not update them), but check that the backend_name field is consistent, - each SWHTask-based task declared in a module listed in the 'task_modules' plugin registry field will be checked and added if needed; tasks which name start wit an underscore will not be added, - added task-type will have: - the 'type' field is derived from the task's function name (with underscores replaced with dashes), - the description field is the first line of that function's docstring, - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with a simple pattern matching to have decent default values for full/incremental tasks), - these default values can be overloaded via the 'task_type' plugin registry entry. For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`). Comes with some tests.	2019-09-04 15:36:08 +02:00
David Douard	e3c0ea9d90	implement listers as plugins Listers are declared as plugins via the `swh.workers` entry_point. As such, the registry function is expected to return a dict with the `task_modules` field (as for generic worker plugins), plus: - `lister`: the lister class, - `models`: list of SQLAlchemy models used by this lister, - `init` (optionnal): hook (callable) used to initialize the lister's state (typically, create/initialize the database for this lister). If not set, the default implementation creates database tables (after optionally having deleted exisintg ones) according to models declared in the `models` register field. There is no need for explicitely add lister task modules in the main `conftest` module, but any new/extra lister to be tested must be registered (the tested lister module must be properly installed in the test environment). Also refactor a bit the cli tools: - add support for the standard --config-file option at the 'lister' group level, - move the --db-url to the 'lister' group, - drop the --lister option for the `swh lister db-init` cli tool: initializing (especially with --drop-tables) the database for a single lister is unreliable, since all tables are created using a sibgle MetaData (in the same namespace).	2019-09-03 15:02:24 +02:00
David Douard	87cec2f5c3	phabricator: refactor PhabricatorLister's constructor - use the 'standard' api_baseurl as init argument, - make it optional, with default to forge.softwareheritage.org, - use origin_url as id.	2019-09-02 12:29:38 +02:00
David Douard	3816b4d3bf	cgit: rewrite the CGit lister Simplify the code: - do only inherit from ListerBase - implement HTTP queries directly using requests - get rid of convoluted code Make the origin_url gathered from the git repo's "project" page instead of building it from the 'url_prefix' hack. Now, the lister WILL make substancially more requests, since it will make one request per listed git repo, but the provided origin_url should be pretty reliable now. When several url are provided as clonable URLs, choose the http/https one first, otherwise, choose the first one of the list. Add proper tests for the cgit lister. Also, get rid of the 'time_updated' column in the model.	2019-09-02 12:29:31 +02:00
Antoine R. Dumont (@ardumont)	4b2ab0488a	cli: Unify new_lister method name to get_lister	2019-08-28 16:29:26 +02:00
Antoine R. Dumont (@ardumont)	dee9fe93bf	cli: Bootstrap tests on cli	2019-08-28 16:29:26 +02:00
Nicolas Dandrimont	b3e815bef8	Rename test methods to test_ to allow collection by pytest Part of T1261	2018-10-15 10:11:07 +02:00
Antoine R. Dumont (@ardumont)	e24cf1d4f1	gitlab.lister: Simplify retrieving headers information As response headers' keys are case-insensitive and requests does the aggregation magic. [1] http://docs.python-requests.org/en/master/user/quickstart/#response-headers	2018-07-16 13:57:34 +02:00
Antoine R. Dumont (@ardumont)	81fd5f9c5d	swh.lister.gitlab.tasks: Fix range computations	2018-07-12 14:23:14 +02:00
Antoine R. Dumont (@ardumont)	13bb7aca58	swh.lister.gitlab: Improve headers extraction	2018-07-11 18:09:18 +02:00

25 commits