swh-lister

Author	SHA1	Message	Date
Antoine R. Dumont (@ardumont)	394658e53b	cgit.tests: Check the tasks from the scheduler	2019-10-09 17:57:57 +02:00
David Douard	bd11830328	cgit: reduce the batch size to 10 and add a bit of logging Since the CGit lister now perform an HTTP query for each git repos listed in the main index, it is significantly slower, so reducing the time between database commits make sense, and won't overload the database. With a bit of logging, it makes it easier to follow/debug the progress of a listing.	2019-09-04 15:37:40 +02:00
David Douard	8d9deeb8f8	plugins: add support for scheduler's task-type declaration Add a new register-task-types cli that will create missing task-type entries in the scheduler according to: - only create missing task-types (do not update them), but check that the backend_name field is consistent, - each SWHTask-based task declared in a module listed in the 'task_modules' plugin registry field will be checked and added if needed; tasks which name start wit an underscore will not be added, - added task-type will have: - the 'type' field is derived from the task's function name (with underscores replaced with dashes), - the description field is the first line of that function's docstring, - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with a simple pattern matching to have decent default values for full/incremental tasks), - these default values can be overloaded via the 'task_type' plugin registry entry. For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`). Comes with some tests.	2019-09-04 15:36:08 +02:00
David Douard	e3c0ea9d90	implement listers as plugins Listers are declared as plugins via the `swh.workers` entry_point. As such, the registry function is expected to return a dict with the `task_modules` field (as for generic worker plugins), plus: - `lister`: the lister class, - `models`: list of SQLAlchemy models used by this lister, - `init` (optionnal): hook (callable) used to initialize the lister's state (typically, create/initialize the database for this lister). If not set, the default implementation creates database tables (after optionally having deleted exisintg ones) according to models declared in the `models` register field. There is no need for explicitely add lister task modules in the main `conftest` module, but any new/extra lister to be tested must be registered (the tested lister module must be properly installed in the test environment). Also refactor a bit the cli tools: - add support for the standard --config-file option at the 'lister' group level, - move the --db-url to the 'lister' group, - drop the --lister option for the `swh lister db-init` cli tool: initializing (especially with --drop-tables) the database for a single lister is unreliable, since all tables are created using a sibgle MetaData (in the same namespace).	2019-09-03 15:02:24 +02:00
David Douard	8785fc1a4e	cgit: fix cgit's task module and tests forgot some `url_prefix` there.	2019-09-03 12:01:55 +02:00
David Douard	3816b4d3bf	cgit: rewrite the CGit lister Simplify the code: - do only inherit from ListerBase - implement HTTP queries directly using requests - get rid of convoluted code Make the origin_url gathered from the git repo's "project" page instead of building it from the 'url_prefix' hack. Now, the lister WILL make substancially more requests, since it will make one request per listed git repo, but the provided origin_url should be pretty reliable now. When several url are provided as clonable URLs, choose the http/https one first, otherwise, choose the first one of the list. Add proper tests for the cgit lister. Also, get rid of the 'time_updated' column in the model.	2019-09-02 12:29:31 +02:00
Archit Agrawal	0bf24469b7	swh.lister.cgit: Remove repo page visit step Remove the need to visit every page and extract the origin url by introducing a parameter url_prefix. The origin url is in format <prefix>/<repo_name> where The prefix is same for all the repos for a particular cgit instance.	2019-06-28 20:02:07 +05:30
Archit Agrawal	7e3c79bb1d	swh.lister.cgit: Add pagination support Some cgit instance have a pagination. Modifiy lister to find all the pages and list all the repos from all the pages.	2019-06-28 19:27:25 +05:30
Archit Agrawal	b972a2a88d	swh.lister.cgit Implemented a lister to list the repos for a given CGit instance. Closes T1659	2019-06-28 19:27:25 +05:30

9 commits