swh-lister

Author	SHA1	Message	Date
Antoine R. Dumont (@ardumont)	2ffe9c2aea	Use swh.core.github.pytest_plugin in github tests Related to T4232	2022-05-20 16:06:11 +02:00
Valentin Lorentz	d715aaf903	Make user_agent a parameter of GitHubSession So it can be set when used by other packages	2022-04-26 11:08:53 +02:00
Valentin Lorentz	2d04244cc9	Move GitHubSession from github/lister.py to github/utils.py So it can be reused by other packages without importing lister.py itself	2022-04-26 11:08:49 +02:00
Valentin Lorentz	9ee4a99f15	github: Refactor rate-limiting out of the GitHubLister class This will allow the GitHub Metadata Fetcher to reuse the logic by importing the GitHubSession class.	2022-04-26 11:08:45 +02:00
Valentin Lorentz	d0924f39d0	github: Remove dead code Authentication is handled directly in the session	2022-04-21 20:32:45 +02:00
Antoine Lambert	d38e05cff7	python: Reformat code with black 22.3.0 Related to T3922	2022-04-08 15:15:09 +02:00
Nicolas Dandrimont	5f567b3c34	Deduplicate origins in the GitHub lister In some circumstances, GitHub will return two separate repos with the same html_url in the same page. This makes the lister fail with a cardinality error.	2021-12-01 16:00:14 +01:00
Nicolas Dandrimont	879170a57d	GitHub: handle edge cases with empty responses	2021-03-19 16:53:52 +01:00
Nicolas Dandrimont	c375a61b16	GitHub: handle Server Errors These errors happen, sometimes, when requesting large pages of results.	2021-03-19 16:53:52 +01:00
Nicolas Dandrimont	4a215e68e0	GitHub: Move rate-limit reset logic to RateLimited exception This makes the logic easier to test.	2021-03-19 16:52:46 +01:00
Nicolas Dandrimont	cfd4169bd8	Retry GitHub requests on ChunkEncodingErrors These happen, sometimes, when the connection to the GitHub server resets, e.g. because of congestion on a slow link.	2021-03-19 16:52:46 +01:00
Nicolas Dandrimont	61c1d444c5	GitHub: Move rate limit handling to the request function	2021-03-19 15:58:01 +01:00
Nicolas Dandrimont	03b10e5c83	GitHub: Start moving the request logic to a separate function	2021-03-19 15:58:01 +01:00
Nicolas Dandrimont	8f7dbb7488	GitHub: Use function for requests.Session initialization This will help us to break the retry logic for the listing requests themselves to a separate function too.	2021-03-19 15:58:01 +01:00
Antoine Lambert	4245c5046f	Remove no longer used models field in dict returned by register	2021-02-02 16:33:52 +01:00
tenma	6cd31769c1	tests: Remove no longer used conftest files All the fixtures declared in them are not used anymore in the tests of the listers ported to the new Lister API.	2021-01-26 17:09:04 +01:00
Antoine Lambert	ea8ecee541	tests: Fix errors after swh-scheduler API update The PaginatedListedOriginList model has been updated in rDSCHb93aa5be2c2d5dc2130e1027698f3e1255052d8d and the origins field has been renamed to results.	2021-01-25 17:11:54 +01:00
Nicolas Dandrimont	b63aa83b41	Reimplement the GitHub lister using the new pattern class This replaces the test data with some manually generated answers, which allows us to test a few more cases for instantiating the lister. This also expands test coverage to test behavior on rate-limited requests.	2021-01-11 11:00:29 +01:00
Antoine R. Dumont (@ardumont)	978fbbe029	lister.github.tests: Clarify lister configuration	2020-10-30 13:30:15 +01:00
Antoine Lambert	22f7181294	python: Reorder imports with isort Related to T2610	2020-09-17 17:48:27 +02:00
Antoine R. Dumont (@ardumont)	5a5b7ef70b	tests: Separate lister instantiations Prior to this commit, all listers were instantiated at the same time even if only one was needed. This commit separates those instantiations. The only drawback to this is the db model initialization which now happens at each lister instantiation. This can be dealt with if needed at another time though.	2020-09-02 12:49:00 +02:00
Antoine R. Dumont (@ardumont)	9437a643ad	pytest: Define plugin and declare it in the root conftest Then drop all unneeded and indirect imports	2020-09-02 12:25:15 +02:00
Nicolas Dandrimont	c9963d4302	Use the new names for the swh.scheduler test fixtures	2020-07-09 17:06:50 +02:00
David Douard	93a4d8b784	Enable black - blackify all the python files, - enable black in pre-commit, - add a black tox environment.	2020-04-08 16:31:22 +02:00
Gautier Pugnonblanc Yann	60adc424be	add anotation type in some lister file	2020-02-17 15:58:34 +01:00
Antoine R. Dumont (@ardumont)	ed73cea771	github.lister: Filter out partial repositories which break listing This commit fixes the repository mapping to model. It broke when the listed repository was either None or missing the id field [1] [1] https://sentry.softwareheritage.org/share/issue/532d682182fc43d6a7a99400e3928811/	2020-01-20 10:25:57 +01:00
Antoine R. Dumont (@ardumont)	4b383abc56	github.lister: Use Retry-After header when rate limit reached Following the github's documentation [1] [1] https://developer.github.com/v3/guides/best-practices-for-integrators/#dealing-with-abuse-rate-limits Related to T2170	2020-01-17 10:37:53 +01:00
Antoine R. Dumont (@ardumont)	5ab9d67d67	core: Align listers' task output (hg/git tasks) with expected format Related to T2134 Related to D2409 Related to D2410	2019-12-09 15:12:17 +01:00
Antoine R. Dumont (@ardumont)	4a9608f31c	lister/tasks: Standardize return statements The following commit adapts the return statements from both lister and their associated tasks. This standardizes on what other modules (e.g. both dvcs and package loaders) do.	2019-12-02 15:49:38 +01:00
Nicolas Dandrimont	ff7fdf24db	Use a uniform User-Agent on all listers This also adds tests to make sure that we properly send our version number to upstreams.	2019-11-22 15:49:23 +01:00
Stefano Zacchiroli	974f80f966	typing: minimal changes to make a no-op mypy run pass	2019-10-28 15:35:21 +01:00
Nicolas Dandrimont	78105940ff	Stop binding tasks to a specific instance of the celery app The celery.shared_task decorator allows late-binding of tasks to any celery app, which is well suited for our "task plugin" architecture.	2019-10-18 18:02:25 +02:00
Antoine R. Dumont (@ardumont)	a8cde12d72	tests: Update pytest_plugin according to latest version change	2019-10-14 18:20:15 +02:00
Antoine R. Dumont (@ardumont)	0b8b1419e1	github.lister: Add integration test which checks scheduled tasks Related T2032	2019-10-12 03:28:39 +02:00
Antoine Lambert	04d8fdf8df	github/lister: Prevent erroneous scheduler tasks disabling Closes T2014	2019-09-19 14:30:30 +02:00
Antoine Lambert	7572228f7c	listers: Ensure run can be called without bounds arguments Closes T2001	2019-09-17 15:09:04 +02:00
David Douard	b810876ef8	tasks: normalize the url argument name of most lister Since all the listing tasks accepts an url as first argument (whatever the argument name is), it makes sense to use a simple common argument name for this. I've chosen 'url' instead of api_baseurl/forge_url/url. Also kill now useless `new_lister()` functions.	2019-09-04 15:38:01 +02:00
David Douard	8d9deeb8f8	plugins: add support for scheduler's task-type declaration Add a new register-task-types cli that will create missing task-type entries in the scheduler according to: - only create missing task-types (do not update them), but check that the backend_name field is consistent, - each SWHTask-based task declared in a module listed in the 'task_modules' plugin registry field will be checked and added if needed; tasks which name start wit an underscore will not be added, - added task-type will have: - the 'type' field is derived from the task's function name (with underscores replaced with dashes), - the description field is the first line of that function's docstring, - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with a simple pattern matching to have decent default values for full/incremental tasks), - these default values can be overloaded via the 'task_type' plugin registry entry. For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`). Comes with some tests.	2019-09-04 15:36:08 +02:00
David Douard	e3c0ea9d90	implement listers as plugins Listers are declared as plugins via the `swh.workers` entry_point. As such, the registry function is expected to return a dict with the `task_modules` field (as for generic worker plugins), plus: - `lister`: the lister class, - `models`: list of SQLAlchemy models used by this lister, - `init` (optionnal): hook (callable) used to initialize the lister's state (typically, create/initialize the database for this lister). If not set, the default implementation creates database tables (after optionally having deleted exisintg ones) according to models declared in the `models` register field. There is no need for explicitely add lister task modules in the main `conftest` module, but any new/extra lister to be tested must be registered (the tested lister module must be properly installed in the test environment). Also refactor a bit the cli tools: - add support for the standard --config-file option at the 'lister' group level, - move the --db-url to the 'lister' group, - drop the --lister option for the `swh lister db-init` cli tool: initializing (especially with --drop-tables) the database for a single lister is unreliable, since all tables are created using a sibgle MetaData (in the same namespace).	2019-09-03 15:02:24 +02:00
David Douard	b87cd5d309	github: make GitHubLister's api_baseurl init argument optional	2019-09-02 12:29:38 +02:00
Antoine R. Dumont (@ardumont)	3d473c307c	lister: Type correctly the 'indexable' column instead of converting that column as a string As a side effect, bitbucket wise, we provided improperly the after query parameter as a date not url encoded. This resulted in improper api response from bitbucket's (we received from time to time the same next index as the current one). Related T1826	2019-06-26 10:58:54 +02:00
Antoine R. Dumont (@ardumont)	b99617f976	relister: Fix consistently the behavior for the first time relisting If nothing has been done prior to a full relisting, there is actually nothing to list. So the relister in question does nothing. In that context, the IndexingLister class's `db_partition_indices` method now returns an empty list instead of raising a ValueError when there is nothing to list. Related T1826 Related `e129e48`	2019-06-25 14:48:17 +02:00
Antoine R. Dumont (@ardumont)	5ec3067b0d	Clean up code - Remove unneeded return instructions - Clarify tests code regarding request_index computations	2019-06-25 14:48:13 +02:00
Antoine R. Dumont (@ardumont)	b3463ecddc	Drop SWH prefix in classes everywhere It's redundant with the swh modules in itself.	2019-06-20 19:08:46 +02:00
Valentin Lorentz	aef7d5952e	Remove columns 'description' and 'origin_id'. They are useless.	2019-06-19 10:29:15 +02:00
Antoine R. Dumont (@ardumont)	fc92c79b7e	models: Unify tablenames using singular as main archive's convention Related P434	2019-06-18 07:18:34 +02:00
Antoine R. Dumont (@ardumont)	b81621274b	lister: Unify credentials structure between listers This becomes a dictionary of key <lister-name>, value a dict of key <instance-name>, value list of dict username/password. Related T1772	2019-05-29 14:00:11 +02:00
David Douard	e5c3559033	tasks: fix handling of unsupported promise.save() calls the exception can also be an AttributeError. Also do not reraise this exception (in github/tasks.py). This promise saving feature is used for tests.	2019-04-11 11:03:48 +02:00
David Douard	f670de298f	Remove debug logging from tasks' code since this is now handled by the SWHTask itself.	2019-01-17 13:58:29 +01:00
David Douard	e31b61bee1	Do not crash range tasks if celery result backend does not support saving the group's state	2019-01-15 15:32:07 +01:00

1 2 3

107 commits