Commit graph

52 commits

Author SHA1 Message Date
David Douard
cccb8c21ff Replace all remaining occurrences of the 'local' cls by 'postgresql'
The former has been deprecated for ages...
2024-10-28 14:35:29 +01:00
Antoine Lambert
6e7bc49ec7 Harmonize listers parameters and add test to check mandatory ones
Ensure that all lister classes have the same set of mandatory parameters
in their constructors, notably: scheduler, url, instance and credentials.

Add a new test checking listers classes have mandatory parameters declared
in their constructors. The purpose is to avoid deployment issues on staging
or production environment as celery tasks can fail to be executed if mandatory
parameters are not handled by listers.

Reated to swh/infra/sysadm-environment#5030.
2023-09-06 11:55:34 +02:00
Antoine R. Dumont (@ardumont)
fcfb7004db
pypi: Allow passing configuration arguments to task
The constructor allows it but not the celery task.

This also aligns the behavior with other lister tasks.
2023-08-04 15:34:02 +02:00
Antoine Lambert
4f57e84450 Use http_retry decorator from swh.core.retry module
The http_retry decorator has been moved to swh-core package in order
to ease its reuse across swh packages.
2023-04-13 14:19:57 +02:00
Nicolas Dandrimont
e785e67315 Hook up recently introduced options to all listers
Hopefully one day we'll be able to replace all of this mess with PEP692
TypedDict kwargs, but that's only on track for Python 3.12.
2022-12-05 16:33:45 +01:00
Antoine Lambert
9c55acd286 Use generic HTTP retry policy by default and rename dedicated decorator
Instead of retrying HTTP requests only for 429 status code by default,
prefer to use the generic retry policy enabling to also retry for status
codes >= 500 but also on ConnectionError exceptions.

Rename throttling_retry decorator to http_retry to reflect this change.
2022-09-26 10:48:40 +02:00
Valentin Lorentz
b7ec6cb120 tests: Simplify origin comparison and improve pytest diff on failure
By using a single equality instead of checking len() then zip()
to check one by one, pytest can find the common/missing elements
and print them nicely when the two lists are unequal.
2022-08-24 17:21:24 +02:00
Antoine Lambert
d38e05cff7 python: Reformat code with black 22.3.0
Related to T3922
2022-04-08 15:15:09 +02:00
Antoine Lambert
445d539b3f Remove no longer needed tenacity workarounds
Now that we have packaged tenacity 6.2 for debian buster and use it
in production, we can remove the workarounds to support tenacity < 5.
2021-12-08 13:28:11 +01:00
Antoine R. Dumont (@ardumont)
df46b22098
Make PyPI lister incremental and complete in regards to last_update
This rewrote the current implementation to actually use pypi's xml-rpc api which allows
to be incremental. It also allows to fetch the last release date per package. This last
part actually make it possible to update the "last_update" entry in the ListedOrigin
model.

Related to T3399
2021-07-09 12:51:58 +02:00
Antoine R. Dumont (@ardumont)
698be475e9
test_tasks: Align test consistently with other using mocker 2021-07-09 10:13:21 +02:00
Antoine Lambert
2461c97bbb pypi: Use BeautifulSoup for parsing HTML instead of xmltodict
xmltodict now raises an error while trying to parse the HTML content
of https://pypi.org/simple/ page.

So use BeautifulSoup HTML parser instead as it is aleady a requirement
of swh-lister and it does not fail parsing the PyPI HTML page.

Also drop no longer used xmltodict in requirements.
2021-02-05 14:23:11 +01:00
Antoine Lambert
4245c5046f Remove no longer used models field in dict returned by register 2021-02-02 16:33:52 +01:00
Antoine R. Dumont (@ardumont)
ae17b6b9a0
Make stateless lister constructors compatible with credentials
In effect, it just allows to add credentials to cgit, cran and pypi listers.

This fixes instances of error [1]

[1] https://sentry.softwareheritage.org/share/issue/2c35a9f129cf4982a2dd003a232d507a/

Related to T2998
2021-01-28 14:42:56 +01:00
tenma
6cd31769c1 tests: Remove no longer used conftest files
All the fixtures declared in them are not used anymore in the
tests of the listers ported to the new Lister API.
2021-01-26 17:09:04 +01:00
Antoine Lambert
ea8ecee541 tests: Fix errors after swh-scheduler API update
The PaginatedListedOriginList model has been updated in
rDSCHb93aa5be2c2d5dc2130e1027698f3e1255052d8d and the origins
field has been renamed to results.
2021-01-25 17:11:54 +01:00
tenma
6d22007946 pypi.tests: simplify requests_mock invocation 2021-01-21 16:17:29 +01:00
tenma
62c825b8cb Reimplement PyPI lister using new Lister API
The new lister has only full listing capability.
It scrapes pypi.org list of packages.
Rate-limiting was not encountered but is handled generically.
2021-01-20 15:45:16 +01:00
Antoine Lambert
22f7181294 python: Reorder imports with isort
Related to T2610
2020-09-17 17:48:27 +02:00
Antoine R. Dumont (@ardumont)
5a5b7ef70b
tests: Separate lister instantiations
Prior to this commit, all listers were instantiated at the same time even if
only one was needed. This commit separates those instantiations.

The only drawback to this is the db model initialization which now happens at
each lister instantiation. This can be dealt with if needed at another time
though.
2020-09-02 12:49:00 +02:00
Antoine R. Dumont (@ardumont)
9437a643ad
pytest: Define plugin and declare it in the root conftest
Then drop all unneeded and indirect imports
2020-09-02 12:25:15 +02:00
Nicolas Dandrimont
c9963d4302 Use the new names for the swh.scheduler test fixtures 2020-07-09 17:06:50 +02:00
David Douard
93a4d8b784 Enable black
- blackify all the python files,
- enable black in pre-commit,
- add a black tox environment.
2020-04-08 16:31:22 +02:00
Gautier Pugnonblanc Yann
60adc424be add anotation type in some lister file 2020-02-17 15:58:34 +01:00
Antoine R. Dumont (@ardumont)
4a9608f31c
lister/tasks: Standardize return statements
The following commit adapts the return statements from both lister and their
associated tasks. This standardizes on what other modules (e.g. both dvcs and
package loaders) do.
2019-12-02 15:49:38 +01:00
Antoine R. Dumont (@ardumont)
af04129d79
lister.pypi: Align lister with pypi package loader 2019-11-21 18:43:45 +01:00
Antoine R. Dumont (@ardumont)
1cf7c8e86b
lister.tests: Add missing task_type for package listers
The scheduler module no longer initializes itself those task_type.
2019-11-21 13:44:20 +01:00
Nicolas Dandrimont
78105940ff Stop binding tasks to a specific instance of the celery app
The celery.shared_task decorator allows late-binding of tasks to any celery app,
which is well suited for our "task plugin" architecture.
2019-10-18 18:02:25 +02:00
Antoine R. Dumont (@ardumont)
960868badb
pypi.tests: Remove trailing _ in test method name 2019-10-15 12:19:10 +02:00
Antoine R. Dumont (@ardumont)
a8cde12d72
tests: Update pytest_plugin according to latest version change 2019-10-14 18:20:15 +02:00
Antoine R. Dumont (@ardumont)
599af25ad6
pypi.lister: Add integration test which checks the scheduled tasks
Related T2032
2019-10-11 13:24:55 +02:00
Antoine R. Dumont (@ardumont)
04ca318680
simple_lister: Extract common behavior in base class 2019-10-09 17:35:12 +02:00
David Douard
8d9deeb8f8 plugins: add support for scheduler's task-type declaration
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:

- only create missing task-types (do not update them), but check that the
  backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
  plugin registry field will be checked and added if needed; tasks which name
  start wit an underscore will not be added,
- added task-type will have:
  - the 'type' field is derived from the task's function name (with underscores
    replaced with dashes),
  - the description field is the first line of that function's docstring,
  - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
    a simple pattern matching to have decent default values for full/incremental
    tasks),
  - these default values can be overloaded via the 'task_type' plugin registry
    entry.

For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).

Comes with some tests.
2019-09-04 15:36:08 +02:00
David Douard
e3c0ea9d90 implement listers as plugins
Listers are declared as plugins via the `swh.workers` entry_point.

As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:

- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
  (typically, create/initialize the database for this lister).
  If not set, the default implementation creates database tables (after
  optionally having deleted exisintg ones) according to models declared in
  the `models` register field.

There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).

Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
  level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
  initializing (especially with --drop-tables) the database for a single
  lister is unreliable, since all tables are created using a sibgle MetaData
  (in the same namespace).
2019-09-03 15:02:24 +02:00
Antoine R. Dumont (@ardumont)
87d2a16df0
listers: Allow to override policy and priority for scheduled tasks
Prior to this commit, the policy and priority were hard-coded.
The default values are now the old hard-coded values.

This will allow to develop a cli to trigger forges listing with oneshot policy
and some priority tasks. Thus ingesting those faster and without manual
interventation as we currently do.
2019-08-28 11:57:10 +02:00
Archit Agrawal
08ade29e6d swh.lister.pypi: Add tests
Add tests for pypi lister
Closes T1890
2019-07-18 17:13:13 +05:30
Valentin Lorentz
aef7d5952e Remove columns 'description' and 'origin_id'.
They are useless.
2019-06-19 10:29:15 +02:00
Antoine R. Dumont (@ardumont)
b81621274b
lister: Unify credentials structure between listers
This becomes a dictionary of key <lister-name>, value a dict of key
<instance-name>, value list of dict username/password.

Related T1772
2019-05-29 14:00:11 +02:00
Antoine Lambert
701d833cdf Update scheduler task names to new ones
Related T1508
2019-05-21 13:27:29 +02:00
David Douard
f670de298f Remove debug logging from tasks' code
since this is now handled by the SWHTask itself.
2019-01-17 13:58:29 +01:00
David Douard
f46f3e2015 Remove explicit setting of the task base class
since it's now the default base class in swh-scheduler (>= 0.0.39)
2019-01-10 09:55:17 +01:00
David Douard
65f3b9edc8 Add tests for pypi tasks 2019-01-08 10:35:33 +01:00
David Douard
0583b0e685 Add a 'ping' task for every lister. 2019-01-08 10:35:33 +01:00
David Douard
2d1f0643ff Heavy refactor of the task system
Get rid of the class based task definition in favor of decorator-based
task declarations.

Doing so, we can get rid of core/tasks.py

Task names are explicitely set to keep compatibility with task
definitions in schedulers' database.

This also add debug statements at the beginning and end of each lister
task.
2019-01-08 10:33:32 +01:00
Antoine R. Dumont (@ardumont)
5b20eff7d3
pypi.lister: Use https://pypi.org/project/<name>/ uri as project_url
The previous url is correct and redirects to this new one.

Related T422
2018-09-17 16:01:17 +02:00
Antoine R. Dumont (@ardumont)
ed64d24634
pypi.lister: Normalize pypi name to PyPI
Related T422
2018-09-14 13:24:48 +02:00
Antoine R. Dumont (@ardumont)
cba22b7d19
doc: Fix typos according to review 2018-09-06 14:58:03 +02:00
Antoine R. Dumont (@ardumont)
3bb12649ec
pypi.lister: Shuffle the package list 2018-09-06 12:18:37 +02:00
Antoine R. Dumont (@ardumont)
86cfac9277
pypi.lister: Improve parameter names 2018-09-06 09:56:48 +02:00
Antoine R. Dumont (@ardumont)
267ce9b463
swh.lister.pypi: Adapt task creation dict according to latest dev 2018-08-01 14:52:35 +02:00