Commit graph

29 commits

Author SHA1 Message Date
Antoine R. Dumont (@ardumont)
ae17b6b9a0
Make stateless lister constructors compatible with credentials
In effect, it just allows to add credentials to cgit, cran and pypi listers.

This fixes instances of error [1]

[1] https://sentry.softwareheritage.org/share/issue/2c35a9f129cf4982a2dd003a232d507a/

Related to T2998
2021-01-28 14:42:56 +01:00
Vincent SELLIER
f6f9f1ca28
cgit: Don't stop the listing when a repository page is not available
Related to T2988
2021-01-27 14:52:04 +01:00
Vincent SELLIER
91fcde8341
cgit: Add support for last_update information during listing
Related to T2988
2021-01-27 14:17:17 +01:00
Vincent SELLIER
d62e77c1b4
cgit lister: Add missing types on the init method
Related to T2984
2021-01-25 18:52:59 +01:00
Antoine Lambert
ea8ecee541 tests: Fix errors after swh-scheduler API update
The PaginatedListedOriginList model has been updated in
rDSCHb93aa5be2c2d5dc2130e1027698f3e1255052d8d and the origins
field has been renamed to results.
2021-01-25 17:11:54 +01:00
Vincent SELLIER
e4a590fc7f
Port cgit lister to the new lister api
Related to T2984
2021-01-25 14:57:45 +01:00
Antoine R. Dumont (@ardumont)
5d4b38999d
lister.cgit.tests: Clarify lister configuration 2020-10-30 13:30:14 +01:00
Antoine Lambert
22f7181294 python: Reorder imports with isort
Related to T2610
2020-09-17 17:48:27 +02:00
Antoine R. Dumont (@ardumont)
5a5b7ef70b
tests: Separate lister instantiations
Prior to this commit, all listers were instantiated at the same time even if
only one was needed. This commit separates those instantiations.

The only drawback to this is the db model initialization which now happens at
each lister instantiation. This can be dealt with if needed at another time
though.
2020-09-02 12:49:00 +02:00
Antoine R. Dumont (@ardumont)
9437a643ad
pytest: Define plugin and declare it in the root conftest
Then drop all unneeded and indirect imports
2020-09-02 12:25:15 +02:00
Nicolas Dandrimont
c9963d4302 Use the new names for the swh.scheduler test fixtures 2020-07-09 17:06:50 +02:00
David Douard
93a4d8b784 Enable black
- blackify all the python files,
- enable black in pre-commit,
- add a black tox environment.
2020-04-08 16:31:22 +02:00
Gautier Pugnonblanc Yann
e5fea84c55 review corrections 2020-02-20 09:13:49 +01:00
Gautier Pugnonblanc Yann
60adc424be add anotation type in some lister file 2020-02-17 15:58:34 +01:00
Antoine Lambert
99fcd2b3f5 docs: Fix sphinx warnings
Related to T2188
2020-01-17 16:15:11 +01:00
Antoine R. Dumont (@ardumont)
5ab9d67d67
core: Align listers' task output (hg/git tasks) with expected format
Related to T2134
Related to D2409
Related to D2410
2019-12-09 15:12:17 +01:00
Antoine R. Dumont (@ardumont)
4a9608f31c
lister/tasks: Standardize return statements
The following commit adapts the return statements from both lister and their
associated tasks. This standardizes on what other modules (e.g. both dvcs and
package loaders) do.
2019-12-02 15:49:38 +01:00
Nicolas Dandrimont
ff7fdf24db Use a uniform User-Agent on all listers
This also adds tests to make sure that we properly send our version number to
upstreams.
2019-11-22 15:49:23 +01:00
Nicolas Dandrimont
78105940ff Stop binding tasks to a specific instance of the celery app
The celery.shared_task decorator allows late-binding of tasks to any celery app,
which is well suited for our "task plugin" architecture.
2019-10-18 18:02:25 +02:00
Antoine R. Dumont (@ardumont)
a8cde12d72
tests: Update pytest_plugin according to latest version change 2019-10-14 18:20:15 +02:00
Antoine R. Dumont (@ardumont)
394658e53b
cgit.tests: Check the tasks from the scheduler 2019-10-09 17:57:57 +02:00
David Douard
bd11830328 cgit: reduce the batch size to 10 and add a bit of logging
Since the CGit lister now perform an HTTP query for each git repos listed in
the main index, it is significantly slower, so reducing the time between
database commits make sense, and won't overload the database.

With a bit of logging, it makes it easier to follow/debug the progress of
a listing.
2019-09-04 15:37:40 +02:00
David Douard
8d9deeb8f8 plugins: add support for scheduler's task-type declaration
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:

- only create missing task-types (do not update them), but check that the
  backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
  plugin registry field will be checked and added if needed; tasks which name
  start wit an underscore will not be added,
- added task-type will have:
  - the 'type' field is derived from the task's function name (with underscores
    replaced with dashes),
  - the description field is the first line of that function's docstring,
  - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
    a simple pattern matching to have decent default values for full/incremental
    tasks),
  - these default values can be overloaded via the 'task_type' plugin registry
    entry.

For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).

Comes with some tests.
2019-09-04 15:36:08 +02:00
David Douard
e3c0ea9d90 implement listers as plugins
Listers are declared as plugins via the `swh.workers` entry_point.

As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:

- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
  (typically, create/initialize the database for this lister).
  If not set, the default implementation creates database tables (after
  optionally having deleted exisintg ones) according to models declared in
  the `models` register field.

There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).

Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
  level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
  initializing (especially with --drop-tables) the database for a single
  lister is unreliable, since all tables are created using a sibgle MetaData
  (in the same namespace).
2019-09-03 15:02:24 +02:00
David Douard
8785fc1a4e cgit: fix cgit's task module and tests
forgot some `url_prefix` there.
2019-09-03 12:01:55 +02:00
David Douard
3816b4d3bf cgit: rewrite the CGit lister
Simplify the code:
- do only inherit from ListerBase
- implement HTTP queries directly using requests
- get rid of convoluted code

Make the origin_url gathered from the git repo's "project" page instead of
building it from the 'url_prefix' hack. Now, the lister WILL make substancially
more requests, since it will make one request per listed git repo, but
the provided origin_url should be pretty reliable now.

When several url are provided as clonable URLs, choose the http/https one first,
otherwise, choose the first one of the list.

Add proper tests for the cgit lister.

Also, get rid of the 'time_updated' column in the model.
2019-09-02 12:29:31 +02:00
Archit Agrawal
0bf24469b7 swh.lister.cgit: Remove repo page visit step
Remove the need to visit every page and extract the
origin url by introducing a parameter url_prefix.
The origin url is in format <prefix>/<repo_name> where
The prefix is same for all the repos for a particular
cgit instance.
2019-06-28 20:02:07 +05:30
Archit Agrawal
7e3c79bb1d swh.lister.cgit: Add pagination support
Some cgit instance have a pagination. Modifiy
lister to find all the pages and list all the repos
from all the pages.
2019-06-28 19:27:25 +05:30
Archit Agrawal
b972a2a88d swh.lister.cgit
Implemented a lister to list the repos for a given CGit instance.

Closes T1659
2019-06-28 19:27:25 +05:30