Commit graph

98 commits

Author SHA1 Message Date
Antoine R. Dumont (@ardumont)
fdb420238c
gitlab: Allow ingestion of hg_git origins as hg ones
Related to T3581#70593
2021-09-17 12:17:11 +02:00
Antoine R. Dumont (@ardumont)
4e4edee478
gitlab: Allow listing of instances providing multiple vcs_type
This will allow to list the foss.heptapod.net instance for example.

Related to T3581
2021-09-16 18:36:25 +02:00
Antoine Lambert
e904f4760e gitlab: Handle HTTP status code 500 when listing projects
GitLab API can return errors 500 when listing projects
(see https://gitlab.com/gitlab-org/gitlab/-/issues/262629).

To avoid ending the listing prematurely, skip buggy URLs and move
to next pages.

Related to T3442
2021-07-23 15:07:16 +02:00
Antoine Lambert
52c3150155 gitlab: Update requests query parameters
Increase number of origins per page to the maximum value allowed
by GitLab API (100) to send less requests.

Ask for simple responses to reduce size of JSON data.
2021-07-23 14:05:38 +02:00
Antoine Lambert
73f85c0b8a gitlab: Adapt requests retry policy to consider HTTP 50x status codes
Temporarily server failures can happen when listing a GitLab instance,
HTTP status codes 502, 503 or 520 are returned in that case.

So adapt lister requests retry policy to execute requests again when
such errors are encountered.

Related to T3442
2021-07-23 13:51:17 +02:00
Antoine Lambert
6c12350863 pattern: Use URL network location as instance name when not provided
Make the instance parameter of the base pattern lister optional and set
lister name to URL network location when not provided.

It simplifies lister creation when associated forge type have a lot of
instances in the wild (e.g. gitlab or cgit) while giving more details
about the listed forge instance.

Also process listers for forge with multiple instances (cgit, gitea,
gitlab, phabricator and tuleap) to ensure URL network location will be
used when instance parameter is not provided.

Related to T3403
2021-07-13 12:33:49 +02:00
Antoine R. Dumont (@ardumont)
72be074a79
gitlab: Deal with missing or trailing / in url input 2021-01-28 10:46:58 +01:00
Antoine R. Dumont (@ardumont)
e09ad272d7
tests: Drop unneeded reset instruction
Plus that instruction is not correct in most recent requests_mock version (failing the
debian build)
2021-01-27 15:42:57 +01:00
Antoine R. Dumont (@ardumont)
97254a19f2
gitlab: Implement keyset-based pagination listing
The previous pagination implementation has a hard-coded limit server side [1]

[1]

```
{"error":"Offset pagination has a maximum allowed offset of 50000 for requests that return objects of type Project. Remaining records can be retrieved using keyset pagination."}
```

Related to T2994
2021-01-26 16:54:14 +01:00
Antoine R. Dumont (@ardumont)
aefb260f76
gitlab: Add support for last_update information during listing 2021-01-26 14:03:20 +01:00
Antoine R. Dumont (@ardumont)
1a19b2c747
gitlab: Support authentication
Related to T2987
2021-01-26 14:03:20 +01:00
Antoine R. Dumont (@ardumont)
bea9d6d147
gitlab: make url mandatory and add type 2021-01-25 19:00:01 +01:00
Antoine Lambert
ea8ecee541 tests: Fix errors after swh-scheduler API update
The PaginatedListedOriginList model has been updated in
rDSCHb93aa5be2c2d5dc2130e1027698f3e1255052d8d and the origins
field has been renamed to results.
2021-01-25 17:11:54 +01:00
Antoine R. Dumont (@ardumont)
02871f16c9
gitlab: Adapt celery task implementations to the new lister api
Related to T2987
2021-01-25 15:08:31 +01:00
Antoine R. Dumont (@ardumont)
ce87a8f7b2
gitlab: Let the lister compute the internal project listing page
Related to T2987
2021-01-25 14:05:34 +01:00
Antoine R. Dumont (@ardumont)
7f1609265f
test: Rename internal method to something public
It's used in multiple module tests now.
2021-01-25 13:39:07 +01:00
Antoine R. Dumont (@ardumont)
d3fe3d5747
gitlab: Fix mypy issue 2021-01-25 13:39:07 +01:00
Antoine R. Dumont (@ardumont)
2246d28606
gitlab: Document the lister constructor parameters
Related to T2987
2021-01-25 13:30:45 +01:00
Antoine R. Dumont (@ardumont)
b352b8e11e
gitlab: Add test on rate-limit support
Related to T2987
2021-01-25 09:23:22 +01:00
Antoine R. Dumont (@ardumont)
1f911401a1
gitlab: Add test on incremental implementation
Note that the current implementation will start back the new visit from the last
next_page link seen (that's what is stored in the lister state to avoid computing back
the url). This means that this page will be seen at least 2 times, on the first visit
and on the next. This should not pose any problems as the listing is idempotent.

Related to T2987
2021-01-25 08:51:23 +01:00
Antoine R. Dumont (@ardumont)
84dd616ab6
gitlab: Add test on pagination
Related to T2987
2021-01-25 08:51:23 +01:00
Antoine R. Dumont (@ardumont)
1390a513f2
gitlab: Port to the new lister api
Related to T2987
2021-01-25 08:51:16 +01:00
tenma
5411141e3a gitlab.tests: fix erroneous import from gitea module 2021-01-21 16:17:36 +01:00
Antoine R. Dumont (@ardumont)
20a91482ca
lister.gitlab.tests: Clarify lister configuration 2020-10-30 13:30:15 +01:00
Antoine Lambert
22f7181294 python: Reorder imports with isort
Related to T2610
2020-09-17 17:48:27 +02:00
Antoine R. Dumont (@ardumont)
e3c856b5ee
utils.split_range: Split into not overlapping ranges
Existing listers use the `is_within_bound` [1] method from the base lister.
This method uses inclusive boundaries in all cases.

As some "range" task listers [2] [3] are using `split_range` function to create
"overlapping" ranges, this can cause concurrent insert issues down the line [4].

This commit adapts the function `split_range` to make the generated ranges no
longer overlap.

[1]
https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/core/lister_base.py$194-199

[2]
https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitlab/tasks.py$37-41

[3]
https://forge.softwareheritage.org/source/swh-lister/browse/master/swh/lister/gitea/tasks.py$36-41

Related to T2577
2020-09-10 11:01:44 +02:00
Antoine R. Dumont (@ardumont)
5a5b7ef70b
tests: Separate lister instantiations
Prior to this commit, all listers were instantiated at the same time even if
only one was needed. This commit separates those instantiations.

The only drawback to this is the db model initialization which now happens at
each lister instantiation. This can be dealt with if needed at another time
though.
2020-09-02 12:49:00 +02:00
Antoine R. Dumont (@ardumont)
9437a643ad
pytest: Define plugin and declare it in the root conftest
Then drop all unneeded and indirect imports
2020-09-02 12:25:15 +02:00
Nicolas Dandrimont
c9963d4302 Use the new names for the swh.scheduler test fixtures 2020-07-09 17:06:50 +02:00
David Douard
93a4d8b784 Enable black
- blackify all the python files,
- enable black in pre-commit,
- add a black tox environment.
2020-04-08 16:31:22 +02:00
Gautier Pugnonblanc Yann
e5fea84c55 review corrections 2020-02-20 09:13:49 +01:00
Gautier Pugnonblanc Yann
60adc424be add anotation type in some lister file 2020-02-17 15:58:34 +01:00
Antoine R. Dumont (@ardumont)
5ab9d67d67
core: Align listers' task output (hg/git tasks) with expected format
Related to T2134
Related to D2409
Related to D2410
2019-12-09 15:12:17 +01:00
Antoine R. Dumont (@ardumont)
4a9608f31c
lister/tasks: Standardize return statements
The following commit adapts the return statements from both lister and their
associated tasks. This standardizes on what other modules (e.g. both dvcs and
package loaders) do.
2019-12-02 15:49:38 +01:00
Nicolas Dandrimont
78105940ff Stop binding tasks to a specific instance of the celery app
The celery.shared_task decorator allows late-binding of tasks to any celery app,
which is well suited for our "task plugin" architecture.
2019-10-18 18:02:25 +02:00
Antoine R. Dumont (@ardumont)
a8cde12d72
tests: Update pytest_plugin according to latest version change 2019-10-14 18:20:15 +02:00
Antoine R. Dumont (@ardumont)
1889875f67
gitlab.lister: Add integration test which checks scheduled tasks
Related T2032
2019-10-12 03:11:31 +02:00
David Douard
b810876ef8 tasks: normalize the url argument name of most lister
Since all the listing tasks accepts an url as first argument (whatever the
argument name is), it makes sense to use a simple common argument name for
this. I've chosen 'url' instead of api_baseurl/forge_url/url.

Also kill now useless `new_lister()` functions.
2019-09-04 15:38:01 +02:00
David Douard
8d9deeb8f8 plugins: add support for scheduler's task-type declaration
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:

- only create missing task-types (do not update them), but check that the
  backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
  plugin registry field will be checked and added if needed; tasks which name
  start wit an underscore will not be added,
- added task-type will have:
  - the 'type' field is derived from the task's function name (with underscores
    replaced with dashes),
  - the description field is the first line of that function's docstring,
  - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
    a simple pattern matching to have decent default values for full/incremental
    tasks),
  - these default values can be overloaded via the 'task_type' plugin registry
    entry.

For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).

Comes with some tests.
2019-09-04 15:36:08 +02:00
David Douard
e3c0ea9d90 implement listers as plugins
Listers are declared as plugins via the `swh.workers` entry_point.

As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:

- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
  (typically, create/initialize the database for this lister).
  If not set, the default implementation creates database tables (after
  optionally having deleted exisintg ones) according to models declared in
  the `models` register field.

There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).

Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
  level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
  initializing (especially with --drop-tables) the database for a single
  lister is unreliable, since all tables are created using a sibgle MetaData
  (in the same namespace).
2019-09-03 15:02:24 +02:00
David Douard
befe9a6d57 gitlab: make GitLabLister's api_baseurl init argument optional
and simplify a bit the code of the constructor.
2019-09-02 12:29:38 +02:00
Valentin Lorentz
52b1de87c5 Finish dropping the 'description' column.
I missed some in aef7d5952e.
2019-06-26 14:46:27 +02:00
Antoine R. Dumont (@ardumont)
3d473c307c
lister: Type correctly the 'indexable' column
instead of converting that column as a string

As a side effect, bitbucket wise, we provided improperly the after query
parameter as a date not url encoded. This resulted in improper api response from
bitbucket's (we received from time to time the same next index as the current
one).

Related T1826
2019-06-26 10:58:54 +02:00
Antoine R. Dumont (@ardumont)
b99617f976
relister: Fix consistently the behavior for the first time relisting
If nothing has been done prior to a full relisting, there is actually nothing
to list. So the relister in question does nothing.

In that context, the IndexingLister class's `db_partition_indices` method now
returns an empty list instead of raising a ValueError when there is nothing to
list.

Related T1826
Related e129e48
2019-06-25 14:48:17 +02:00
Antoine R. Dumont (@ardumont)
ecdce4b0cc
gitlab.lister: Remove request_params method override
This should have been removed along with the code in b816212.

The request authentication has been reworked so that all listers use the same
credentials dict.

Related b816212
Related T1772
2019-06-14 18:26:36 +02:00
Antoine R. Dumont (@ardumont)
b81621274b
lister: Unify credentials structure between listers
This becomes a dictionary of key <lister-name>, value a dict of key
<instance-name>, value list of dict username/password.

Related T1772
2019-05-29 14:00:11 +02:00
David Douard
e5c3559033 tasks: fix handling of unsupported promise.save() calls
the exception can also be an AttributeError.

Also do not reraise this exception (in github/tasks.py). This promise
saving feature is used for tests.
2019-04-11 11:03:48 +02:00
David Douard
f670de298f Remove debug logging from tasks' code
since this is now handled by the SWHTask itself.
2019-01-17 13:58:29 +01:00
David Douard
e31b61bee1 Do not crash range tasks if celery result backend does not support saving the group's state 2019-01-15 15:32:07 +01:00
David Douard
f46f3e2015 Remove explicit setting of the task base class
since it's now the default base class in swh-scheduler (>= 0.0.39)
2019-01-10 09:55:17 +01:00