Commit graph

483 commits

Author SHA1 Message Date
Antoine R. Dumont (@ardumont)
85d001067a
setup.py: Kill deprecated swh-lister command
Prior to this commit, the pip activation environment failed because the old cli
name no longer exists, it's named 'lister' now.
2019-09-20 11:11:27 +02:00
Antoine Lambert
04d8fdf8df github/lister: Prevent erroneous scheduler tasks disabling
Closes T2014
2019-09-19 14:30:30 +02:00
Antoine Lambert
7572228f7c listers: Ensure run can be called without bounds arguments
Closes T2001
2019-09-17 15:09:04 +02:00
Antoine Lambert
4c8d7baf75 phabricator/lister: Prevent erroneous scheduler tasks disabling
Previously, the Phabricator lister was disabling some loading tasks while it was not
supposed to. More precisely, due to an invalid index provided to a database query,
the latest created scheduler task was disabled each time a new page of results was
provided to the lister by the Phabricator API. Moreover, database queries were not
filtered according to the Phabricator instance resulting in possible disabling of
scheduler tasks from other instances.

Closes T2000
2019-09-16 20:05:48 +02:00
Antoine Lambert
e83902c2a3 phabricator/lister: Fix get_next_target_from_response return type
Without that fix, errors are raised when one wants to list Phabricator repositories
in a specific index range. The issue is due to a comparison between a string and
an integer. So convert next extracted repository index to integer to match the
corresponding model type.

Closes T1997
2019-09-16 13:36:46 +02:00
Antoine Lambert
1ebe762ea6 phabricator/lister: Do not override max_index when bootstrapping
Turns out all newly listed repositories were filtered out because of that.
Consequently, no entries in the listers database and no scheduler loading
tasks were created when listing a Phabricator instance.

Closes T1999
2019-09-16 13:34:22 +02:00
Antoine Lambert
7c8f4dc9a8 packagist/lister: Fix typos in docstring 2019-09-12 20:46:42 +02:00
Antoine R. Dumont (@ardumont)
7377439c5e
MANIFEST.in: Include cgit tests data folder 2019-09-09 12:13:45 +02:00
Antoine R. Dumont (@ardumont)
481b30c540
docs: Fix toc
Related T1984
2019-09-06 12:31:06 +02:00
David Douard
780c0ef999 lister/base: remove the reference to the storage from ListerBase
it is not used anymore.
2019-09-05 10:39:50 +02:00
David Douard
b810876ef8 tasks: normalize the url argument name of most lister
Since all the listing tasks accepts an url as first argument (whatever the
argument name is), it makes sense to use a simple common argument name for
this. I've chosen 'url' instead of api_baseurl/forge_url/url.

Also kill now useless `new_lister()` functions.
2019-09-04 15:38:01 +02:00
David Douard
631b8e7668 models: use the same declarative base class for all models
This is needed to fix the db-init implementation so the debian loader (which
does use the SQLBase from swh.storage) have its models declared in the
MetaData used by the initialize() function.
2019-09-04 15:37:40 +02:00
David Douard
bd11830328 cgit: reduce the batch size to 10 and add a bit of logging
Since the CGit lister now perform an HTTP query for each git repos listed in
the main index, it is significantly slower, so reducing the time between
database commits make sense, and won't overload the database.

With a bit of logging, it makes it easier to follow/debug the progress of
a listing.
2019-09-04 15:37:40 +02:00
David Douard
8d9deeb8f8 plugins: add support for scheduler's task-type declaration
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:

- only create missing task-types (do not update them), but check that the
  backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
  plugin registry field will be checked and added if needed; tasks which name
  start wit an underscore will not be added,
- added task-type will have:
  - the 'type' field is derived from the task's function name (with underscores
    replaced with dashes),
  - the description field is the first line of that function's docstring,
  - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
    a simple pattern matching to have decent default values for full/incremental
    tasks),
  - these default values can be overloaded via the 'task_type' plugin registry
    entry.

For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).

Comes with some tests.
2019-09-04 15:36:08 +02:00
David Douard
e3c0ea9d90 implement listers as plugins
Listers are declared as plugins via the `swh.workers` entry_point.

As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:

- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
  (typically, create/initialize the database for this lister).
  If not set, the default implementation creates database tables (after
  optionally having deleted exisintg ones) according to models declared in
  the `models` register field.

There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).

Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
  level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
  initializing (especially with --drop-tables) the database for a single
  lister is unreliable, since all tables are created using a sibgle MetaData
  (in the same namespace).
2019-09-03 15:02:24 +02:00
David Douard
c67a926f26 npm: make NpmVisitModel use the main declarative base class from core.models
This is needed by the (refactored) db init mechanism, since this later uses
the main declarative base class (thus the main MetaData instance) to gather
tables to be created/dropped.
2019-09-03 15:02:24 +02:00
David Douard
342964eda7 phabricator: fix the FullPhabricatorLister task
forgot the forge_url -> api_baseurl renaming in there.
2019-09-03 12:01:55 +02:00
David Douard
8785fc1a4e cgit: fix cgit's task module and tests
forgot some `url_prefix` there.
2019-09-03 12:01:55 +02:00
David Douard
87cec2f5c3 phabricator: refactor PhabricatorLister's constructor
- use the 'standard' api_baseurl as init argument,
- make it optional, with default to forge.softwareheritage.org,
- use origin_url as id.
2019-09-02 12:29:38 +02:00
David Douard
befe9a6d57 gitlab: make GitLabLister's api_baseurl init argument optional
and simplify a bit the code of the constructor.
2019-09-02 12:29:38 +02:00
David Douard
b87cd5d309 github: make GitHubLister's api_baseurl init argument optional 2019-09-02 12:29:38 +02:00
David Douard
8950b0b32d bitbucket: make BitBucketLister's api_baseurl init argument optional 2019-09-02 12:29:38 +02:00
David Douard
22f2f2c43c core: make it possible to specify the api_baseurl init argument in override_config
This is required to be able to make lister classes instanciation easier and more
reliable, especially in the context of cli tools like 'swh lister run', for which
we want to be able to specify any lister init argument as extra parameter of the
command.
2019-09-02 12:29:38 +02:00
David Douard
3816b4d3bf cgit: rewrite the CGit lister
Simplify the code:
- do only inherit from ListerBase
- implement HTTP queries directly using requests
- get rid of convoluted code

Make the origin_url gathered from the git repo's "project" page instead of
building it from the 'url_prefix' hack. Now, the lister WILL make substancially
more requests, since it will make one request per listed git repo, but
the provided origin_url should be pretty reliable now.

When several url are provided as clonable URLs, choose the http/https one first,
otherwise, choose the first one of the list.

Add proper tests for the cgit lister.

Also, get rid of the 'time_updated' column in the model.
2019-09-02 12:29:31 +02:00
David Douard
e0ce68377d bitbucket: simplify a bit BitBucketLister's constructor
get rid of the "smart" flush_packet_db computation.
2019-08-30 17:56:19 +02:00
David Douard
d807d15f65 phabricator: randomly select the API token in the provided list
instead of picking the first one, so this behavior is consistent with
ListerHttpTransport's one.
2019-08-30 17:56:19 +02:00
David Douard
814779404c phabricator: small refactoring/simplification of the request_params method
and get rid of the unneeded _build_query_params method.
2019-08-30 17:56:19 +02:00
David Douard
83d138759c phabricator: kill PhabricatorLister's api_token argument
stick to the existing credentials mechanism provided by ListerHttpTransport.
2019-08-30 17:56:19 +02:00
David Douard
6f56d2c8d7 core: move credentials' docstring from request_params to request_instance_credentials
and fix empty values returned by this later (empty list instead of ampty dict).
2019-08-30 17:56:19 +02:00
Antoine R. Dumont (@ardumont)
09f3605a7e
docs: Remove spurious blank spaces 2019-08-29 09:57:59 +02:00
Antoine R. Dumont (@ardumont)
4b2ab0488a
cli: Unify new_lister method name to get_lister 2019-08-28 16:29:26 +02:00
Antoine R. Dumont (@ardumont)
dee9fe93bf
cli: Bootstrap tests on cli 2019-08-28 16:29:26 +02:00
Antoine R. Dumont (@ardumont)
e0664c10cd
lister.cli: Allow to list forges with policy and priority
Example use case:

swh lister run --lister gitlab \
               --priority high \
               --policy oneshot \
               --db-url postgresql://postgres@localhost:5432/swh-listers \
               api_baseurl=https://gitlab.ow2.org/api/v4/

Related T1919
2019-08-28 16:29:26 +02:00
Antoine R. Dumont (@ardumont)
87d2a16df0
listers: Allow to override policy and priority for scheduled tasks
Prior to this commit, the policy and priority were hard-coded.
The default values are now the old hard-coded values.

This will allow to develop a cli to trigger forges listing with oneshot policy
and some priority tasks. Thus ingesting those faster and without manual
interventation as we currently do.
2019-08-28 11:57:10 +02:00
Archit Agrawal
5727f15cf3 swh.lister.packagist
Implement a packagist lister to list the
names and metadata url of all the
packages.

Closes 1776
2019-07-19 19:59:30 +05:30
Archit Agrawal
08ade29e6d swh.lister.pypi: Add tests
Add tests for pypi lister
Closes T1890
2019-07-18 17:13:13 +05:30
Archit Agrawal
f424f07c7e swh.lister.core: Add test for simple lister
There were previously no tests for the listers
which are using the class SimpleLister(like pypi)
Refractored test_lister.py of lister core to
accomodate tests for SimpleLister keeping the tests
undisturbed for other lister.
2019-07-18 17:13:13 +05:30
Stefano Zacchiroli
9c97291abd add code of conduct document 2019-07-11 16:29:36 +02:00
Stefano Zacchiroli
60a6f12bfe CONTRIBUTORS: add Sushant Sushant 2019-07-04 14:41:24 +02:00
Stefano Zacchiroli
bb2dc77788 bitbucket lister: fix typo in docstring 2019-07-04 14:40:02 +02:00
Stefano Zacchiroli
226dfe945f CONTRIBUTORS: add Avi Kelman 2019-07-04 14:39:37 +02:00
Antoine R. Dumont (@ardumont)
6bd5cca151
MANIFEST.in: Include *.txt samples for tests to run during packaging 2019-06-28 18:21:30 +02:00
Antoine R. Dumont (@ardumont)
897a19ad84
MANIFEST.in: Include *.html samples for tests to run 2019-06-28 18:19:21 +02:00
Antoine R. Dumont (@ardumont)
c507948da8
bin: Drop dead code 2019-06-28 18:17:15 +02:00
Antoine R. Dumont (@ardumont)
32c5cf22c2
Add Archit Agrawal as contributors 2019-06-28 17:44:02 +02:00
Archit Agrawal
0bf24469b7 swh.lister.cgit: Remove repo page visit step
Remove the need to visit every page and extract the
origin url by introducing a parameter url_prefix.
The origin url is in format <prefix>/<repo_name> where
The prefix is same for all the repos for a particular
cgit instance.
2019-06-28 20:02:07 +05:30
Archit Agrawal
7e3c79bb1d swh.lister.cgit: Add pagination support
Some cgit instance have a pagination. Modifiy
lister to find all the pages and list all the repos
from all the pages.
2019-06-28 19:27:25 +05:30
Archit Agrawal
b972a2a88d swh.lister.cgit
Implemented a lister to list the repos for a given CGit instance.

Closes T1659
2019-06-28 19:27:25 +05:30
Antoine Lambert
d85bcdac5b simple_lister: Split models into smaller chunks to avoid oversized db transactions
Related T1659
2019-06-28 15:44:47 +02:00
Archit Agrawal
5ea9d5ed39 swh.lister.cran: Add description in task_dict
Add description in task_dict method because
the only metadata that can be found for a
package at CRAN is  its decsription.  That can
only br achived from the build in API in R,
which ister is already using. Hence instead of
getting metadata in loader, it is passed
by lister.
2019-06-27 14:57:51 +05:30