Commit graph

35 commits

Author SHA1 Message Date
Antoine R. Dumont (@ardumont)
394658e53b
cgit.tests: Check the tasks from the scheduler 2019-10-09 17:57:57 +02:00
David Douard
b810876ef8 tasks: normalize the url argument name of most lister
Since all the listing tasks accepts an url as first argument (whatever the
argument name is), it makes sense to use a simple common argument name for
this. I've chosen 'url' instead of api_baseurl/forge_url/url.

Also kill now useless `new_lister()` functions.
2019-09-04 15:38:01 +02:00
David Douard
8d9deeb8f8 plugins: add support for scheduler's task-type declaration
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:

- only create missing task-types (do not update them), but check that the
  backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
  plugin registry field will be checked and added if needed; tasks which name
  start wit an underscore will not be added,
- added task-type will have:
  - the 'type' field is derived from the task's function name (with underscores
    replaced with dashes),
  - the description field is the first line of that function's docstring,
  - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
    a simple pattern matching to have decent default values for full/incremental
    tasks),
  - these default values can be overloaded via the 'task_type' plugin registry
    entry.

For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).

Comes with some tests.
2019-09-04 15:36:08 +02:00
David Douard
e3c0ea9d90 implement listers as plugins
Listers are declared as plugins via the `swh.workers` entry_point.

As such, the registry function is expected to return a dict with the
`task_modules` field (as for generic worker plugins), plus:

- `lister`: the lister class,
- `models`: list of SQLAlchemy models used by this lister,
- `init` (optionnal): hook (callable) used to initialize the lister's state
  (typically, create/initialize the database for this lister).
  If not set, the default implementation creates database tables (after
  optionally having deleted exisintg ones) according to models declared in
  the `models` register field.

There is no need for explicitely add lister task modules in the main
`conftest` module, but any new/extra lister to be tested must be registered
(the tested lister module must be properly installed in the test environment).

Also refactor a bit the cli tools:
- add support for the standard --config-file option at the 'lister' group
  level,
- move the --db-url to the 'lister' group,
- drop the --lister option for the `swh lister db-init` cli tool:
  initializing (especially with --drop-tables) the database for a single
  lister is unreliable, since all tables are created using a sibgle MetaData
  (in the same namespace).
2019-09-03 15:02:24 +02:00
David Douard
c67a926f26 npm: make NpmVisitModel use the main declarative base class from core.models
This is needed by the (refactored) db init mechanism, since this later uses
the main declarative base class (thus the main MetaData instance) to gather
tables to be created/dropped.
2019-09-03 15:02:24 +02:00
David Douard
3816b4d3bf cgit: rewrite the CGit lister
Simplify the code:
- do only inherit from ListerBase
- implement HTTP queries directly using requests
- get rid of convoluted code

Make the origin_url gathered from the git repo's "project" page instead of
building it from the 'url_prefix' hack. Now, the lister WILL make substancially
more requests, since it will make one request per listed git repo, but
the provided origin_url should be pretty reliable now.

When several url are provided as clonable URLs, choose the http/https one first,
otherwise, choose the first one of the list.

Add proper tests for the cgit lister.

Also, get rid of the 'time_updated' column in the model.
2019-09-02 12:29:31 +02:00
Antoine R. Dumont (@ardumont)
4b2ab0488a
cli: Unify new_lister method name to get_lister 2019-08-28 16:29:26 +02:00
Antoine R. Dumont (@ardumont)
e0664c10cd
lister.cli: Allow to list forges with policy and priority
Example use case:

swh lister run --lister gitlab \
               --priority high \
               --policy oneshot \
               --db-url postgresql://postgres@localhost:5432/swh-listers \
               api_baseurl=https://gitlab.ow2.org/api/v4/

Related T1919
2019-08-28 16:29:26 +02:00
Archit Agrawal
5727f15cf3 swh.lister.packagist
Implement a packagist lister to list the
names and metadata url of all the
packages.

Closes 1776
2019-07-19 19:59:30 +05:30
Archit Agrawal
0bf24469b7 swh.lister.cgit: Remove repo page visit step
Remove the need to visit every page and extract the
origin url by introducing a parameter url_prefix.
The origin url is in format <prefix>/<repo_name> where
The prefix is same for all the repos for a particular
cgit instance.
2019-06-28 20:02:07 +05:30
Archit Agrawal
b972a2a88d swh.lister.cgit
Implemented a lister to list the repos for a given CGit instance.

Closes T1659
2019-06-28 19:27:25 +05:30
Archit Agrawal
a9a37a85bf swh.lister.cran
Add a lister to list all the CRAN packages .
It uses the build-in API in R language to list the packages
and get their metadata. 

Closes T1709
2019-06-11 21:26:31 +05:30
Archit Agrawal
151f6cd223 swh.lister.gnu
Implement first pass of gnu lister to list all the
packages present in https://ftp.gnu.org/
Add GNU lister in README and cli.py

Closes T1722
2019-06-08 21:56:00 +05:30
David Douard
d6169c7141 cli: register the 'lister' cli subcommand
also add a cli group named 'lister' for the sake of consistency with
other swh packages and rename the command as 'db-init', like:

  swh lister db-init LISTER [...]
2019-05-22 13:36:57 +02:00
archit
fedfd73c8e swh.lister.phabricator
Add a lister of all hosted repositories on a Phabricator instance
Closes T808
2019-05-15 19:54:33 +05:30
Antoine Lambert
977d2459c3 Remove references to no more used lister_db_url conf entry 2019-05-13 15:18:56 +02:00
Antoine Lambert
8ffd8dadef cli: Fix initialization for all listers
Prior to this commit, initializing all listers was failing after the
debian lister processing because of global insert_minimum_data init

Related T1629
2019-04-10 11:33:55 +02:00
Antoine R. Dumont (@ardumont)
262c297a5e
lister.cli: Fix spelling typo 2019-02-14 10:47:28 +01:00
David Douard
0720c8e12e Kill (useless) --create-tables and --with-data cli command options 2019-02-06 15:38:11 +01:00
David Douard
4f2580a97c Make the --lister option of the cli tool a variadic argument
and add a 'all' possibel value for it, so that one can initialize all the
database for all listers at once.
2019-02-06 10:19:26 +01:00
David Douard
72658ff720 cli: fix debian lister so it also uses config overrides 2018-12-20 15:01:18 +01:00
Antoine Lambert
ffe4ac9a3c swh.lister.npm: Add an incremental npm lister
This new lister enables to get only new or updated npm packages
since the last listing operation.

Related T1378
Closes T1398
2018-12-03 17:58:27 +01:00
Antoine Lambert
605a67f51d swh.lister.npm : Add a lister of all available packages in the npm registry
Related T1378
Closes T1380
2018-11-26 11:04:13 +01:00
Antoine R. Dumont (@ardumont)
ed64d24634
pypi.lister: Normalize pypi name to PyPI
Related T422
2018-09-14 13:24:48 +02:00
Antoine R. Dumont (@ardumont)
6ff3b90859
swh.lister.pypi: Add a pypi lister implementation using xmlprc api
Based solely on pypi's deprecated xmlrpc api [1].  No other way of listing
pypi.org is referenced (except for parsing an html page through a
legacy api [2])

[1] https://warehouse.readthedocs.io/api-reference/xml-rpc/#pypi-s-xml-rpc-methods

[2] https://pypi.python.org/simple/

Related T422
2018-08-01 10:25:21 +02:00
Antoine R. Dumont (@ardumont)
b6b588dbbb
lister.cli: Insert optional flag to permit post insert data
That's needed for example for having the minimum necessary to make the
debian lister run.
2018-07-27 11:13:13 +02:00
Antoine R. Dumont (@ardumont)
2c69b586bc
Revert "lister.cli: Insert optional flag to permit post insert data"
This reverts commit e61512afc7.

This commit contains one section not supposed to be there yet (pypi is
not ready)
2018-07-27 11:12:01 +02:00
Antoine R. Dumont (@ardumont)
e61512afc7
lister.cli: Insert optional flag to permit post insert data
That's needed for example for having the minimum necessary to make the
debian lister run.
2018-07-27 11:06:22 +02:00
Antoine R. Dumont (@ardumont)
726d45b182
swh.lister.cli: Factorize supported listers 2018-07-27 10:21:38 +02:00
Antoine R. Dumont (@ardumont)
6c54b64a8f
swh.lister.cli: Add debian lister to the list of supported listers 2018-07-27 10:19:48 +02:00
Antoine R. Dumont (@ardumont)
63cff5b337
lister.cli: Fix broken imports 2018-07-17 15:48:48 +02:00
Antoine R. Dumont (@ardumont)
4c4aa0ead2
swh.lister: Make LISTER_NAME a class attribute
swh.lister.gitlab: make the 'instance' a constructor parameter
2018-07-11 17:43:41 +02:00
Antoine R. Dumont (@ardumont)
581028cfc5
swh.lister.cli: Fix cli docstring 2018-07-11 15:56:33 +02:00
Antoine R. Dumont (@ardumont)
3e62bc867e
swh.lister.cli: Simplify cli 2018-07-11 09:45:51 +02:00
Antoine R. Dumont (@ardumont)
afcd6997c4
swh.lister.cli: Add a basic cli to deal with create/drop db actions 2018-07-03 15:49:52 +02:00