No description
Find a file
David Douard 8d9deeb8f8 plugins: add support for scheduler's task-type declaration
Add a new register-task-types cli that will create missing task-type entries in the
scheduler according to:

- only create missing task-types (do not update them), but check that the
  backend_name field is consistent,
- each SWHTask-based task declared in a module listed in the 'task_modules'
  plugin registry field will be checked and added if needed; tasks which name
  start wit an underscore will not be added,
- added task-type will have:
  - the 'type' field is derived from the task's function name (with underscores
    replaced with dashes),
  - the description field is the first line of that function's docstring,
  - default values as provided by the swh.lister.cli.DEFAULT_TASK_TYPE (with
    a simple pattern matching to have decent default values for full/incremental
    tasks),
  - these default values can be overloaded via the 'task_type' plugin registry
    entry.

For this, we had to rename all tasks names (eg. `cran_lister` -> `list_cran`).

Comes with some tests.
2019-09-04 15:36:08 +02:00
docs docs: Remove spurious blank spaces 2019-08-29 09:57:59 +02:00
sql Revert to the pre-qless refactoring version 2016-09-13 14:57:26 +02:00
swh plugins: add support for scheduler's task-type declaration 2019-09-04 15:36:08 +02:00
.gitignore Update coverage gitignore 2019-04-12 12:03:08 +02:00
ACKNOWLEDGEMENTS add ACKNOWLEDGEMENTS 2015-04-26 15:54:25 +02:00
CODE_OF_CONDUCT.md add code of conduct document 2019-07-11 16:29:36 +02:00
CONTRIBUTORS CONTRIBUTORS: add Sushant Sushant 2019-07-04 14:41:24 +02:00
LICENSE add license information 2015-04-26 16:24:32 +02:00
Makefile Makefile: add from swh-py-template 2015-10-27 14:35:54 +01:00
MANIFEST.in MANIFEST.in: Include *.txt samples for tests to run during packaging 2019-06-28 18:21:30 +02:00
pytest.ini Don't run pytest in the docs directory 2018-10-23 16:48:14 +02:00
README.md swh.lister.packagist 2019-07-19 19:59:30 +05:30
requirements-swh.txt implement listers as plugins 2019-09-03 15:02:24 +02:00
requirements-test.txt implement listers as plugins 2019-09-03 15:02:24 +02:00
requirements.txt swh.lister.cgit 2019-06-28 19:27:25 +05:30
setup.py implement listers as plugins 2019-09-03 15:02:24 +02:00
tox.ini tox: workaround to pip's inability to properly solve dependency resolution 2019-05-20 17:12:31 +02:00

swh-lister

This component from the Software Heritage stack aims to produce listings of software origins and their urls hosted on various public developer platforms or package managers. As these operations are quite similar, it provides a set of Python modules abstracting common software origins listing behaviors.

It also provides several lister implementations, contained in the following Python modules:

  • swh.lister.bitbucket
  • swh.lister.debian
  • swh.lister.github
  • swh.lister.gitlab
  • swh.lister.gnu
  • swh.lister.pypi
  • swh.lister.npm
  • swh.lister.phabricator
  • swh.lister.cran
  • swh.lister.cgit
  • swh.lister.packagist

Dependencies

All required dependencies can be found in the requirements*.txt files located at the root of the repository.

Local deployment

lister configuration

Each lister implemented so far by Software Heritage (github, gitlab, debian, pypi, npm) must be configured by following the instructions below (please note that you have to replace <lister_name> by one of the lister name introduced above).

Preparation steps

  1. mkdir ~/.config/swh/ ~/.cache/swh/lister/<lister_name>/
  2. create configuration file ~/.config/swh/lister_<lister_name>.yml
  3. Bootstrap the db instance schema
$ createdb lister-<lister_name>
$ python3 -m swh.lister.cli --db-url postgres:///lister-<lister_name> <lister_name>

Note: This bootstraps a minimum data set needed for the lister to run.

Configuration file sample

Minimalistic configuration shared by all listers to add in file ~/.config/swh/lister_<lister_name>.yml:

storage:
  cls: 'remote'
  args:
    url: 'http://localhost:5002/'

scheduler:
  cls: 'remote'
  args:
    url: 'http://localhost:5008/'

lister:
  cls: 'local'
  args:
    # see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
    db: 'postgresql:///lister-<lister_name>'

credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/<lister_name>/

Note: This expects storage (5002) and scheduler (5008) services to run locally

lister-github

Once configured, you can execute a GitHub lister using the following instructions in a python3 script:

import logging
from swh.lister.github.tasks import range_github_lister

logging.basicConfig(level=logging.DEBUG)
range_github_lister(364, 365)
...

lister-gitlab

Once configured, you can execute a GitLab lister using the instructions detailed in the python3 scripts below:

import logging
from swh.lister.gitlab.tasks import range_gitlab_lister

logging.basicConfig(level=logging.DEBUG)
range_gitlab_lister(1, 2, {
    'instance': 'debian',
    'api_baseurl': 'https://salsa.debian.org/api/v4',
    'sort': 'asc',
    'per_page': 20
})
import logging
from swh.lister.gitlab.tasks import full_gitlab_relister

logging.basicConfig(level=logging.DEBUG)
full_gitlab_relister({
    'instance': '0xacab',
    'api_baseurl': 'https://0xacab.org/api/v4',
    'sort': 'asc',
    'per_page': 20
})
import logging
from swh.lister.gitlab.tasks import incremental_gitlab_lister

logging.basicConfig(level=logging.DEBUG)
incremental_gitlab_lister({
    'instance': 'freedesktop.org',
    'api_baseurl': 'https://gitlab.freedesktop.org/api/v4',
    'sort': 'asc',
    'per_page': 20
})

lister-debian

Once configured, you can execute a Debian lister using the following instructions in a python3 script:

import logging
from swh.lister.debian.tasks import debian_lister

logging.basicConfig(level=logging.DEBUG)
debian_lister('Debian')

lister-pypi

Once configured, you can execute a PyPI lister using the following instructions in a python3 script:

import logging
from swh.lister.pypi.tasks import pypi_lister

logging.basicConfig(level=logging.DEBUG)
pypi_lister()

lister-npm

Once configured, you can execute a npm lister using the following instructions in a python3 REPL:

import logging
from swh.lister.npm.tasks import npm_lister

logging.basicConfig(level=logging.DEBUG)
npm_lister()

lister-phabricator

Once configured, you can execute a Phabricator lister using the following instructions in a python3 script:

import logging
from swh.lister.phabricator.tasks import incremental_phabricator_lister

logging.basicConfig(level=logging.DEBUG)
incremental_phabricator_lister(forge_url='https://forge.softwareheritage.org', api_token='XXXX')

lister-gnu

Once configured, you can execute a PyPI lister using the following instructions in a python3 script:

import logging
from swh.lister.gnu.tasks import gnu_lister

logging.basicConfig(level=logging.DEBUG)
gnu_lister()

lister-cran

Once configured, you can execute a CRAN lister using the following instructions in a python3 script:

import logging
from swh.lister.cran.tasks import cran_lister

logging.basicConfig(level=logging.DEBUG)
cran_lister()

lister-cgit

Once configured, you can execute a cgit lister using the following instructions in a python3 script:

import logging
from swh.lister.cgit.tasks import cgit_lister

logging.basicConfig(level=logging.DEBUG)
# simple cgit instance
cgit_lister(url='https://git.kernel.org/')
# cgit instance whose listed repositories differ from the base url
cgit_lister(url='https://cgit.kde.org/',
            url_prefix='https://anongit.kde.org/')

lister-packagist

Once configured, you can execute a Packagist lister using the following instructions in a python3 script:

import logging
from swh.lister.packagist.tasks import packagist_lister

logging.basicConfig(level=logging.DEBUG)
packagist_lister()

Licensing

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

See top-level LICENSE file for the full text of the GNU General Public License along with this program.