docs: Update listers execution instructions

Remove outdated part about listers database and use swh CLI in
README for executing a lister instead of raw Python code.
This commit is contained in:
Antoine Lambert 2021-02-05 12:35:37 +01:00
parent 1803b707e4
commit e72c15e97a
2 changed files with 23 additions and 176 deletions

191
README.md
View file

@ -34,207 +34,54 @@ Local deployment
## lister configuration
Each lister implemented so far by Software Heritage (`github`, `gitlab`, `debian`, `pypi`, `npm`)
Each lister implemented so far by Software Heritage (`bitbucket`, `cgit`, `cran`, `debian`,
`gitea`, `github`, `gitlab`, `gnu`, `launchpad`, `npm`, `packagist`, `phabricator`, `pypi`)
must be configured by following the instructions below (please note that you have to replace
`<lister_name>` by one of the lister name introduced above).
### Preparation steps
1. `mkdir ~/.config/swh/ ~/.cache/swh/lister/<lister_name>/`
2. create configuration file `~/.config/swh/lister_<lister_name>.yml`
3. Bootstrap the db instance schema
```lang=bash
$ createdb lister-<lister_name>
$ python3 -m swh.lister.cli --db-url postgres:///lister-<lister_name> <lister_name>
```
Note: This bootstraps a minimum data set needed for the lister to run.
1. `mkdir ~/.config/swh/`
2. create configuration file `~/.config/swh/listers.yml`
### Configuration file sample
Minimalistic configuration shared by all listers to add in file `~/.config/swh/lister_<lister_name>.yml`:
Minimalistic configuration shared by all listers to add in file `~/.config/swh/listers.yml`:
```lang=yml
storage:
cls: 'remote'
args:
url: 'http://localhost:5002/'
scheduler:
cls: 'remote'
args:
url: 'http://localhost:5008/'
lister:
cls: 'local'
args:
# see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
db: 'postgresql:///lister-<lister_name>'
credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/<lister_name>/
credentials: {}
```
Note: This expects storage (5002) and scheduler (5008) services to run locally
Note: This expects scheduler (5008) service to run locally
## lister-github
## Executing a lister
Once configured, you can execute a GitHub lister using the following instructions in a `python3` script:
Once configured, a lister can be executed by using the `swh` CLI tool with the
following options and commands:
```lang=python
import logging
from swh.lister.github.tasks import range_github_lister
logging.basicConfig(level=logging.DEBUG)
range_github_lister(364, 365)
...
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister <lister_name> [lister_parameters]
```
## lister-gitlab
Examples:
Once configured, you can execute a GitLab lister using the instructions detailed in the `python3` scripts below:
```lang=python
import logging
from swh.lister.gitlab.tasks import range_gitlab_lister
logging.basicConfig(level=logging.DEBUG)
range_gitlab_lister(1, 2, {
'instance': 'debian',
'api_baseurl': 'https://salsa.debian.org/api/v4',
'sort': 'asc',
'per_page': 20
})
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister bitbucket
```lang=python
import logging
from swh.lister.gitlab.tasks import full_gitlab_relister
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister cran
logging.basicConfig(level=logging.DEBUG)
full_gitlab_relister({
'instance': '0xacab',
'api_baseurl': 'https://0xacab.org/api/v4',
'sort': 'asc',
'per_page': 20
})
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitea url=https://codeberg.org/api/v1/
```lang=python
import logging
from swh.lister.gitlab.tasks import incremental_gitlab_lister
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister gitlab url=https://salsa.debian.org/api/v4/
logging.basicConfig(level=logging.DEBUG)
incremental_gitlab_lister({
'instance': 'freedesktop.org',
'api_baseurl': 'https://gitlab.freedesktop.org/api/v4',
'sort': 'asc',
'per_page': 20
})
```
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister npm
## lister-debian
Once configured, you can execute a Debian lister using the following instructions in a `python3` script:
```lang=python
import logging
from swh.lister.debian.tasks import debian_lister
logging.basicConfig(level=logging.DEBUG)
debian_lister('Debian')
```
## lister-pypi
Once configured, you can execute a PyPI lister using the following instructions in a `python3` script:
```lang=python
import logging
from swh.lister.pypi.tasks import pypi_lister
logging.basicConfig(level=logging.DEBUG)
pypi_lister()
```
## lister-npm
Once configured, you can execute a npm lister using the following instructions in a `python3` REPL:
```lang=python
import logging
from swh.lister.npm.tasks import npm_lister
logging.basicConfig(level=logging.DEBUG)
npm_lister()
```
## lister-phabricator
Once configured, you can execute a Phabricator lister using the following instructions in a `python3` script:
```lang=python
import logging
from swh.lister.phabricator.tasks import incremental_phabricator_lister
logging.basicConfig(level=logging.DEBUG)
incremental_phabricator_lister(forge_url='https://forge.softwareheritage.org', api_token='XXXX')
```
## lister-gnu
Once configured, you can execute a PyPI lister using the following instructions in a `python3` script:
```lang=python
import logging
from swh.lister.gnu.tasks import gnu_lister
logging.basicConfig(level=logging.DEBUG)
gnu_lister()
```
## lister-cran
Once configured, you can execute a CRAN lister using the following instructions in a `python3` script:
```lang=python
import logging
from swh.lister.cran.tasks import cran_lister
logging.basicConfig(level=logging.DEBUG)
cran_lister()
```
## lister-cgit
Once configured, you can execute a cgit lister using the following instructions
in a `python3` script:
```lang=python
import logging
from swh.lister.cgit.tasks import cgit_lister
logging.basicConfig(level=logging.DEBUG)
# simple cgit instance
cgit_lister(url='https://git.kernel.org/')
# cgit instance whose listed repositories differ from the base url
cgit_lister(url='https://cgit.kde.org/',
url_prefix='https://anongit.kde.org/')
```
## lister-packagist
Once configured, you can execute a Packagist lister using the following instructions
in a `python3` script:
```lang=python
import logging
from swh.lister.packagist.tasks import packagist_lister
logging.basicConfig(level=logging.DEBUG)
packagist_lister()
$ swh --log-level DEBUG lister -C ~/.config/swh/listers.yml run --lister pypi
```
Licensing

View file

@ -80,8 +80,8 @@ After the execution of lister is complete, you can see the loading task created:
~/swh-environment/swh-lister$ swh scheduler task list
You can also check the repositories listed by the lister from the database in
which the lister output is stored. To connect to the database::
You can also check the repositories listed by the lister from the scheduler database
in which the lister output is stored. To connect to the database::
~/swh-environment/swh-docker-dev$ docker-compose exec swh-lister bash -c \
'psql swh-listers'
~/swh-environment/docker$ docker-compose exec swh-scheduler bash -c \
'psql swh-scheduler -c "select url from listed_origins"'