No description
Find a file
Antoine R. Dumont (@ardumont) 8f5b10b3a3
gitlab.lister: Break asap when problem exists during fetch info
Prior to this, when wrong url injection happens, no information is
fetched because we have bad input in the first place (404, 400).  This
makes the debugging session quite hard.
2018-10-08 13:55:33 +02:00
bin use /usr/bin/env python3 as shebang, to be nice to virtualenv 2018-09-19 17:17:51 +02:00
debian debian/rules: Fix double export instruction typo 2018-08-01 10:21:11 +02:00
docs docs: add title and brief module description 2018-10-01 15:53:13 +02:00
sql Revert to the pre-qless refactoring version 2016-09-13 14:57:26 +02:00
swh gitlab.lister: Break asap when problem exists during fetch info 2018-10-08 13:55:33 +02:00
.gitignore add build/ to gitignore 2018-10-08 11:27:54 +02:00
ACKNOWLEDGEMENTS add ACKNOWLEDGEMENTS 2015-04-26 15:54:25 +02:00
LICENSE add license information 2015-04-26 16:24:32 +02:00
Makefile Makefile: add from swh-py-template 2015-10-27 14:35:54 +01:00
MANIFEST.in debian: Fix tests during debian packaging 2018-07-18 14:15:09 +02:00
README.md pypi.lister: Normalize pypi name to PyPI 2018-09-14 13:24:48 +02:00
requirements-swh.txt core.lister_base: Batch create origins (storage) & tasks (scheduler) 2018-07-27 17:24:49 +02:00
requirements-test.txt Clean up dependencies to enable tests on build 2017-10-30 17:04:49 +01:00
requirements.txt Clean up dependencies to enable tests on build 2017-10-30 17:04:49 +01:00
setup.py setup: prepare for pypi upload 2018-10-08 11:27:54 +02:00

SWH-lister

The Software Heritage Lister is both a library module to permit to centralize lister behaviors, and to provide lister implementations.

Actual lister implementations are:

  • swh-lister-bitbucket
  • swh-lister-debian
  • swh-lister-github
  • swh-lister-gitlab
  • swh-lister-pypi

Licensing

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

See top-level LICENSE file for the full text of the GNU General Public License along with this program.

Dependencies

  • python3
  • python3-requests
  • python3-sqlalchemy

More details in requirements*.txt

Local deployment

lister-github

Preparation steps

  1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)

  2. mkdir ~/.config/swh/ ~/.cache/swh/lister/github.com/

  3. create configuration file ~/.config/swh/lister-github.com.yml

  4. Bootstrap the db instance schema

    $ createdb lister-github $ python3 -m swh.lister.cli --db-url postgres:///lister-github
    --lister github
    --create-tables

Configuration file sample

Minimalistic configuration:

$ cat ~/.config/swh/lister-github.com.yml
# see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
lister_db_url: postgres:///lister-github
credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/github.com

Note: This expects storage (5002) and scheduler (5008) services to run locally

Run

$ python3
>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> from swh.lister.github.tasks import RangeGitHubLister; RangeGitHubLister().run(364, 365)
INFO:root:listing repos starting at 364
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.github.com
DEBUG:urllib3.connectionpool:https://api.github.com:443 "GET /repositories?since=364 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost
DEBUG:urllib3.connectionpool:http://localhost:5002 "POST /origin/add HTTP/1.1" 200 1

lister-gitlab

preparation steps

  1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)

  2. mkdir ~/.config/swh/ ~/.cache/swh/lister/gitlab/

  3. create configuration file ~/.config/swh/lister-gitlab.yml

  4. Bootstrap the db instance schema

    $ createdb lister-gitlab $ python3 -m swh.lister.cli --db-url postgres:///lister-gitlab
    --lister gitlab
    --create-tables

Configuration file sample

$ cat ~/.config/swh/lister-gitlab.yml
# see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
lister_db_url: postgres:///lister-gitlab
credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/gitlab

Note: This expects storage (5002) and scheduler (5008) services to run locally

Run

$ python3
Python 3.6.6 (default, Jun 27 2018, 14:44:17)
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from swh.lister.gitlab.tasks import RangeGitLabLister; RangeGitLabLister().run_task(1, 2,
  {'instance': 'debian', 'api_baseurl': 'https://salsa.debian.org/api/v4', 'sort': 'asc', 'per_page': 20})
>>> from swh.lister.gitlab.tasks import FullGitLabRelister; FullGitLabRelister().run_task(
  {'instance':'0xacab', 'api_baseurl':'https://0xacab.org/api/v4', 'sort': 'asc', 'per_page': 20})
>>> from swh.lister.gitlab.tasks import IncrementalGitLabLister; IncrementalGitLabLister().run_task(
  {'instance': 'freedesktop.org', 'api_baseurl': 'https://gitlab.freedesktop.org/api/v4',
   'sort': 'asc', 'per_page': 20})

lister-debian

preparation steps

  1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)

  2. mkdir ~/.config/swh/ ~/.cache/swh/lister/debian/

  3. create configuration file ~/.config/swh/lister-debian.yml

  4. Bootstrap the db instance schema

    $ createdb lister-debian $ python3 -m swh.lister.cli --db-url postgres:///lister-debian
    --lister debian
    --create-tables
    --with-data

    Note: This bootstraps a minimum data set needed for the debian lister to run (for development)

Configuration file sample

$ cat ~/.config/swh/lister-debian.yml
# see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
lister_db_url: postgres:///lister-debian
credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/debian

Note: This expects storage (5002) and scheduler (5008) services to run locally

Run

$ python3 Python 3.6.6 (default, Jun 27 2018, 14:44:17) [GCC 8.1.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import logging; logging.basicConfig(level=logging.DEBUG); from swh.lister.debian.tasks import DebianListerTask; DebianListerTask().run_task('Debian') DEBUG:root:Creating snapshot for distribution Distribution(Debian (deb) on http://deb.debian.org/debian/) on date 2018-07-27 09:22:50.461165+00:00 DEBUG:root:Processing area Area(stretch/main of Debian) DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): deb.debian.org DEBUG:urllib3.connectionpool:http://deb.debian.org:80 "GET /debian//dists/stretch/main/source/Sources.xz HTTP/1.1" 302 325 ...

lister-pypi

preparation steps

  1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)

  2. mkdir ~/.config/swh/ ~/.cache/swh/lister/pypi/

  3. create configuration file ~/.config/swh/lister-pypi.yml

  4. Bootstrap the db instance schema

    $ createdb lister-pypi $ python3 -m swh.lister.cli --db-url postgres:///lister-pypi
    --lister pypi
    --create-tables
    --with-data

    Note: This bootstraps a minimum data set needed for the pypi lister to run (for development)

Configuration file sample

$ cat ~/.config/swh/lister-pypi.yml
# see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
lister_db_url: postgres:///lister-pypi
credentials: []
cache_responses: True
cache_dir: /home/user/.cache/swh/lister/pypi

Note: This expects storage (5002) and scheduler (5008) services to run locally

Run

$ python3 Python 3.6.6 (default, Jun 27 2018, 14:44:17) [GCC 8.1.0] on linux Type "help", "copyright", "credits" or "license" for more information.

from swh.lister.pypi.tasks import PyPIListerTask; PyPIListerTask().run_task()