swh.lister.pypi: Add a pypi lister implementation using xmlprc api

Based solely on pypi's deprecated xmlrpc api [1].  No other way of listing
pypi.org is referenced (except for parsing an html page through a
legacy api [2])

[1] https://warehouse.readthedocs.io/api-reference/xml-rpc/#pypi-s-xml-rpc-methods

[2] https://pypi.python.org/simple/

Related T422
This commit is contained in:
Antoine R. Dumont (@ardumont) 2018-07-27 13:54:08 +02:00
parent 94913e2bf3
commit 6ff3b90859
No known key found for this signature in database
GPG key ID: 52E2E9840D10C3B8
8 changed files with 284 additions and 2 deletions

View file

@ -160,3 +160,42 @@ Note: This expects storage (5002) and scheduler (5008) services to run locally
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): deb.debian.org
DEBUG:urllib3.connectionpool:http://deb.debian.org:80 "GET /debian//dists/stretch/main/source/Sources.xz HTTP/1.1" 302 325
...
## lister-debian
### preparation steps
1. git clone under $SWH_ENVIRONMENT_HOME/swh-lister (of your choosing)
2. mkdir ~/.config/swh/ ~/.cache/swh/lister/pypi/
3. create configuration file ~/.config/swh/lister-pypi.yml
4. Bootstrap the db instance schema
$ createdb lister-pypi
$ python3 -m swh.lister.cli --db-url postgres:///lister-pypi \
--lister pypi \
--create-tables \
--with-data
Note: This bootstraps a minimum data set needed for the pypi
lister to run (for development)
### Configuration file sample
$ cat ~/.config/swh/lister-pypi.yml
# see http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls
lister_db_url: postgres:///lister-pypi
credentials: []
cache_responses: True
cache_dir: /home/zack/.cache/swh/lister/pypi
Note: This expects storage (5002) and scheduler (5008) services to run locally
### Run
$ python3
Python 3.6.6 (default, Jun 27 2018, 14:44:17)
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from swh.lister.pypi.tasks import PyPiListerTask; PyPiListerTask().run_task()
>>>