cgit: rewrite the CGit lister

Simplify the code:
- do only inherit from ListerBase
- implement HTTP queries directly using requests
- get rid of convoluted code

Make the origin_url gathered from the git repo's "project" page instead of
building it from the 'url_prefix' hack. Now, the lister WILL make substancially
more requests, since it will make one request per listed git repo, but
the provided origin_url should be pretty reliable now.

When several url are provided as clonable URLs, choose the http/https one first,
otherwise, choose the first one of the list.

Add proper tests for the cgit lister.

Also, get rid of the 'time_updated' column in the model.
This commit is contained in:
David Douard 2019-08-30 16:48:44 +02:00
parent e0ce68377d
commit 3816b4d3bf
29 changed files with 2386 additions and 299 deletions

View file

@ -19,10 +19,6 @@ SUPPORTED_LISTERS = ['github', 'gitlab', 'bitbucket', 'debian', 'pypi',
DEFAULT_BASEURLS = {
'gitlab': 'https://gitlab.com/api/v4/',
'phabricator': 'https://forge.softwareheritage.org',
'cgit': (
'http://git.savannah.gnu.org/cgit/',
'http://git.savannah.gnu.org/git/'
),
}
@ -132,13 +128,8 @@ def get_lister(lister_name, db_url, drop_tables=False, **conf):
elif lister_name == 'cgit':
from .cgit.models import ModelBase
from .cgit.lister import CGitLister
if isinstance(api_baseurl, str):
_lister = CGitLister(url=api_baseurl,
override_config=override_conf)
else: # tuple
_lister = CGitLister(url=api_baseurl[0],
url_prefix=api_baseurl[1],
override_config=override_conf)
_lister = CGitLister(url=api_baseurl,
override_config=override_conf)
elif lister_name == 'packagist':
from .packagist.models import ModelBase # noqa