Antoine R. Dumont (@ardumont)
72e208aeed
lister_base: Clarify docstrings
2019-06-13 15:42:07 +02:00
Antoine R. Dumont (@ardumont)
64a9bc691d
lister.core: Stop creating origins when scheduling tasks
...
Prior to this commit, lister did create origins as well in the archive. Now, we
only schedule new origins for ingestion.
2019-06-13 15:42:07 +02:00
Archit Agrawal
a9a37a85bf
swh.lister.cran
...
Add a lister to list all the CRAN packages .
It uses the build-in API in R language to list the packages
and get their metadata.
Closes T1709
2019-06-11 21:26:31 +05:30
Archit Agrawal
7c6245e663
swh.lister.gnu: Add function to check for file extension.
...
Added a function which will derive the extension from filename
and check if the fie extension match the type of file that is to be
archived.
2019-06-11 15:12:53 +05:30
Archit Agrawal
709ba8a6e5
swh.lister.gnu: Add functionality to list all the tarballs for a package.
...
As discussed in T1389 to ingest all packages using base loader, it need
a list of all the tarballs for a pakage.
Hence modifified lister to recursively list all the tarballs for a
package with their last updated time.
2019-06-08 21:56:00 +05:30
Archit Agrawal
ebdb959823
swh.lister.gnu : Change download method of tree.json file to request
...
Previously gnu lister was using same code as that of tarball loader
to download, unzip and read tree.json file.
To make the code consise the downloading method is changed to
requests library.
2019-06-08 21:56:00 +05:30
Archit Agrawal
151f6cd223
swh.lister.gnu
...
Implement first pass of gnu lister to list all the
packages present in https://ftp.gnu.org/
Add GNU lister in README and cli.py
Closes T1722
2019-06-08 21:56:00 +05:30
Archit Agrawal
f8a2ae866b
swh.lister.core: Remove abstractmethod
...
Some of the new listers like GNU and CRAN do not
follow the conventional way of making an HTTP
request, hence they do not need some of the methods
which are usually needed by in conventional HTTP
request.
But those method are marked abstractmethod in the
core making them necessary to be present. So it is in
best to remove abstractmethod to increase the
readability of those listers.
2019-06-08 16:51:57 +05:30
Antoine R. Dumont (@ardumont)
b81621274b
lister: Unify credentials structure between listers
...
This becomes a dictionary of key <lister-name>, value a dict of key
<instance-name>, value list of dict username/password.
Related T1772
2019-05-29 14:00:11 +02:00
David Douard
d6169c7141
cli: register the 'lister' cli subcommand
...
also add a cli group named 'lister' for the sake of consistency with
other swh packages and rename the command as 'db-init', like:
swh lister db-init LISTER [...]
2019-05-22 13:36:57 +02:00
Antoine Lambert
701d833cdf
Update scheduler task names to new ones
...
Related T1508
2019-05-21 13:27:29 +02:00
David Douard
6bf226d85d
tox: workaround to pip's inability to properly solve dependency resolution
...
This is (hopefully) a temporary fix that can be removed as soon as
https://github.com/pypa/pip/issues/6239
is fixed, probably thanks to
https://github.com/pypa/pip/issues/988
2019-05-20 17:12:31 +02:00
archit
fedfd73c8e
swh.lister.phabricator
...
Add a lister of all hosted repositories on a Phabricator instance
Closes T808
2019-05-15 19:54:33 +05:30
Antoine Lambert
4efb2ce62b
core.lister_base: Ensure deterministic _task_key return value
2019-05-15 15:37:19 +02:00
Antoine Lambert
7d192a2f1b
README.md: Fix outdated instructions and improve formatting
2019-05-14 10:57:17 +02:00
Antoine Lambert
977d2459c3
Remove references to no more used lister_db_url conf entry
2019-05-13 15:18:56 +02:00
Nicolas Dandrimont
6269975d55
Update coverage gitignore
2019-04-12 12:03:08 +02:00
David Douard
e5c3559033
tasks: fix handling of unsupported promise.save() calls
...
the exception can also be an AttributeError.
Also do not reraise this exception (in github/tasks.py). This promise
saving feature is used for tests.
2019-04-11 11:03:48 +02:00
Antoine Lambert
0b8d1d464d
npm.lister: Update loading task name
...
Related T1508
2019-04-10 18:24:35 +02:00
Antoine Lambert
dac7777cd6
listers: Align config filename with production
2019-04-10 18:22:45 +02:00
Antoine Lambert
8ffd8dadef
cli: Fix initialization for all listers
...
Prior to this commit, initializing all listers was failing after the
debian lister processing because of global insert_minimum_data init
Related T1629
2019-04-10 11:33:55 +02:00
Archit Agrawal
4b27f9d9c4
Updated toplevel function names in README
2019-03-24 10:54:12 +05:30
Archit Agrawal
26232db926
Removed Extra blank space
2019-03-20 00:54:42 +05:30
Archit Agrawal
be804be0fc
Removed unnecessary files
2019-03-20 00:45:30 +05:30
Archit Agrawal
fa91132364
Removed extra space from the README
2019-03-20 00:42:18 +05:30
Archit Agrawal
d7ae2f1305
Updated README for listers
2019-03-19 23:11:44 +05:30
Nicolas Dandrimont
dd148fc64d
Guesstimate partition boundaries from extrema rather than using expensive offsets
...
Summary:
Using order by and offset makes the partitioning a n^2 operation on the number
of entries in the table, rather than an instant operation when using
min/max.
This assumes the indexable column is more or less uniform, which is not exactly
true but not the worst approximation either.
Test Plan: tox
Reviewers: #reviewers, douardda
Reviewed By: #reviewers, douardda
Subscribers: douardda, swh-public-ci
Differential Revision: https://forge.softwareheritage.org/D1267
2019-03-19 13:51:42 +01:00
Nicolas Dandrimont
c574897e2a
Guesstimate partition boundaries from extrema rather than using expensive offsets
...
Summary:
Using order by and offset makes the partitioning a n^2 operation on the number
of entries in the table, rather than an instant operation when using
min/max.
This assumes the indexable column is more or less uniform, which is not exactly
true but not the worst approximation either.
Test Plan: tox
Reviewers: #reviewers, douardda
Reviewed By: #reviewers, douardda
Subscribers: douardda, swh-public-ci
Differential Revision: https://forge.softwareheritage.org/D1267
2019-03-19 13:51:38 +01:00
Archit Agrawal
5acb1fefc1
Updated README.md for listers
2019-03-19 15:57:29 +05:30
Antoine R. Dumont (@ardumont)
2a588d2d5a
d/*: debian packaging files migrated to separated branches
2019-02-14 10:47:40 +01:00
Antoine R. Dumont (@ardumont)
262c297a5e
lister.cli: Fix spelling typo
2019-02-14 10:47:28 +01:00
David Douard
1756e2efbf
Fix tests: config change lister_db_url -> lister had not been applied in tests
...
making 2 of them fail (in test_bb_lister.py and test_gh_lister.py).
2019-02-06 16:45:28 +01:00
David Douard
c1a3c10c5c
Bump dependencies
2019-02-06 15:38:11 +01:00
David Douard
0720c8e12e
Kill (useless) --create-tables and --with-data cli command options
2019-02-06 15:38:11 +01:00
David Douard
662df8aea1
Change the lister_db_url config option into a 'lister' one
...
with conventional structure
{ 'cls': cls, 'args': {}}
r
2019-02-06 10:22:11 +01:00
David Douard
b8381d8a1b
Add an entry point for the cli command
2019-02-06 10:21:50 +01:00
David Douard
4f2580a97c
Make the --lister option of the cli tool a variadic argument
...
and add a 'all' possibel value for it, so that one can initialize all the
database for all listers at once.
2019-02-06 10:19:26 +01:00
David Douard
c2c26d7e46
Fix the bitbucket lister; handle properly the date-like bounds
2019-02-01 15:38:11 +01:00
David Douard
94a35f12aa
Fix the SWHIndexingLister.db_partition_indices
...
the db query actually returns a table-like (2d) structure.
2019-02-01 15:38:11 +01:00
David Douard
1b2d6895a9
Use the named logger instead of the root logger in lister_base.py
2019-02-01 15:38:11 +01:00
David Douard
1d7d9b6128
Log errors when fetching an url in SWHListerHttpTransport
2019-02-01 15:38:11 +01:00
Antoine Lambert
68eb727d6a
npm.tasks: Fix NpmVisitModel parameters format
2019-01-30 16:32:26 +01:00
David Douard
f670de298f
Remove debug logging from tasks' code
...
since this is now handled by the SWHTask itself.
2019-01-17 13:58:29 +01:00
David Douard
6957f3c435
update dep on swh-scheduler>0.0.39 and pytest<4 (tests)
...
pytest<4 because of https://github.com/pytest-dev/pytest/issues/4641
2019-01-16 16:39:03 +01:00
David Douard
e6a4ae7619
flake8: remove unneeded imports
2019-01-15 18:17:20 +01:00
David Douard
e31b61bee1
Do not crash range tasks if celery result backend does not support saving the group's state
2019-01-15 15:32:07 +01:00
David Douard
a1ec4437e6
Add a few debug statements
2019-01-15 15:32:07 +01:00
David Douard
065d3a64fc
Fix SWHIndexingLister.db_partition_indices(); ensure partition size is not zero
...
when this lister is called for the first time, db_num_entries() may return
a null value, so the min() will also be zero, making the range() call crash.
2019-01-15 15:32:06 +01:00
David Douard
028ceca90d
Fix bitbucker lister: request_uri expect 'identifier' to be a date
...
which it won't be when the lister is called for the first time (since
the run() method will be be called with min_bond=None in this case).
2019-01-15 15:31:59 +01:00
David Douard
4fc1968f1f
Rename the bitbucket and github listers to remove the 'tld' part
...
so that we can easily manage its configuration (especially in the docker
environment) by referring to this lister as only 'bitbucket' everywhere
(ie. python package name and config file names).
2019-01-14 12:07:57 +01:00