arch: Extra_loader_arguments consistency + documentation

Split extraloader_arguments artifacts to artifacts and arch_metadata
Add lister documentation at module level

Related T4233
This commit is contained in:
Franck Bret 2022-08-19 06:29:51 +02:00
parent 3ab90cc0cd
commit 7dd412e553
3 changed files with 841 additions and 294 deletions

View file

@ -3,6 +3,220 @@
# See top-level LICENSE file for more information
"""
Arch Linux lister
=================
The Arch lister list origins from `archlinux.org`_, the official Arch Linux packages,
and from `archlinuxarm.org`_, the Arch Linux ARM packages, an unofficial port for arm.
Packages are put in three different repositories, `core`, `extra` and `community`.
To manage listing those origins, this lister must be instantiated with a `flavours` dict.
`flavours` default values::
"official": {
"archs": ["x86_64"],
"repos": ["core", "extra", "community"],
"base_info_url": "https://archlinux.org",
"base_archive_url": "https://archive.archlinux.org",
"base_mirror_url": "",
"base_api_url": "https://archlinux.org",
},
"arm": {
"archs": ["armv7h", "aarch64"],
"repos": ["core", "extra", "community"],
"base_info_url": "https://archlinuxarm.org",
"base_archive_url": "",
"base_mirror_url": "https://uk.mirror.archlinuxarm.org",
"base_api_url": "",
}
From official Arch Linux repositories we can list all packages and all released versions.
They provide an api and archives.
From Arch Linux ARM repositories we can list all packages at their latest versions, they
do not provide api or archives.
As of August 2022 `archlinux.org`_ list 12592 packages and `archlinuxarm.org` 24044 packages.
Please note that those amounts are the total of `regular`_ and `split`_ packages.
Origins retrieving strategy
---------------------------
Download repositories archives as tar.gz files from https://archive.archlinux.org/repos/last/,
extract to a temp directory and then walks through each 'desc' files.
Repository archive index url example for Arch Linux `core repository`_ and Arch
Linux ARM `extra repository`_.
Each 'desc' file describe the latest released version of a package and helps
to build an origin url and `package versions url`_ from where scrapping artifacts metadata
and get a list of versions.
For Arch Linux ARM it follow the same discovery process parsing 'desc' files.
The main difference is that we can't get existing versions of an arm package
because https://archlinuxarm.org does not have an 'archive' website or api.
Page listing
------------
Each page is a list of package belonging to a flavour ('official', 'arm'), and a
repo ('core', 'extra', 'community').
Each line of a page represents an origin url for a package name with related metadata and versions.
Origin url examples:
* **Arch Linux**: https://archlinux.org/packages/extra/x86_64/mercurial
* **Arch Linux ARM**: https://archlinuxarm.org/packages/armv7h/mercurial
The data schema for each line is:
* **name**: Package name
* **version**: Last released package version
* **last_modified**: Iso8601 last modified date from timestamp
* **url**: Origin url
* **data**: Package metadata dict
* **versions**: A list of dict with artifacts metadata for each versions
The data schema for `versions` within a line:
* **name**: Package name
* **version**: Package version
* **repo**: One of core, extra, community
* **arch**: Processor architecture targeted
* **filename**: Filename of the archive to download
* **url**: Package download url
* **last_modified**: Iso8601 last modified date from timestamp, used as publication date
for this version
* **length**: Length of the archive to download
Origins from page
-----------------
The origin url corresponds to:
* **Arch Linux**: https://archlinux.org/packages/{repo}/{arch}/{name}
* **Arch Linux ARM**: https://archlinuxarm.org/packages/{arch}/{name}
Additionally we add some data set to "extra_loader_arguments":
* **artifacts**: Represent data about the Arch Linux package archive to download,
following :ref:`original-artifacts-json specification <original-artifacts-json>`
* **arch_metadata**: To store all other interesting attributes that do not belongs to artifacts.
Origin data example Arch Linux official::
{
"url": "https://archlinux.org/packages/extra/x86_64/mercurial",
"visit_type": "arch",
"extra_loader_arguments": {
"artifacts": [
{
"url": "https://archive.archlinux.org/packages/m/mercurial/mercurial-4.8.2-1-x86_64.pkg.tar.xz", # noqa: B950
"version": "4.8.2-1",
"length": 4000000,
"filename": "mercurial-4.8.2-1-x86_64.pkg.tar.xz",
},
{
"url": "https://archive.archlinux.org/packages/m/mercurial/mercurial-4.9-1-x86_64.pkg.tar.xz", # noqa: B950
"version": "4.9-1",
"length": 4000000,
"filename": "mercurial-4.9-1-x86_64.pkg.tar.xz",
},
{
"url": "https://archive.archlinux.org/packages/m/mercurial/mercurial-4.9.1-1-x86_64.pkg.tar.xz", # noqa: B950
"version": "4.9.1-1",
"length": 4000000,
"filename": "mercurial-4.9.1-1-x86_64.pkg.tar.xz",
},
...
],
"arch_metadata": [
{
"arch": "x86_64",
"repo": "extra",
"name": "mercurial",
"version": "4.8.2-1",
"last_modified": "2019-01-15T20:31:00",
},
{
"arch": "x86_64",
"repo": "extra",
"name": "mercurial",
"version": "4.9-1",
"last_modified": "2019-02-12T06:15:00",
},
{
"arch": "x86_64",
"repo": "extra",
"name": "mercurial",
"version": "4.9.1-1",
"last_modified": "2019-03-30T17:40:00",
},
],
},
},
Origin data example Arch Linux ARM::
{
"url": "https://archlinuxarm.org/packages/armv7h/mercurial",
"visit_type": "arch",
"extra_loader_arguments": {
"artifacts": [
{
"url": "https://uk.mirror.archlinuxarm.org/armv7h/extra/mercurial-6.1.3-1-armv7h.pkg.tar.xz", # noqa: B950
"length": 4897816,
"version": "6.1.3-1",
"filename": "mercurial-6.1.3-1-armv7h.pkg.tar.xz",
}
],
"arch_metadata": [
{
"arch": "armv7h",
"name": "mercurial",
"repo": "extra",
"version": "6.1.3-1",
"last_modified": "2022-06-02T22:13:08",
}
],
},
},
Running tests
-------------
Activate the virtualenv and run from within swh-lister directory::
pytest -s -vv --log-cli-level=DEBUG swh/lister/arch/tests
Testing with Docker
-------------------
Change directory to swh/docker then launch the docker environment::
docker-compose up -d
Then connect to the lister::
docker exec -it docker_swh-lister_1 bash
And run the lister (The output of this listing results in oneshot tasks in the scheduler)::
swh lister run -l arch
.. _archlinux.org: https://archlinux.org/packages/
.. _archlinuxarm.org: https://archlinuxarm.org/packages/
.. _core repository: https://archive.archlinux.org/repos/last/core/os/x86_64/core.files.tar.gz
.. _extra repository: https://uk.mirror.archlinuxarm.org/armv7h/extra/extra.files.tar.gz
.. _package versions url: https://archive.archlinux.org/packages/m/mercurial/
.. _regular: https://wiki.archlinux.org/title/PKGBUILD#Package_name
.. _split: https://man.archlinux.org/man/PKGBUILD.5#PACKAGE_SPLITTING
"""
def register():
from .lister import ArchLister

View file

@ -437,12 +437,33 @@ class ArchLister(StatelessLister[ArchListerPage]):
"""Iterate on all arch pages and yield ListedOrigin instances."""
assert self.lister_obj.id is not None
for origin in page:
artifacts = []
arch_metadata = []
for version in origin["versions"]:
artifacts.append(
{
"version": version["version"],
"filename": version["filename"],
"url": version["url"],
"length": version["length"],
}
)
arch_metadata.append(
{
"version": version["version"],
"name": version["name"],
"arch": version["arch"],
"repo": version["repo"],
"last_modified": version["last_modified"],
}
)
yield ListedOrigin(
lister_id=self.lister_obj.id,
visit_type=self.VISIT_TYPE,
url=origin["url"],
last_update=origin["last_modified"],
extra_loader_arguments={
"artifacts": origin["versions"],
"artifacts": artifacts,
"arch_metadata": arch_metadata,
},
)

File diff suppressed because it is too large Load diff