Prior to this, some urls were detected as file because their version name were wrongly
detected as extension, hence not matching tarball extensions.
Related to T3781
For now, those can be faulty as the manifest is missing 'critical' information about how
to recompute the hash (e.g. fs layout, executable bit, ...).
Related to T4608
Related to T3781
In that case, this fallbacks to use the "outputHash" which is an equivalent field of the
integrity one except it's for "recursive" outputHashMode. This adds the necessary
assertions around this case so correct data is sent to loaders as well.
Related to T3781
This actually includes all query param values as paths to check. When paths have
extensions, it then pattern matches against tarballs if any. When no extension is
detected, it's doing as before, fallbacks to head query the url to have more information
on the file.
Prior to this commit, this only looked over a hard-coded list of values (for hard-coded
keys: file, f, name, url) detected through docker runs. This way of doing it should
decrease future misdetections (when new unknown "keys" show up in the wild).
Related to T3781
Without this, some git repositories are detected as file (due to upstream
misqualification too). This does some extra effort to detect those to avoid sending
noise to loaders.
This also refactors some common code to build vcs artifacts to avoid duplication.
Related to T3781
Without this, some tarballs hidden within query parameters are not detected. This does
some extra effort to detect those to avoid sending noise to loaders.
Related to T3781
Without this distinction the current directory or content loader will fail the download
as they currently expect the checksums to be about the tarball. When a recursive
"integrity" is provided, it's actually about the uncompressed tarball as per the
nix-store computation.
It's detailed within the code.
Related to T3294
Related to T3781
Some origins are listed as urls while they are not. They are possibly vcs. So this
commit tries to detect and and deal with those if possible. If not possible, they are
skipped.
Related to T3781
Related to P1470
The end goal is to ingest sparsely the origins, that would avoid hitting the various
servers around the same time for colocated origins in the upstream manifest (especially
file or tarball).
Related to T3781