Use beautifulsoup4 CSS selectors to simplify code and type checking

As the types-beautifulsoup4 package gets installed in the swh virtualenv
as it is a swh-scanner test dependency, some mypy errors were reported
related to beautifulsoup4 typing.

As the returned type for the find method of bs4 is the following union:
Tag | NavigableString | None, isinstance calls must be used to ensure
proper typing which is not great.

So prefer to use the select_one method instead where a simple None check
must be done to ensure typing is correct as it is returning Optional[Tag].
In a similar manner, replace use of find_all method by select method.

It also has the advantage to simplify the code.
This commit is contained in:
Antoine Lambert 2024-04-15 16:58:46 +02:00
parent e6a35c55b0
commit 41407e0eff
10 changed files with 100 additions and 100 deletions

View file

@ -1,4 +1,4 @@
# Copyright (C) 2022-2023 The Software Heritage developers
# Copyright (C) 2022-2024 The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information
@ -82,8 +82,8 @@ class RubyGemsLister(StatelessLister[RubyGemsListerPage]):
def get_latest_dump_file(self) -> str:
response = self.http_request(self.RUBY_GEMS_POSTGRES_DUMP_LIST_URL)
xml = BeautifulSoup(response.content, "xml")
contents = xml.find_all("Contents")
return contents[-1].find("Key").text
contents = xml.select("Contents")
return contents[-1].select("Key")[0].text
def create_rubygems_db(
self, postgresql: Postgresql