Compare commits

...

36 Commits

Author SHA1 Message Date
86947e4874 Bump version to 3.3.1 2022-01-15 15:11:22 +01:00
4f022e2d19 Reword changelog 2022-01-15 15:06:02 +01:00
f47e7374d2 Use fixed windows path for video cache 2022-01-15 12:00:30 +01:00
57ec51e95a Fix login after shib url parser change 2022-01-14 20:17:27 +01:00
0045124a4e Bump version to 3.3.0 2022-01-09 21:09:09 +01:00
9618aae83b Add content pages to changelog 2022-01-09 18:32:58 +01:00
33453ede2d Update dependency versions in setup.py 2022-01-09 18:31:42 +01:00
e467b38d73 Only reject 1970 timestamps on windows 2022-01-09 18:23:00 +01:00
e9d2d05030 Update changelog 2022-01-09 11:48:26 +01:00
4bf0c972e6 Update types for rich 11 2022-01-09 11:48:26 +01:00
4ee919625d Add rudimentary support for content pages 2022-01-08 20:47:35 +01:00
d30f25ee97 Detect shib login page as login page
And do not assume we are logged in...
2022-01-08 20:28:45 +01:00
10d9d74528 Bail out when crawling recursive courses 2022-01-08 20:28:30 +01:00
43c5453e10 Correctly crawl files on desktop
The files on the desktop do not include a download link, so we need to
rewrite it.
2022-01-08 20:00:53 +01:00
eb4de8ae0c Ignore 1970 dates as windows crashes when calling .timestamp() 2022-01-08 18:14:43 +01:00
e32c1f000f Fix mtime for single streams 2022-01-08 18:05:48 +01:00
5f527bc697 Remove Python 3.9 Pattern typehints 2022-01-08 17:14:40 +01:00
ced8b9a2d0 Fix some accordions 2022-01-08 16:58:30 +01:00
6f3cfd4396 Fix personal desktop crawling 2022-01-08 16:58:15 +01:00
462d993fbc Fix local video path cache (hopefully) 2022-01-08 00:27:48 +01:00
a99356f2a2 Fix video stream extraction 2022-01-08 00:27:34 +01:00
eac2e34161 Fix is_logged_in for ILIAS 7 2022-01-07 23:32:31 +01:00
a82a0b19c2 Collect crawler warnings/errors and include them in the report 2021-11-07 21:48:55 +01:00
90cb6e989b Do not download single videos if cache does not exist 2021-11-06 23:21:15 +01:00
6289938d7c Do not stop crawling files when encountering a CrawlWarning 2021-11-06 12:09:51 +01:00
13b8c3d9c6 Add regex option to config and CLI parser 2021-11-02 09:30:46 +01:00
88afe64a92 Refactor IPD crawler a bit 2021-11-02 01:25:01 +00:00
6b2a657573 Fix IPD crawler for different subpages (#42)
This patch reworks the IPD crawler to support subpages which do not use
"/intern" for links and fetches the folder names from table headings.
2021-11-02 01:25:01 +00:00
d6f38a61e1 Fixed minor spelling mistakes 2021-11-02 01:54:00 +01:00
ad3f4955f7 Update changelog 2021-10-30 18:14:39 +02:00
e42ab83d32 Add support for ILIAS cards 2021-10-30 18:13:44 +02:00
f9a3f9b9f2 Handle multi-stream videos 2021-10-30 18:12:29 +02:00
ef7d5ea2d3 Allow storing crawler-specific data in reports 2021-10-30 18:09:05 +02:00
55ea304ff3 Disable interpolation of ConfigParser 2021-10-25 23:37:42 +02:00
fee12b3d9e Fix changelog 2021-10-25 17:44:12 +00:00
6673077397 Add kit-ipd crawler 2021-10-21 13:20:21 +02:00
18 changed files with 714 additions and 54 deletions

View File

@ -22,6 +22,29 @@ ambiguous situations.
## Unreleased ## Unreleased
## 3.3.1 - 2022-01-15
### Fixed
- ILIAS login
- Local video cache if `windows_paths` is enabled
## 3.3.0 - 2022-01-09
### Added
- A KIT IPD crawler
- Support for ILIAS cards
- (Rudimentary) support for content pages
- Support for multi-stream videos
- Support for ILIAS 7
### Removed
- [Interpolation](https://docs.python.org/3/library/configparser.html#interpolation-of-values) in config file
### Fixed
- Crawling of recursive courses
- Crawling files directly placed on the personal desktop
- Ignore timestamps at the unix epoch as they crash on windows
## 3.2.0 - 2021-08-04 ## 3.2.0 - 2021-08-04
### Added ### Added

View File

@ -4,11 +4,11 @@ A config file consists of sections. A section begins with a `[section]` header,
which is followed by a list of `key = value` pairs. Comments must be on their which is followed by a list of `key = value` pairs. Comments must be on their
own line and start with `#`. Multiline values must be indented beyond their key. own line and start with `#`. Multiline values must be indented beyond their key.
Boolean values can be `yes` or `no`. For more details and some examples on the Boolean values can be `yes` or `no`. For more details and some examples on the
format, see the [configparser documentation][1] ([basic interpolation][2] is format, see the [configparser documentation][1] ([interpolation][2] is
enabled). disabled).
[1]: <https://docs.python.org/3/library/configparser.html#supported-ini-file-structure> "Supported INI File Structure" [1]: <https://docs.python.org/3/library/configparser.html#supported-ini-file-structure> "Supported INI File Structure"
[2]: <https://docs.python.org/3/library/configparser.html#configparser.BasicInterpolation> "BasicInterpolation" [2]: <https://docs.python.org/3/library/configparser.html#interpolation-of-values> "Interpolation of values"
## The `DEFAULT` section ## The `DEFAULT` section
@ -53,7 +53,7 @@ common to all crawlers:
crawler can still be executed manually using the `--crawler` or `-C` flags. crawler can still be executed manually using the `--crawler` or `-C` flags.
(Default: `no`) (Default: `no`)
- `output_dir`: The directory the crawler synchronizes files to. A crawler will - `output_dir`: The directory the crawler synchronizes files to. A crawler will
never place any files outside of this directory. (Default: the crawler's name) never place any files outside this directory. (Default: the crawler's name)
- `redownload`: When to download a file that is already present locally. - `redownload`: When to download a file that is already present locally.
(Default: `never-smart`) (Default: `never-smart`)
- `never`: If a file is present locally, it is not downloaded again. - `never`: If a file is present locally, it is not downloaded again.
@ -136,6 +136,18 @@ crawler simulate a slower, network-based crawler.
requests. (Default: `0.0`) requests. (Default: `0.0`)
- `download_speed`: Download speed (in bytes per second) to simulate. (Optional) - `download_speed`: Download speed (in bytes per second) to simulate. (Optional)
### The `kit-ipd` crawler
This crawler crawls a KIT-IPD page by url. The root page can be crawled from
outside the KIT network so you will be informed about any new/deleted files,
but downloading files requires you to be within. Adding a show delay between
requests is likely a good idea.
- `target`: URL to a KIT-IPD page
- `link_regex`: A regex that is matched against the `href` part of links. If it
matches, the given link is downloaded as a file. This is used to extract
files from KIT-IPD pages. (Default: `^.*/[^/]*\.(?:pdf|zip|c|java)$`)
### The `kit-ilias-web` crawler ### The `kit-ilias-web` crawler
This crawler crawls the KIT ILIAS instance. This crawler crawls the KIT ILIAS instance.

View File

@ -1,4 +1,5 @@
Copyright 2019-2020 Garmelon, I-Al-Istannen, danstooamerican, pavelzw, TheChristophe, Scriptim Copyright 2019-2021 Garmelon, I-Al-Istannen, danstooamerican, pavelzw,
TheChristophe, Scriptim, thelukasprobst, Toorero
Permission is hereby granted, free of charge, to any person obtaining a copy of Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in this software and associated documentation files (the "Software"), to deal in

View File

@ -15,7 +15,7 @@ from .transformer import RuleParseError
def load_config_parser(args: argparse.Namespace) -> configparser.ConfigParser: def load_config_parser(args: argparse.Namespace) -> configparser.ConfigParser:
log.explain_topic("Loading config") log.explain_topic("Loading config")
parser = configparser.ConfigParser() parser = configparser.ConfigParser(interpolation=None)
if args.command is None: if args.command is None:
log.explain("No CLI command specified, loading config from file") log.explain("No CLI command specified, loading config from file")

View File

@ -9,4 +9,5 @@
from . import command_local # noqa: F401 imported but unused from . import command_local # noqa: F401 imported but unused
from . import command_kit_ilias_web # noqa: F401 imported but unused from . import command_kit_ilias_web # noqa: F401 imported but unused
from . import command_kit_ipd # noqa: F401 imported but unused
from .parser import PARSER, ParserLoadError, load_default_section # noqa: F401 imported but unused from .parser import PARSER, ParserLoadError, load_default_section # noqa: F401 imported but unused

View File

@ -0,0 +1,54 @@
import argparse
import configparser
from pathlib import Path
from ..logging import log
from .parser import CRAWLER_PARSER, SUBPARSERS, load_crawler
SUBPARSER = SUBPARSERS.add_parser(
"kit-ipd",
parents=[CRAWLER_PARSER],
)
GROUP = SUBPARSER.add_argument_group(
title="kit ipd crawler arguments",
description="arguments for the 'kit-ipd' crawler",
)
GROUP.add_argument(
"--link-regex",
type=str,
metavar="REGEX",
help="href-matching regex to identify downloadable files"
)
GROUP.add_argument(
"target",
type=str,
metavar="TARGET",
help="url to crawl"
)
GROUP.add_argument(
"output",
type=Path,
metavar="OUTPUT",
help="output directory"
)
def load(
args: argparse.Namespace,
parser: configparser.ConfigParser,
) -> None:
log.explain("Creating config for command 'kit-ipd'")
parser["crawl:kit-ipd"] = {}
section = parser["crawl:kit-ipd"]
load_crawler(args, section)
section["type"] = "kit-ipd"
section["target"] = str(args.target)
section["output_dir"] = str(args.output)
if args.link_regex:
section["link_regex"] = str(args.link_regex)
SUBPARSER.set_defaults(command=load)

View File

@ -5,6 +5,7 @@ from ..auth import Authenticator
from ..config import Config from ..config import Config
from .crawler import Crawler, CrawlError, CrawlerSection # noqa: F401 from .crawler import Crawler, CrawlError, CrawlerSection # noqa: F401
from .ilias import KitIliasWebCrawler, KitIliasWebCrawlerSection from .ilias import KitIliasWebCrawler, KitIliasWebCrawlerSection
from .kit_ipd_crawler import KitIpdCrawler, KitIpdCrawlerSection
from .local_crawler import LocalCrawler, LocalCrawlerSection from .local_crawler import LocalCrawler, LocalCrawlerSection
CrawlerConstructor = Callable[[ CrawlerConstructor = Callable[[
@ -19,4 +20,6 @@ CRAWLERS: Dict[str, CrawlerConstructor] = {
LocalCrawler(n, LocalCrawlerSection(s), c), LocalCrawler(n, LocalCrawlerSection(s), c),
"kit-ilias-web": lambda n, s, c, a: "kit-ilias-web": lambda n, s, c, a:
KitIliasWebCrawler(n, KitIliasWebCrawlerSection(s), c, a), KitIliasWebCrawler(n, KitIliasWebCrawlerSection(s), c, a),
"kit-ipd": lambda n, s, c, a:
KitIpdCrawler(n, KitIpdCrawlerSection(s), c),
} }

View File

@ -47,10 +47,12 @@ def noncritical(f: Wrapped) -> Wrapped:
try: try:
f(*args, **kwargs) f(*args, **kwargs)
except (CrawlWarning, OutputDirError, MarkDuplicateError, MarkConflictError) as e: except (CrawlWarning, OutputDirError, MarkDuplicateError, MarkConflictError) as e:
crawler.report.add_warning(str(e))
log.warn(str(e)) log.warn(str(e))
crawler.error_free = False crawler.error_free = False
except: # noqa: E722 do not use bare 'except' except Exception as e:
crawler.error_free = False crawler.error_free = False
crawler.report.add_error(str(e))
raise raise
return wrapper # type: ignore return wrapper # type: ignore
@ -83,8 +85,10 @@ def anoncritical(f: AWrapped) -> AWrapped:
except (CrawlWarning, OutputDirError, MarkDuplicateError, MarkConflictError) as e: except (CrawlWarning, OutputDirError, MarkDuplicateError, MarkConflictError) as e:
log.warn(str(e)) log.warn(str(e))
crawler.error_free = False crawler.error_free = False
except: # noqa: E722 do not use bare 'except' crawler.report.add_warning(str(e))
except Exception as e:
crawler.error_free = False crawler.error_free = False
crawler.report.add_error(str(e))
raise raise
return None return None

View File

@ -39,7 +39,12 @@ class IliasPageElement:
description: Optional[str] = None description: Optional[str] = None
def id(self) -> str: def id(self) -> str:
regexes = [r"eid=(?P<id>[0-9a-z\-]+)", r"file_(?P<id>\d+)", r"ref_id=(?P<id>\d+)"] regexes = [
r"eid=(?P<id>[0-9a-z\-]+)",
r"file_(?P<id>\d+)",
r"ref_id=(?P<id>\d+)",
r"target=[a-z]+_(?P<id>\d+)"
]
for regex in regexes: for regex in regexes:
if match := re.search(regex, self.url): if match := re.search(regex, self.url):
@ -71,6 +76,12 @@ class IliasPage:
if self._is_exercise_file(): if self._is_exercise_file():
log.explain("Page is an exercise, searching for elements") log.explain("Page is an exercise, searching for elements")
return self._find_exercise_entries() return self._find_exercise_entries()
if self._is_personal_desktop():
log.explain("Page is the personal desktop, searching for elements")
return self._find_personal_desktop_entries()
if self._is_content_page():
log.explain("Page is a content page, searching for elements")
return self._find_copa_entries()
log.explain("Page is a normal folder, searching for elements") log.explain("Page is a normal folder, searching for elements")
return self._find_normal_entries() return self._find_normal_entries()
@ -115,13 +126,22 @@ class IliasPage:
return False return False
def _is_personal_desktop(self) -> bool:
return self._soup.find("a", attrs={"href": lambda x: x and "block_type=pditems" in x})
def _is_content_page(self) -> bool:
link = self._soup.find(id="current_perma_link")
if not link:
return False
return "target=copa_" in link.get("value")
def _player_to_video(self) -> List[IliasPageElement]: def _player_to_video(self) -> List[IliasPageElement]:
# Fetch the actual video page. This is a small wrapper page initializing a javscript # Fetch the actual video page. This is a small wrapper page initializing a javscript
# player. Sadly we can not execute that JS. The actual video stream url is nowhere # player. Sadly we can not execute that JS. The actual video stream url is nowhere
# on the page, but defined in a JS object inside a script tag, passed to the player # on the page, but defined in a JS object inside a script tag, passed to the player
# library. # library.
# We do the impossible and RegEx the stream JSON object out of the page's HTML source # We do the impossible and RegEx the stream JSON object out of the page's HTML source
regex: re.Pattern[str] = re.compile( regex = re.compile(
r"({\"streams\"[\s\S]+?),\s*{\"paella_config_file", re.IGNORECASE r"({\"streams\"[\s\S]+?),\s*{\"paella_config_file", re.IGNORECASE
) )
json_match = regex.search(str(self._soup)) json_match = regex.search(str(self._soup))
@ -133,10 +153,64 @@ class IliasPage:
# parse it # parse it
json_object = json.loads(json_str) json_object = json.loads(json_str)
# and fetch the video url! streams = [stream for stream in json_object["streams"]]
video_url = json_object["streams"][0]["sources"]["mp4"][0]["src"]
# and just fetch the lone video url!
if len(streams) == 1:
video_url = streams[0]["sources"]["mp4"][0]["src"]
return [IliasPageElement(IliasElementType.VIDEO, video_url, self._source_name)] return [IliasPageElement(IliasElementType.VIDEO, video_url, self._source_name)]
log.explain(f"Found multiple videos for stream at {self._source_name}")
items = []
for stream in sorted(streams, key=lambda stream: stream["content"]):
full_name = f"{self._source_name.replace('.mp4', '')} ({stream['content']}).mp4"
video_url = stream["sources"]["mp4"][0]["src"]
items.append(IliasPageElement(IliasElementType.VIDEO, video_url, full_name))
return items
def _find_personal_desktop_entries(self) -> List[IliasPageElement]:
items: List[IliasPageElement] = []
titles: List[Tag] = self._soup.select(".il-item-title")
for title in titles:
link = title.find("a")
name = _sanitize_path_name(link.text.strip())
url = self._abs_url_from_link(link)
type = self._find_type_from_link(name, link, url)
if not type:
_unexpected_html_warning()
log.warn_contd(f"Could not extract type for {link}")
continue
log.explain(f"Found {name!r}")
if type == IliasElementType.FILE and "_download" not in url:
url = re.sub(r"(target=file_\d+)", r"\1_download", url)
log.explain("Rewired file URL to include download part")
items.append(IliasPageElement(type, url, name))
return items
def _find_copa_entries(self) -> List[IliasPageElement]:
items: List[IliasPageElement] = []
links: List[Tag] = self._soup.findAll(class_="ilc_flist_a_FileListItemLink")
for link in links:
url = self._abs_url_from_link(link)
name = _sanitize_path_name(link.getText().strip().replace("\t", ""))
if "file_id" not in url:
_unexpected_html_warning()
log.warn_contd(f"Found unknown content page item {name!r} with url {url!r}")
continue
items.append(IliasPageElement(IliasElementType.FILE, url, name))
return items
def _find_video_entries(self) -> List[IliasPageElement]: def _find_video_entries(self) -> List[IliasPageElement]:
# ILIAS has three stages for video pages # ILIAS has three stages for video pages
# 1. The initial dummy page without any videos. This page contains the link to the listing # 1. The initial dummy page without any videos. This page contains the link to the listing
@ -356,6 +430,8 @@ class IliasPage:
log.explain(f"Found {element_name!r}") log.explain(f"Found {element_name!r}")
result.append(IliasPageElement(element_type, abs_url, element_name, description=description)) result.append(IliasPageElement(element_type, abs_url, element_name, description=description))
result += self._find_cards()
return result return result
def _find_upwards_folder_hierarchy(self, tag: Tag) -> List[str]: def _find_upwards_folder_hierarchy(self, tag: Tag) -> List[str]:
@ -383,7 +459,10 @@ class IliasPage:
continue continue
prev: Tag = parent.findPreviousSibling("div") prev: Tag = parent.findPreviousSibling("div")
if "ilContainerBlockHeader" in prev.get("class"): if "ilContainerBlockHeader" in prev.get("class"):
if prev.find("h3"):
found_titles.append(prev.find("h3").getText().strip()) found_titles.append(prev.find("h3").getText().strip())
else:
found_titles.append(prev.find("h2").getText().strip())
# And this for real accordions # And this for real accordions
if "il_VAccordionContentDef" in parent.get("class"): if "il_VAccordionContentDef" in parent.get("class"):
@ -438,6 +517,90 @@ class IliasPage:
log.explain(f"Found file {full_path!r}") log.explain(f"Found file {full_path!r}")
return IliasPageElement(IliasElementType.FILE, url, full_path, modification_date) return IliasPageElement(IliasElementType.FILE, url, full_path, modification_date)
def _find_cards(self) -> List[IliasPageElement]:
result: List[IliasPageElement] = []
card_titles: List[Tag] = self._soup.select(".card-title a")
for title in card_titles:
url = self._abs_url_from_link(title)
name = _sanitize_path_name(title.getText().strip())
type = self._find_type_from_card(title)
if not type:
_unexpected_html_warning()
log.warn_contd(f"Could not extract type for {title}")
continue
result.append(IliasPageElement(type, url, name))
card_button_tiles: List[Tag] = self._soup.select(".card-title button")
for button in card_button_tiles:
regex = re.compile(button["id"] + r".*window.open\(['\"](.+?)['\"]")
res = regex.search(str(self._soup))
if not res:
_unexpected_html_warning()
log.warn_contd(f"Could not find click handler target for {button}")
continue
url = self._abs_url_from_relative(res.group(1))
name = _sanitize_path_name(button.getText().strip())
type = self._find_type_from_card(button)
caption_parent = button.findParent(
"div",
attrs={"class": lambda x: x and "caption" in x},
)
description = caption_parent.find_next_sibling("div").getText().strip()
if not type:
_unexpected_html_warning()
log.warn_contd(f"Could not extract type for {button}")
continue
result.append(IliasPageElement(type, url, name, description=description))
return result
def _find_type_from_card(self, card_title: Tag) -> Optional[IliasElementType]:
def is_card_root(element: Tag) -> bool:
return "il-card" in element["class"] and "thumbnail" in element["class"]
card_root: Optional[Tag] = None
# We look for the card root
for parent in card_title.parents:
if is_card_root(parent):
card_root = parent
break
if card_root is None:
_unexpected_html_warning()
log.warn_contd(f"Tried to figure out element type, but did not find an icon for {card_title}")
return None
icon: Tag = card_root.select_one(".il-card-repository-head .icon")
if "opencast" in icon["class"]:
return IliasElementType.VIDEO_FOLDER_MAYBE_PAGINATED
if "exc" in icon["class"]:
return IliasElementType.EXERCISE
if "webr" in icon["class"]:
return IliasElementType.LINK
if "book" in icon["class"]:
return IliasElementType.BOOKING
if "frm" in icon["class"]:
return IliasElementType.FORUM
if "sess" in icon["class"]:
return IliasElementType.MEETING
if "tst" in icon["class"]:
return IliasElementType.TEST
if "fold" in icon["class"]:
return IliasElementType.FOLDER
_unexpected_html_warning()
log.warn_contd(f"Could not extract type from {icon} for card title {card_title}")
return None
@staticmethod @staticmethod
def _find_type_from_link( def _find_type_from_link(
element_name: str, element_name: str,
@ -453,9 +616,30 @@ class IliasPage:
if "target=file_" in parsed_url.query: if "target=file_" in parsed_url.query:
return IliasElementType.FILE return IliasElementType.FILE
if "target=grp_" in parsed_url.query:
return IliasElementType.FOLDER
if "target=crs_" in parsed_url.query:
return IliasElementType.FOLDER
if "baseClass=ilExerciseHandlerGUI" in parsed_url.query:
return IliasElementType.EXERCISE
if "baseClass=ilLinkResourceHandlerGUI" in parsed_url.query and "calldirectlink" in parsed_url.query:
return IliasElementType.LINK
if "cmd=showThreads" in parsed_url.query or "target=frm_" in parsed_url.query:
return IliasElementType.FORUM
if "cmdClass=ilobjtestgui" in parsed_url.query:
return IliasElementType.TEST
# Booking and Meeting can not be detected based on the link. They do have a ref_id though, so
# try to guess it from the image.
# Everything with a ref_id can *probably* be opened to reveal nested things # Everything with a ref_id can *probably* be opened to reveal nested things
# video groups, directories, exercises, etc # video groups, directories, exercises, etc
if "ref_id=" in parsed_url.query: if "ref_id=" in parsed_url.query or "goto.php" in parsed_url.path:
return IliasPage._find_type_from_folder_like(link_element, url) return IliasPage._find_type_from_folder_like(link_element, url)
_unexpected_html_warning() _unexpected_html_warning()
@ -476,7 +660,7 @@ class IliasPage:
# We look for the outer div of our inner link, to find information around it # We look for the outer div of our inner link, to find information around it
# (mostly the icon) # (mostly the icon)
for parent in link_element.parents: for parent in link_element.parents:
if "ilContainerListItemOuter" in parent["class"]: if "ilContainerListItemOuter" in parent["class"] or "il-std-item" in parent["class"]:
found_parent = parent found_parent = parent
break break
@ -488,6 +672,9 @@ class IliasPage:
# Find the small descriptive icon to figure out the type # Find the small descriptive icon to figure out the type
img_tag: Optional[Tag] = found_parent.select_one("img.ilListItemIcon") img_tag: Optional[Tag] = found_parent.select_one("img.ilListItemIcon")
if img_tag is None:
img_tag = found_parent.select_one("img.icon")
if img_tag is None: if img_tag is None:
_unexpected_html_warning() _unexpected_html_warning()
log.warn_contd(f"Tried to figure out element type, but did not find an image for {url}") log.warn_contd(f"Tried to figure out element type, but did not find an image for {url}")
@ -538,7 +725,13 @@ class IliasPage:
""" """
Create an absolute url from an <a> tag. Create an absolute url from an <a> tag.
""" """
return urljoin(self._page_url, link_tag.get("href")) return self._abs_url_from_relative(link_tag.get("href"))
def _abs_url_from_relative(self, relative_url: str) -> str:
"""
Create an absolute url from a relative URL.
"""
return urljoin(self._page_url, relative_url)
def _unexpected_html_warning() -> None: def _unexpected_html_warning() -> None:

View File

@ -1,9 +1,10 @@
import asyncio import asyncio
import re import re
from pathlib import PurePath from pathlib import PurePath
from typing import Any, Awaitable, Callable, Dict, List, Optional, Set, TypeVar, Union from typing import Any, Awaitable, Callable, Dict, List, Optional, Set, TypeVar, Union, cast
import aiohttp import aiohttp
import yarl
from aiohttp import hdrs from aiohttp import hdrs
from bs4 import BeautifulSoup, Tag from bs4 import BeautifulSoup, Tag
@ -84,7 +85,7 @@ _VIDEO_ELEMENTS: Set[IliasElementType] = set([
AWrapped = TypeVar("AWrapped", bound=Callable[..., Awaitable[Optional[Any]]]) AWrapped = TypeVar("AWrapped", bound=Callable[..., Awaitable[Optional[Any]]])
def _iorepeat(attempts: int, name: str) -> Callable[[AWrapped], AWrapped]: def _iorepeat(attempts: int, name: str, failure_is_error: bool = False) -> Callable[[AWrapped], AWrapped]:
def decorator(f: AWrapped) -> AWrapped: def decorator(f: AWrapped) -> AWrapped:
async def wrapper(*args: Any, **kwargs: Any) -> Optional[Any]: async def wrapper(*args: Any, **kwargs: Any) -> Optional[Any]:
last_exception: Optional[BaseException] = None last_exception: Optional[BaseException] = None
@ -105,6 +106,9 @@ def _iorepeat(attempts: int, name: str) -> Callable[[AWrapped], AWrapped]:
if last_exception: if last_exception:
message = f"Error in I/O Operation: {last_exception}" message = f"Error in I/O Operation: {last_exception}"
if failure_is_error:
raise CrawlError(message) from last_exception
else:
raise CrawlWarning(message) from last_exception raise CrawlWarning(message) from last_exception
raise CrawlError("Impossible return in ilias _iorepeat") raise CrawlError("Impossible return in ilias _iorepeat")
@ -179,6 +183,7 @@ instance's greatest bottleneck.
self._link_file_redirect_delay = section.link_redirect_delay() self._link_file_redirect_delay = section.link_redirect_delay()
self._links = section.links() self._links = section.links()
self._videos = section.videos() self._videos = section.videos()
self._visited_urls: Set[str] = set()
async def _run(self) -> None: async def _run(self) -> None:
if isinstance(self._target, int): if isinstance(self._target, int):
@ -200,7 +205,9 @@ instance's greatest bottleneck.
await self._crawl_url(root_url, expected_id=course_id) await self._crawl_url(root_url, expected_id=course_id)
async def _crawl_desktop(self) -> None: async def _crawl_desktop(self) -> None:
await self._crawl_url(self._base_url) appendix = r"ILIAS\PersonalDesktop\PDMainBarProvider|mm_pd_sel_items"
appendix = appendix.encode("ASCII").hex()
await self._crawl_url(self._base_url + "/gs_content.php?item=" + appendix)
async def _crawl_url(self, url: str, expected_id: Optional[int] = None) -> None: async def _crawl_url(self, url: str, expected_id: Optional[int] = None) -> None:
maybe_cl = await self.crawl(PurePath(".")) maybe_cl = await self.crawl(PurePath("."))
@ -251,6 +258,7 @@ instance's greatest bottleneck.
return None return None
return self._crawl_ilias_page(url, parent, maybe_cl) return self._crawl_ilias_page(url, parent, maybe_cl)
@anoncritical
async def _crawl_ilias_page( async def _crawl_ilias_page(
self, self,
url: str, url: str,
@ -292,15 +300,23 @@ instance's greatest bottleneck.
# And execute them # And execute them
await self.gather(tasks) await self.gather(tasks)
# These decorators only apply *to this method* and *NOT* to the returned
# awaitables!
# This method does not await the handlers but returns them instead.
# This ensures one level is handled at a time and name deduplication
# works correctly.
@anoncritical @anoncritical
# Shouldn't happen but we also really don't want to let I/O errors bubble up to anoncritical.
# If that happens we will be terminated as anoncritical doesn't tream them as non-critical.
@_wrap_io_in_warning("handling ilias element")
async def _handle_ilias_element( async def _handle_ilias_element(
self, self,
parent_path: PurePath, parent_path: PurePath,
element: IliasPageElement, element: IliasPageElement,
) -> Optional[Awaitable[None]]: ) -> Optional[Awaitable[None]]:
if element.url in self._visited_urls:
raise CrawlWarning(
f"Found second path to element {element.name!r} at {element.url!r}. Aborting subpath"
)
self._visited_urls.add(element.url)
element_path = PurePath(parent_path, element.name) element_path = PurePath(parent_path, element.name)
if element.type in _VIDEO_ELEMENTS: if element.type in _VIDEO_ELEMENTS:
@ -363,6 +379,7 @@ instance's greatest bottleneck.
return self._download_link(element, link_template_maybe, maybe_dl) return self._download_link(element, link_template_maybe, maybe_dl)
@anoncritical
@_iorepeat(3, "resolving link") @_iorepeat(3, "resolving link")
async def _download_link(self, element: IliasPageElement, link_template: str, dl: DownloadToken) -> None: async def _download_link(self, element: IliasPageElement, link_template: str, dl: DownloadToken) -> None:
async with dl as (bar, sink): async with dl as (bar, sink):
@ -409,6 +426,7 @@ instance's greatest bottleneck.
return self._download_booking(element, link_template_maybe, maybe_dl) return self._download_booking(element, link_template_maybe, maybe_dl)
@anoncritical
@_iorepeat(3, "resolving booking") @_iorepeat(3, "resolving booking")
async def _download_booking( async def _download_booking(
self, self,
@ -439,22 +457,108 @@ instance's greatest bottleneck.
element: IliasPageElement, element: IliasPageElement,
element_path: PurePath, element_path: PurePath,
) -> Optional[Awaitable[None]]: ) -> Optional[Awaitable[None]]:
# Videos will NOT be redownloaded - their content doesn't really change and they are chunky # Copy old mapping as it is likely still relevant
maybe_dl = await self.download(element_path, mtime=element.mtime, redownload=Redownload.NEVER) if self.prev_report:
if not maybe_dl: self.report.add_custom_value(
str(element_path),
self.prev_report.get_custom_value(str(element_path))
)
# A video might contain other videos, so let's "crawl" the video first
# to ensure rate limits apply. This must be a download as *this token*
# is re-used if the video consists of a single stream. In that case the
# file name is used and *not* the stream name the ilias html parser reported
# to ensure backwards compatibility.
maybe_dl = await self.download(element_path, mtime=element.mtime, redownload=Redownload.ALWAYS)
# If we do not want to crawl it (user filter) or we have every file
# from the cached mapping already, we can ignore this and bail
if not maybe_dl or self._all_videos_locally_present(element_path):
# Mark all existing cideos as known so they do not get deleted
# during dleanup. We "downloaded" them, just without actually making
# a network request as we assumed they did not change.
for video in self._previous_contained_videos(element_path):
await self.download(video)
return None return None
return self._download_video(element, maybe_dl) return self._download_video(element_path, element, maybe_dl)
def _previous_contained_videos(self, video_path: PurePath) -> List[PurePath]:
if not self.prev_report:
return []
custom_value = self.prev_report.get_custom_value(str(video_path))
if not custom_value:
return []
names = cast(List[str], custom_value)
folder = video_path.parent
return [PurePath(folder, name) for name in names]
def _all_videos_locally_present(self, video_path: PurePath) -> bool:
if contained_videos := self._previous_contained_videos(video_path):
log.explain_topic(f"Checking local cache for video {video_path.name}")
all_found_locally = True
for video in contained_videos:
transformed_path = self._to_local_video_path(video)
if transformed_path:
exists_locally = self._output_dir.resolve(transformed_path).exists()
all_found_locally = all_found_locally and exists_locally
if all_found_locally:
log.explain("Found all videos locally, skipping enumeration request")
return True
log.explain("Missing at least one video, continuing with requests!")
return False
def _to_local_video_path(self, path: PurePath) -> Optional[PurePath]:
if transformed := self._transformer.transform(path):
return self._deduplicator.fixup_path(transformed)
return None
@anoncritical
@_iorepeat(3, "downloading video") @_iorepeat(3, "downloading video")
async def _download_video(self, element: IliasPageElement, dl: DownloadToken) -> None: async def _download_video(
self,
original_path: PurePath,
element: IliasPageElement,
dl: DownloadToken
) -> None:
stream_elements: List[IliasPageElement] = []
async with dl as (bar, sink): async with dl as (bar, sink):
page = IliasPage(await self._get_page(element.url), element.url, element) page = IliasPage(await self._get_page(element.url), element.url, element)
real_element = page.get_child_elements()[0] stream_elements = page.get_child_elements()
log.explain(f"Streaming video from real url {real_element.url}") if len(stream_elements) > 1:
log.explain(f"Found multiple video streams for {element.name}")
else:
log.explain(f"Using single video mode for {element.name}")
stream_element = stream_elements[0]
await self._stream_from_url(real_element.url, sink, bar, is_video=True) transformed_path = self._to_local_video_path(original_path)
if not transformed_path:
raise CrawlError(f"Download returned a path but transform did not for {original_path}")
# We do not have a local cache yet
if self._output_dir.resolve(transformed_path).exists():
log.explain(f"Video for {element.name} existed locally")
else:
await self._stream_from_url(stream_element.url, sink, bar, is_video=True)
self.report.add_custom_value(str(original_path), [original_path.name])
return
contained_video_paths: List[str] = []
for stream_element in stream_elements:
video_path = original_path.parent / stream_element.name
contained_video_paths.append(str(video_path))
maybe_dl = await self.download(video_path, mtime=element.mtime, redownload=Redownload.NEVER)
if not maybe_dl:
continue
async with maybe_dl as (bar, sink):
log.explain(f"Streaming video from real url {stream_element.url}")
await self._stream_from_url(stream_element.url, sink, bar, is_video=True)
self.report.add_custom_value(str(original_path), contained_video_paths)
async def _handle_file( async def _handle_file(
self, self,
@ -466,6 +570,7 @@ instance's greatest bottleneck.
return None return None
return self._download_file(element, maybe_dl) return self._download_file(element, maybe_dl)
@anoncritical
@_iorepeat(3, "downloading file") @_iorepeat(3, "downloading file")
async def _download_file(self, element: IliasPageElement, dl: DownloadToken) -> None: async def _download_file(self, element: IliasPageElement, dl: DownloadToken) -> None:
assert dl # The function is only reached when dl is not None assert dl # The function is only reached when dl is not None
@ -521,16 +626,23 @@ instance's greatest bottleneck.
# We repeat this as the login method in shibboleth doesn't handle I/O errors. # We repeat this as the login method in shibboleth doesn't handle I/O errors.
# Shibboleth is quite reliable as well, the repeat is likely not critical here. # Shibboleth is quite reliable as well, the repeat is likely not critical here.
@_iorepeat(3, "Login") @_iorepeat(3, "Login", failure_is_error=True)
async def _authenticate(self) -> None: async def _authenticate(self) -> None:
await self._shibboleth_login.login(self.session) await self._shibboleth_login.login(self.session)
@staticmethod @staticmethod
def _is_logged_in(soup: BeautifulSoup) -> bool: def _is_logged_in(soup: BeautifulSoup) -> bool:
# Normal ILIAS pages # Normal ILIAS pages
userlog = soup.find("li", {"id": "userlog"}) mainbar: Optional[Tag] = soup.find(class_="il-maincontrols-metabar")
if userlog is not None: if mainbar is not None:
login_button = mainbar.find("button", attrs={"data-action": lambda x: x and "login.php" in x})
shib_login = soup.find(id="button_shib_login")
return not login_button and not shib_login
# Personal Desktop
if soup.find("a", attrs={"href": lambda x: x and "block_type=pditems" in x}):
return True return True
# Video listing embeds do not have complete ILIAS html. Try to match them by # Video listing embeds do not have complete ILIAS html. Try to match them by
# their video listing table # their video listing table
video_table = soup.find( video_table = soup.find(
@ -568,14 +680,14 @@ class KitShibbolethLogin:
# Equivalent: Click on "Mit KIT-Account anmelden" button in # Equivalent: Click on "Mit KIT-Account anmelden" button in
# https://ilias.studium.kit.edu/login.php # https://ilias.studium.kit.edu/login.php
url = "https://ilias.studium.kit.edu/Shibboleth.sso/Login" url = "https://ilias.studium.kit.edu/shib_login.php"
data = { data = {
"sendLogin": "1", "sendLogin": "1",
"idp_selection": "https://idp.scc.kit.edu/idp/shibboleth", "idp_selection": "https://idp.scc.kit.edu/idp/shibboleth",
"target": "/shib_login.php", "il_target": "",
"home_organization_selection": "Mit KIT-Account anmelden", "home_organization_selection": "Weiter",
} }
soup: BeautifulSoup = await _post(sess, url, data) soup: BeautifulSoup = await _shib_post(sess, url, data)
# Attempt to login using credentials, if necessary # Attempt to login using credentials, if necessary
while not self._login_successful(soup): while not self._login_successful(soup):
@ -655,3 +767,33 @@ class KitShibbolethLogin:
async def _post(session: aiohttp.ClientSession, url: str, data: Any) -> BeautifulSoup: async def _post(session: aiohttp.ClientSession, url: str, data: Any) -> BeautifulSoup:
async with session.post(url, data=data) as response: async with session.post(url, data=data) as response:
return soupify(await response.read()) return soupify(await response.read())
async def _shib_post(session: aiohttp.ClientSession, url: str, data: Any) -> BeautifulSoup:
"""
aiohttp unescapes '/' and ':' in URL query parameters which is not RFC compliant and rejected
by Shibboleth. Thanks a lot. So now we unroll the requests manually, parse location headers and
build encoded URL objects ourselves... Who thought mangling location header was a good idea??
"""
async with session.post(url, data=data, allow_redirects=False) as response:
location = response.headers.get("location")
if not location:
raise CrawlWarning(f"Login failed, no location header present at {url}")
correct_url = yarl.URL(location, encoded=True)
async with session.get(correct_url, allow_redirects=False) as response:
as_yarl = yarl.URL(response.url)
location = response.headers.get("location")
if not location or not as_yarl.host:
raise CrawlWarning(f"Login failed, no location header present at {correct_url}")
correct_url = yarl.URL.build(
scheme=as_yarl.scheme,
host=as_yarl.host,
path=location,
encoded=True
)
async with session.get(correct_url, allow_redirects=False) as response:
return soupify(await response.read())

View File

@ -0,0 +1,164 @@
import os
import re
from dataclasses import dataclass
from pathlib import PurePath
from typing import Awaitable, List, Optional, Pattern, Set, Union
from urllib.parse import urljoin
from bs4 import BeautifulSoup, Tag
from ..config import Config
from ..logging import ProgressBar, log
from ..output_dir import FileSink
from ..utils import soupify
from .crawler import CrawlError
from .http_crawler import HttpCrawler, HttpCrawlerSection
class KitIpdCrawlerSection(HttpCrawlerSection):
def target(self) -> str:
target = self.s.get("target")
if not target:
self.missing_value("target")
if not target.startswith("https://"):
self.invalid_value("target", target, "Should be a URL")
return target
def link_regex(self) -> Pattern[str]:
regex = self.s.get("link_regex", r"^.*/[^/]*\.(?:pdf|zip|c|java)$")
return re.compile(regex)
@dataclass(unsafe_hash=True)
class KitIpdFile:
name: str
url: str
@dataclass
class KitIpdFolder:
name: str
files: List[KitIpdFile]
def explain(self) -> None:
log.explain_topic(f"Folder {self.name!r}")
for file in self.files:
log.explain(f"File {file.name!r}")
def __hash__(self) -> int:
return self.name.__hash__()
class KitIpdCrawler(HttpCrawler):
def __init__(
self,
name: str,
section: KitIpdCrawlerSection,
config: Config,
):
super().__init__(name, section, config)
self._url = section.target()
self._file_regex = section.link_regex()
async def _run(self) -> None:
maybe_cl = await self.crawl(PurePath("."))
if not maybe_cl:
return
tasks: List[Awaitable[None]] = []
async with maybe_cl:
for item in await self._fetch_items():
if isinstance(item, KitIpdFolder):
tasks.append(self._crawl_folder(item))
else:
# Orphan files are placed in the root folder
tasks.append(self._download_file(PurePath("."), item))
await self.gather(tasks)
async def _crawl_folder(self, folder: KitIpdFolder) -> None:
path = PurePath(folder.name)
if not await self.crawl(path):
return
tasks = [self._download_file(path, file) for file in folder.files]
await self.gather(tasks)
async def _download_file(self, parent: PurePath, file: KitIpdFile) -> None:
element_path = parent / file.name
maybe_dl = await self.download(element_path)
if not maybe_dl:
return
async with maybe_dl as (bar, sink):
await self._stream_from_url(file.url, sink, bar)
async def _fetch_items(self) -> Set[Union[KitIpdFile, KitIpdFolder]]:
page = await self.get_page()
elements: List[Tag] = self._find_file_links(page)
items: Set[Union[KitIpdFile, KitIpdFolder]] = set()
for element in elements:
folder_label = self._find_folder_label(element)
if folder_label:
folder = self._extract_folder(folder_label)
if folder not in items:
items.add(folder)
folder.explain()
else:
file = self._extract_file(element)
items.add(file)
log.explain_topic(f"Orphan file {file.name!r}")
log.explain("Attributing it to root folder")
return items
def _extract_folder(self, folder_tag: Tag) -> KitIpdFolder:
files: List[KitIpdFile] = []
name = folder_tag.getText().strip()
container: Tag = folder_tag.findNextSibling(name="table")
for link in self._find_file_links(container):
files.append(self._extract_file(link))
return KitIpdFolder(name, files)
@staticmethod
def _find_folder_label(file_link: Tag) -> Optional[Tag]:
enclosing_table: Tag = file_link.findParent(name="table")
if enclosing_table is None:
return None
return enclosing_table.findPreviousSibling(name=re.compile("^h[1-6]$"))
def _extract_file(self, link: Tag) -> KitIpdFile:
url = self._abs_url_from_link(link)
name = os.path.basename(url)
return KitIpdFile(name, url)
def _find_file_links(self, tag: Union[Tag, BeautifulSoup]) -> List[Tag]:
return tag.findAll(name="a", attrs={"href": self._file_regex})
def _abs_url_from_link(self, link_tag: Tag) -> str:
return urljoin(self._url, link_tag.get("href"))
async def _stream_from_url(self, url: str, sink: FileSink, bar: ProgressBar) -> None:
async with self.session.get(url, allow_redirects=False) as resp:
if resp.status == 403:
raise CrawlError("Received a 403. Are you within the KIT network/VPN?")
if resp.content_length:
bar.set_total(resp.content_length)
async for data in resp.content.iter_chunked(1024):
sink.file.write(data)
bar.advance(len(data))
sink.done()
async def get_page(self) -> BeautifulSoup:
async with self.session.get(self._url) as request:
return soupify(await request.read())

View File

@ -56,6 +56,12 @@ class Deduplicator:
log.explain(f"Changed path to {fmt_path(new_path)} for windows compatibility") log.explain(f"Changed path to {fmt_path(new_path)} for windows compatibility")
return new_path return new_path
def fixup_path(self, path: PurePath) -> PurePath:
"""Fixes up the path for windows, if enabled. Returns the path unchanged otherwise."""
if self._windows_paths:
return self._fixup_for_windows(path)
return path
def mark(self, path: PurePath) -> PurePath: def mark(self, path: PurePath) -> PurePath:
if self._windows_paths: if self._windows_paths:
path = self._fixup_for_windows(path) path = self._fixup_for_windows(path)

View File

@ -5,7 +5,7 @@ from contextlib import asynccontextmanager, contextmanager
# TODO In Python 3.9 and above, ContextManager is deprecated # TODO In Python 3.9 and above, ContextManager is deprecated
from typing import AsyncIterator, ContextManager, Iterator, List, Optional from typing import AsyncIterator, ContextManager, Iterator, List, Optional
from rich.console import Console, RenderGroup from rich.console import Console, Group
from rich.live import Live from rich.live import Live
from rich.markup import escape from rich.markup import escape
from rich.panel import Panel from rich.panel import Panel
@ -68,7 +68,7 @@ class Log:
if self._download_progress.task_ids: if self._download_progress.task_ids:
elements.append(self._download_progress) elements.append(self._download_progress)
group = RenderGroup(*elements) # type: ignore group = Group(*elements) # type: ignore
self._live.update(group) self._live.update(group)
@contextmanager @contextmanager

View File

@ -231,7 +231,10 @@ class OutputDirectory:
stat = local_path.stat() stat = local_path.stat()
remote_newer = None remote_newer = None
if mtime := heuristics.mtime:
# Python on Windows crashes when faced with timestamps around the unix epoch
if heuristics.mtime and (os.name != "nt" or heuristics.mtime.year > 1970):
mtime = heuristics.mtime
remote_newer = mtime.timestamp() > stat.st_mtime remote_newer = mtime.timestamp() > stat.st_mtime
if remote_newer: if remote_newer:
log.explain("Remote file seems to be newer") log.explain("Remote file seems to be newer")

View File

@ -182,5 +182,13 @@ class Pferd:
something_changed = True something_changed = True
log.report(f" [bold bright_magenta]Not deleted[/] {fmt_path(path)}") log.report(f" [bold bright_magenta]Not deleted[/] {fmt_path(path)}")
for warning in crawler.report.encountered_warnings:
something_changed = True
log.report(f" [bold bright_red]Warning[/] {warning}")
for error in crawler.report.encountered_errors:
something_changed = True
log.report(f" [bold bright_red]Error[/] {error}")
if not something_changed: if not something_changed:
log.report(" Nothing changed") log.report(" Nothing changed")

View File

@ -1,6 +1,6 @@
import json import json
from pathlib import Path, PurePath from pathlib import Path, PurePath
from typing import Any, Dict, List, Set from typing import Any, Dict, List, Optional, Set
class ReportLoadError(Exception): class ReportLoadError(Exception):
@ -68,6 +68,13 @@ class Report:
# Files that should have been deleted by the cleanup but weren't # Files that should have been deleted by the cleanup but weren't
self.not_deleted_files: Set[PurePath] = set() self.not_deleted_files: Set[PurePath] = set()
# Custom crawler-specific data
self.custom: Dict[str, Any] = dict()
# Encountered errors and warnings
self.encountered_warnings: List[str] = []
self.encountered_errors: List[str] = []
@staticmethod @staticmethod
def _get_list_of_strs(data: Dict[str, Any], key: str) -> List[str]: def _get_list_of_strs(data: Dict[str, Any], key: str) -> List[str]:
result: Any = data.get(key, []) result: Any = data.get(key, [])
@ -81,6 +88,15 @@ class Report:
return result return result
@staticmethod
def _get_str_dictionary(data: Dict[str, Any], key: str) -> Dict[str, Any]:
result: Dict[str, Any] = data.get(key, {})
if not isinstance(result, dict):
raise ReportLoadError(f"Incorrect format: {key!r} is not a dictionary")
return result
@classmethod @classmethod
def load(cls, path: Path) -> "Report": def load(cls, path: Path) -> "Report":
""" """
@ -108,6 +124,9 @@ class Report:
self.delete_file(PurePath(elem)) self.delete_file(PurePath(elem))
for elem in self._get_list_of_strs(data, "not_deleted"): for elem in self._get_list_of_strs(data, "not_deleted"):
self.not_delete_file(PurePath(elem)) self.not_delete_file(PurePath(elem))
self.custom = self._get_str_dictionary(data, "custom")
self.encountered_errors = self._get_list_of_strs(data, "encountered_errors")
self.encountered_warnings = self._get_list_of_strs(data, "encountered_warnings")
return self return self
@ -124,6 +143,9 @@ class Report:
"changed": [str(path) for path in sorted(self.changed_files)], "changed": [str(path) for path in sorted(self.changed_files)],
"deleted": [str(path) for path in sorted(self.deleted_files)], "deleted": [str(path) for path in sorted(self.deleted_files)],
"not_deleted": [str(path) for path in sorted(self.not_deleted_files)], "not_deleted": [str(path) for path in sorted(self.not_deleted_files)],
"custom": self.custom,
"encountered_warnings": self.encountered_warnings,
"encountered_errors": self.encountered_errors,
} }
with open(path, "w") as f: with open(path, "w") as f:
@ -190,3 +212,27 @@ class Report:
""" """
self.not_deleted_files.add(path) self.not_deleted_files.add(path)
def add_custom_value(self, key: str, value: Any) -> None:
"""
Adds a custom value under the passed key, overwriting any existing
"""
self.custom[key] = value
def get_custom_value(self, key: str) -> Optional[Any]:
"""
Retrieves a custom value for the given key.
"""
return self.custom.get(key)
def add_error(self, error: str) -> None:
"""
Adds an error to this report's error list.
"""
self.encountered_errors.append(error)
def add_warning(self, warning: str) -> None:
"""
Adds a warning to this report's warning list.
"""
self.encountered_warnings.append(warning)

View File

@ -1,2 +1,2 @@
NAME = "PFERD" NAME = "PFERD"
VERSION = "3.2.0" VERSION = "3.3.1"

View File

@ -6,11 +6,11 @@ version = attr: PFERD.version.VERSION
packages = find: packages = find:
python_requires = >=3.8 python_requires = >=3.8
install_requires = install_requires =
aiohttp>=3.7.4.post0 aiohttp>=3.8.1
beautifulsoup4>=4.9.3 beautifulsoup4>=4.10.0
rich>=10.1.0 rich>=11.0.0
keyring>=23.0.1 keyring>=23.5.0
certifi>=2020.12.5 certifi>=2021.10.8
[options.entry_points] [options.entry_points]
console_scripts = console_scripts =