Overhaul config and CLI option names

This commit is contained in:
Joscha 2021-05-25 14:12:19 +02:00
parent eb8b915813
commit 61430c8739
5 changed files with 154 additions and 129 deletions

138
CONFIG.md
View File

@ -1,10 +1,11 @@
# Config file format # Config file format
A config file consists of sections. A section begins with a `[section]` header, A config file consists of sections. A section begins with a `[section]` header,
which is followed by a list of `key = value` or `key: value` pairs. Comments which is followed by a list of `key = value` pairs. Comments must be on their
must be on their own line and start with `#` or `;`. Multiline values must be own line and start with `#`. Multiline values must be indented beyond their key.
indented beyond their key. For more details and some examples on the format, see Boolean values can be `yes` or `no`. For more details and some examples on the
the [configparser documentation][1] ([basic interpolation][2] is enabled). format, see the [configparser documentation][1] ([basic interpolation][2] is
enabled).
[1]: <https://docs.python.org/3/library/configparser.html#supported-ini-file-structure> "Supported INI File Structure" [1]: <https://docs.python.org/3/library/configparser.html#supported-ini-file-structure> "Supported INI File Structure"
[2]: <https://docs.python.org/3/library/configparser.html#configparser.BasicInterpolation> "BasicInterpolation" [2]: <https://docs.python.org/3/library/configparser.html#configparser.BasicInterpolation> "BasicInterpolation"
@ -15,21 +16,19 @@ This section contains global configuration values. It can also be used to set
default values for the other sections. default values for the other sections.
- `working_dir`: The directory PFERD operates in. Set to an absolute path to - `working_dir`: The directory PFERD operates in. Set to an absolute path to
make PFERD operate the same regardless of where it is executed. All other make PFERD operate the same regardless of where it is executed from. All other
paths in the config file are interpreted relative to this path. If this path paths in the config file are interpreted relative to this path. If this path
is relative, it is interpreted relative to the script's working dir. `~` is is relative, it is interpreted relative to the script's working dir. `~` is
expanded to the current user's home directory. (Default: `.`) expanded to the current user's home directory. (Default: `.`)
- `explain`: Whether PFERD should log and explain its actions and decisions in - `explain`: Whether PFERD should log and explain its actions and decisions in
detail. (Default: `no`) detail. (Default: `no`)
- `status`: Whether PFERD should print status updates while crawling. (Default: - `status`: Whether PFERD should print status updates (like `Crawled ...`,
`yes`) `Added ...`) while running a crawler. (Default: `yes`)
- `report`: Whether PFERD should print a report of added, changed and deleted - `report`: Whether PFERD should print a report of added, changed and deleted
local files for all crawlers before exiting. (Default: `yes`) local files for all crawlers before exiting. (Default: `yes`)
- `share_cookies`: Whether crawlers should share cookies where applicable. By - `share_cookies`: Whether crawlers should share cookies where applicable. For
default, crawlers are isolated and don't interact with each other. This example, some crawlers share cookies if they crawl the same website using the
includes their cookies. However, in situations where multiple crawlers crawl same account. (Default: `yes`)
the same website using the same account, sharing cookies between crawlers can
make sense. (Default: `yes`)
## The `crawl:*` sections ## The `crawl:*` sections
@ -42,17 +41,17 @@ courses or lecture websites.
Each crawl section represents an instance of a specific type of crawler. The Each crawl section represents an instance of a specific type of crawler. The
`type` option is used to specify the crawler type. The crawler's name is usually `type` option is used to specify the crawler type. The crawler's name is usually
used as the name for the output directory. New crawlers can be created simply by used as the output directory. New crawlers can be created simply by adding a new
adding a new crawl section to the config file. crawl section to the config file.
Depending on a crawler's type, it may have different options. For more details, Depending on a crawler's type, it may have different options. For more details,
see the type's documentation below. The following options are common to all see the type's [documentation](#crawler-types) below. The following options are
crawlers: common to all crawlers:
- `type`: The types are specified in [this section](#crawler-types). - `type`: The available types are specified in [this section](#crawler-types).
- `output_dir`: The directory the crawler synchronizes files to. A crawler will - `output_dir`: The directory the crawler synchronizes files to. A crawler will
never place any files outside of this directory. (Default: the crawler's name) never place any files outside of this directory. (Default: the crawler's name)
- `redownload`: When to download again a file that is already present locally. - `redownload`: When to download a file that is already present locally.
(Default: `never-smart`) (Default: `never-smart`)
- `never`: If a file is present locally, it is not downloaded again. - `never`: If a file is present locally, it is not downloaded again.
- `never-smart`: Like `never`, but PFERD tries to detect if an already - `never-smart`: Like `never`, but PFERD tries to detect if an already
@ -62,8 +61,8 @@ crawlers:
- `always-smart`: Like `always`, but PFERD tries to avoid unnecessary - `always-smart`: Like `always`, but PFERD tries to avoid unnecessary
downloads via some (unreliable) heuristics. downloads via some (unreliable) heuristics.
- `on_conflict`: What to do when the local and remote versions of a file or - `on_conflict`: What to do when the local and remote versions of a file or
directory differ. Includes the cases where a file is replaced by a directory directory differ, including when a file is replaced by a directory or a
or a directory by a file. (Default: `prompt`) directory by a file. (Default: `prompt`)
- `prompt`: Always ask the user before overwriting or deleting local files - `prompt`: Always ask the user before overwriting or deleting local files
and directories. and directories.
- `local-first`: Always keep the local file or directory. Equivalent to - `local-first`: Always keep the local file or directory. Equivalent to
@ -75,14 +74,13 @@ crawlers:
remote file is different. remote file is different.
- `transform`: Rules for renaming and excluding certain files and directories. - `transform`: Rules for renaming and excluding certain files and directories.
For more details, see [this section](#transformation-rules). (Default: empty) For more details, see [this section](#transformation-rules). (Default: empty)
- `max_concurrent_tasks`: The maximum number of concurrent tasks (such as - `tasks`: The maximum number of concurrent tasks (such as crawling or
crawling or downloading). (Default: 1) downloading). (Default: `1`)
- `max_concurrent_downloads`: How many of those tasks can be download tasks at - `downloads`: How many of those tasks can be download tasks at the same time.
the same time. Must not be greater than `max_concurrent_tasks`. When not set, Must not be greater than `tasks`. (Default: Same as `tasks`)
this is the same as `max_concurrent_tasks`. (Optional) - `task_delay`: Time (in seconds) that the crawler should wait between
- `delay_between_tasks`: Time (in seconds) that the crawler should wait between
subsequent tasks. Can be used as a sort of rate limit to avoid unnecessary subsequent tasks. Can be used as a sort of rate limit to avoid unnecessary
load for the crawl target. (Default: 0.0) load for the crawl target. (Default: `0.0`)
- `windows_paths`: Whether PFERD should find alternative names for paths that - `windows_paths`: Whether PFERD should find alternative names for paths that
are invalid on Windows. (Default: `yes` on Windows, `no` otherwise) are invalid on Windows. (Default: `yes` on Windows, `no` otherwise)
@ -101,6 +99,8 @@ password = bar
[crawl:something] [crawl:something]
type = some-complex-crawler type = some-complex-crawler
auth = auth:example auth = auth:example
on_conflict = no-delete
tasks = 3
``` ```
## The `auth:*` sections ## The `auth:*` sections
@ -109,12 +109,12 @@ Sections whose names start with `auth:` are used to configure authenticators. An
authenticator provides a username and a password to one or more crawlers. authenticator provides a username and a password to one or more crawlers.
Authenticators work similar to crawlers: A section represents an authenticator Authenticators work similar to crawlers: A section represents an authenticator
instance, whose name is the rest of the section name. The type is specified by instance whose name is the rest of the section name. The type is specified by
the `type` option. the `type` option.
Depending on an authenticator's type, it may have different options. For more Depending on an authenticator's type, it may have different options. For more
details, see the type's documentation below. The only option common to all details, see the type's [documentation](#authenticator-types) below. The only
authenticators is `type`: option common to all authenticators is `type`:
- `type`: The types are specified in [this section](#authenticator-types). - `type`: The types are specified in [this section](#authenticator-types).
@ -127,28 +127,47 @@ testing different setups. The various delay options are meant to make the
crawler simulate a slower, network-based crawler. crawler simulate a slower, network-based crawler.
- `target`: Path to the local directory to crawl. (Required) - `target`: Path to the local directory to crawl. (Required)
- `crawl_delay`: Maximum artificial delay (in seconds) to simulate for crawl - `crawl_delay`: Artificial delay (in seconds) to simulate for crawl requests.
requests. (Default: 0.0) (Default: `0.0`)
- `download_delay`: Maximum artificial delay (in seconds) to simulate for - `download_delay`: Artificial delay (in seconds) to simulate for download
download requests. (Default: 0.0) requests. (Default: `0.0`)
- `download_speed`: Download speed (in bytes per second) to simulate. (Optional) - `download_speed`: Download speed (in bytes per second) to simulate. (Optional)
### The `kit-ilias` crawler ### The `kit-ilias-web` crawler
This crawler crawls the KIT ILIAS instance. It performs remote calls to a poor SCC-Server, so you should be nice and use reasonable delays and concurrent requests. This crawler crawls the KIT ILIAS instance.
- `target`: The ILIAS element to crawl. Can be:
- `desktop` if you want to crawl your personal desktop ILIAS is not great at handling too many concurrent requests. To avoid
- `<course id>` if you want to crawl the course with the given id unnecessary load, please limit `tasks` to `1`.
- `<url>` if you want to crawl a given element by URL (preferably the permanent URL linked at the bottom of an ILIAS page)
- `tfa_auth`: Like `auth` but only used for two-factor authentication There is a spike in ILIAS usage at the beginning of lectures, so please don't
- `link_file_redirect_delay`: PFERD will create local HTML for external links. run PFERD during those times.
If this property is set to a non-negative value it configures the amount of seconds after which the local HTML
file will redirect you to the link target. If you're automatically running PFERD periodically (e. g. via cron or a systemd
- `link_file_plain_text`: If this is set to true, PFERD will generate plain-text files containing only the link timer), please randomize the start time or at least don't use the full hour. For
target for external links. If this is false or not specified, PFERD will generate a neat, pretty and functional systemd timers, this can be accomplished using the `RandomizedDelaySec` option.
HTML page instead. Also, please schedule the script to run in periods of low activity. Running the
- `videos`: If this is set to false, PFERD will not crawl or download any videos. script once per day should be fine.
- `http_timeout`: The timeout for http requests
- `target`: The ILIAS element to crawl. (Required)
- `desktop`: Crawl your personal desktop
- `<course id>`: Crawl the course with the given id
- `<url>`: Crawl a given element by URL (preferably the permanent URL linked
at the bottom of its ILIAS page)
- `auth`: Name of auth section to use for login. (Required)
- `tfa_auth`: Name of auth section to use for two-factor authentication. Only
uses the auth section's password. (Default: Anonymous `tfa` authenticator)
- `links`: How to represent external links. (Default: `fancy`)
- `ignore`: Don't download links.
- `plaintext`: A text file containing only the URL.
- `fancy`: A HTML file looking like the ILIAS link element.
- `internet-shortcut`: An internet shortcut file (`.url` file).
- `link_redirect_delay`: Time (in seconds) until `fancy` link files will
redirect to the actual URL. Set to a negative value to disable the automatic
redirect. (Default: `-1`)
- `videos`: Whether to download videos. (Default: `no`)
- `http_timeout`: The timeout (in seconds) for all HTTP requests. (Default:
`20.0`)
## Authenticator types ## Authenticator types
@ -161,21 +180,24 @@ via the terminal.
- `username`: The username. (Optional) - `username`: The username. (Optional)
- `password`: The password. (Optional) - `password`: The password. (Optional)
### The `keyring` authenticator
This authenticator uses the system keyring to store passwords. The username can
be set directly in the config file. If the username is not specified, the user
is prompted via the terminal. If the keyring contains no entry or the entry is
incorrect, the user is prompted for a password via the terminal and the password
is stored in the keyring.
- `username`: The username. (Optional)
- `keyring_name`: The service name PFERD uses for storing credentials. (Default:
`PFERD`)
### The `tfa` authenticator ### The `tfa` authenticator
This authenticator prompts the user on the console for a two-factor This authenticator prompts the user on the console for a two-factor
authentication token. The token is provided as password and it is not cached. authentication token. The token is provided as password and it is not cached.
This authenticator does not support usernames. This authenticator does not support usernames.
### The `keyring` authenticator
This authenticator uses the system keyring to store passwords. It expects a
username in the config and will prompt *once* for the password. After that it
receives the password from the system keyring.
- `username`: The username. (Required)
- `keyring_name`: The service name PFERD uses for storing credentials. (Optional)
## Transformation rules ## Transformation rules
Transformation rules are rules for renaming and excluding files and directories. Transformation rules are rules for renaming and excluding files and directories.

View File

@ -11,14 +11,14 @@ SUBPARSER = SUBPARSERS.add_parser(
) )
GROUP = SUBPARSER.add_argument_group( GROUP = SUBPARSER.add_argument_group(
title="KIT ILIAS web-crawler arguments", title="kit-ilias-web crawler arguments",
description="arguments for the 'kit-ilias-web' crawler", description="arguments for the 'kit-ilias-web' crawler",
) )
GROUP.add_argument( GROUP.add_argument(
"target", "target",
type=str, type=str,
metavar="TARGET", metavar="TARGET",
help="course id, 'desktop', or ILIAS https-URL to crawl" help="course id, 'desktop', or ILIAS URL to crawl"
) )
GROUP.add_argument( GROUP.add_argument(
"output", "output",
@ -27,14 +27,9 @@ GROUP.add_argument(
help="output directory" help="output directory"
) )
GROUP.add_argument( GROUP.add_argument(
"--videos", "--username", "-u",
action=BooleanOptionalAction,
help="crawl and download videos"
)
GROUP.add_argument(
"--username",
type=str, type=str,
metavar="USER_NAME", metavar="USERNAME",
help="user name for authentication" help="user name for authentication"
) )
GROUP.add_argument( GROUP.add_argument(
@ -46,19 +41,24 @@ GROUP.add_argument(
"--links", "--links",
type=show_value_error(Links.from_string), type=show_value_error(Links.from_string),
metavar="OPTION", metavar="OPTION",
help="how to treat external links" help="how to represent external links"
) )
GROUP.add_argument( GROUP.add_argument(
"--link-file-redirect-delay", "--link-redirect-delay",
type=int, type=int,
metavar="SECONDS", metavar="SECONDS",
help="delay before external link files redirect you to their target (-1 to disable)" help="time before 'fancy' links redirect to to their target (-1 to disable)"
) )
GROUP.add_argument( GROUP.add_argument(
"--http-timeout", "--videos",
action=BooleanOptionalAction,
help="crawl and download videos"
)
GROUP.add_argument(
"--http-timeout", "-t",
type=float, type=float,
metavar="SECONDS", metavar="SECONDS",
help="the timeout to use for HTTP requests" help="timeout for all HTTP requests"
) )
@ -66,33 +66,30 @@ def load(
args: argparse.Namespace, args: argparse.Namespace,
parser: configparser.ConfigParser, parser: configparser.ConfigParser,
) -> None: ) -> None:
parser["crawl:kit-ilias-web"] = {} parser["crawl:ilias"] = {}
section = parser["crawl:kit-ilias-web"] section = parser["crawl:ilias"]
load_crawler(args, section) load_crawler(args, section)
section["type"] = "kit-ilias-web" section["type"] = "kit-ilias-web"
section["target"] = str(args.target) section["target"] = str(args.target)
section["output_dir"] = str(args.output) section["output_dir"] = str(args.output)
section["auth"] = "auth:kit-ilias-web" section["auth"] = "auth:ilias"
if args.link_file_redirect_delay is not None:
section["link_file_redirect_delay"] = str(args.link_file_redirect_delay)
if args.links is not None: if args.links is not None:
section["links"] = str(args.links.value) section["links"] = str(args.links.value)
if args.link_redirect_delay is not None:
section["link_redirect_delay"] = str(args.link_redirect_delay)
if args.videos is not None: if args.videos is not None:
section["videos"] = str(False) section["videos"] = "yes" if args.videos else "no"
if args.http_timeout is not None: if args.http_timeout is not None:
section["http_timeout"] = str(args.http_timeout) section["http_timeout"] = str(args.http_timeout)
parser["auth:kit-ilias-web"] = {} parser["auth:ilias"] = {}
auth_section = parser["auth:kit-ilias-web"] auth_section = parser["auth:ilias"]
auth_section["type"] = "simple"
if args.username is not None:
auth_section["username"] = args.username
if args.keyring: if args.keyring:
auth_section["type"] = "keyring" auth_section["type"] = "keyring"
else:
auth_section["type"] = "simple"
if args.username is not None:
auth_section["username"] = str(args.username)
SUBPARSER.set_defaults(command=load) SUBPARSER.set_defaults(command=load)

View File

@ -77,10 +77,10 @@ CRAWLER_PARSER_GROUP = CRAWLER_PARSER.add_argument_group(
description="arguments common to all crawlers", description="arguments common to all crawlers",
) )
CRAWLER_PARSER_GROUP.add_argument( CRAWLER_PARSER_GROUP.add_argument(
"--redownload", "--redownload", "-r",
type=show_value_error(Redownload.from_string), type=show_value_error(Redownload.from_string),
metavar="OPTION", metavar="OPTION",
help="when to redownload a file that's already present locally" help="when to download a file that's already present locally"
) )
CRAWLER_PARSER_GROUP.add_argument( CRAWLER_PARSER_GROUP.add_argument(
"--on-conflict", "--on-conflict",
@ -89,30 +89,35 @@ CRAWLER_PARSER_GROUP.add_argument(
help="what to do when local and remote files or directories differ" help="what to do when local and remote files or directories differ"
) )
CRAWLER_PARSER_GROUP.add_argument( CRAWLER_PARSER_GROUP.add_argument(
"--transform", "-t", "--transform", "-T",
action="append", action="append",
type=str, type=str,
metavar="RULE", metavar="RULE",
help="add a single transformation rule. Can be specified multiple times" help="add a single transformation rule. Can be specified multiple times"
) )
CRAWLER_PARSER_GROUP.add_argument( CRAWLER_PARSER_GROUP.add_argument(
"--max-concurrent-tasks", "--tasks", "-n",
type=int, type=int,
metavar="N", metavar="N",
help="maximum number of concurrent tasks (crawling, downloading)" help="maximum number of concurrent tasks (crawling, downloading)"
) )
CRAWLER_PARSER_GROUP.add_argument( CRAWLER_PARSER_GROUP.add_argument(
"--max-concurrent-downloads", "--downloads", "-N",
type=int, type=int,
metavar="N", metavar="N",
help="maximum number of tasks that may download data at the same time" help="maximum number of tasks that may download data at the same time"
) )
CRAWLER_PARSER_GROUP.add_argument( CRAWLER_PARSER_GROUP.add_argument(
"--delay-between-tasks", "--task-delay", "-d",
type=float, type=float,
metavar="SECONDS", metavar="SECONDS",
help="time the crawler should wait between subsequent tasks" help="time the crawler should wait between subsequent tasks"
) )
CRAWLER_PARSER_GROUP.add_argument(
"--windows-paths",
action=BooleanOptionalAction,
help="whether to repair invalid paths on windows"
)
def load_crawler( def load_crawler(
@ -125,12 +130,14 @@ def load_crawler(
section["on_conflict"] = args.on_conflict.value section["on_conflict"] = args.on_conflict.value
if args.transform is not None: if args.transform is not None:
section["transform"] = "\n" + "\n".join(args.transform) section["transform"] = "\n" + "\n".join(args.transform)
if args.max_concurrent_tasks is not None: if args.tasks is not None:
section["max_concurrent_tasks"] = str(args.max_concurrent_tasks) section["tasks"] = str(args.tasks)
if args.max_concurrent_downloads is not None: if args.downloads is not None:
section["max_concurrent_downloads"] = str(args.max_concurrent_downloads) section["downloads"] = str(args.downloads)
if args.delay_between_tasks is not None: if args.task_delay is not None:
section["delay_between_tasks"] = str(args.delay_between_tasks) section["task_delay"] = str(args.task_delay)
if args.windows_paths is not None:
section["windows_paths"] = "yes" if args.windows_paths else "no"
PARSER = argparse.ArgumentParser() PARSER = argparse.ArgumentParser()
@ -200,6 +207,10 @@ def load_default_section(
section["working_dir"] = str(args.working_dir) section["working_dir"] = str(args.working_dir)
if args.explain is not None: if args.explain is not None:
section["explain"] = "yes" if args.explain else "no" section["explain"] = "yes" if args.explain else "no"
if args.status is not None:
section["status"] = "yes" if args.status else "no"
if args.report is not None:
section["report"] = "yes" if args.report else "no"
if args.share_cookies is not None: if args.share_cookies is not None:
section["share_cookies"] = "yes" if args.share_cookies else "no" section["share_cookies"] = "yes" if args.share_cookies else "no"

View File

@ -169,33 +169,33 @@ class CrawlerSection(Section):
def transform(self) -> str: def transform(self) -> str:
return self.s.get("transform", "") return self.s.get("transform", "")
def max_concurrent_tasks(self) -> int: def tasks(self) -> int:
value = self.s.getint("max_concurrent_tasks", fallback=1) value = self.s.getint("tasks", fallback=1)
if value <= 0: if value <= 0:
self.invalid_value("max_concurrent_tasks", value, self.invalid_value("tasks", value, "Must be greater than 0")
"Must be greater than 0")
return value return value
def max_concurrent_downloads(self) -> int: def downloads(self) -> int:
tasks = self.max_concurrent_tasks() tasks = self.tasks()
value = self.s.getint("max_concurrent_downloads", fallback=None) value = self.s.getint("downloads", fallback=None)
if value is None: if value is None:
return tasks return tasks
if value <= 0: if value <= 0:
self.invalid_value("max_concurrent_downloads", value, self.invalid_value("downloads", value, "Must be greater than 0")
"Must be greater than 0")
if value > tasks: if value > tasks:
self.invalid_value("max_concurrent_downloads", value, self.invalid_value("downloads", value, "Must not be greater than tasks")
"Must not be greater than max_concurrent_tasks")
return value return value
def delay_between_tasks(self) -> float: def task_delay(self) -> float:
value = self.s.getfloat("delay_between_tasks", fallback=0.0) value = self.s.getfloat("task_delay", fallback=0.0)
if value < 0: if value < 0:
self.invalid_value("delay_between_tasks", value, self.invalid_value("task_delay", value, "Must not be negative")
"Must not be negative")
return value return value
def windows_paths(self) -> bool:
on_windows = os.name == "nt"
return self.s.getboolean("windows_paths", fallback=on_windows)
def auth(self, authenticators: Dict[str, Authenticator]) -> Authenticator: def auth(self, authenticators: Dict[str, Authenticator]) -> Authenticator:
value = self.s.get("auth") value = self.s.get("auth")
if value is None: if value is None:
@ -205,10 +205,6 @@ class CrawlerSection(Section):
self.invalid_value("auth", value, "No such auth section exists") self.invalid_value("auth", value, "No such auth section exists")
return auth return auth
def windows_paths(self) -> bool:
on_windows = os.name == "nt"
return self.s.getboolean("windows_paths", fallback=on_windows)
class Crawler(ABC): class Crawler(ABC):
def __init__( def __init__(
@ -230,9 +226,9 @@ class Crawler(ABC):
self.error_free = True self.error_free = True
self._limiter = Limiter( self._limiter = Limiter(
task_limit=section.max_concurrent_tasks(), task_limit=section.tasks(),
download_limit=section.max_concurrent_downloads(), download_limit=section.downloads(),
task_delay=section.delay_between_tasks(), task_delay=section.task_delay(),
) )
self._deduplicator = Deduplicator(section.windows_paths()) self._deduplicator = Deduplicator(section.windows_paths())

View File

@ -40,18 +40,14 @@ class KitIliasWebCrawlerSection(HttpCrawlerSection):
self.invalid_value("target", target, "Should be <course id | desktop | kit ilias URL>") self.invalid_value("target", target, "Should be <course id | desktop | kit ilias URL>")
def tfa_auth(self, authenticators: Dict[str, Authenticator]) -> Optional[Authenticator]: def tfa_auth(self, authenticators: Dict[str, Authenticator]) -> Optional[Authenticator]:
value = self.s.get("tfa_auth") value: Optional[str] = self.s.get("tfa_auth")
if not value: if value is None:
return None return None
auth = authenticators.get(value)
auth = authenticators.get(f"auth:{value}")
if auth is None: if auth is None:
self.invalid_value("auth", value, "No such auth section exists") self.invalid_value("tfa_auth", value, "No such auth section exists")
return auth return auth
def link_file_redirect_delay(self) -> int:
return self.s.getint("link_file_redirect_delay", fallback=-1)
def links(self) -> Links: def links(self) -> Links:
type_str: Optional[str] = self.s.get("links") type_str: Optional[str] = self.s.get("links")
@ -63,6 +59,9 @@ class KitIliasWebCrawlerSection(HttpCrawlerSection):
except ValueError as e: except ValueError as e:
self.invalid_value("links", type_str, str(e).capitalize()) self.invalid_value("links", type_str, str(e).capitalize())
def link_redirect_delay(self) -> int:
return self.s.getint("link_redirect_delay", fallback=-1)
def videos(self) -> bool: def videos(self) -> bool:
return self.s.getboolean("videos", fallback=False) return self.s.getboolean("videos", fallback=False)
@ -173,7 +172,7 @@ class KitIliasWebCrawler(HttpCrawler):
self._base_url = "https://ilias.studium.kit.edu" self._base_url = "https://ilias.studium.kit.edu"
self._target = section.target() self._target = section.target()
self._link_file_redirect_delay = section.link_file_redirect_delay() self._link_file_redirect_delay = section.link_redirect_delay()
self._links = section.links() self._links = section.links()
self._videos = section.videos() self._videos = section.videos()