mirror of
https://github.com/Garmelon/PFERD.git
synced 2023-12-21 10:23:01 +01:00
Document config file format
This commit is contained in:
parent
f776186480
commit
9ec19be113
138
CONFIG.md
Normal file
138
CONFIG.md
Normal file
@ -0,0 +1,138 @@
|
||||
# Config file format
|
||||
|
||||
A config file consists of sections. A section begins with a `[section]` header,
|
||||
which is followed by a list of `key = value` or `key: value` pairs. Comments
|
||||
must be on their own line and start with `#` or `;`. Multiline values must be
|
||||
indented beyond their key. For more details and some examples on the format, see
|
||||
the [configparser documentation][1] ([basic interpolation][2] is enabled).
|
||||
|
||||
[1]: <https://docs.python.org/3/library/configparser.html#supported-ini-file-structure> "Supported INI File Structure"
|
||||
[2]: <https://docs.python.org/3/library/configparser.html#configparser.BasicInterpolation> "BasicInterpolation"
|
||||
|
||||
## The `DEFAULT` section
|
||||
|
||||
This section contains global configuration values. It can also be used to set
|
||||
default values for the other sections.
|
||||
|
||||
- `working_dir`: The directory PFERD operates in. Set to an absolute path to
|
||||
make PFERD operate the same regardless of where it is executed. All other
|
||||
paths in the config file are interpreted relative to this path. If this path
|
||||
is relative, it is interpreted relative to the script's working dir. `~` is
|
||||
expanded to the current user's home directory. (Default: `.`)
|
||||
|
||||
## The `crawl:*` sections
|
||||
|
||||
Sections whose names start with `crawl:` are used to configure crawlers. The
|
||||
rest of the section name specifies the name of the crawler.
|
||||
|
||||
A crawler synchronizes a remote resource to a local directory. There are
|
||||
different types of crawlers for different kinds of resources, e. g. ILIAS
|
||||
courses or lecture websites.
|
||||
|
||||
Each crawl section represents an instance of a specific type of crawler. The
|
||||
`type` option is used to specify the crawler type. The crawler's name is usually
|
||||
used as the name for the output directory. New crawlers can be created simply by
|
||||
adding a new crawl section to the config file.
|
||||
|
||||
Depending on a crawler's type, it may have different options. For more details,
|
||||
see the type's documentation below. The following options are common to all
|
||||
crawlers:
|
||||
|
||||
- `type`: The types are specified in [this section](#crawler-types).
|
||||
- `output_dir`: The directory the crawler synchronizes files to. A crawler will
|
||||
never place any files outside of this directory. (Default: crawler's name)
|
||||
- `transform`: Rules for renaming and excluding certain files and directories.
|
||||
For more details, see [this section](#transformation-rules). (Default: empty)
|
||||
|
||||
## The `auth:*` sections
|
||||
|
||||
Sections whose names start with `auth:` are used to configure authenticators. An
|
||||
authenticator provides login credentials to one or more crawlers.
|
||||
|
||||
Authenticators work similar to crawlers: A section represents an authenticator
|
||||
instance, whose name is the rest of the section name. The type is specified by
|
||||
the `type` option.
|
||||
|
||||
Depending on an authenticator's type, it may have different options. For more
|
||||
details, see the type's documentation below. The only option common to all
|
||||
authenticators is `type`:
|
||||
|
||||
- `type`: The types are specified in [this section](#authenticator-types).
|
||||
|
||||
## Crawler types
|
||||
|
||||
TODO Fill in as crawlers are implemented
|
||||
|
||||
## Authenticator types
|
||||
|
||||
TODO Fill in as authenticators are implemented
|
||||
|
||||
## Transformation rules
|
||||
|
||||
Transformation rules are rules for renaming and excluding files and directories.
|
||||
They are specified line-by-line in a crawler's `transform` option. When a
|
||||
crawler needs to apply a rule to a path, it goes through this list top-to-bottom
|
||||
and choose the first matching rule.
|
||||
|
||||
Each line has the format `SOURCE ARROW TARGET` where `TARGET` is optional.
|
||||
`SOURCE` is either a normal path without spaces (e. g. `foo/bar`), or a string
|
||||
literal delimited by `"` or `'` (e. g. `"foo\" bar/baz"`). Python's string
|
||||
escape syntax is supported. Trailing slashes are ignored. `TARGET` can be
|
||||
formatted like `SOURCE`, but it can also be a single exclamation mark without
|
||||
quotes (`!`). `ARROW` is one of `-->`, `-exact->` and `-re->`.
|
||||
|
||||
If a rule's target is `!`, this means that when the rule matches on a path, the
|
||||
corresponding file or directory is ignored. If a rule's target is missing, the
|
||||
path is matched but not modified.
|
||||
|
||||
### The `-->` arrow
|
||||
|
||||
The `-->` arrow is a basic renaming operation. If a path begins with `SOURCE`,
|
||||
that part of the path is replaced with `TARGET`. This means that the rule
|
||||
`foo/bar --> baz` would convert `foo/bar` into `baz`, but also `foo/bar/xyz`
|
||||
into `baz/xyz`. The rule `foo --> !` would ignore a directory named `foo` as
|
||||
well as all its contents.
|
||||
|
||||
### The `-exact->` arrow
|
||||
|
||||
The `-exact->` arrow requires the path to match `SOURCE` exactly. This means
|
||||
that the rule `foo/bar -exact-> baz` would still convert `foo/bar` into `baz`,
|
||||
but `foo/bar/xyz` would be unaffected. Also, `foo -exact-> !` would only ignore
|
||||
`foo`, but not its contents (if it has any). The examples below show why this is
|
||||
useful.
|
||||
|
||||
### The `-re->` arrow
|
||||
|
||||
The `-re->` arrow uses regular expressions. `SOURCE` is a regular expression
|
||||
that must match the entire path. If this is the case, then the capturing groups
|
||||
are available in `TARGET` for formatting.
|
||||
|
||||
### Example: Tutorials
|
||||
|
||||
You have ILIAS course with lots of tutorials, but are only interested in a
|
||||
single one?
|
||||
|
||||
```
|
||||
tutorials/
|
||||
|- tut_01/
|
||||
|- tut_02/
|
||||
|- tut_03/
|
||||
...
|
||||
```
|
||||
|
||||
You can use a mix of normal and exact arrows to get rid of the other ones and
|
||||
move the `tutorials/tut_02/` folder to `my_tut/`:
|
||||
|
||||
```
|
||||
tutorials/tut_02 --> my_tut
|
||||
tutorials -exact->
|
||||
tutorials --> !
|
||||
```
|
||||
|
||||
The second rule is required for many crawlers since they use the rules to decide
|
||||
which directories to crawl. If it was missing when the crawler looks at
|
||||
`tutorials/`, the third rule would match. This means the crawler would not crawl
|
||||
the `tutorials/` directory and thus not discover that `tutorials/tut02/`
|
||||
existed.
|
||||
|
||||
Since the second rule is only relevant for crawling, the `TARGET` is left out.
|
Loading…
Reference in New Issue
Block a user