pferd/README.md

148 lines
4.5 KiB
Markdown
Raw Permalink Normal View History

2019-04-25 21:58:30 +02:00
# PFERD
**P**rogramm zum **F**lotten, **E**infachen **R**unterladen von **D**ateien
2019-04-25 13:48:58 +02:00
Other resources:
2021-04-29 18:55:08 +02:00
- [Config file format](CONFIG.md)
2021-04-29 16:14:50 +02:00
- [Changelog](CHANGELOG.md)
- [Development Guide](DEV.md)
2020-10-06 17:46:42 +02:00
2021-05-25 17:16:57 +02:00
## Installation
### Direct download
Binaries for Linux, Windows and Mac can be downloaded directly from the
[latest release](https://github.com/Garmelon/PFERD/releases/latest).
### With pip
2020-10-06 17:46:42 +02:00
2022-04-27 22:16:47 +02:00
Ensure you have at least Python 3.9 installed. Run the following command to
install PFERD or upgrade it to the latest version:
2020-10-06 17:46:42 +02:00
2019-04-25 13:48:58 +02:00
```
$ pip install --upgrade git+https://github.com/Garmelon/PFERD@latest
2019-04-25 21:30:02 +02:00
```
The use of [venv](https://docs.python.org/3/library/venv.html) is recommended.
2021-05-25 17:16:57 +02:00
2022-04-27 21:15:33 +02:00
### With package managers
Unofficial packages are available for:
- [AUR](https://aur.archlinux.org/packages/pferd)
- [brew](https://formulae.brew.sh/formula/pferd)
- [conda-forge](https://github.com/conda-forge/pferd-feedstock)
2022-04-27 21:15:33 +02:00
- [nixpkgs](https://github.com/NixOS/nixpkgs/blob/master/pkgs/tools/misc/pferd/default.nix)
- [PyPi](https://pypi.org/project/pferd)
2022-04-27 21:15:33 +02:00
See also PFERD's [repology page](https://repology.org/project/pferd/versions).
2021-05-25 17:16:57 +02:00
## Basic usage
PFERD can be run directly from the command line with no config file. Run `pferd
-h` to get an overview of available commands and options. Run `pferd <command>
-h` to see which options a command has.
2021-05-25 17:16:57 +02:00
For example, you can download your personal desktop from the KIT ILIAS like
this:
```
$ pferd kit-ilias-web desktop <output_directory>
```
Also, you can download most ILIAS pages directly like this:
```
$ pferd kit-ilias-web <url> <output_directory>
```
However, the CLI only lets you download a single thing at a time, and the
resulting command can grow long quite quickly. Because of this, PFERD can also
be used with a config file.
To get started, just take a command you've been using and add `--dump-config`
directly after `pferd`, like this:
```
$ pferd --dump-config kit-ilias-web <url> <output_directory>
```
This will make PFERD write its current configuration to its default config file
path. You can then run `pferd` without a command and it will execute the config
file. Alternatively, you can use `--dump-config-to` and specify a path yourself.
Using `--dump-config-to -` will print the configuration to stdout instead of a
file, which is a good way to see what is actually going on when using a CLI
command.
Another good way to see what PFERD is doing is the `--explain` option. When
enabled, PFERD explains in detail what it is doing and why. This can help with
2021-05-28 23:21:04 +02:00
debugging your own config.
2021-05-25 17:16:57 +02:00
If you don't want to run all crawlers from your config file, you can specify the
crawlers you want to run with `--crawler` or `-C`, like this:
```
$ pferd -C crawler1 -C crawler2
```
## Advanced usage
PFERD supports lots of different options. For example, you can configure PFERD
to [use your system's keyring](CONFIG.md#the-keyring-authenticator) instead of
prompting you for your username and password. PFERD also supports
[transformation rules](CONFIG.md#transformation-rules) that let you rename or
exclude certain files.
For more details, see the comprehensive [config format documentation](CONFIG.md).
## Example
This example downloads a few courses from the KIT ILIAS with a common keyring
authenticator. It reorganizes and ignores some files.
```ini
[DEFAULT]
# All paths will be relative to this.
# The crawler output directories will be <working_dir>/Foo and <working_dir>/Bar.
working_dir = ~/stud
# If files vanish from ILIAS the local files are not deleted, allowing us to
# take a look at them before deleting them ourselves.
on_conflict = no-delete
[auth:ilias]
type = keyring
username = foo
[crawl:Foo]
type = kit-ilias-web
auth = auth:ilias
# Crawl a course by its ID (found as `ref_id=ID` in the URL)
target = 1234567
# Plaintext files are easier to read by other tools
links = plaintext
transform =
# Ignore unneeded folders
Online-Tests --> !
Vorlesungswerbung --> !
# Rename folders
Lehrbücher --> Vorlesung
# Note the ">>" arrow head which lets us apply further rules to files moved to "Übung"
Übungsunterlagen -->> Übung
2021-05-25 17:16:57 +02:00
# Move exercises to own folder. Rename them to "Blatt-XX.pdf" to make them sort properly
"Übung/(\d+). Übungsblatt.pdf" -re-> Blätter/Blatt-{i1:02}.pdf
2021-05-25 17:16:57 +02:00
# Move solutions to own folder. Rename them to "Blatt-XX-Lösung.pdf" to make them sort properly
"Übung/(\d+). Übungsblatt.*Musterlösung.pdf" -re-> Blätter/Blatt-{i1:02}-Lösung.pdf
2021-05-25 17:16:57 +02:00
# The course has nested folders with the same name - flatten them
"Übung/(.+?)/\\1" -re-> Übung/{g1}
2021-05-25 17:16:57 +02:00
[crawl:Bar]
type = kit-ilias-web
auth = auth:ilias
target = 1337420
```