Improve README

- Added a table of contents
- Reworked the transform section
- Fixed the commented example
This commit is contained in:
Joscha 2020-07-11 18:16:33 +00:00
parent 5b38ab8cf1
commit c28347122e

190
README.md
View File

@ -2,35 +2,40 @@
**P**rogramm zum **F**lotten, **E**infachen **R**unterladen von **D**ateien **P**rogramm zum **F**lotten, **E**infachen **R**unterladen von **D**ateien
- [Installation](#installation)
- [Upgrading from 2.0.0 to 2.1.0+](#upgrading-from-200-to-210)
- [Example setup](#example-setup)
- [Usage](#usage)
- [General concepts](#general-concepts)
- [Constructing transforms](#constructing-transforms)
- [Transform creators](#transform-creators)
- [Transform combinators](#transform-combinators)
- [A short, but commented example](#a-short-but-commented-example)
## Installation ## Installation
Ensure that you have at least Python 3.8 installed. Ensure that you have at least Python 3.8 installed.
To install PFERD or update your installation to the latest version, run this To install PFERD or update your installation to the latest version, run this
wherever you want to install/have installed PFERD: wherever you want to install or have already installed PFERD:
``` ```
$ pip install git+https://github.com/Garmelon/PFERD@v2.1.2 $ pip install git+https://github.com/Garmelon/PFERD@v2.1.2
``` ```
The use of [venv](https://docs.python.org/3/library/venv.html) is recommended. The use of [venv] is recommended.
[venv]: https://docs.python.org/3/library/venv.html
### Upgrading from 2.0.0 to 2.1.0+ ### Upgrading from 2.0.0 to 2.1.0+
The `IliasDirectoryType` type was renamed to `IliasElementType` and is now far - The `IliasDirectoryType` type was renamed to `IliasElementType` and is now far more detailed:
more detailed. The new values are: `REGULAR_FOLDER`, `VIDEO_FOLDER`, `EXERCISE_FOLDER`, `REGULAR_FILE`, `VIDEO_FILE`, `FORUM`, `EXTERNAL_LINK`.
The new values are: REGULAR_FOLDER, VIDEO_FOLDER, - Forums and external links are skipped automatically if you use the `kit_ilias` helper.
EXERCISE_FOLDER, REGULAR_FILE, VIDEO_FILE, FORUM, EXTERNAL_LINK.
Forums and external links are skipped automatically if you use the `kit_ilias` helper.
## Example setup ## Example setup
In this example, `python3` refers to at least Python 3.8. In this example, `python3` refers to at least Python 3.8.
If you just want to get started and crawl *your entire ILIAS Desktop* instead
of a given set of courses, please replace `example_config.py` with
`example_config_personal_desktop.py` in all of the instructions below (`curl` call and
`python3` run command).
A full example setup and initial use could look like: A full example setup and initial use could look like:
``` ```
$ mkdir Vorlesungen $ mkdir Vorlesungen
@ -51,50 +56,93 @@ $ python3 example_config.py
$ deactivate $ deactivate
``` ```
If you just want to get started and crawl *your entire ILIAS Desktop* instead
of a given set of courses, please replace `example_config.py` with
`example_config_personal_desktop.py` in all of the instructions below (`curl` call and
`python3` run command).
## Usage ## Usage
### General concepts
A PFERD config is a normal python file that starts multiple *synchronizers* A PFERD config is a normal python file that starts multiple *synchronizers*
which do all the heavy lifting. While you can create and wire them up manually, which do all the heavy lifting. While you can create and wire them up manually,
you are encouraged to use the helper methods provided in `PFERD.Pferd`. you are encouraged to use the helper methods provided in `PFERD.Pferd`.
The synchronizers take some input arguments specific to their service and a The synchronizers take some input arguments specific to their service and a
*transformer*. The transformer receives the computed path of an element in *transform*. The transform receives the computed path of an element in ILIAS and
ILIAS and can return either an output path (so you can rename files or move can return either an output path (so you can rename files or move them around as
them around as you wish) or `None` if you do not want to save the given file. you wish) or `None` if you do not want to save the given file.
Additionally the ILIAS synchronizer allows you to define a *crawl filter*. This Additionally the ILIAS synchronizer allows you to define a *crawl filter*. This
filter also receives the computed path as the input, but is only called for filter also receives the computed path as the input, but is only called for
*directories*. If you return `True`, the directory will be crawled and *directories*. If you return `True`, the directory will be crawled and
searched. If you return `False` the directory will be ignored and nothing in it searched. If you return `False` the directory will be ignored and nothing in it
will be passed to the transformer. will be passed to the transform.
In order to help you with writing your own transformers and filters, PFERD ### Constructing transforms
ships with a few powerful building blocks:
| Method | Description | While transforms are just normal python functions, writing them by hand can
|--------|-------------| quickly become tedious. In order to help you with writing your own transforms
| `glob` | Returns a transform that returns `None` if the glob does not match and the unmodified path otherwise. | and filters, PFERD defines a few useful transform creators and combinators in
| `predicate` | Returns a transform that returns `None` if the predicate does not match the path and the unmodified path otherwise. | the `PFERD.transform` module:
| `move_dir(source, target)` | Returns a transform that moves all files from the `source` to the `target` dir. |
| `move(source, target)` | Returns a transform that moves the `source` file to `target`. |
| `rename(old, new)` | Renames a single file. |
| `re_move(regex, sub)` | Moves all files matching the given regular expression. The different captured groups are available under their index and can be used together with normal python format methods: `re_move(r"Blatt (\d+)\.pdf", "Blätter/Blatt_{1:0>2}.pdf"),`. |
| `re_rename(old, new)` | Same as `re_move` but operates on the path *names* instead of the full path. |
And PFERD also offers a few combinator functions: #### Transform creators
* **`keep`** These methods let you create a few basic transform building blocks:
`keep` just returns the input path unchanged. It can be very useful as the
last argument in an `attempt` call, to leave everything not matching a rule - **`glob(glob)`**
unchanged. Creates a transform that returns the unchanged path if the glob matches the path and `None` otherwise.
* **`optionally(transformer)`** See also [Path.match].
Wraps a given transformer and returns its result if it is not `None`. Example: `glob("Übung/*.pdf")`
- **`predicate(pred)`**
Creates a transform that returns the unchanged path if `pred(path)` returns a truthy value.
Returns `None` otherwise.
Example: `predicate(lambda path: len(path.parts) == 3)`
- **`move_dir(source, target)`**
Creates a transform that moves all files from the `source` to the `target` directory.
Example: `move_dir("Übung/", "Blätter/")`
- **`move(source, target)`**
Creates a transform that moves the `source` file to `target`.
Example: `move("Vorlesung/VL02_Automten.pdf", "Vorlesung/VL02_Automaten.pdf")`
- **`rename(source, target)`**
Creates a transform that renames all files named `source` to `target`.
This transform works on the file names, not paths, and thus works no matter where the file is located.
Example: `rename("VL02_Automten.pdf", "VL02_Automaten.pdf")`
- **`re_move(regex, target)`**
Creates a transform that moves all files matching `regex` to `target`.
The transform `str.format` on the `target` string with the contents of the capturing groups before returning it.
The capturing groups can be accessed via their index.
See also [Match.group].
Example: `re_move(r"Übung/Blatt (\d+)\.pdf", "Blätter/Blatt_{1:0>2}.pdf")`
- **`re_rename(regex, target)`**
Creates a transform that renames all files matching `regex` to `target`.
This transform works on the file names, not paths, and thus works no matter where the file is located.
Example: `re_rename(r"VL(\d+)(.*)\.pdf", "Vorlesung_Nr_{1}__{2}.pdf")`
All movement or rename transforms above return `None` if a file doesn't match
their movement or renaming criteria. This enables them to be used as building
blocks to build up more complex transforms.
In addition, `PFERD.transform` also defines the `keep` transform which returns its input path unchanged.
This behaviour can be very useful when creating more complex transforms.
See below for example usage.
[Path.match]: https://docs.python.org/3/library/pathlib.html#pathlib.Path.match
[Match.group]: https://docs.python.org/3/library/re.html#re.Match.group
#### Transform combinators
These methods let you combine transforms into more complex transforms:
- **`optionally(transform)`**
Wraps a given transform and returns its result if it is not `None`.
Otherwise returns the input path unchanged. Otherwise returns the input path unchanged.
* **`do(transformers)`** See below for example usage.
`do` accepts a series of transformers and applies them in the given order to * **`do(transforms)`**
the result of the previous one. If any transformer returns `None`, do Accepts a series of transforms and applies them in the given order to the result of the previous one.
short-circuits and also returns `None`. This can be used to perform multiple If any transform returns `None`, `do` short-circuits and also returns `None`.
renames in a row: This can be used to perform multiple renames in a row:
```py ```py
do( do(
# Move them # Move them
@ -103,13 +151,12 @@ And PFERD also offers a few combinator functions:
optionally(re_rename("(.*).m4v.mp4", "{1}.mp4")), optionally(re_rename("(.*).m4v.mp4", "{1}.mp4")),
# Remove the 'dbs' prefix (if they have any) # Remove the 'dbs' prefix (if they have any)
optionally(re_rename("(?i)dbs-(.+)", "{1}")), optionally(re_rename("(?i)dbs-(.+)", "{1}")),
), )
``` ```
* **`attempt(transformers)`** - **`attempt(transforms)`**
`attempt` applies the passed transformers in the given order until it finds Applies the passed transforms in the given order until it finds one that does not return `None`.
one that does not return `None`. If it does not find any, it returns `None`. If it does not find any, it returns `None`.
This can be used to give a list of possible transformations and it will This can be used to give a list of possible transformations and automatically pick the first one that fits:
automatically pick the first one that fits:
```py ```py
attempt( attempt(
# Move all videos. If a video is passed in, this `re_move` will succeed # Move all videos. If a video is passed in, this `re_move` will succeed
@ -122,17 +169,26 @@ And PFERD also offers a few combinator functions:
) )
``` ```
All of these combinators are used in the provided example config, if you want All of these combinators are used in the provided example configs, if you want
to see some more true-to-live usages. to see some more real-life usages.
### A short, but commented example ### A short, but commented example
```py ```py
def filter_course(path: PurePath) -> bool: from pathlib import Path, PurePath
# Note that glob returns a Transformer from PFERD import Pferd
# - a function from PurePath -> Optional[PurePath] from PFERD.ilias import IliasElementType
# So we need to apply the result of 'glob' to our input path. from PFERD.transform import *
# We need to crawl the 'Tutorien' folder as it contains the one we want.
# This filter will later be used by the ILIAS crawler to decide whether it
# should crawl a directory (or directory-like structure).
def filter_course(path: PurePath, type: IliasElementType) -> bool:
# Note that glob returns a Transform, which is a function from PurePath ->
# Optional[PurePath]. Because of this, we need to apply the result of
# 'glob' to our input path. The returned value will be truthy (a Path) if
# the transform succeeded, or `None` if it failed.
# We need to crawl the 'Tutorien' folder as it contains one that we want.
if glob("Tutorien/")(path): if glob("Tutorien/")(path):
return True return True
# If we found 'Tutorium 10', keep it! # If we found 'Tutorium 10', keep it!
@ -145,21 +201,35 @@ def filter_course(path: PurePath) -> bool:
# All other dirs (including subdirs of 'Tutorium 10') should be searched :) # All other dirs (including subdirs of 'Tutorium 10') should be searched :)
return True return True
enable_logging() # needed once before calling a Pferd method
# Create a Pferd instance rooted in the same directory as the script file # This transform will later be used to rename a few files. It can also be used
# This is not a test run, so files will be downloaded (default, can be omitted) # to ignore some files.
transform_course = attempt(
# We don't care about the other tuts and would instead prefer a cleaner
# directory structure.
move_dir("Tutorien/Tutorium 10/", "Tutorium/"),
# We don't want to modify any other files, so we're going to keep them
# exactly as they are.
keep
)
# Enable and configure the text output. Needs to be called before calling any
# other PFERD methods.
Pferd.enable_logging()
# Create a Pferd instance rooted in the same directory as the script file. This
# is not a test run, so files will be downloaded (default, can be omitted).
pferd = Pferd(Path(__file__).parent, test_run=False) pferd = Pferd(Path(__file__).parent, test_run=False)
# Use the ilias_kit helper to synchronize an ILIAS course # Use the ilias_kit helper to synchronize an ILIAS course
pferd.ilias_kit( pferd.ilias_kit(
# The folder all of the course's content should be placed in # The directory that all of the downloaded files should be placed in
Path("My cool course"), "My_cool_course/",
# The course ID (found in the URL when on the course page in ILIAS) # The course ID (found in the URL when on the course page in ILIAS)
"course id", "course id",
# A path to a cookie jar. If you synchronize multiple ILIAS courses, setting this # A path to a cookie jar. If you synchronize multiple ILIAS courses,
# to a common value requires you to only login once. # setting this to a common value requires you to only log in once.
cookies=Path("ilias_cookies.txt"), cookies=Path("ilias_cookies.txt"),
# A transform to apply to all found paths # A transform can rename, move or filter out certain files
transform=transform_course, transform=transform_course,
# A crawl filter limits what paths the cralwer searches # A crawl filter limits what paths the cralwer searches
dir_filter=filter_course, dir_filter=filter_course,