Compare commits

..

90 Commits

Author SHA1 Message Date
1c2b6bf994 Bump version 2020-12-13 19:57:29 +01:00
ee39aaf08b Fix merge marker in LICENSE 2020-12-07 22:55:28 +01:00
93e6329901 Use the least destructive conflict resolver if there are multiple 2020-12-06 13:28:08 +01:00
f47b137b59 Fix ILIAS init.py and Pferd.py authenticators 2020-12-06 13:15:32 +01:00
83ea15ee83 Use system keyring service for password auth 2020-12-06 13:15:30 +01:00
75471c46d1 Use credential file 2020-12-05 23:44:09 +01:00
1e0343bba6 sync_url: Add username and password args 2020-12-05 23:30:09 +01:00
0f5e55648b Tell user when the conflict resolver kept existing files 2020-12-05 14:12:45 +01:00
57259e21f4 Print download summary in sync_url 2020-12-05 14:09:09 +01:00
4ce385b262 Treat file overwrite and marked file overwrite differently 2020-12-05 14:03:43 +01:00
2d64409542 Fix handling of empty args.folder 2020-12-05 13:50:46 +01:00
fcb3884a8f Add --remote-first, --local-first and --no-delete flags 2020-12-05 13:49:05 +01:00
9f6dc56a7b Use a strategy to decide conflict resolution 2020-12-02 19:32:57 +01:00
56ab473611 Merge pull request #17 from TheChristophe/master
Add flag to make sync_url use defaults instead of prompting
2020-12-02 19:04:46 +01:00
6426060804 Fix relative paths bug
Introduced in 74ea039458
2020-12-02 18:40:45 +01:00
49a0ca7a7c Add myself to LICENSE
This should've been done back when I added a PR for adding sync_url but people are lazy smh.
2020-12-02 18:24:07 +01:00
f3a4663491 Add passive/no_prompt flag 2020-12-02 18:24:07 +01:00
ecdbca8fb6 Make sync_url work relative to cwd like sane programs 2020-12-02 18:24:04 +01:00
9cbea5fe06 Add requirements.txt 2020-11-23 10:16:40 +01:00
ba3c7f85fa Replace "\" in ILIAS paths as well
I am not sure whether anybody really uses a backslash in their names,
but I guess it can't hurt to do this for windows users.
2020-11-19 19:37:28 +01:00
ba9215ebe8 Bump version 2020-11-18 10:09:45 +01:00
8ebf0eab16 Sort download summary 2020-11-17 21:36:04 +01:00
cd90a60dee Move "sanitize_windows_path" to PFERD.transform 2020-11-12 20:52:46 +01:00
98834c9c95 Bump version 2020-11-12 20:23:36 +01:00
55e9e719ad Sanitize "/" in ilias path names 2020-11-12 20:21:24 +01:00
a0ae9aee27 Sanitize individual path parts 2020-11-11 09:36:20 +01:00
1486a63854 Do not collapse directory structure when sanitizing 2020-11-10 22:53:47 +01:00
733e1ae136 Bump version 2020-11-10 20:50:31 +01:00
4ac51048c1 Use "_" as a replacement for illegal characters 2020-11-10 20:49:14 +01:00
f2aba970fd [sync_url] Sanitize path names on windows 2020-11-10 17:16:14 +01:00
9c4759103a Bump patch version 2020-11-05 11:25:06 +01:00
316b9d7bf4 Prevent too many retries when fetching an ILIAS page 2020-11-04 22:23:56 +01:00
6f30adcd22 Fix quote type in README 2020-11-04 22:13:08 +01:00
6f78fef604 Add quoting instructions to README 2020-11-04 22:08:33 +01:00
f830b42a36 Fix duplicate files in download summary 2020-11-04 21:49:35 +01:00
ef343dec7c Merge organizer download summaries 2020-11-04 15:06:58 +01:00
0da2fafcd8 Fix links outside tables 2020-11-04 14:46:15 +01:00
f4abe3197c Add ipd crawler 2020-11-03 21:15:40 +01:00
38d4f5b4c9 Do not fail only empty courses 2020-11-03 20:09:54 +01:00
9ea03bda3e Adjust release names 2020-10-30 18:14:02 +01:00
07de5bea8b Explain how to run sync_url on Mac 2020-10-30 17:53:55 +01:00
f0d572c110 Fix a few typos in release body 2020-10-30 17:32:04 +01:00
076067e22d Bump version 2020-10-30 17:28:34 +01:00
ebb6e63c5c Add MacOS to CI 2020-10-30 17:23:27 +01:00
0c3f35a2d2 Do not provide a shorthand for "no-videos" 2020-10-30 17:01:10 +01:00
521890ae78 Update README.md 2020-10-28 23:24:18 +01:00
3f7c73df80 Release new minor version 2020-10-07 09:32:17 +02:00
43100f69d5 Merge pull request #10 from Garmelon/sync-url
Add "Sync url" script from Christophe and release it automatically
2020-10-07 09:29:48 +02:00
d73c778b0a Add sync_url instructions to README 2020-10-06 17:50:28 +02:00
73c3eb0984 Add option to skip videos in sync_url 2020-10-06 17:20:47 +02:00
a519cbe05d Add sync_url workflow 2020-10-06 12:42:20 +02:00
b3ad9783c4 Ignore pyinstaller files 2020-10-06 11:43:20 +02:00
c1ccb6c53e Allow crawling videos with sync_url 2020-10-06 10:46:06 +02:00
51a713fa04 Allow crawling courses or folders with sync_url
Video folders do not work, if they are passed directly. Their containing
folder must be specified instead.
2020-09-28 20:00:01 +02:00
74ea039458 Fix a few lint errors and pferd quirks in sync_url 2020-09-28 19:42:59 +02:00
aaa6a2b6a4 Merge pull request #9 from TheChristophe/master
Add simple course-download-by-url script
2020-09-28 19:25:45 +02:00
e32a49480b Expose methods to look up course/element names by id / url 2020-09-28 19:16:52 +02:00
be65051f9d Support downloading folders in get-by-url script 2020-09-28 18:16:33 +02:00
3387bc5f20 Add simple course-download-by-url script 2020-09-28 17:49:36 +02:00
3f0ae729d6 Expand "is course" check to not download magazines or other weird things 2020-09-28 16:43:58 +02:00
8e8c1c031a Version 2.3.0 2020-09-03 21:47:10 +02:00
55678d7fee Pass string down to FileCookieJar
Some python versions just can't handle it *despite the documentation
stating they should*.
2020-08-12 09:09:14 +02:00
a57ee8b96b Add timeout to video downloads to work around requests IPv6 bug 2020-08-11 14:40:30 +02:00
e367da925e Bump version to 2.2.1 2020-07-28 19:55:32 +00:00
77a109bb7e Fix ilias shibboleth authenticator
The shibboleth site got a visual overhaul that slightly changed the classes of a
form we need.
2020-07-28 19:13:51 +00:00
a3e1864a26 Allow long paths on windows
If you start PFERD a few folders deep in your home directory, it is
quite easy to reach the maximum path length limit on Windows (260
chars). This patch opts in to long paths ("\\?\" prefix) which lift that
restriction at the cost of ugly path names.
2020-07-25 13:44:49 +02:00
41cbcc509c Update version to 2.2.0 2020-07-15 22:47:44 +02:00
77874b432b Also add personal_desktop to download summary 2020-07-15 22:47:44 +02:00
5c4c785e60 Fix HTML file downloading
Previously PFERD thought any HTML file was a "Error, no access" page
when downloading. Now it checks whether ILIAS sends a
content-disposition header, telling the browser to download the file. If
that is the case, it was just a HTML file uploaded to ILIAS. If it has
no header, it is probably an error message.
2020-07-15 15:12:14 +02:00
2aed4f6d1f Only query the dir_filter for directories 2020-07-13 13:36:12 +02:00
34152fbe54 Set mtime and atime to ILIAS dates where possible 2020-07-13 13:29:18 +02:00
4047fe78f3 Fix README formatting 2020-07-11 18:22:33 +00:00
c28347122e Improve README
- Added a table of contents
- Reworked the transform section
- Fixed the commented example
2020-07-11 18:16:33 +00:00
5b38ab8cf1 Add MIT license 2020-07-08 09:46:27 +00:00
bb25d32f03 Fix typo in README 2020-06-29 16:18:33 +02:00
ecaedea709 Merge pull request #8 from pavelzw/master
Fix version number
2020-06-26 17:52:05 +02:00
f05d1b1261 Fix version number 2020-06-26 17:49:47 +02:00
6aaa3071f9 Update README with new version 2020-06-26 17:35:03 +02:00
c26c9352f1 Make DownloadSummary private, provide property accessors 2020-06-26 17:30:45 +02:00
d9ea688145 Use pretty logger for summaries 2020-06-26 17:24:36 +02:00
e8be6e498e Add summary to example_config_personal_desktop 2020-06-26 17:24:36 +02:00
e4b1fac045 Satisfy pylint 2020-06-26 15:38:22 +02:00
402ae81335 Fix type hints 2020-06-26 13:17:44 +00:00
52f31e2783 Add type hints to DownloadSummary 2020-06-26 13:02:37 +02:00
739522a151 Move download summary into a separate class 2020-06-25 23:07:11 +02:00
6c034209b6 Add deleted files to summary 2020-06-25 22:00:28 +02:00
f6fbd5e4bb Add download summary 2020-06-25 19:19:34 +02:00
7024db1f13 Use transient progessbar
This will ensure no pesky newline ends up in the output, even on
windows.
2020-06-25 18:03:12 +02:00
23bfa42a0d Never use the direct download button, as it is currently broken 2020-06-11 13:31:01 +02:00
fdb57884ed Touch files with same content to update timestamps 2020-05-31 20:27:15 +02:00
22 changed files with 1195 additions and 123 deletions

74
.github/workflows/package.yml vendored Normal file
View File

@ -0,0 +1,74 @@
name: Package Application with Pyinstaller
on:
push:
branches:
- "*"
tags:
- "v*"
jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: '3.x'
- name: "Install dependencies"
run: "pip install setuptools pyinstaller rich requests beautifulsoup4 -f --upgrade"
- name: "Install sync_url.py"
run: "pyinstaller sync_url.py -F"
- name: "Move artifact"
run: "mv dist/sync_url* dist/sync_url-${{ matrix.os }}"
- uses: actions/upload-artifact@v2
with:
name: "Pferd Sync URL"
path: "dist/sync_url*"
release:
name: Release
needs: [build]
runs-on: ubuntu-latest
if: startsWith(github.ref, 'refs/tags/')
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
steps:
- name: "Checkout"
uses: actions/checkout@v2
- name: "Download artifacts"
uses: actions/download-artifact@v2
with:
name: "Pferd Sync URL"
- name: "look at folder structure"
run: "ls -lah"
- name: "Rename releases"
run: "mv sync_url-macos-latest pferd_sync_url_mac && mv sync_url-ubuntu-latest pferd_sync_url_linux && mv sync_url-windows-latest pferd_sync_url.exe"
- name: "Create release"
uses: softprops/action-gh-release@v1
- name: "Upload release artifacts"
uses: softprops/action-gh-release@v1
with:
body: "Download the correct sync_url for your platform and run it in the terminal or CMD. You might need to make it executable on Linux/Mac with `chmod +x <file>`. Also please enclose the *url you pass to the program in double quotes* or your shell might silently screw it up!"
files: |
pferd_sync_url_mac
pferd_sync_url_linux
pferd_sync_url.exe

7
.gitignore vendored
View File

@ -1,7 +1,14 @@
__pycache__/ __pycache__/
.venv/ .venv/
venv/
.idea/
build/
.mypy_cache/ .mypy_cache/
.tmp/ .tmp/
.env .env
.vscode .vscode
ilias_cookies.txt ilias_cookies.txt
# PyInstaller
sync_url.spec
dist/

18
LICENSE Normal file
View File

@ -0,0 +1,18 @@
Copyright 2019-2020 Garmelon, I-Al-Istannen, danstooamerican, pavelzw, TheChristophe, Scriptim
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

View File

@ -3,8 +3,19 @@ General authenticators useful in many situations
""" """
import getpass import getpass
import logging
from typing import Optional, Tuple from typing import Optional, Tuple
from .logging import PrettyLogger
LOGGER = logging.getLogger(__name__)
PRETTY = PrettyLogger(LOGGER)
try:
import keyring
except ImportError:
PRETTY.warning("Keyring module not found, KeyringAuthenticator won't work!")
class TfaAuthenticator: class TfaAuthenticator:
# pylint: disable=too-few-public-methods # pylint: disable=too-few-public-methods
@ -123,3 +134,81 @@ class UserPassAuthenticator:
if self._given_username is not None and self._given_password is not None: if self._given_username is not None and self._given_password is not None:
self._given_username = None self._given_username = None
self._given_password = None self._given_password = None
class KeyringAuthenticator(UserPassAuthenticator):
"""
An authenticator for username-password combinations that stores the
password using the system keyring service and prompts the user for missing
information.
"""
def get_credentials(self) -> Tuple[str, str]:
"""
Returns a tuple (username, password). Prompts user for username or
password when necessary.
"""
if self._username is None and self._given_username is not None:
self._username = self._given_username
if self._password is None and self._given_password is not None:
self._password = self._given_password
if self._username is not None and self._password is None:
self._load_password()
if self._username is None or self._password is None:
print(f"Enter credentials ({self._reason})")
username: str
if self._username is None:
username = input("Username: ")
self._username = username
else:
username = self._username
if self._password is None:
self._load_password()
password: str
if self._password is None:
password = getpass.getpass(prompt="Password: ")
self._password = password
self._save_password()
else:
password = self._password
return (username, password)
def _load_password(self) -> None:
"""
Loads the saved password associated with self._username from the system
keyring service (or None if not password has been saved yet) and stores
it in self._password.
"""
self._password = keyring.get_password("pferd-ilias", self._username)
def _save_password(self) -> None:
"""
Saves self._password to the system keyring service and associates it
with self._username.
"""
keyring.set_password("pferd-ilias", self._username, self._password)
def invalidate_credentials(self) -> None:
"""
Marks the credentials as invalid. If only a username was supplied in
the constructor, assumes that the username is valid and only the
password is invalid. If only a password was supplied in the
constructor, assumes that the password is valid and only the username
is invalid. Otherwise, assumes that username and password are both
invalid.
"""
try:
keyring.delete_password("pferd-ilias", self._username)
except keyring.errors.PasswordDeleteError:
pass
super().invalidate_credentials()

View File

@ -22,7 +22,7 @@ class CookieJar:
if cookie_file is None: if cookie_file is None:
self._cookies = LWPCookieJar() self._cookies = LWPCookieJar()
else: else:
self._cookies = LWPCookieJar(cookie_file) self._cookies = LWPCookieJar(str(cookie_file.resolve()))
@property @property
def cookies(self) -> LWPCookieJar: def cookies(self) -> LWPCookieJar:

75
PFERD/download_summary.py Normal file
View File

@ -0,0 +1,75 @@
"""
Provides a summary that keeps track of new modified or deleted files.
"""
from pathlib import Path
from typing import List
def _mergeNoDuplicate(first: List[Path], second: List[Path]) -> List[Path]:
tmp = list(set(first + second))
tmp.sort(key=lambda x: str(x.resolve()))
return tmp
class DownloadSummary:
"""
Keeps track of all new, modified or deleted files and provides a summary.
"""
def __init__(self) -> None:
self._new_files: List[Path] = []
self._modified_files: List[Path] = []
self._deleted_files: List[Path] = []
@property
def new_files(self) -> List[Path]:
"""
Returns all new files.
"""
return self._new_files.copy()
@property
def modified_files(self) -> List[Path]:
"""
Returns all modified files.
"""
return self._modified_files.copy()
@property
def deleted_files(self) -> List[Path]:
"""
Returns all deleted files.
"""
return self._deleted_files.copy()
def merge(self, summary: 'DownloadSummary') -> None:
"""
Merges ourselves with the passed summary. Modifies this object, but not the passed one.
"""
self._new_files = _mergeNoDuplicate(self._new_files, summary.new_files)
self._modified_files = _mergeNoDuplicate(self._modified_files, summary.modified_files)
self._deleted_files = _mergeNoDuplicate(self._deleted_files, summary.deleted_files)
def add_deleted_file(self, path: Path) -> None:
"""
Registers a file as deleted.
"""
self._deleted_files.append(path)
def add_modified_file(self, path: Path) -> None:
"""
Registers a file as changed.
"""
self._modified_files.append(path)
def add_new_file(self, path: Path) -> None:
"""
Registers a file as new.
"""
self._new_files.append(path)
def has_updates(self) -> bool:
"""
Returns whether this summary has any updates.
"""
return bool(self._new_files or self._modified_files or self._deleted_files)

View File

@ -37,8 +37,12 @@ class KitShibbolethAuthenticator(IliasAuthenticator):
Authenticate via KIT's shibboleth system. Authenticate via KIT's shibboleth system.
""" """
def __init__(self, username: Optional[str] = None, password: Optional[str] = None) -> None: def __init__(self, authenticator: Optional[UserPassAuthenticator] = None) -> None:
self._auth = UserPassAuthenticator("KIT ILIAS Shibboleth", username, password) if authenticator:
self._auth = authenticator
else:
self._auth = UserPassAuthenticator("KIT ILIAS Shibboleth")
self._tfa_auth = TfaAuthenticator("KIT ILIAS Shibboleth") self._tfa_auth = TfaAuthenticator("KIT ILIAS Shibboleth")
def authenticate(self, sess: requests.Session) -> None: def authenticate(self, sess: requests.Session) -> None:
@ -67,7 +71,7 @@ class KitShibbolethAuthenticator(IliasAuthenticator):
while not self._login_successful(soup): while not self._login_successful(soup):
# Searching the form here so that this fails before asking for # Searching the form here so that this fails before asking for
# credentials rather than after asking. # credentials rather than after asking.
form = soup.find("form", {"class": "form2", "method": "post"}) form = soup.find("form", {"class": "full content", "method": "post"})
action = form["action"] action = form["action"]
# Equivalent: Enter credentials in # Equivalent: Enter credentials in

View File

@ -26,6 +26,10 @@ LOGGER = logging.getLogger(__name__)
PRETTY = PrettyLogger(LOGGER) PRETTY = PrettyLogger(LOGGER)
def _sanitize_path_name(name: str) -> str:
return name.replace("/", "-").replace("\\", "-")
class IliasElementType(Enum): class IliasElementType(Enum):
""" """
The type of an ilias element. The type of an ilias element.
@ -38,6 +42,12 @@ class IliasElementType(Enum):
FORUM = "FORUM" FORUM = "FORUM"
EXTERNAL_LINK = "EXTERNAL_LINK" EXTERNAL_LINK = "EXTERNAL_LINK"
def is_folder(self) -> bool:
"""
Returns whether this type is some kind of folder.
"""
return "FOLDER" in str(self.name)
IliasDirectoryFilter = Callable[[Path, IliasElementType], bool] IliasDirectoryFilter = Callable[[Path, IliasElementType], bool]
@ -110,6 +120,16 @@ class IliasCrawler:
return urlunsplit((scheme, netloc, path, new_query_string, fragment)) return urlunsplit((scheme, netloc, path, new_query_string, fragment))
def recursive_crawl_url(self, url: str) -> List[IliasDownloadInfo]:
"""
Crawls a given url *and all reachable elements in it*.
Args:
url {str} -- the *full* url to crawl
"""
start_entries: List[IliasCrawlerEntry] = self._crawl_folder(Path(""), url)
return self._iterate_entries_to_download_infos(start_entries)
def crawl_course(self, course_id: str) -> List[IliasDownloadInfo]: def crawl_course(self, course_id: str) -> List[IliasDownloadInfo]:
""" """
Starts the crawl process for a course, yielding a list of elements to (potentially) Starts the crawl process for a course, yielding a list of elements to (potentially)
@ -128,7 +148,7 @@ class IliasCrawler:
if not self._is_course_id_valid(root_url, course_id): if not self._is_course_id_valid(root_url, course_id):
raise FatalException( raise FatalException(
"Invalid course id? The URL the server returned did not contain my id." "Invalid course id? I didn't find anything looking like a course!"
) )
# And treat it as a folder # And treat it as a folder
@ -137,7 +157,34 @@ class IliasCrawler:
def _is_course_id_valid(self, root_url: str, course_id: str) -> bool: def _is_course_id_valid(self, root_url: str, course_id: str) -> bool:
response: requests.Response = self._session.get(root_url) response: requests.Response = self._session.get(root_url)
return course_id in response.url # We were redirected ==> Non-existant ID
if course_id not in response.url:
return False
link_element: bs4.Tag = self._get_page(root_url, {}).find(id="current_perma_link")
if not link_element:
return False
# It wasn't a course but a category list, forum, etc.
return "crs_" in link_element.get("value")
def find_course_name(self, course_id: str) -> Optional[str]:
"""
Returns the name of a given course. None if it is not a valid course
or it could not be found.
"""
course_url = self._url_set_query_param(
self._base_url + "/goto.php", "target", f"crs_{course_id}"
)
return self.find_element_name(course_url)
def find_element_name(self, url: str) -> Optional[str]:
"""
Returns the name of the element at the given URL, if it can find one.
"""
focus_element: bs4.Tag = self._get_page(url, {}).find(id="il_mhead_t_focus")
if not focus_element:
return None
return focus_element.text
def crawl_personal_desktop(self) -> List[IliasDownloadInfo]: def crawl_personal_desktop(self) -> List[IliasDownloadInfo]:
""" """
@ -167,7 +214,7 @@ class IliasCrawler:
PRETTY.not_searching(entry.path, "forum") PRETTY.not_searching(entry.path, "forum")
continue continue
if not self.dir_filter(entry.path, entry.entry_type): if entry.entry_type.is_folder() and not self.dir_filter(entry.path, entry.entry_type):
PRETTY.not_searching(entry.path, "user filter") PRETTY.not_searching(entry.path, "user filter")
continue continue
@ -202,13 +249,22 @@ class IliasCrawler:
""" """
soup = self._get_page(url, {}) soup = self._get_page(url, {})
if soup.find(id="headerimage"):
element: bs4.Tag = soup.find(id="headerimage")
if "opencast" in element.attrs["src"].lower():
PRETTY.warning(f"Switched to crawling a video at {folder_path}")
if not self.dir_filter(folder_path, IliasElementType.VIDEO_FOLDER):
PRETTY.not_searching(folder_path, "user filter")
return []
return self._crawl_video_directory(folder_path, url)
result: List[IliasCrawlerEntry] = [] result: List[IliasCrawlerEntry] = []
# Fetch all links and throw them to the general interpreter # Fetch all links and throw them to the general interpreter
links: List[bs4.Tag] = soup.select("a.il_ContainerItemTitle") links: List[bs4.Tag] = soup.select("a.il_ContainerItemTitle")
for link in links: for link in links:
abs_url = self._abs_url_from_link(link) abs_url = self._abs_url_from_link(link)
element_path = Path(folder_path, link.getText().strip()) element_path = Path(folder_path, _sanitize_path_name(link.getText().strip()))
element_type = self._find_type_from_link(element_path, link, abs_url) element_type = self._find_type_from_link(element_path, link, abs_url)
if element_type == IliasElementType.REGULAR_FILE: if element_type == IliasElementType.REGULAR_FILE:
@ -325,7 +381,7 @@ class IliasCrawler:
modification_date = demangle_date(modification_date_str) modification_date = demangle_date(modification_date_str)
# Grab the name from the link text # Grab the name from the link text
name = link_element.getText() name = _sanitize_path_name(link_element.getText())
full_path = Path(path, name + "." + file_type) full_path = Path(path, name + "." + file_type)
return [ return [
@ -425,7 +481,8 @@ class IliasCrawler:
results: List[IliasCrawlerEntry] = [] results: List[IliasCrawlerEntry] = []
# We can download everything directly! # We can download everything directly!
if len(direct_download_links) == len(video_links): # FIXME: Sadly the download button is currently broken, so never do that
if False and len(direct_download_links) == len(video_links):
for link in direct_download_links: for link in direct_download_links:
results += self._crawl_single_video(video_dir_path, link, True) results += self._crawl_single_video(video_dir_path, link, True)
else: else:
@ -455,7 +512,7 @@ class IliasCrawler:
).getText().strip() ).getText().strip()
title += ".mp4" title += ".mp4"
video_path: Path = Path(parent_path, title) video_path: Path = Path(parent_path, _sanitize_path_name(title))
video_url = self._abs_url_from_link(link) video_url = self._abs_url_from_link(link)
@ -527,6 +584,7 @@ class IliasCrawler:
# Two divs, side by side. Left is the name, right is the link ==> get left # Two divs, side by side. Left is the name, right is the link ==> get left
# sibling # sibling
file_name = file_link.parent.findPrevious(name="div").getText().strip() file_name = file_link.parent.findPrevious(name="div").getText().strip()
file_name = _sanitize_path_name(file_name)
url = self._abs_url_from_link(file_link) url = self._abs_url_from_link(file_link)
LOGGER.debug("Found file %r at %r", file_name, url) LOGGER.debug("Found file %r at %r", file_name, url)
@ -540,10 +598,17 @@ class IliasCrawler:
return results return results
def _get_page(self, url: str, params: Dict[str, Any]) -> bs4.BeautifulSoup: def _get_page(self, url: str, params: Dict[str, Any],
retry_count: int = 0) -> bs4.BeautifulSoup:
""" """
Fetches a page from ILIAS, authenticating when needed. Fetches a page from ILIAS, authenticating when needed.
""" """
if retry_count >= 4:
raise FatalException("Could not get a proper page after 4 tries. "
"Maybe your URL is wrong, authentication fails continuously, "
"your ILIAS connection is spotty or ILIAS is not well.")
LOGGER.debug("Fetching %r", url) LOGGER.debug("Fetching %r", url)
response = self._session.get(url, params=params) response = self._session.get(url, params=params)
@ -564,7 +629,7 @@ class IliasCrawler:
self._authenticator.authenticate(self._session) self._authenticator.authenticate(self._session)
return self._get_page(url, params) return self._get_page(url, params, retry_count + 1)
@staticmethod @staticmethod
def _is_logged_in(soup: bs4.BeautifulSoup) -> bool: def _is_logged_in(soup: bs4.BeautifulSoup) -> bool:

View File

@ -2,6 +2,8 @@
import datetime import datetime
import logging import logging
import math
import os
from pathlib import Path, PurePath from pathlib import Path, PurePath
from typing import Callable, List, Optional, Union from typing import Callable, List, Optional, Union
@ -82,9 +84,13 @@ class IliasDownloader:
session: requests.Session, session: requests.Session,
authenticator: IliasAuthenticator, authenticator: IliasAuthenticator,
strategy: IliasDownloadStrategy, strategy: IliasDownloadStrategy,
timeout: int = 5
): ):
""" """
Create a new IliasDownloader. Create a new IliasDownloader.
The timeout applies to the download request only, as bwcloud uses IPv6
and requests has a problem with that: https://github.com/psf/requests/issues/5522
""" """
self._tmp_dir = tmp_dir self._tmp_dir = tmp_dir
@ -92,6 +98,7 @@ class IliasDownloader:
self._session = session self._session = session
self._authenticator = authenticator self._authenticator = authenticator
self._strategy = strategy self._strategy = strategy
self._timeout = timeout
def download_all(self, infos: List[IliasDownloadInfo]) -> None: def download_all(self, infos: List[IliasDownloadInfo]) -> None:
""" """
@ -119,7 +126,15 @@ class IliasDownloader:
LOGGER.info("Retrying download: %r", info) LOGGER.info("Retrying download: %r", info)
self._authenticator.authenticate(self._session) self._authenticator.authenticate(self._session)
self._organizer.accept_file(tmp_file, info.path) dst_path = self._organizer.accept_file(tmp_file, info.path)
if dst_path and info.modification_date:
os.utime(
dst_path,
times=(
math.ceil(info.modification_date.timestamp()),
math.ceil(info.modification_date.timestamp())
)
)
def _try_download(self, info: IliasDownloadInfo, target: Path) -> bool: def _try_download(self, info: IliasDownloadInfo, target: Path) -> bool:
url = info.url() url = info.url()
@ -127,10 +142,11 @@ class IliasDownloader:
PRETTY.warning(f"Could not download {str(info.path)!r} as I got no URL :/") PRETTY.warning(f"Could not download {str(info.path)!r} as I got no URL :/")
return True return True
with self._session.get(url, stream=True) as response: with self._session.get(url, stream=True, timeout=self._timeout) as response:
content_type = response.headers["content-type"] content_type = response.headers["content-type"]
has_content_disposition = "content-disposition" in response.headers
if content_type.startswith("text/html"): if content_type.startswith("text/html") and not has_content_disposition:
if self._is_logged_in(soupify(response)): if self._is_logged_in(soupify(response)):
raise ContentTypeException("Attempting to download a web page, not a file") raise ContentTypeException("Attempting to download a web page, not a file")

151
PFERD/ipd.py Normal file
View File

@ -0,0 +1,151 @@
"""
Utility functions and a scraper/downloader for the IPD pages.
"""
import datetime
import logging
import math
import os
from dataclasses import dataclass
from pathlib import Path
from typing import Callable, List, Optional
from urllib.parse import urljoin
import bs4
import requests
from PFERD.errors import FatalException
from PFERD.utils import soupify
from .logging import PrettyLogger
from .organizer import Organizer
from .tmp_dir import TmpDir
from .transform import Transformable
from .utils import stream_to_path
LOGGER = logging.getLogger(__name__)
PRETTY = PrettyLogger(LOGGER)
@dataclass
class IpdDownloadInfo(Transformable):
"""
Information about an ipd entry.
"""
url: str
modification_date: Optional[datetime.datetime]
IpdDownloadStrategy = Callable[[Organizer, IpdDownloadInfo], bool]
def ipd_download_new_or_modified(organizer: Organizer, info: IpdDownloadInfo) -> bool:
"""
Accepts new files or files with a more recent modification date.
"""
resolved_file = organizer.resolve(info.path)
if not resolved_file.exists():
return True
if not info.modification_date:
PRETTY.ignored_file(info.path, "could not find modification time, file exists")
return False
resolved_mod_time_seconds = resolved_file.stat().st_mtime
# Download if the info is newer
if info.modification_date.timestamp() > resolved_mod_time_seconds:
return True
PRETTY.ignored_file(info.path, "local file has newer or equal modification time")
return False
class IpdCrawler:
# pylint: disable=too-few-public-methods
"""
A crawler for IPD pages.
"""
def __init__(self, base_url: str):
self._base_url = base_url
def _abs_url_from_link(self, link_tag: bs4.Tag) -> str:
"""
Create an absolute url from an <a> tag.
"""
return urljoin(self._base_url, link_tag.get("href"))
def crawl(self) -> List[IpdDownloadInfo]:
"""
Crawls the playlist given in the constructor.
"""
page = soupify(requests.get(self._base_url))
items: List[IpdDownloadInfo] = []
for link in page.findAll(name="a", attrs={"href": lambda x: x and x.endswith("pdf")}):
href: str = link.attrs.get("href")
name = href.split("/")[-1]
modification_date: Optional[datetime.datetime] = None
try:
enclosing_row: bs4.Tag = link.findParent(name="tr")
if enclosing_row:
date_text = enclosing_row.find(name="td").text
modification_date = datetime.datetime.strptime(date_text, "%d.%m.%Y")
except ValueError:
modification_date = None
items.append(IpdDownloadInfo(
Path(name),
url=self._abs_url_from_link(link),
modification_date=modification_date
))
return items
class IpdDownloader:
"""
A downloader for ipd files.
"""
def __init__(self, tmp_dir: TmpDir, organizer: Organizer, strategy: IpdDownloadStrategy):
self._tmp_dir = tmp_dir
self._organizer = organizer
self._strategy = strategy
self._session = requests.session()
def download_all(self, infos: List[IpdDownloadInfo]) -> None:
"""
Download multiple files one after the other.
"""
for info in infos:
self.download(info)
def download(self, info: IpdDownloadInfo) -> None:
"""
Download a single file.
"""
if not self._strategy(self._organizer, info):
self._organizer.mark(info.path)
return
with self._session.get(info.url, stream=True) as response:
if response.status_code == 200:
tmp_file = self._tmp_dir.new_path()
stream_to_path(response, tmp_file, info.path.name)
dst_path = self._organizer.accept_file(tmp_file, info.path)
if dst_path and info.modification_date:
os.utime(
dst_path,
times=(
math.ceil(info.modification_date.timestamp()),
math.ceil(info.modification_date.timestamp())
)
)
elif response.status_code == 403:
raise FatalException("Received 403. Are you not using the KIT VPN?")
else:
PRETTY.warning(f"Could not download file, got response {response.status_code}")

View File

@ -3,14 +3,18 @@ Contains a few logger utility functions and implementations.
""" """
import logging import logging
from typing import Optional from pathlib import Path
from typing import List, Optional
from rich import print as rich_print
from rich._log_render import LogRender from rich._log_render import LogRender
from rich.console import Console from rich.console import Console
from rich.panel import Panel
from rich.style import Style from rich.style import Style
from rich.text import Text from rich.text import Text
from rich.theme import Theme from rich.theme import Theme
from .download_summary import DownloadSummary
from .utils import PathLike, to_path from .utils import PathLike, to_path
STYLE = "{" STYLE = "{"
@ -111,6 +115,15 @@ class PrettyLogger:
f"[bold green]Created {self._format_path(path)}.[/bold green]" f"[bold green]Created {self._format_path(path)}.[/bold green]"
) )
def deleted_file(self, path: PathLike) -> None:
"""
A file has been deleted.
"""
self.logger.info(
f"[bold red]Deleted {self._format_path(path)}.[/bold red]"
)
def ignored_file(self, path: PathLike, reason: str) -> None: def ignored_file(self, path: PathLike, reason: str) -> None:
""" """
File was not downloaded or modified. File was not downloaded or modified.
@ -138,6 +151,23 @@ class PrettyLogger:
f"([/dim]{reason}[dim]).[/dim]" f"([/dim]{reason}[dim]).[/dim]"
) )
def summary(self, download_summary: DownloadSummary) -> None:
"""
Prints a download summary.
"""
self.logger.info("")
self.logger.info("[bold cyan]Download Summary[/bold cyan]")
if not download_summary.has_updates():
self.logger.info("[bold dim]Nothing changed![/bold dim]")
return
for new_file in download_summary.new_files:
self.new_file(new_file)
for modified_file in download_summary.modified_files:
self.modified_file(modified_file)
for deleted_files in download_summary.deleted_files:
self.deleted_file(deleted_files)
def starting_synchronizer( def starting_synchronizer(
self, self,
target_directory: PathLike, target_directory: PathLike,

View File

@ -5,10 +5,13 @@ A organizer is bound to a single directory.
import filecmp import filecmp
import logging import logging
import os
import shutil import shutil
from enum import Enum
from pathlib import Path, PurePath from pathlib import Path, PurePath
from typing import List, Set from typing import Callable, List, Optional, Set
from .download_summary import DownloadSummary
from .location import Location from .location import Location
from .logging import PrettyLogger from .logging import PrettyLogger
from .utils import prompt_yes_no from .utils import prompt_yes_no
@ -17,6 +20,51 @@ LOGGER = logging.getLogger(__name__)
PRETTY = PrettyLogger(LOGGER) PRETTY = PrettyLogger(LOGGER)
class ConflictType(Enum):
"""
The type of the conflict. A file might not exist anymore and will be deleted
or it might be overwritten with a newer version.
FILE_OVERWRITTEN: An existing file will be updated
MARKED_FILE_OVERWRITTEN: A file is written for the second+ time in this run
FILE_DELETED: The file was deleted
"""
FILE_OVERWRITTEN = "overwritten"
MARKED_FILE_OVERWRITTEN = "marked_file_overwritten"
FILE_DELETED = "deleted"
class FileConflictResolution(Enum):
"""
The reaction when confronted with a file conflict:
DESTROY_EXISTING: Delete/overwrite the current file
KEEP_EXISTING: Keep the current file
DEFAULT: Do whatever the PFERD authors thought is sensible
PROMPT: Interactively ask the user
"""
DESTROY_EXISTING = "destroy"
KEEP_EXISTING = "keep"
DEFAULT = "default"
PROMPT = "prompt"
FileConflictResolver = Callable[[PurePath, ConflictType], FileConflictResolution]
def resolve_prompt_user(_path: PurePath, conflict: ConflictType) -> FileConflictResolution:
"""
Resolves conflicts by asking the user if a file was written twice or will be deleted.
"""
if conflict == ConflictType.FILE_OVERWRITTEN:
return FileConflictResolution.DESTROY_EXISTING
return FileConflictResolution.PROMPT
class FileAcceptException(Exception): class FileAcceptException(Exception):
"""An exception while accepting a file.""" """An exception while accepting a file."""
@ -24,7 +72,7 @@ class FileAcceptException(Exception):
class Organizer(Location): class Organizer(Location):
"""A helper for managing downloaded files.""" """A helper for managing downloaded files."""
def __init__(self, path: Path): def __init__(self, path: Path, conflict_resolver: FileConflictResolver = resolve_prompt_user):
"""Create a new organizer for a given path.""" """Create a new organizer for a given path."""
super().__init__(path) super().__init__(path)
self._known_files: Set[Path] = set() self._known_files: Set[Path] = set()
@ -32,10 +80,30 @@ class Organizer(Location):
# Keep the root dir # Keep the root dir
self._known_files.add(path.resolve()) self._known_files.add(path.resolve())
def accept_file(self, src: Path, dst: PurePath) -> None: self.download_summary = DownloadSummary()
"""Move a file to this organizer and mark it."""
src_absolute = src.resolve() self.conflict_resolver = conflict_resolver
dst_absolute = self.resolve(dst)
def accept_file(self, src: Path, dst: PurePath) -> Optional[Path]:
"""
Move a file to this organizer and mark it.
Returns the path the file was moved to, to allow the caller to adjust the metadata.
As you might still need to adjust the metadata when the file was identical
(e.g. update the timestamp), the path is also returned in this case.
In all other cases (ignored, not overwritten, etc.) this method returns None.
"""
# Windows limits the path length to 260 for *some* historical reason
# If you want longer paths, you will have to add the "\\?\" prefix in front of
# your path...
# See:
# https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file#maximum-path-length-limitation
if os.name == 'nt':
src_absolute = Path("\\\\?\\" + str(src.resolve()))
dst_absolute = Path("\\\\?\\" + str(self.resolve(dst)))
else:
src_absolute = src.resolve()
dst_absolute = self.resolve(dst)
if not src_absolute.exists(): if not src_absolute.exists():
raise FileAcceptException("Source file does not exist") raise FileAcceptException("Source file does not exist")
@ -47,17 +115,20 @@ class Organizer(Location):
if self._is_marked(dst): if self._is_marked(dst):
PRETTY.warning(f"File {str(dst_absolute)!r} was already written!") PRETTY.warning(f"File {str(dst_absolute)!r} was already written!")
if not prompt_yes_no(f"Overwrite file?", default=False): conflict = ConflictType.MARKED_FILE_OVERWRITTEN
if self._resolve_conflict(f"Overwrite file?", dst_absolute, conflict, default=False):
PRETTY.ignored_file(dst_absolute, "file was written previously") PRETTY.ignored_file(dst_absolute, "file was written previously")
return return None
# Destination file is directory # Destination file is directory
if dst_absolute.exists() and dst_absolute.is_dir(): if dst_absolute.exists() and dst_absolute.is_dir():
if prompt_yes_no(f"Overwrite folder {dst_absolute} with file?", default=False): prompt = f"Overwrite folder {dst_absolute} with file?"
conflict = ConflictType.FILE_OVERWRITTEN
if self._resolve_conflict(prompt, dst_absolute, conflict, default=False):
shutil.rmtree(dst_absolute) shutil.rmtree(dst_absolute)
else: else:
PRETTY.warning(f"Could not add file {str(dst_absolute)!r}") PRETTY.warning(f"Could not add file {str(dst_absolute)!r}")
return return None
# Destination file exists # Destination file exists
if dst_absolute.exists() and dst_absolute.is_file(): if dst_absolute.exists() and dst_absolute.is_file():
@ -65,10 +136,18 @@ class Organizer(Location):
# Bail out, nothing more to do # Bail out, nothing more to do
PRETTY.ignored_file(dst_absolute, "same file contents") PRETTY.ignored_file(dst_absolute, "same file contents")
self.mark(dst) self.mark(dst)
return return dst_absolute
prompt = f"Overwrite file {dst_absolute}?"
conflict = ConflictType.FILE_OVERWRITTEN
if not self._resolve_conflict(prompt, dst_absolute, conflict, default=True):
PRETTY.ignored_file(dst_absolute, "user conflict resolution")
return None
self.download_summary.add_modified_file(dst_absolute)
PRETTY.modified_file(dst_absolute) PRETTY.modified_file(dst_absolute)
else: else:
self.download_summary.add_new_file(dst_absolute)
PRETTY.new_file(dst_absolute) PRETTY.new_file(dst_absolute)
# Create parent dir if needed # Create parent dir if needed
@ -80,6 +159,8 @@ class Organizer(Location):
self.mark(dst) self.mark(dst)
return dst_absolute
def mark(self, path: PurePath) -> None: def mark(self, path: PurePath) -> None:
"""Mark a file as used so it will not get cleaned up.""" """Mark a file as used so it will not get cleaned up."""
absolute_path = self.resolve(path) absolute_path = self.resolve(path)
@ -100,6 +181,8 @@ class Organizer(Location):
self._cleanup(self.path) self._cleanup(self.path)
def _cleanup(self, start_dir: Path) -> None: def _cleanup(self, start_dir: Path) -> None:
if not start_dir.exists():
return
paths: List[Path] = list(start_dir.iterdir()) paths: List[Path] = list(start_dir.iterdir())
# Recursively clean paths # Recursively clean paths
@ -115,9 +198,27 @@ class Organizer(Location):
if start_dir.resolve() not in self._known_files and dir_empty: if start_dir.resolve() not in self._known_files and dir_empty:
start_dir.rmdir() start_dir.rmdir()
@staticmethod def _delete_file_if_confirmed(self, path: Path) -> None:
def _delete_file_if_confirmed(path: Path) -> None:
prompt = f"Do you want to delete {path}" prompt = f"Do you want to delete {path}"
if prompt_yes_no(prompt, False): if self._resolve_conflict(prompt, path, ConflictType.FILE_DELETED, default=False):
self.download_summary.add_deleted_file(path)
path.unlink() path.unlink()
else:
PRETTY.ignored_file(path, "user conflict resolution")
def _resolve_conflict(
self, prompt: str, path: Path, conflict: ConflictType, default: bool
) -> bool:
if not self.conflict_resolver:
return prompt_yes_no(prompt, default=default)
result = self.conflict_resolver(path, conflict)
if result == FileConflictResolution.DEFAULT:
return default
if result == FileConflictResolution.KEEP_EXISTING:
return False
if result == FileConflictResolution.DESTROY_EXISTING:
return True
return prompt_yes_no(prompt, default=default)

View File

@ -6,16 +6,20 @@ import logging
from pathlib import Path from pathlib import Path
from typing import Callable, List, Optional, Union from typing import Callable, List, Optional, Union
from .authenticators import UserPassAuthenticator
from .cookie_jar import CookieJar from .cookie_jar import CookieJar
from .diva import (DivaDownloader, DivaDownloadStrategy, DivaPlaylistCrawler, from .diva import (DivaDownloader, DivaDownloadStrategy, DivaPlaylistCrawler,
diva_download_new) diva_download_new)
from .download_summary import DownloadSummary
from .errors import FatalException, swallow_and_print_errors from .errors import FatalException, swallow_and_print_errors
from .ilias import (IliasAuthenticator, IliasCrawler, IliasDirectoryFilter, from .ilias import (IliasAuthenticator, IliasCrawler, IliasDirectoryFilter,
IliasDownloader, IliasDownloadInfo, IliasDownloadStrategy, IliasDownloader, IliasDownloadInfo, IliasDownloadStrategy,
KitShibbolethAuthenticator, download_modified_or_new) KitShibbolethAuthenticator, download_modified_or_new)
from .ipd import (IpdCrawler, IpdDownloader, IpdDownloadInfo,
IpdDownloadStrategy, ipd_download_new_or_modified)
from .location import Location from .location import Location
from .logging import PrettyLogger, enable_logging from .logging import PrettyLogger, enable_logging
from .organizer import Organizer from .organizer import FileConflictResolver, Organizer, resolve_prompt_user
from .tmp_dir import TmpDir from .tmp_dir import TmpDir
from .transform import TF, Transform, apply_transform from .transform import TF, Transform, apply_transform
from .utils import PathLike, to_path from .utils import PathLike, to_path
@ -42,6 +46,7 @@ class Pferd(Location):
): ):
super().__init__(Path(base_dir)) super().__init__(Path(base_dir))
self._download_summary = DownloadSummary()
self._tmp_dir = TmpDir(self.resolve(tmp_dir)) self._tmp_dir = TmpDir(self.resolve(tmp_dir))
self._test_run = test_run self._test_run = test_run
@ -60,6 +65,13 @@ class Pferd(Location):
for transformable in transformables: for transformable in transformables:
LOGGER.info(transformable.path) LOGGER.info(transformable.path)
@staticmethod
def _get_authenticator(
username: Optional[str], password: Optional[str]
) -> KitShibbolethAuthenticator:
inner_auth = UserPassAuthenticator("ILIAS - Pferd.py", username, password)
return KitShibbolethAuthenticator(inner_auth)
def _ilias( def _ilias(
self, self,
target: PathLike, target: PathLike,
@ -70,16 +82,19 @@ class Pferd(Location):
dir_filter: IliasDirectoryFilter, dir_filter: IliasDirectoryFilter,
transform: Transform, transform: Transform,
download_strategy: IliasDownloadStrategy, download_strategy: IliasDownloadStrategy,
clean: bool = True timeout: int,
clean: bool = True,
file_conflict_resolver: FileConflictResolver = resolve_prompt_user
) -> Organizer: ) -> Organizer:
# pylint: disable=too-many-locals # pylint: disable=too-many-locals
cookie_jar = CookieJar(to_path(cookies) if cookies else None) cookie_jar = CookieJar(to_path(cookies) if cookies else None)
session = cookie_jar.create_session() session = cookie_jar.create_session()
tmp_dir = self._tmp_dir.new_subdir() tmp_dir = self._tmp_dir.new_subdir()
organizer = Organizer(self.resolve(to_path(target))) organizer = Organizer(self.resolve(to_path(target)), file_conflict_resolver)
crawler = IliasCrawler(base_url, session, authenticator, dir_filter) crawler = IliasCrawler(base_url, session, authenticator, dir_filter)
downloader = IliasDownloader(tmp_dir, organizer, session, authenticator, download_strategy) downloader = IliasDownloader(tmp_dir, organizer, session,
authenticator, download_strategy, timeout)
cookie_jar.load_cookies() cookie_jar.load_cookies()
info = crawl_function(crawler) info = crawl_function(crawler)
@ -110,6 +125,8 @@ class Pferd(Location):
password: Optional[str] = None, password: Optional[str] = None,
download_strategy: IliasDownloadStrategy = download_modified_or_new, download_strategy: IliasDownloadStrategy = download_modified_or_new,
clean: bool = True, clean: bool = True,
timeout: int = 5,
file_conflict_resolver: FileConflictResolver = resolve_prompt_user
) -> Organizer: ) -> Organizer:
""" """
Synchronizes a folder with the ILIAS instance of the KIT. Synchronizes a folder with the ILIAS instance of the KIT.
@ -135,11 +152,16 @@ class Pferd(Location):
be downloaded. Can save bandwidth and reduce the number of requests. be downloaded. Can save bandwidth and reduce the number of requests.
(default: {download_modified_or_new}) (default: {download_modified_or_new})
clean {bool} -- Whether to clean up when the method finishes. clean {bool} -- Whether to clean up when the method finishes.
timeout {int} -- The download timeout for opencast videos. Sadly needed due to a
requests bug.
file_conflict_resolver {FileConflictResolver} -- A function specifying how to deal
with overwriting or deleting files. The default always asks the user.
""" """
# This authenticator only works with the KIT ilias instance. # This authenticator only works with the KIT ilias instance.
authenticator = KitShibbolethAuthenticator(username=username, password=password) authenticator = Pferd._get_authenticator(username=username, password=password)
PRETTY.starting_synchronizer(target, "ILIAS", course_id) PRETTY.starting_synchronizer(target, "ILIAS", course_id)
return self._ilias(
organizer = self._ilias(
target=target, target=target,
base_url="https://ilias.studium.kit.edu/", base_url="https://ilias.studium.kit.edu/",
crawl_function=lambda crawler: crawler.crawl_course(course_id), crawl_function=lambda crawler: crawler.crawl_course(course_id),
@ -149,8 +171,20 @@ class Pferd(Location):
transform=transform, transform=transform,
download_strategy=download_strategy, download_strategy=download_strategy,
clean=clean, clean=clean,
timeout=timeout,
file_conflict_resolver=file_conflict_resolver
) )
self._download_summary.merge(organizer.download_summary)
return organizer
def print_summary(self) -> None:
"""
Prints the accumulated download summary.
"""
PRETTY.summary(self._download_summary)
@swallow_and_print_errors @swallow_and_print_errors
def ilias_kit_personal_desktop( def ilias_kit_personal_desktop(
self, self,
@ -162,6 +196,8 @@ class Pferd(Location):
password: Optional[str] = None, password: Optional[str] = None,
download_strategy: IliasDownloadStrategy = download_modified_or_new, download_strategy: IliasDownloadStrategy = download_modified_or_new,
clean: bool = True, clean: bool = True,
timeout: int = 5,
file_conflict_resolver: FileConflictResolver = resolve_prompt_user
) -> Organizer: ) -> Organizer:
""" """
Synchronizes a folder with the ILIAS instance of the KIT. This method will crawl the ILIAS Synchronizes a folder with the ILIAS instance of the KIT. This method will crawl the ILIAS
@ -186,11 +222,16 @@ class Pferd(Location):
be downloaded. Can save bandwidth and reduce the number of requests. be downloaded. Can save bandwidth and reduce the number of requests.
(default: {download_modified_or_new}) (default: {download_modified_or_new})
clean {bool} -- Whether to clean up when the method finishes. clean {bool} -- Whether to clean up when the method finishes.
timeout {int} -- The download timeout for opencast videos. Sadly needed due to a
requests bug.
file_conflict_resolver {FileConflictResolver} -- A function specifying how to deal
with overwriting or deleting files. The default always asks the user.
""" """
# This authenticator only works with the KIT ilias instance. # This authenticator only works with the KIT ilias instance.
authenticator = KitShibbolethAuthenticator(username=username, password=password) authenticator = Pferd._get_authenticator(username, password)
PRETTY.starting_synchronizer(target, "ILIAS", "Personal Desktop") PRETTY.starting_synchronizer(target, "ILIAS", "Personal Desktop")
return self._ilias(
organizer = self._ilias(
target=target, target=target,
base_url="https://ilias.studium.kit.edu/", base_url="https://ilias.studium.kit.edu/",
crawl_function=lambda crawler: crawler.crawl_personal_desktop(), crawl_function=lambda crawler: crawler.crawl_personal_desktop(),
@ -200,8 +241,139 @@ class Pferd(Location):
transform=transform, transform=transform,
download_strategy=download_strategy, download_strategy=download_strategy,
clean=clean, clean=clean,
timeout=timeout,
file_conflict_resolver=file_conflict_resolver
) )
self._download_summary.merge(organizer.download_summary)
return organizer
@swallow_and_print_errors
def ilias_kit_folder(
self,
target: PathLike,
full_url: str,
dir_filter: IliasDirectoryFilter = lambda x, y: True,
transform: Transform = lambda x: x,
cookies: Optional[PathLike] = None,
username: Optional[str] = None,
password: Optional[str] = None,
download_strategy: IliasDownloadStrategy = download_modified_or_new,
clean: bool = True,
timeout: int = 5,
file_conflict_resolver: FileConflictResolver = resolve_prompt_user
) -> Organizer:
"""
Synchronizes a folder with a given folder on the ILIAS instance of the KIT.
Arguments:
target {Path} -- the target path to write the data to
full_url {str} -- the full url of the folder/videos/course to crawl
Keyword Arguments:
dir_filter {IliasDirectoryFilter} -- A filter for directories. Will be applied on the
crawler level, these directories and all of their content is skipped.
(default: {lambdax:True})
transform {Transform} -- A transformation function for the output paths. Return None
to ignore a file. (default: {lambdax:x})
cookies {Optional[Path]} -- The path to store and load cookies from.
(default: {None})
username {Optional[str]} -- The SCC username. If none is given, it will prompt
the user. (default: {None})
password {Optional[str]} -- The SCC password. If none is given, it will prompt
the user. (default: {None})
download_strategy {DownloadStrategy} -- A function to determine which files need to
be downloaded. Can save bandwidth and reduce the number of requests.
(default: {download_modified_or_new})
clean {bool} -- Whether to clean up when the method finishes.
timeout {int} -- The download timeout for opencast videos. Sadly needed due to a
requests bug.
file_conflict_resolver {FileConflictResolver} -- A function specifying how to deal
with overwriting or deleting files. The default always asks the user.
"""
# This authenticator only works with the KIT ilias instance.
authenticator = Pferd._get_authenticator(username=username, password=password)
PRETTY.starting_synchronizer(target, "ILIAS", "An ILIAS element by url")
if not full_url.startswith("https://ilias.studium.kit.edu"):
raise FatalException("Not a valid KIT ILIAS URL")
organizer = self._ilias(
target=target,
base_url="https://ilias.studium.kit.edu/",
crawl_function=lambda crawler: crawler.recursive_crawl_url(full_url),
authenticator=authenticator,
cookies=cookies,
dir_filter=dir_filter,
transform=transform,
download_strategy=download_strategy,
clean=clean,
timeout=timeout,
file_conflict_resolver=file_conflict_resolver
)
self._download_summary.merge(organizer.download_summary)
return organizer
@swallow_and_print_errors
def ipd_kit(
self,
target: Union[PathLike, Organizer],
url: str,
transform: Transform = lambda x: x,
download_strategy: IpdDownloadStrategy = ipd_download_new_or_modified,
clean: bool = True,
file_conflict_resolver: FileConflictResolver = resolve_prompt_user
) -> Organizer:
"""
Synchronizes a folder with a DIVA playlist.
Arguments:
target {Union[PathLike, Organizer]} -- The organizer / target folder to use.
url {str} -- the url to the page
Keyword Arguments:
transform {Transform} -- A transformation function for the output paths. Return None
to ignore a file. (default: {lambdax:x})
download_strategy {DivaDownloadStrategy} -- A function to determine which files need to
be downloaded. Can save bandwidth and reduce the number of requests.
(default: {diva_download_new})
clean {bool} -- Whether to clean up when the method finishes.
file_conflict_resolver {FileConflictResolver} -- A function specifying how to deal
with overwriting or deleting files. The default always asks the user.
"""
tmp_dir = self._tmp_dir.new_subdir()
if target is None:
PRETTY.starting_synchronizer("None", "IPD", url)
raise FatalException("Got 'None' as target directory, aborting")
if isinstance(target, Organizer):
organizer = target
else:
organizer = Organizer(self.resolve(to_path(target)), file_conflict_resolver)
PRETTY.starting_synchronizer(organizer.path, "IPD", url)
elements: List[IpdDownloadInfo] = IpdCrawler(url).crawl()
transformed = apply_transform(transform, elements)
if self._test_run:
self._print_transformables(transformed)
return organizer
downloader = IpdDownloader(tmp_dir=tmp_dir, organizer=organizer, strategy=download_strategy)
downloader.download_all(transformed)
if clean:
organizer.cleanup()
self._download_summary.merge(organizer.download_summary)
return organizer
@swallow_and_print_errors @swallow_and_print_errors
def diva_kit( def diva_kit(
self, self,
@ -209,7 +381,8 @@ class Pferd(Location):
playlist_location: str, playlist_location: str,
transform: Transform = lambda x: x, transform: Transform = lambda x: x,
download_strategy: DivaDownloadStrategy = diva_download_new, download_strategy: DivaDownloadStrategy = diva_download_new,
clean: bool = True clean: bool = True,
file_conflict_resolver: FileConflictResolver = resolve_prompt_user
) -> Organizer: ) -> Organizer:
""" """
Synchronizes a folder with a DIVA playlist. Synchronizes a folder with a DIVA playlist.
@ -226,6 +399,8 @@ class Pferd(Location):
be downloaded. Can save bandwidth and reduce the number of requests. be downloaded. Can save bandwidth and reduce the number of requests.
(default: {diva_download_new}) (default: {diva_download_new})
clean {bool} -- Whether to clean up when the method finishes. clean {bool} -- Whether to clean up when the method finishes.
file_conflict_resolver {FileConflictResolver} -- A function specifying how to deal
with overwriting or deleting files. The default always asks the user.
""" """
tmp_dir = self._tmp_dir.new_subdir() tmp_dir = self._tmp_dir.new_subdir()
@ -241,7 +416,7 @@ class Pferd(Location):
if isinstance(target, Organizer): if isinstance(target, Organizer):
organizer = target organizer = target
else: else:
organizer = Organizer(self.resolve(to_path(target))) organizer = Organizer(self.resolve(to_path(target)), file_conflict_resolver)
PRETTY.starting_synchronizer(organizer.path, "DIVA", playlist_id) PRETTY.starting_synchronizer(organizer.path, "DIVA", playlist_id)
@ -260,4 +435,6 @@ class Pferd(Location):
if clean: if clean:
organizer.cleanup() organizer.cleanup()
self._download_summary.merge(organizer.download_summary)
return organizer return organizer

View File

@ -7,8 +7,7 @@ from types import TracebackType
from typing import Optional, Type from typing import Optional, Type
import requests import requests
from rich.console import Console, ConsoleOptions, Control, RenderResult from rich.console import Console
from rich.live_render import LiveRender
from rich.progress import (BarColumn, DownloadColumn, Progress, TaskID, from rich.progress import (BarColumn, DownloadColumn, Progress, TaskID,
TextColumn, TimeRemainingColumn, TextColumn, TimeRemainingColumn,
TransferSpeedColumn) TransferSpeedColumn)
@ -23,7 +22,8 @@ _progress: Progress = Progress(
TransferSpeedColumn(), TransferSpeedColumn(),
"", "",
TimeRemainingColumn(), TimeRemainingColumn(),
console=Console(file=sys.stdout) console=Console(file=sys.stdout),
transient=True
) )
@ -61,18 +61,6 @@ def progress_for(settings: Optional[ProgressSettings]) -> 'ProgressContextManage
return ProgressContextManager(settings) return ProgressContextManager(settings)
class _OneLineUp(LiveRender):
"""
Render a control code for moving one line upwards.
"""
def __init__(self) -> None:
super().__init__("not rendered")
def __console__(self, console: Console, options: ConsoleOptions) -> RenderResult:
yield Control(f"\r\x1b[1A")
class ProgressContextManager: class ProgressContextManager:
""" """
A context manager used for displaying progress. A context manager used for displaying progress.
@ -113,9 +101,6 @@ class ProgressContextManager:
_progress.stop() _progress.stop()
_progress.refresh() _progress.refresh()
# And we existed, so remove the line above (remove_task leaves one behind)
Console().print(_OneLineUp())
return None return None
def advance(self, amount: float) -> None: def advance(self, amount: float) -> None:

View File

@ -5,6 +5,8 @@ only files whose names match a regex, or renaming files from one numbering
scheme to another. scheme to another.
""" """
import os
import re
from dataclasses import dataclass from dataclasses import dataclass
from pathlib import PurePath from pathlib import PurePath
from typing import Callable, List, Optional, TypeVar from typing import Callable, List, Optional, TypeVar
@ -45,7 +47,8 @@ def apply_transform(
# Transform combinators # Transform combinators
keep = lambda path: path def keep(path: PurePath) -> Optional[PurePath]:
return path
def attempt(*args: Transform) -> Transform: def attempt(*args: Transform) -> Transform:
def inner(path: PurePath) -> Optional[PurePath]: def inner(path: PurePath) -> Optional[PurePath]:
@ -125,3 +128,15 @@ def re_rename(regex: Regex, target: str) -> Transform:
return path.with_name(target.format(*groups)) return path.with_name(target.format(*groups))
return None return None
return inner return inner
def sanitize_windows_path(path: PurePath) -> Optional[PurePath]:
"""
A small function to escape characters that are forbidden in windows path names.
This method is a no-op on other operating systems.
"""
# Escape windows illegal path characters
if os.name == 'nt':
sanitized_parts = [re.sub(r'[<>:"/|?]', "_", x) for x in list(path.parts)]
return PurePath(*sanitized_parts)
return path

216
README.md
View File

@ -2,35 +2,66 @@
**P**rogramm zum **F**lotten, **E**infachen **R**unterladen von **D**ateien **P**rogramm zum **F**lotten, **E**infachen **R**unterladen von **D**ateien
- [Quickstart with `sync_url`](#quickstart-with-sync_url)
- [Installation](#installation)
- [Upgrading from 2.0.0 to 2.1.0+](#upgrading-from-200-to-210)
- [Example setup](#example-setup)
- [Usage](#usage)
- [General concepts](#general-concepts)
- [Constructing transforms](#constructing-transforms)
- [Transform creators](#transform-creators)
- [Transform combinators](#transform-combinators)
- [A short, but commented example](#a-short-but-commented-example)
## Quickstart with `sync_url`
The `sync_url` program allows you to just synchronize a given ILIAS URL (of a
course, a folder, your personal desktop, etc.) without any extra configuration
or setting up. Download the program, open ILIAS, copy the URL from the address
bar and pass it to sync_url.
It bundles everything it needs in one executable and is easy to
use, but doesn't expose all the configuration options and tweaks a full install
does.
1. Download the `sync_url` binary from the [latest release](https://github.com/Garmelon/PFERD/releases/latest).
2. Recognize that you most likely need to enclose the URL in `""` quotes to prevent your shell from interpreting `&` and other symbols
3. Run the binary in your terminal (`./sync_url` or `sync_url.exe` in the CMD) to see the help and use it. I'd recommend using the `--cookies` option.
If you are on **Linux/Mac**, you need to *make the file executable* using `chmod +x <file>`.
If you are on **Mac**, you need to allow this unverified program to run (see e.g. [here](https://www.switchingtomac.com/tutorials/osx/how-to-run-unverified-apps-on-macos/))
## Installation ## Installation
Ensure that you have at least Python 3.8 installed. Ensure that you have at least Python 3.8 installed.
To install PFERD or update your installation to the latest version, run this To install PFERD or update your installation to the latest version, run this
wherever you want to install/have installed PFERD: wherever you want to install or have already installed PFERD:
``` ```
$ pip install git+https://github.com/Garmelon/PFERD@v2.0.0 $ pip install git+https://github.com/Garmelon/PFERD@v2.5.0
``` ```
The use of [venv](https://docs.python.org/3/library/venv.html) is recommended. The use of [venv] is recommended.
[venv]: https://docs.python.org/3/library/venv.html
### Upgrading from 2.0.0 to 2.1.0+
- The `IliasDirectoryType` type was renamed to `IliasElementType` and is now far more detailed.
The new values are: `REGULAR_FOLDER`, `VIDEO_FOLDER`, `EXERCISE_FOLDER`, `REGULAR_FILE`, `VIDEO_FILE`, `FORUM`, `EXTERNAL_LINK`.
- Forums and external links are skipped automatically if you use the `kit_ilias` helper.
## Example setup ## Example setup
In this example, `python3` refers to at least Python 3.8. In this example, `python3` refers to at least Python 3.8.
If you just want to get started and crawl *your entire ILIAS Desktop* instead
of a given set of courses, please replace `example_config.py` with
`example_config_personal_desktop.py` in all of the instructions below (`curl` call and
`python3` run command).
A full example setup and initial use could look like: A full example setup and initial use could look like:
``` ```
$ mkdir Vorlesungen $ mkdir Vorlesungen
$ cd Vorlesungen $ cd Vorlesungen
$ python3 -m venv .venv $ python3 -m venv .venv
$ .venv/bin/activate $ .venv/bin/activate
$ pip install git+https://github.com/Garmelon/PFERD@v2.0.0 $ pip install git+https://github.com/Garmelon/PFERD@v2.5.0
$ curl -O https://raw.githubusercontent.com/Garmelon/PFERD/v2.0.0/example_config.py $ curl -O https://raw.githubusercontent.com/Garmelon/PFERD/v2.5.0/example_config.py
$ python3 example_config.py $ python3 example_config.py
$ deactivate $ deactivate
``` ```
@ -43,50 +74,93 @@ $ python3 example_config.py
$ deactivate $ deactivate
``` ```
If you just want to get started and crawl *your entire ILIAS Desktop* instead
of a given set of courses, please replace `example_config.py` with
`example_config_personal_desktop.py` in all of the instructions below (`curl` call and
`python3` run command).
## Usage ## Usage
### General concepts
A PFERD config is a normal python file that starts multiple *synchronizers* A PFERD config is a normal python file that starts multiple *synchronizers*
which do all the heavy lifting. While you can create and wire them up manually, which do all the heavy lifting. While you can create and wire them up manually,
you are encouraged to use the helper methods provided in `PFERD.Pferd`. you are encouraged to use the helper methods provided in `PFERD.Pferd`.
The synchronizers take some input arguments specific to their service and a The synchronizers take some input arguments specific to their service and a
*transformer*. The transformer receives the computed path of an element in *transform*. The transform receives the computed path of an element in ILIAS and
ILIAS and can return either an output path (so you can rename files or move can return either an output path (so you can rename files or move them around as
them around as you wish) or `None` if you do not want to save the given file. you wish) or `None` if you do not want to save the given file.
Additionally the ILIAS synchronizer allows you to define a *crawl filter*. This Additionally the ILIAS synchronizer allows you to define a *crawl filter*. This
filter also receives the computed path as the input, but is only called or filter also receives the computed path as the input, but is only called for
*directoties*. If you return `True`, the directory will be crawled and *directories*. If you return `True`, the directory will be crawled and
searched. If you return `False` the directory will be ignored and nothing in it searched. If you return `False` the directory will be ignored and nothing in it
will be passed to the transformer. will be passed to the transform.
In order to help you with writing your own transformers and filters, PFERD ### Constructing transforms
ships with a few powerful building blocks:
| Method | Description | While transforms are just normal python functions, writing them by hand can
|--------|-------------| quickly become tedious. In order to help you with writing your own transforms
| `glob` | Returns a transform that returns `None` if the glob does not match and the unmodified path otherwise. | and filters, PFERD defines a few useful transform creators and combinators in
| `predicate` | Returns a transform that returns `None` if the predicate does not match the path and the unmodified path otherwise. | the `PFERD.transform` module:
| `move_dir(source, target)` | Returns a transform that moves all files from the `source` to the `target` dir. |
| `move(source, target)` | Returns a transform that moves the `source` file to `target`. |
| `rename(old, new)` | Renames a single file. |
| `re_move(regex, sub)` | Moves all files matching the given regular expression. The different captured groups are available under their index and can be used together with normal python format methods: `re_move(r"Blatt (\d+)\.pdf", "Blätter/Blatt_{1:0>2}.pdf"),`. |
| `re_rename(old, new)` | Same as `re_move` but operates on the path *names* instead of the full path. |
And PFERD also offers a few combinator functions: #### Transform creators
* **`keep`** These methods let you create a few basic transform building blocks:
`keep` just returns the input path unchanged. It can be very useful as the
last argument in an `attempt` call, to leave everything not matching a rule - **`glob(glob)`**
unchanged. Creates a transform that returns the unchanged path if the glob matches the path and `None` otherwise.
* **`optionally(transformer)`** See also [Path.match].
Wraps a given transformer and returns its result if it is not `None`. Example: `glob("Übung/*.pdf")`
- **`predicate(pred)`**
Creates a transform that returns the unchanged path if `pred(path)` returns a truthy value.
Returns `None` otherwise.
Example: `predicate(lambda path: len(path.parts) == 3)`
- **`move_dir(source, target)`**
Creates a transform that moves all files from the `source` to the `target` directory.
Example: `move_dir("Übung/", "Blätter/")`
- **`move(source, target)`**
Creates a transform that moves the `source` file to `target`.
Example: `move("Vorlesung/VL02_Automten.pdf", "Vorlesung/VL02_Automaten.pdf")`
- **`rename(source, target)`**
Creates a transform that renames all files named `source` to `target`.
This transform works on the file names, not paths, and thus works no matter where the file is located.
Example: `rename("VL02_Automten.pdf", "VL02_Automaten.pdf")`
- **`re_move(regex, target)`**
Creates a transform that moves all files matching `regex` to `target`.
The transform `str.format` on the `target` string with the contents of the capturing groups before returning it.
The capturing groups can be accessed via their index.
See also [Match.group].
Example: `re_move(r"Übung/Blatt (\d+)\.pdf", "Blätter/Blatt_{1:0>2}.pdf")`
- **`re_rename(regex, target)`**
Creates a transform that renames all files matching `regex` to `target`.
This transform works on the file names, not paths, and thus works no matter where the file is located.
Example: `re_rename(r"VL(\d+)(.*)\.pdf", "Vorlesung_Nr_{1}__{2}.pdf")`
All movement or rename transforms above return `None` if a file doesn't match
their movement or renaming criteria. This enables them to be used as building
blocks to build up more complex transforms.
In addition, `PFERD.transform` also defines the `keep` transform which returns its input path unchanged.
This behaviour can be very useful when creating more complex transforms.
See below for example usage.
[Path.match]: https://docs.python.org/3/library/pathlib.html#pathlib.Path.match
[Match.group]: https://docs.python.org/3/library/re.html#re.Match.group
#### Transform combinators
These methods let you combine transforms into more complex transforms:
- **`optionally(transform)`**
Wraps a given transform and returns its result if it is not `None`.
Otherwise returns the input path unchanged. Otherwise returns the input path unchanged.
* **`do(transformers)`** See below for example usage.
`do` accepts a series of transformers and applies them in the given order to * **`do(transforms)`**
the result of the previous one. If any transformer returns `None`, do Accepts a series of transforms and applies them in the given order to the result of the previous one.
short-circuits and also returns `None`. This can be used to perform multiple If any transform returns `None`, `do` short-circuits and also returns `None`.
renames in a row: This can be used to perform multiple renames in a row:
```py ```py
do( do(
# Move them # Move them
@ -95,13 +169,12 @@ And PFERD also offers a few combinator functions:
optionally(re_rename("(.*).m4v.mp4", "{1}.mp4")), optionally(re_rename("(.*).m4v.mp4", "{1}.mp4")),
# Remove the 'dbs' prefix (if they have any) # Remove the 'dbs' prefix (if they have any)
optionally(re_rename("(?i)dbs-(.+)", "{1}")), optionally(re_rename("(?i)dbs-(.+)", "{1}")),
), )
``` ```
* **`attempt(transformers)`** - **`attempt(transforms)`**
`attempt` applies the passed transformers in the given order until it finds Applies the passed transforms in the given order until it finds one that does not return `None`.
one that does not return `None`. If it does not find any, it returns `None`. If it does not find any, it returns `None`.
This can be used to give a list of possible transformations and it will This can be used to give a list of possible transformations and automatically pick the first one that fits:
automatically pick the first one that fits:
```py ```py
attempt( attempt(
# Move all videos. If a video is passed in, this `re_move` will succeed # Move all videos. If a video is passed in, this `re_move` will succeed
@ -114,17 +187,26 @@ And PFERD also offers a few combinator functions:
) )
``` ```
All of these combinators are used in the provided example config, if you want All of these combinators are used in the provided example configs, if you want
to see some more true-to-live usages. to see some more real-life usages.
### A short, but commented example ### A short, but commented example
```py ```py
def filter_course(path: PurePath) -> bool: from pathlib import Path, PurePath
# Note that glob returns a Transformer from PFERD import Pferd
# - a function from PurePath -> Optional[PurePath] from PFERD.ilias import IliasElementType
# So we need to apply the result of 'glob' to our input path. from PFERD.transform import *
# We need to crawl the 'Tutorien' folder as it contains the one we want.
# This filter will later be used by the ILIAS crawler to decide whether it
# should crawl a directory (or directory-like structure).
def filter_course(path: PurePath, type: IliasElementType) -> bool:
# Note that glob returns a Transform, which is a function from PurePath ->
# Optional[PurePath]. Because of this, we need to apply the result of
# 'glob' to our input path. The returned value will be truthy (a Path) if
# the transform succeeded, or `None` if it failed.
# We need to crawl the 'Tutorien' folder as it contains one that we want.
if glob("Tutorien/")(path): if glob("Tutorien/")(path):
return True return True
# If we found 'Tutorium 10', keep it! # If we found 'Tutorium 10', keep it!
@ -137,21 +219,35 @@ def filter_course(path: PurePath) -> bool:
# All other dirs (including subdirs of 'Tutorium 10') should be searched :) # All other dirs (including subdirs of 'Tutorium 10') should be searched :)
return True return True
enable_logging() # needed once before calling a Pferd method
# Create a Pferd instance rooted in the same directory as the script file # This transform will later be used to rename a few files. It can also be used
# This is not a test run, so files will be downloaded (default, can be omitted) # to ignore some files.
transform_course = attempt(
# We don't care about the other tuts and would instead prefer a cleaner
# directory structure.
move_dir("Tutorien/Tutorium 10/", "Tutorium/"),
# We don't want to modify any other files, so we're going to keep them
# exactly as they are.
keep
)
# Enable and configure the text output. Needs to be called before calling any
# other PFERD methods.
Pferd.enable_logging()
# Create a Pferd instance rooted in the same directory as the script file. This
# is not a test run, so files will be downloaded (default, can be omitted).
pferd = Pferd(Path(__file__).parent, test_run=False) pferd = Pferd(Path(__file__).parent, test_run=False)
# Use the ilias_kit helper to synchronize an ILIAS course # Use the ilias_kit helper to synchronize an ILIAS course
pferd.ilias_kit( pferd.ilias_kit(
# The folder all of the course's content should be placed in # The directory that all of the downloaded files should be placed in
Path("My cool course"), "My_cool_course/",
# The course ID (found in the URL when on the course page in ILIAS) # The course ID (found in the URL when on the course page in ILIAS)
"course id", "course id",
# A path to a cookie jar. If you synchronize multiple ILIAS courses, setting this # A path to a cookie jar. If you synchronize multiple ILIAS courses,
# to a common value requires you to only login once. # setting this to a common value requires you to only log in once.
cookies=Path("ilias_cookies.txt"), cookies=Path("ilias_cookies.txt"),
# A transform to apply to all found paths # A transform can rename, move or filter out certain files
transform=transform_course, transform=transform_course,
# A crawl filter limits what paths the cralwer searches # A crawl filter limits what paths the cralwer searches
dir_filter=filter_course, dir_filter=filter_course,

View File

@ -124,6 +124,8 @@ def main() -> None:
cookies="ilias_cookies.txt", cookies="ilias_cookies.txt",
) )
# Prints a summary listing all new, modified or deleted files
pferd.print_summary()
if __name__ == "__main__": if __name__ == "__main__":
main() main()

View File

@ -30,6 +30,9 @@ def main() -> None:
cookies="ilias_cookies.txt", cookies="ilias_cookies.txt",
) )
# Prints a summary listing all new, modified or deleted files
pferd.print_summary()
if __name__ == "__main__": if __name__ == "__main__":
main() main()

View File

@ -3,5 +3,5 @@ disallow_untyped_defs = True
disallow_incomplete_defs = True disallow_incomplete_defs = True
no_implicit_optional = True no_implicit_optional = True
[mypy-rich.*,bs4] [mypy-rich.*,bs4,keyring]
ignore_missing_imports = True ignore_missing_imports = True

4
requirements.txt Normal file
View File

@ -0,0 +1,4 @@
requests>=2.21.0
beautifulsoup4>=4.7.1
rich>=2.1.0
keyring>=21.5.0

View File

@ -2,12 +2,13 @@ from setuptools import find_packages, setup
setup( setup(
name="PFERD", name="PFERD",
version="2.1.0", version="2.5.0",
packages=find_packages(), packages=find_packages(),
install_requires=[ install_requires=[
"requests>=2.21.0", "requests>=2.21.0",
"beautifulsoup4>=4.7.1", "beautifulsoup4>=4.7.1",
"rich>=1.0.0" "rich>=2.1.0",
"keyring>=21.5.0"
], ],
) )

159
sync_url.py Executable file
View File

@ -0,0 +1,159 @@
#!/usr/bin/env python
"""
A simple script to download a course by name from ILIAS.
"""
import argparse
import logging
import sys
from pathlib import Path, PurePath
from typing import Optional
from urllib.parse import urlparse
from PFERD import Pferd
from PFERD.authenticators import KeyringAuthenticator, UserPassAuthenticator
from PFERD.cookie_jar import CookieJar
from PFERD.ilias import (IliasCrawler, IliasElementType,
KitShibbolethAuthenticator)
from PFERD.logging import PrettyLogger, enable_logging
from PFERD.organizer import (ConflictType, FileConflictResolution,
FileConflictResolver, resolve_prompt_user)
from PFERD.transform import sanitize_windows_path
from PFERD.utils import to_path
_LOGGER = logging.getLogger("sync_url")
_PRETTY = PrettyLogger(_LOGGER)
def _extract_credentials(file_path: Optional[str]) -> UserPassAuthenticator:
if not file_path:
return UserPassAuthenticator("KIT ILIAS Shibboleth", None, None)
if not Path(file_path).exists():
_PRETTY.error("Credential file does not exist")
sys.exit(1)
with open(file_path, "r") as file:
first_line = file.read().splitlines()[0]
read_name, *read_password = first_line.split(":", 1)
name = read_name if read_name else None
password = read_password[0] if read_password else None
return UserPassAuthenticator("KIT ILIAS Shibboleth", username=name, password=password)
def _resolve_remote_first(_path: PurePath, _conflict: ConflictType) -> FileConflictResolution:
return FileConflictResolution.DESTROY_EXISTING
def _resolve_local_first(_path: PurePath, _conflict: ConflictType) -> FileConflictResolution:
return FileConflictResolution.KEEP_EXISTING
def _resolve_no_delete(_path: PurePath, conflict: ConflictType) -> FileConflictResolution:
# Update files
if conflict == ConflictType.FILE_OVERWRITTEN:
return FileConflictResolution.DESTROY_EXISTING
if conflict == ConflictType.MARKED_FILE_OVERWRITTEN:
return FileConflictResolution.DESTROY_EXISTING
# But do not delete them
return FileConflictResolution.KEEP_EXISTING
def main() -> None:
enable_logging(name="sync_url")
parser = argparse.ArgumentParser()
parser.add_argument("--test-run", action="store_true")
parser.add_argument('-c', '--cookies', nargs='?', default=None, help="File to store cookies in")
parser.add_argument('-u', '--username', nargs='?', default=None, help="Username for Ilias")
parser.add_argument('-p', '--password', nargs='?', default=None, help="Password for Ilias")
parser.add_argument('--credential-file', nargs='?', default=None,
help="Path to a file containing credentials for Ilias. The file must have "
"one line in the following format: '<user>:<password>'")
parser.add_argument("-k", "--keyring", action="store_true",
help="Use the system keyring service for authentication")
parser.add_argument('--no-videos', nargs='?', default=None, help="Don't download videos")
parser.add_argument('--local-first', action="store_true",
help="Don't prompt for confirmation, keep existing files")
parser.add_argument('--remote-first', action="store_true",
help="Don't prompt for confirmation, delete and overwrite local files")
parser.add_argument('--no-delete', action="store_true",
help="Don't prompt for confirmation, overwrite local files, don't delete")
parser.add_argument('url', help="URL to the course page")
parser.add_argument('folder', nargs='?', default=None, help="Folder to put stuff into")
args = parser.parse_args()
cookie_jar = CookieJar(to_path(args.cookies) if args.cookies else None)
session = cookie_jar.create_session()
if args.keyring:
if not args.username:
_PRETTY.error("Keyring auth selected but no --username passed!")
return
inner_auth: UserPassAuthenticator = KeyringAuthenticator(
"KIT ILIAS Shibboleth", username=args.username, password=args.password
)
else:
inner_auth = _extract_credentials(args.credential_file)
username, password = inner_auth.get_credentials()
authenticator = KitShibbolethAuthenticator(inner_auth)
url = urlparse(args.url)
crawler = IliasCrawler(url.scheme + '://' + url.netloc, session,
authenticator, lambda x, y: True)
cookie_jar.load_cookies()
if args.folder is None:
element_name = crawler.find_element_name(args.url)
if not element_name:
print("Error, could not get element name. Please specify a folder yourself.")
return
folder = Path(element_name)
cookie_jar.save_cookies()
else:
folder = Path(args.folder)
# files may not escape the pferd_root with relative paths
# note: Path(Path.cwd, Path(folder)) == Path(folder) if it is an absolute path
pferd_root = Path(Path.cwd(), Path(folder)).parent
target = folder.name
pferd = Pferd(pferd_root, test_run=args.test_run)
def dir_filter(_: Path, element: IliasElementType) -> bool:
if args.no_videos:
return element not in [IliasElementType.VIDEO_FILE, IliasElementType.VIDEO_FOLDER]
return True
if args.local_first:
file_confilict_resolver: FileConflictResolver = _resolve_local_first
elif args.no_delete:
file_confilict_resolver = _resolve_no_delete
elif args.remote_first:
file_confilict_resolver = _resolve_remote_first
else:
file_confilict_resolver = resolve_prompt_user
pferd.enable_logging()
# fetch
pferd.ilias_kit_folder(
target=target,
full_url=args.url,
cookies=args.cookies,
dir_filter=dir_filter,
username=username,
password=password,
file_conflict_resolver=file_confilict_resolver,
transform=sanitize_windows_path
)
pferd.print_summary()
if __name__ == "__main__":
main()