Commit Graph

509 Commits

Author SHA1 Message Date
Joscha
77a109bb7e Fix ilias shibboleth authenticator
The shibboleth site got a visual overhaul that slightly changed the classes of a
form we need.
2020-07-28 19:13:51 +00:00
I-Al-Istannen
a3e1864a26 Allow long paths on windows
If you start PFERD a few folders deep in your home directory, it is
quite easy to reach the maximum path length limit on Windows (260
chars). This patch opts in to long paths ("\\?\" prefix) which lift that
restriction at the cost of ugly path names.
2020-07-25 13:44:49 +02:00
I-Al-Istannen
77874b432b Also add personal_desktop to download summary 2020-07-15 22:47:44 +02:00
I-Al-Istannen
5c4c785e60 Fix HTML file downloading
Previously PFERD thought any HTML file was a "Error, no access" page
when downloading. Now it checks whether ILIAS sends a
content-disposition header, telling the browser to download the file. If
that is the case, it was just a HTML file uploaded to ILIAS. If it has
no header, it is probably an error message.
2020-07-15 15:12:14 +02:00
I-Al-Istannen
2aed4f6d1f Only query the dir_filter for directories 2020-07-13 13:36:12 +02:00
I-Al-Istannen
34152fbe54 Set mtime and atime to ILIAS dates where possible 2020-07-13 13:29:18 +02:00
I-Al-Istannen
c26c9352f1 Make DownloadSummary private, provide property accessors 2020-06-26 17:30:45 +02:00
I-Al-Istannen
d9ea688145 Use pretty logger for summaries 2020-06-26 17:24:36 +02:00
I-Al-Istannen
e4b1fac045 Satisfy pylint 2020-06-26 15:38:22 +02:00
Joscha
402ae81335 Fix type hints 2020-06-26 13:17:44 +00:00
Daniel Augustin
52f31e2783
Add type hints to DownloadSummary 2020-06-26 13:02:37 +02:00
Daniel Augustin
739522a151
Move download summary into a separate class 2020-06-25 23:07:11 +02:00
Daniel Augustin
6c034209b6
Add deleted files to summary 2020-06-25 22:00:28 +02:00
Daniel Augustin
f6fbd5e4bb
Add download summary 2020-06-25 19:19:34 +02:00
I-Al-Istannen
7024db1f13 Use transient progessbar
This will ensure no pesky newline ends up in the output, even on
windows.
2020-06-25 18:03:12 +02:00
I-Al-Istannen
23bfa42a0d Never use the direct download button, as it is currently broken 2020-06-11 13:31:01 +02:00
I-Al-Istannen
fdb57884ed Touch files with same content to update timestamps 2020-05-31 20:27:15 +02:00
I-Al-Istannen
8198c9ecaa Reorder methods a bit 2020-05-30 19:06:36 +02:00
I-Al-Istannen
086b15d10f Crawl a bit more iteratively 2020-05-30 15:47:15 +02:00
I-Al-Istannen
9d6ce331a5 Use IliasCrawlerEntry entries in the ilias scraper 2020-05-30 15:20:51 +02:00
I-Al-Istannen
821c7ade26 Move video url extraction logic to crawler 2020-05-30 00:22:31 +02:00
I-Al-Istannen
b969a1854a Remove unneeded whitespace 2020-05-30 00:22:31 +02:00
I-Al-Istannen
62535b4452 Unpack videos in ILIAS downloader 2020-05-21 22:12:52 +02:00
I-Al-Istannen
c0056e5669 Correctly crawl video pages with multiple pages 2020-05-21 21:38:07 +02:00
I-Al-Istannen
03a801eecc Correctly type hint swallow_and_print_errors decorator 2020-05-12 21:03:53 +02:00
Joscha
072c6630bf Avoid logging import in config 2020-05-12 18:19:23 +00:00
I-Al-Istannen
4f56c8f192 Pass element type to ilias directory filter 2020-05-12 14:41:13 +02:00
I-Al-Istannen
4fdb67128d Fetch correct diva playlist id 2020-05-11 00:25:34 +02:00
I-Al-Istannen
a0f9d31d94 Use PrettyLogger warning everywhere 2020-05-10 21:56:12 +02:00
I-Al-Istannen
e7b08420ba Warn when a marked file is added again 2020-05-10 21:42:30 +02:00
I-Al-Istannen
c1b21f7772 Only remove a progress task when we added it 2020-05-10 12:28:30 +02:00
I-Al-Istannen
9850ab1d73 Allow crawling the ILIAS Personal Desktop 2020-05-10 12:16:42 +02:00
I-Al-Istannen
9950144e97 Allow passing a playlist URL to diva instead of an id 2020-05-10 11:17:13 +02:00
I-Al-Istannen
f6faacabb0 Move FatalException to errors.py 2020-05-09 00:11:21 +02:00
I-Al-Istannen
19c1e3ac6f Fail on invalid ILIAS course ids 2020-05-09 00:11:20 +02:00
I-Al-Istannen
afa48c2d2d Swallow and print errors instead of crashing 2020-05-09 00:10:54 +02:00
I-Al-Istannen
a4c518bf4c Update date find regex 2020-05-08 22:17:58 +02:00
I-Al-Istannen
057135022f Try to accept that life sometimes is in English 2020-05-08 22:10:43 +02:00
I-Al-Istannen
755e9aa0d3 Try to add support for Shibboleth TFA token 2020-05-08 21:52:51 +02:00
I-Al-Istannen
c9deca19ca Remove walrus to lower needed python version 2020-05-08 21:21:33 +02:00
I-Al-Istannen
a0c5572b59 Fix progress bars swallowing a line when they shouldn't 2020-05-08 19:55:53 +02:00
I-Al-Istannen
2d20d2934c Color warning differently 2020-05-08 19:52:45 +02:00
I-Al-Istannen
2c48ab66d4 Use rich for log colorization 2020-05-08 19:31:54 +02:00
I-Al-Istannen
56f2394001 Add a download progress bar 2020-05-08 17:09:56 +02:00
I-Al-Istannen
bee3d70998 Added a diva playlist downloader 2020-04-30 17:18:45 +02:00
I-Al-Istannen
42345ecc61 Demangle "Morgen" too 2020-04-30 12:05:25 +02:00
I-Al-Istannen
920d521d68 Change PrettyLogger.warn to PrettyLogger.warning 2020-04-25 20:11:51 +02:00
I-Al-Istannen
e0b46a306a Use warn method in IliasCrawler 2020-04-25 20:07:40 +02:00
I-Al-Istannen
8a42a2a396 Move logging into its own file 2020-04-25 20:02:01 +02:00
I-Al-Istannen
80247400a4 Debug log when starting an ilias download 2020-04-25 13:02:07 +02:00
Joscha
1aaa6e7ab5 Use PathLike everywhere 2020-04-24 18:41:14 +00:00
Joscha
7f53543324 Satisfy pylint and add todo 2020-04-24 18:26:28 +00:00
Joscha
292e516297 Change crawler and downloader output 2020-04-24 18:24:44 +00:00
Joscha
8258fa8919 Add test run option to PFERD 2020-04-24 18:00:21 +00:00
Joscha
5b929f09a2 Move download strategies to downloader
Also fixes an issue where the downloader didn't mark files that were not
downloaded due to the strategy used.
2020-04-24 14:27:40 +00:00
Joscha
4d32f863bc Clean up organizer after synchronizing 2020-04-24 14:17:23 +00:00
Joscha
4e7333b396 Allow specifying paths as strings in Pferd 2020-04-24 11:50:40 +00:00
I-Al-Istannen
4c0e3b493a Use download_modified_or_new as default strategy 2020-04-24 13:48:06 +02:00
Joscha
2de079a5d3 Add a few Transform combinators 2020-04-24 11:35:46 +00:00
I-Al-Istannen
509e624d47 Satisfy pyling. Useful docstrings? Not quite sure. 2020-04-23 20:35:59 +02:00
I-Al-Istannen
980f69b5af Fix organizer marking itself causing an error 2020-04-23 20:02:05 +02:00
I-Al-Istannen
0b00a9c26b Log when starting to synchronize 2020-04-23 19:56:37 +02:00
Joscha
1ef85c45e5 Switch Transform to PurePath 2020-04-23 17:40:43 +00:00
Joscha
5ef5a56e69 Extract Location into separate file 2020-04-23 17:38:28 +00:00
I-Al-Istannen
f3f4be2690 More free functions 2020-04-23 19:21:49 +02:00
I-Al-Istannen
076b8c5a1f Add download strategies to save bandwith
Only download files that are newer than the local version.
2020-04-23 18:29:20 +02:00
I-Al-Istannen
13bc78c889 Display reason for ignoring an element in ilias crawler 2020-04-23 13:54:58 +02:00
I-Al-Istannen
dc964a9d98 Remove finished TODOs 2020-04-23 13:30:34 +02:00
I-Al-Istannen
c2b14f3db9 ilias crawler: Use direct download link if possible 2020-04-23 13:08:12 +02:00
Joscha
4b59a7c375 Move around TODOs 2020-04-23 10:49:01 +00:00
I-Al-Istannen
bef210ae77 Rename and implement IliasDirectoryFilter 2020-04-23 12:35:18 +02:00
I-Al-Istannen
ea005517cf Only remove folders if they exist in tmpdir 2020-04-23 12:09:45 +02:00
Joscha
df0eb84a44 Fix TmpDir and Location
TmpDir: Clean up before and after, not just after
Location: Resolve path so that parent check works properly
2020-04-23 09:50:32 +00:00
Joscha
2de4255a78 Add Pferd class 2020-04-23 09:50:32 +00:00
Joscha
3c808879c9 Add Transforms and Transformables 2020-04-22 18:25:09 +00:00
I-Al-Istannen
a051e3bcca ilias crawler: Add some unhelpful documentation 2020-04-22 17:58:19 +02:00
I-Al-Istannen
eb7df036df WIP: ilias crawler: Also crawl assignments 2020-04-22 14:32:20 +02:00
I-Al-Istannen
23db59e733 WIP: ilias-crawler: Demangle dates 2020-04-22 12:58:44 +02:00
I-Al-Istannen
ac65b06a8e Satisfy pylint a bit 2020-04-22 01:37:34 +02:00
I-Al-Istannen
8891041069 WIP: crawler: Add opencast video crawler 2020-04-21 23:01:19 +02:00
I-Al-Istannen
70d63e3e90 WIP: Start small ILIAS crawler 2020-04-21 13:32:03 +02:00
I-Al-Istannen
b2a7af2e3e Store modification_date in IliasDownloadInfo, remove parameters 2020-04-21 13:31:50 +02:00
I-Al-Istannen
23bed48c8c Satisfy autopep8 2020-04-21 13:30:42 +02:00
Joscha
0926d33798 Use downloader-specific data classes 2020-04-20 18:07:45 +00:00
I-Al-Istannen
55ba2f4070 Fix pylint in downloaders 2020-04-20 19:49:15 +02:00
I-Al-Istannen
d18b48aaf4 Stream in http downloader 2020-04-20 19:45:25 +02:00
Joscha
4ef0ffe3bf Listen to pylint and mypy 2020-04-20 17:44:58 +00:00
Joscha
ce77995c8f Rename http downloader module 2020-04-20 17:08:51 +00:00
I-Al-Istannen
ed9245c14d Remove old organizer 2020-04-20 18:50:23 +02:00
I-Al-Istannen
01e6972c96 Add ilias downloader 2020-04-20 18:49:01 +02:00
I-Al-Istannen
8181ae5b17 Guard http response in context manager 2020-04-20 18:47:46 +02:00
Joscha
6407190ae0 Soupify requests responses properly 2020-04-20 16:38:30 +00:00
I-Al-Istannen
87395faac2 Add base for simple HTTP downloader 2020-04-20 17:43:59 +02:00
I-Al-Istannen
a9e6e7883d Create temp dir folder in constructor 2020-04-20 17:43:59 +02:00
Joscha
154d6b29dd Listen to pylint 2020-04-20 15:16:22 +00:00
I-Al-Istannen
62ac569ec4 Revert "Add proposed crawler entry type"
This reverts commit 9f1a0a58ab.

Each crawler will have its own data class.
2020-04-20 16:59:20 +02:00
I-Al-Istannen
9f1a0a58ab Add proposed crawler entry type 2020-04-20 16:54:47 +02:00
Joscha
879a2c7c80 Rewrite ILIAS authenticator 2020-04-20 14:26:30 +00:00
Joscha
ff06c5215e Fix authenticator 2020-04-20 14:26:29 +00:00
I-Al-Istannen
135a8dce4b Fix resolve_path allowing paths outside its folder
This happened if the directory name was a prefix of the offending file name.
2020-04-20 16:07:14 +02:00
I-Al-Istannen
63bbcad918 Add resolve method to tmp_dir 2020-04-20 15:40:07 +02:00
I-Al-Istannen
6584d6a905 Elaborate accept_file in new_organizer 2020-04-20 15:40:07 +02:00
Joscha
5990098ef8 Add UserPassAuthenticator 2020-04-20 13:26:45 +00:00
I-Al-Istannen
f3d3d6bb65 Add some docs to cookie_jar 2020-04-20 14:38:03 +02:00
I-Al-Istannen
b2fe7cc064 Add preliminary logging to organizer and tmp_dir 2020-04-20 14:37:44 +02:00
I-Al-Istannen
930d821dd7 Add a simple organizer 2020-04-20 14:29:48 +02:00
I-Al-Istannen
5c2ff14839 Add "prompt_yes_no" to utils 2020-04-20 14:29:48 +02:00
I-Al-Istannen
a3d6dc7873 Clean up temp_folder 2020-04-20 14:29:48 +02:00
Joscha
53ad1c924b Add cookie jar 2020-04-20 11:35:26 +00:00
I-Al-Istannen
8c431c7d81 Add a simple temporary folder 2020-04-20 12:08:52 +02:00
Joscha
d5dd5aac06 Fix some mypy errors 2020-04-20 01:54:47 +00:00
Joscha
25043a4aaa Remove unnecessary files
Also document some plans for the new program structure in REWRITE.md
2020-04-19 19:49:43 +00:00
I-Al-Istannen
cf3553175f Add OS_Exams synchronizer 2020-02-27 14:51:29 +01:00
I-Al-Istannen
bf8b3cf9f7 Hack in support for TI exams
This just adds an additional crawl check for AlteKlausuren. This is not
present on the root site but at the suffix `/Klausuren`.
Example config:

```py
 # The "Klausur" needs to be copied verbatim!
ti.synchronize("Klausur", "sync dir name",
               transform=ro_19_klausur_transform, filter=ro_19_klausur_filter)
```
2020-02-24 20:58:27 +01:00
I-Al-Istannen
f5bc49160f Lose 50 minutes of my life (and fix the TGI tut) 2019-12-12 12:50:16 +01:00
I-Al-Istannen
4433696509 [TGI] Add TGi tut 2019-11-18 09:58:16 +01:00
I-Al-Istannen
1407c6d264 Download all TGI files and not just lectures 2019-10-17 22:14:32 +02:00
I-Al-Istannen
1973c931bd Add support for other years in TGI downloader 2019-10-15 15:37:52 +02:00
I-Al-Istannen
458cc1c6d6 Add support for TGI website 2019-10-15 15:34:59 +02:00
Joscha
f94629a7fa Fix exceptions with weird content types
(hopefully)
2019-09-22 11:55:47 +00:00
I-Al-Istannen
2752e98621 Fix relative url joining in ti downloader 2019-07-26 10:06:01 +02:00
Joscha
ea01dc7cb2 Allow even more types of files 2019-07-05 08:48:43 +00:00
Joscha
77056e6f8d Allow more types of files 2019-07-04 12:16:42 +00:00
Joscha
d468a45662 Allow wolfram files 2019-06-11 12:42:55 +00:00
I-Al-Istannen
67da4e69fa Add colorful log output
Highlight the important operations (new, modified) in different colours.
2019-06-07 13:28:55 +02:00
Joscha
2016f61bf8 Crawl more of the TI page 2019-05-09 11:04:24 +00:00
Joscha
c72e92db18 Make Ti downloader authentication more robust 2019-05-06 12:04:01 +00:00
Joscha
44b4204517 Add basic Ti downloader 2019-05-06 11:54:36 +00:00
Joscha
d730d0064c Conform to other files' __all__ 2019-04-26 09:45:24 +00:00
Joscha
ae6cc40fb5 Rename ILIAS crawler to ilias
To be consistent with the other classes' capitalisation of acronyms
2019-04-26 04:29:12 +00:00
Joscha
0891e7f1bc Fix logging messages not appearing 2019-04-26 03:58:11 +00:00
Joscha
9693e1d968 Make logging easier 2019-04-25 19:53:13 +00:00
Joscha
f1ba618378 Remove unnecessary files 2019-04-25 19:18:19 +00:00
Joscha
dfddc93039 Move norbert from aiohttp to requests
Also fix streaming (when downloading) in the other classes.
2019-04-25 19:15:36 +00:00
Joscha
f0c42ce8ec Clean up
Use shorter name for responses, like in the requests doc.

Change Organizer's __all__ to be more in line with the other __all__s.
2019-04-25 19:02:48 +00:00
Joscha
82adeb324f Move ffm stuff from aiohttp to requests 2019-04-25 19:01:53 +00:00
Joscha
9bae030186 Move ilias stuff from aiohttp to requests 2019-04-25 18:52:48 +00:00
Joscha
c7a9a42b3d Allow files of type application/msword 2019-04-24 12:34:50 +00:00
Joscha
5a1bf2188b Switch from tabs to spaces 2019-04-24 12:34:20 +00:00
Joscha
3019e4255b Replace "/" in file names with "." 2018-12-14 09:27:12 +00:00
Joscha
616a8d96a2 Sort norbert files while downloading 2018-12-05 11:44:35 +00:00
Joscha
2d9223b8e6 Add norbert synchronizer 2018-11-29 10:26:58 +00:00
Joscha
bdc0e8ad03 Remember files correctly for cleanin up 2018-11-28 08:59:07 +00:00
Joscha
dad33b8c7f Save identically named files under different names 2018-11-27 17:23:32 +00:00
Joscha
98a2b5db34 Fix tut crawling 2018-11-27 10:28:39 +00:00
Joscha
c824ae4f6d Add more allowed file types 2018-11-27 10:27:20 +00:00
Joscha
8b1a34233a Add and use utility functions for changing paths
This fixes a small bug in the example config, where some files were
put in the wrong locations.
2018-11-27 08:52:27 +00:00
Joscha
a084b05433 Change log message
for better readability
2018-11-26 17:33:27 +00:00
Joscha
068fe77dcf Clean up minor things
- improve logging messages
- allow more download file formats
- strip file names
2018-11-26 17:00:17 +00:00
Joscha
34da5d4d19 Sync files from ILIAS 2018-11-26 13:39:06 +00:00
Joscha
529c4a7dda Don't overwrite files if the contents match 2018-11-26 13:37:01 +00:00
Joscha
2034c9d426 Add FfM (Fachschaft für Mathematik) synchronizer
This commit moves exceptions and some other things into utils.py and
renames files according to python's file naming guides (kinda).

It also adds a new example config using the new FfM downloader.
2018-11-24 08:27:33 +00:00
Joscha
5732268084 Clean up
- detect whether authenticating is really necessary when attempting to
download a file
- add a get_website_refid() function
- move often-used goto.php url into constant
- and some comments
2018-11-23 17:45:07 +00:00
Joscha
2afcd38f1c Rename Ilias-specific stuff 2018-11-23 10:09:03 +00:00
Joscha
5d5f60e21f Log properly 2018-11-23 10:08:31 +00:00
Joscha
282d0252eb Add file organizer 2018-11-23 08:56:59 +00:00
Joscha
4e6912591c Download files to some local file 2018-11-23 08:53:49 +00:00
Joscha
cf9d43fe84 Fix authenticating bug 2018-11-21 06:59:34 +00:00
Joscha
95646b0b29 Authenticate with ILIAS and get pages by refid 2018-11-20 05:55:41 +00:00