Update README.md

This commit is contained in:
Kovachev 2023-06-05 19:32:13 +01:00 committed by GitHub
parent de98026b95
commit b716454c58
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -4,17 +4,24 @@ We recently removed the `yomi` (alias: `y`) parameter from the pronunciation tem
now-deprecated and unnecessary parameter from entries in this category.
## Method of operation
The script iterates over all the entries in this category, then performs a simple substitution to get rid of the yomis for each occurrence of
{{ja-pron}}.
The regular expression that does this is: `({{ja-pron(?:\|[^\|]+?=[^\|]+?|\|[^\|]+)*?)\|(?:y|yomi)=(?:o|on|go|goon|ko|kan|kanon|so|soon|to|toon|ky|kanyo|kanyoon|k|kun|j|ju|y|yu|i|irr|irreg|irregular)((?:\|[^\|]+?=[^\|]+?|\|[^\|]+)*}})`
You will see two capturing groups, one before the "yomi" portion, and one after; given that these two together comprise the entire template and its arguments,
except for the yomi argument, we simply replace any match for this pattern with the two matching groups concatenated together,
e.g. if we have `{{ja-pron|しょう|y=kan|acc=0}}`, the match would contain the two subgroups `{{ja-pron|しょう` and `|acc=0}}`, so when we put them
together, we get `{{ja-pron|しょう|acc=0}}`, and in this way the yomi is removed.
The script iterates over all the entries in this category, then performs a simple API call in `mwparserfromhell` to get rid of the yomis for each occurrence of {{ja-pron}}.
The way this is done is to iterate over all templates (`for template in parsed.ifilter(forcetype=Template, recursive=False)`) and, for any one whose
name is `ja-pron`, remove any `y` or `yomi` parameters if they exist.
```
for template in parsed.ifilter(forcetype=Template, recursive=False):
if template.name != "ja-pron":
continue
if template.has("y"):
template.remove("y")
if template.has("yomi"):
template.remove("yomi")
```
## Method to guarantee no malfunction
Although I believe the script to have no flaws, it is natural for an unforeseen bug to potentially occur, especially with a complicated regex
like this. The safeguard I have is to check the page text before and after editing: we expect all the {{ja-pron}}s to stay in the same order
The safeguard I have is to check the page text before and after editing: we expect all the {{ja-pron}}s to stay in the same order
that they were originally, since my script doesn't change the fundamental arrangement of the page, but with each one having one less argument
(determinable by counting the number of |s) than it did before the edit. We assert this every single time an edit is made, so that if this invariant
is somehow broken, and an error must have occurred somewhere, the program will halt immediately and before making the edit at all.
I also make use of backups using Python's `difflib`, which creates unified diff files for every edit that is made, which is then stored to disk.
If there arises any issue with the bot's edits, these can be used to undo them.