Update README.md

2023-06-05 19:32:13 +01:00 · 2023-06-05 19:32:13 +01:00 · b716454c58
commit b716454c58
parent de98026b95
1 changed files with 16 additions and 9 deletions
--- a/ja-yomi/README.md
+++ b/ja-yomi/README.md
@ -4,17 +4,24 @@ We recently removed the `yomi` (alias: `y`) parameter from the pronunciation tem
 now-deprecated and unnecessary parameter from entries in this category.
 ## Method of operation
-The script iterates over all the entries in this category, then performs a simple substitution to get rid of the yomis for each occurrence of
+The script iterates over all the entries in this category, then performs a simple API call in `mwparserfromhell` to get rid of the yomis for each occurrence of {{ja-pron}}.
-{{ja-pron}}.
+The way this is done is to iterate over all templates (`for template in parsed.ifilter(forcetype=Template, recursive=False)`) and, for any one whose
-The regular expression that does this is: `({{ja-pron(?:\|[^\|]+?=[^\|]+?|\|[^\|]+)*?)\|(?:y|yomi)=(?:o|on|go|goon|ko|kan|kanon|so|soon|to|toon|ky|kanyo|kanyoon|k|kun|j|ju|y|yu|i|irr|irreg|irregular)((?:\|[^\|]+?=[^\|]+?|\|[^\|]+)*}})`
+name is `ja-pron`, remove any `y` or `yomi` parameters if they exist.
-You will see two capturing groups, one before the "yomi" portion, and one after; given that these two together comprise the entire template and its arguments,
+```
-except for the yomi argument, we simply replace any match for this pattern with the two matching groups concatenated together,
+    for template in parsed.ifilter(forcetype=Template, recursive=False):
-e.g. if we have `{{ja-pron|しょう|y=kan|acc=0}}`, the match would contain the two subgroups  `{{ja-pron|しょう` and `|acc=0}}`, so when we put them
+        if template.name != "ja-pron":
-together, we get `{{ja-pron|しょう|acc=0}}`, and in this way the yomi is removed.
+            continue
        if template.has("y"):
            template.remove("y")
        if template.has("yomi"):
            template.remove("yomi")
 ```
 ## Method to guarantee no malfunction
-Although I believe the script to have no flaws, it is natural for an unforeseen bug to potentially occur, especially with a complicated regex
+The safeguard I have is to check the page text before and after editing: we expect all the {{ja-pron}}s to stay in the same order
 like this. The safeguard I have is to check the page text before and after editing: we expect all the {{ja-pron}}s to stay in the same order
 that they were originally, since my script doesn't change the fundamental arrangement of the page, but with each one having one less argument
 (determinable by counting the number of |s) than it did before the edit. We assert this every single time an edit is made, so that if this invariant
 is somehow broken, and an error must have occurred somewhere, the program will halt immediately and before making the edit at all.
 I also make use of backups using Python's `difflib`, which creates unified diff files for every edit that is made, which is then stored to disk.
 If there arises any issue with the bot's edits, these can be used to undo them.