kovachev-bot/ja-yomi
2023-05-29 16:47:28 +01:00
..
ja-yomi-remove.py Add backup functionality 2023-05-29 16:47:28 +01:00
README.md Create README.md 2023-05-29 16:21:29 +01:00

Japanese yomi removal script

This script is intended to traverse the tracking template/category https://en.wiktionary.org/wiki/Special:WhatLinksHere/Template:tracking/ja-pron/yomi on the English Wiktionary. We recently removed the yomi (alias: y) parameter from the pronunciation template {{ja-pron}}, so this script is intended to remove all occurrences of this now-deprecated and unnecessary parameter from entries in this category.

Method of operation

The script iterates over all the entries in this category, then performs a simple substitution to get rid of the yomis for each occurrence of {{ja-pron}}. The regular expression that does this is: ({{ja-pron(?:\|[^\|]+?=[^\|]+?|\|[^\|]+)*?)\|(?:y|yomi)=(?:o|on|go|goon|ko|kan|kanon|so|soon|to|toon|ky|kanyo|kanyoon|k|kun|j|ju|y|yu|i|irr|irreg|irregular)((?:\|[^\|]+?=[^\|]+?|\|[^\|]+)*}}) You will see two capturing groups, one before the "yomi" portion, and one after; given that these two together comprise the entire template and its arguments, except for the yomi argument, we simply replace any match for this pattern with the two matching groups concatenated together, e.g. if we have {{ja-pron|しょう|y=kan|acc=0}}, the match would contain the two subgroups {{ja-pron|しょう and |acc=0}}, so when we put them together, we get {{ja-pron|しょう|acc=0}}, and in this way the yomi is removed.

Method to guarantee no malfunction

Although I believe the script to have no flaws, it is natural for an unforeseen bug to potentially occur, especially with a complicated regex like this. The safeguard I have is to check the page text before and after editing: we expect all the {{ja-pron}}s to stay in the same order that they were originally, since my script doesn't change the fundamental arrangement of the page, but with each one having one less argument (determinable by counting the number of |s) than it did before the edit. We assert this every single time an edit is made, so that if this invariant is somehow broken, and an error must have occurred somewhere, the program will halt immediately and before making the edit at all.