Update README.md
This commit is contained in:
parent
de98026b95
commit
b716454c58
@ -4,17 +4,24 @@ We recently removed the `yomi` (alias: `y`) parameter from the pronunciation tem
|
|||||||
now-deprecated and unnecessary parameter from entries in this category.
|
now-deprecated and unnecessary parameter from entries in this category.
|
||||||
|
|
||||||
## Method of operation
|
## Method of operation
|
||||||
The script iterates over all the entries in this category, then performs a simple substitution to get rid of the yomis for each occurrence of
|
The script iterates over all the entries in this category, then performs a simple API call in `mwparserfromhell` to get rid of the yomis for each occurrence of {{ja-pron}}.
|
||||||
{{ja-pron}}.
|
The way this is done is to iterate over all templates (`for template in parsed.ifilter(forcetype=Template, recursive=False)`) and, for any one whose
|
||||||
The regular expression that does this is: `({{ja-pron(?:\|[^\|]+?=[^\|]+?|\|[^\|]+)*?)\|(?:y|yomi)=(?:o|on|go|goon|ko|kan|kanon|so|soon|to|toon|ky|kanyo|kanyoon|k|kun|j|ju|y|yu|i|irr|irreg|irregular)((?:\|[^\|]+?=[^\|]+?|\|[^\|]+)*}})`
|
name is `ja-pron`, remove any `y` or `yomi` parameters if they exist.
|
||||||
You will see two capturing groups, one before the "yomi" portion, and one after; given that these two together comprise the entire template and its arguments,
|
```
|
||||||
except for the yomi argument, we simply replace any match for this pattern with the two matching groups concatenated together,
|
for template in parsed.ifilter(forcetype=Template, recursive=False):
|
||||||
e.g. if we have `{{ja-pron|しょう|y=kan|acc=0}}`, the match would contain the two subgroups `{{ja-pron|しょう` and `|acc=0}}`, so when we put them
|
if template.name != "ja-pron":
|
||||||
together, we get `{{ja-pron|しょう|acc=0}}`, and in this way the yomi is removed.
|
continue
|
||||||
|
|
||||||
|
if template.has("y"):
|
||||||
|
template.remove("y")
|
||||||
|
if template.has("yomi"):
|
||||||
|
template.remove("yomi")
|
||||||
|
```
|
||||||
|
|
||||||
## Method to guarantee no malfunction
|
## Method to guarantee no malfunction
|
||||||
Although I believe the script to have no flaws, it is natural for an unforeseen bug to potentially occur, especially with a complicated regex
|
The safeguard I have is to check the page text before and after editing: we expect all the {{ja-pron}}s to stay in the same order
|
||||||
like this. The safeguard I have is to check the page text before and after editing: we expect all the {{ja-pron}}s to stay in the same order
|
|
||||||
that they were originally, since my script doesn't change the fundamental arrangement of the page, but with each one having one less argument
|
that they were originally, since my script doesn't change the fundamental arrangement of the page, but with each one having one less argument
|
||||||
(determinable by counting the number of |s) than it did before the edit. We assert this every single time an edit is made, so that if this invariant
|
(determinable by counting the number of |s) than it did before the edit. We assert this every single time an edit is made, so that if this invariant
|
||||||
is somehow broken, and an error must have occurred somewhere, the program will halt immediately and before making the edit at all.
|
is somehow broken, and an error must have occurred somewhere, the program will halt immediately and before making the edit at all.
|
||||||
|
I also make use of backups using Python's `difflib`, which creates unified diff files for every edit that is made, which is then stored to disk.
|
||||||
|
If there arises any issue with the bot's edits, these can be used to undo them.
|
||||||
|
Loading…
Reference in New Issue
Block a user