Check-in [293badd94c]
Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Expand D-T backend hooks, abbreviate any ln-CT language specifier, document new backends. |
---|---|
Downloads: | Tarball | ZIP archive | SQL archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
293badd94caefe4ba87d2c48b09f7fec |
User & Date: | mario 2021-05-13 15:41:11 |
Context
2021-05-14
| ||
03:19 | Simplified params["backend"] string instead of individual flags, shorten mapping and parameterization in deep_translator backend, abbreviate D-T and T-P in new config dialog, add DT duplicates, minor manual updates. check-in: c28be6ec87 user: mario tags: trunk | |
2021-05-13
| ||
15:41 | Expand D-T backend hooks, abbreviate any ln-CT language specifier, document new backends. check-in: 293badd94c user: mario tags: trunk | |
05:28 | support all 4 new Deep-Translate backends, still needs some rework to omit "auto" source language check-in: 7ad6ace92e user: mario tags: trunk | |
Changes
Changes to help/en/vnd.include-once.pagetranslate/config.page.
︙ | ︙ | |||
15 16 17 18 19 20 21 22 23 24 | <title>Translation settings</title> <p>The options page can be found under <guiseq><gui>Tools</gui> โ <gui>Options</gui> โ <gui>๐</gui> โ <gui>Language Settings</gui> โ <gui>PageTranslate</gui></guiseq>.</p> <section id="service"> <title>Translation service to use</title> <terms> <item> <title>โ Google Translate</title> | > > | > | | < < < < < < < < < < < < < < < < < < < > | | | | | > | | | | > | | | > | | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | <title>Translation settings</title> <p>The options page can be found under <guiseq><gui>Tools</gui> โ <gui>Options</gui> โ <gui>๐</gui> โ <gui>Language Settings</gui> โ <gui>PageTranslate</gui></guiseq>.</p> <section id="service"> <title>Translation service to use</title> <p>There's a few built-in backends:</p> <terms> <item> <title>โ Google Translate</title> <p><link href="https://translate.google.com/">Google Translate</link> is the default option, and suitable to both text selection and translating whole pages. Provides pretty good machine translations. It incurs some delays for longer texts, as each 1900 characters (sentences/paragraphs) have to be transfered individually (managed automatically, no user interaction necessary).</p> </item> <item> <title>โ MyMemory</title> <p>For <link href="https://mymemory.translated.net/">MyMemory</link> you should specify an email address in the according input box (though it's optional, it unlocks more requests). No longer requires the python-translate module, but <file>langdetect</file> (for supplying the correct source language). Which is why it sometimes fails, and possibly requires the Tools โ PageTranslate โ From โ To option. Doesn't yield quite as good machine translations. But it's an open source service. </p> </item> <item> <title>โ Command line tool</title> <p>Allows to send each text paragraph to a local application. To use it, set the command in the according input field again. Placeholders are `{lang}` for the target language, and `{text}` for the paragaphs or current text section. (Both get automatically escaped). For <cmd>translate-cli</cmd> you might need the <var>-p</var> provider option as well. See also the <link href="https://pypi.org/project/translate/">translate-python documentation</link> on how to prepare a separate <file>~/.python-translate.cfg</file>. Or use <link href="https://github.com/nidhaloff/deep-translator">deep-translator cli</link> with for example <cmd>deep_translator -trans "google" -src "auto" -tg {lang} -txt {text}</cmd>. </p> </item> <item> <title>โ DeepL API</title> <p>Utilizes the speedier <link href="https://www.deepl.com/pro">DeepL Pro API</link> to translate documents. As of yet untested. Requires an API key and paid subscription. No XML mode (to retain full inline formatting) yet, still translates each text segment/paragraph/sentence individually.</p> </item> <item> <title>โ DeepL Free API</title> <p>You can now get a free API key for limited usage (500K characters per month - around 1 or 2 documents per day). This secondary API might not be as well maintained. And signup still requires a credit card (use one of the privacy or temporary online credit card services).</p> </item> <item> <title>โ DeepL web interface</title> <p>Utilizes web scraping on the <link href="https://www.deepl.com/translator/">DeepL online translator</link>. Only suitable for testing and translating single paragraphs or text selection, because it quickly blocks with "error 429 - too many requests". It's also kinda redundant now that there's a Free API option.</p> </item> </terms> <p>Some provided via <cmd>pip install <link href="https://pypi.org/project/translate/">translate-python</link></cmd>:</p> <terms> <item> <title>โ Microsoft Translator</title> <p>Requires an authorization key. There's also a free/test <link href="https://azure.microsoft.com/en-us/pricing/details/cognitive-services/translator/">subscription for an API key</link>. Not tested within PageTranslate yet.</p> </item> </terms> <p>And more via <cmd>pip install <link href="https://pypi.org/project/deep-translator/">deep-translator</link></cmd>:</p> <terms> <item> <title>โ QCRI Machine Translation</title> <p>Requires a <link href="https://mt.qcri.org/api/">free API key</link>, but is suitable for whole-document translations. </p> </item> <item> <title>โ Yandex Translation</title> <p>Also requires its own <link href="https://translate.yandex.com/">API key</link>.</p> </item> <item> <title>โ Linguee Dictionary</title> <p>Performs word-wise <link href="https://www.linguee.com/">translation</link> lookups, so not suitable for translating whole documents, but just text selections. Albeit PageTranslate will split up sentences and pipe each word through the service; that won't yield a readable machine translation. </p> </item> <item> <title>โ Pons Dictonary</title> <p>Also is more of a <link href="https://de.pons.com/">dictionary</link> than a translation service. Suitable for text-selections, but probably not paragraphs or whole documents. PageTranslate will split-process longer selections word-wise through the Pons Translation interface.</p> </item> </terms> </section> <section id="service"> <title>Parameters</title> <terms> <item> <title>API key</title> <p>You can set an API or OAuth key for services that require one. The same input field serves for all backends, so you can't switch between them without also changing this entry first. (Not a common use case to have multiple API subscriptions really).</p> </item> <item> <title>Email adr</title> <p>An email address is only required by MyMemory. And strictly speaking it's not even required; it just allows for more translations.</p> </item> <item> <title>Command</title> <p>This field defines the CLI tool to use for translation. You can use something other than `translate-cli` or `deep-translator` of course. Placeholders like {lang} and {text} can be used here.</p> </item> </terms> </section> <section id="flags"> <title>Options / Flags</title> <terms> <item> <title>โ quick linebreak handling</title> <p>Might speed up table processing with Google Translate, as it avoids sending each newline-split sentence separately. It simply conjoins multiple lines temporarily with <cmd>"/#ยง/"</cmd> in place of a linebreak (and then rejoins them), so there are less requests. Primarily helps with tables, but less for documents with lengthy paragraphs.</p> |
︙ | ︙ |
Changes to help/en/vnd.include-once.pagetranslate/config.xhp.
1 2 3 4 5 6 7 8 9 | <?xml version="1.0" encoding="UTF-8"?> <helpdocument version="1.0"> <meta> <topic id="topic_d1e3" indexer="include" status="PUBLISH"> <title xml-lang="en" id="title_d1e3">Translation settings</title> <filename>/help/vnd.include-once.pagetranslate/config.xhp</filename> </topic> <history> <created date="2020-02-02T22:22:22"/> | | > | | > > | > | | | < < < < < < < < < < < < < < < < < < < < < < > | | | | | > | | | | | > | | | | | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | | | | | | | | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | <?xml version="1.0" encoding="UTF-8"?> <helpdocument version="1.0"> <meta> <topic id="topic_d1e3" indexer="include" status="PUBLISH"> <title xml-lang="en" id="title_d1e3">Translation settings</title> <filename>/help/vnd.include-once.pagetranslate/config.xhp</filename> </topic> <history> <created date="2020-02-02T22:22:22"/> <lastedited date="2021-05-13T17:38:33.093+02:00"/> </history> </meta> <body> <bookmark id="bm_d1e7" branch="hid/vnd.include-once.pagetranslate:OptionsPageTranslate" xml-lang="en"> <bookmark_value>PageTranslate settings</bookmark_value> </bookmark> <bookmark id="helpindex_d1e9" branch="index" xml-lang="en"> <bookmark_value>translation; pagetranslate; options</bookmark_value> </bookmark> <paragraph id="hd_d1e15" role="heading" level="1" xml-lang="en">Translation settings</paragraph> <paragraph id="par_d1e18" role="paragraph" xml-lang="en">The options page can be found under <item type="gui">Tools</item> โ <item type="gui">Options</item> โ <item type="gui">๐</item> โ <item type="gui">Language Settings</item> โ <item type="gui">PageTranslate</item>.</paragraph> <paragraph id="sect_d1e37" role="section" xml-lang="en"> <paragraph id="hd_d1e39" role="heading" level="2" xml-lang="en">Translation service to use</paragraph> <paragraph id="par_d1e42" role="paragraph" xml-lang="en">There's a few built-in backends:</paragraph> <list id="terms_d1e45" xml-lang="en"> <listitem id="item_d1e47" xml-lang="en"> <emph>โ Google Translate</emph> <br/> <paragraph id="par_d1e52" role="paragraph" xml-lang="en"> <link href="https://translate.google.com/">Google Translate</link> is the default option, and suitable to both text selection and translating whole pages. Provides pretty good machine translations. It incurs some delays for longer texts, as each 1900 characters (sentences/paragraphs) have to be transfered individually (managed automatically, no user interaction necessary).</paragraph> </listitem> <listitem id="item_d1e58" xml-lang="en"> <emph>โ MyMemory</emph> <br/> <paragraph id="par_d1e63" role="paragraph" xml-lang="en">For <link href="https://mymemory.translated.net/">MyMemory</link> you should specify an email address in the according input box (though it's optional, it unlocks more requests). No longer requires the python-translate module, but <item type="fileitem">langdetect</item> (for supplying the correct source language). Which is why it sometimes fails, and possibly requires the Tools โ PageTranslate โ From โ To option. Doesn't yield quite as good machine translations. But it's an open source service. </paragraph> </listitem> <listitem id="item_d1e73" xml-lang="en"> <emph>โ Command line tool</emph> <br/> <paragraph id="par_d1e78" role="paragraph" xml-lang="en">Allows to send each text paragraph to a local application. To use it, set the command in the according input field again. Placeholders are `{lang}` for the target language, and `{text}` for the paragaphs or current text section. (Both get automatically escaped). For <item type="command">translate-cli</item> you might need the <item type="variable">-p</item> provider option as well. See also the <link href="https://pypi.org/project/translate/">translate-python documentation</link> on how to prepare a separate <item type="fileitem">~/.python-translate.cfg</item>. Or use <link href="https://github.com/nidhaloff/deep-translator">deep-translator cli</link> with for example <item type="command">deep_translator -trans "google" -src "auto" -tg {lang} -txt {text}</item>. </paragraph> </listitem> <listitem id="item_d1e101" xml-lang="en"> <emph>โ DeepL API</emph> <br/> <paragraph id="par_d1e106" role="paragraph" xml-lang="en">Utilizes the speedier <link href="https://www.deepl.com/pro">DeepL Pro API</link> to translate documents. As of yet untested. Requires an API key and paid subscription. No XML mode (to retain full inline formatting) yet, still translates each text segment/paragraph/sentence individually.</paragraph> </listitem> <listitem id="item_d1e113" xml-lang="en"> <emph>โ DeepL Free API</emph> <br/> <paragraph id="par_d1e118" role="paragraph" xml-lang="en">You can now get a free API key for limited usage (500K characters per month - around 1 or 2 documents per day). This secondary API might not be as well maintained. And signup still requires a credit card (use one of the privacy or temporary online credit card services).</paragraph> </listitem> <listitem id="item_d1e123" xml-lang="en"> <emph>โ DeepL web interface</emph> <br/> <paragraph id="par_d1e128" role="paragraph" xml-lang="en">Utilizes web scraping on the <link href="https://www.deepl.com/translator/">DeepL online translator</link>. Only suitable for testing and translating single paragraphs or text selection, because it quickly blocks with "error 429 - too many requests". It's also kinda redundant now that there's a Free API option.</paragraph> </listitem> </list> <paragraph id="par_d1e136" role="paragraph" xml-lang="en">Some provided via <item type="command">pip install <link href="https://pypi.org/project/translate/">translate-python</link> </item>:</paragraph> <list id="terms_d1e144" xml-lang="en"> <listitem id="item_d1e146" xml-lang="en"> <emph>โ Microsoft Translator</emph> <br/> <paragraph id="par_d1e151" role="paragraph" xml-lang="en">Requires an authorization key. There's also a free/test <link href="https://azure.microsoft.com/en-us/pricing/details/cognitive-services/translator/">subscription for an API key</link>. Not tested within PageTranslate yet.</paragraph> </listitem> </list> <paragraph id="par_d1e160" role="paragraph" xml-lang="en">And more via <item type="command">pip install <link href="https://pypi.org/project/deep-translator/">deep-translator</link> </item>:</paragraph> <list id="terms_d1e168" xml-lang="en"> <listitem id="item_d1e170" xml-lang="en"> <emph>โ QCRI Machine Translation</emph> <br/> <paragraph id="par_d1e175" role="paragraph" xml-lang="en">Requires a <link href="https://mt.qcri.org/api/">free API key</link>, but is suitable for whole-document translations. </paragraph> </listitem> <listitem id="item_d1e182" xml-lang="en"> <emph>โ Yandex Translation</emph> <br/> <paragraph id="par_d1e187" role="paragraph" xml-lang="en">Also requires its own <link href="https://translate.yandex.com/">API key</link>.</paragraph> </listitem> <listitem id="item_d1e194" xml-lang="en"> <emph>โ Linguee Dictionary</emph> <br/> <paragraph id="par_d1e199" role="paragraph" xml-lang="en">Performs word-wise <link href="https://www.linguee.com/">translation</link> lookups, so not suitable for translating whole documents, but just text selections. Albeit PageTranslate will split up sentences and pipe each word through the service; that won't yield a readable machine translation. </paragraph> </listitem> <listitem id="item_d1e206" xml-lang="en"> <emph>โ Pons Dictonary</emph> <br/> <paragraph id="par_d1e211" role="paragraph" xml-lang="en">Also is more of a <link href="https://de.pons.com/">dictionary</link> than a translation service. Suitable for text-selections, but probably not paragraphs or whole documents. PageTranslate will split-process longer selections word-wise through the Pons Translation interface.</paragraph> </listitem> </list> </paragraph> <paragraph id="sect_d1e220" role="section" xml-lang="en"> <paragraph id="hd_d1e222" role="heading" level="2" xml-lang="en">Parameters</paragraph> <list id="terms_d1e225" xml-lang="en"> <listitem id="item_d1e227" xml-lang="en"> <emph>API key</emph> <br/> <paragraph id="par_d1e232" role="paragraph" xml-lang="en">You can set an API or OAuth key for services that require one. The same input field serves for all backends, so you can't switch between them without also changing this entry first. (Not a common use case to have multiple API subscriptions really).</paragraph> </listitem> <listitem id="item_d1e236" xml-lang="en"> <emph>Email adr</emph> <br/> <paragraph id="par_d1e241" role="paragraph" xml-lang="en">An email address is only required by MyMemory. And strictly speaking it's not even required; it just allows for more translations.</paragraph> </listitem> <listitem id="item_d1e245" xml-lang="en"> <emph>Command</emph> <br/> <paragraph id="par_d1e250" role="paragraph" xml-lang="en">This field defines the CLI tool to use for translation. You can use something other than `translate-cli` or `deep-translator` of course. Placeholders like {lang} and {text} can be used here.</paragraph> </listitem> </list> </paragraph> <paragraph id="sect_d1e257" role="section" xml-lang="en"> <paragraph id="hd_d1e259" role="heading" level="2" xml-lang="en">Options / Flags</paragraph> <list id="terms_d1e262" xml-lang="en"> <listitem id="item_d1e264" xml-lang="en"> <emph>โ quick linebreak handling</emph> <br/> <paragraph id="par_d1e269" role="paragraph" xml-lang="en">Might speed up table processing with Google Translate, as it avoids sending each newline-split sentence separately. It simply conjoins multiple lines temporarily with <item type="command">"/#ยง/"</item> in place of a linebreak (and then rejoins them), so there are less requests. Primarily helps with tables, but less for documents with lengthy paragraphs.</paragraph> </listitem> <listitem id="item_d1e276" xml-lang="en"> <emph>โ also iterate over TextFrames</emph> <br/> <paragraph id="par_d1e281" role="paragraph" xml-lang="en">Handles normal and floating TextFrames. Those are essentially subdocuments in a Writer page. But you probably don't need this option for standard office documents.</paragraph> </listitem> <listitem id="item_d1e285" xml-lang="en"> <emph>โ super slow mode</emph> <br/> <paragraph id="par_d1e290" role="paragraph" xml-lang="en">Iterates over paragraph segments, to keep more inline formatting - but seriously harms mid-sentence translations. And currently the formatting still bleeds into adjoining paragraph segments, so not very useful in practice yet.</paragraph> </listitem> <listitem id="item_d1e294" xml-lang="en"> <emph>โ debug mode</emph> <br/> <paragraph id="par_d1e299" role="paragraph" xml-lang="en">Will fill up the <item type="fileitem">/tmp/pagetranslate-libreoffice.txt</item> log file quicker. Currently the debug mode is enabled by default anyway.</paragraph> </listitem> </list> </paragraph> </body> </helpdocument> |
Changes to pythonpath/translationbackends.py.
︙ | ︙ | |||
274 275 276 277 278 279 280 | # Registration is broken (error 10040 or whatever, "contact support" lel), even though # it seems to create an account regardless; but API yields SSL or connection errors. # Thus STILL UNTESTED. # class deepl_free_api(deepl_api): def __init__(self, params): self.params = params | | | 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 | # Registration is broken (error 10040 or whatever, "contact support" lel), even though # it seems to create an account regardless; but API yields SSL or connection errors. # Thus STILL UNTESTED. # class deepl_free_api(deepl_api): def __init__(self, params): self.params = params self.api_url = "https://api.deepl.com/v2/translate" # Translate-python # requires `pip install translate` # # ยท provides "microsoft" backend (requires OAuth secret in api_key) # |
︙ | ︙ | |||
300 301 302 303 304 305 306 | try: from translate import Translator except: log.error(format_exc()) raise Exception("Run `pip install translate` to use this module.") # interestingly this backend function might just work as is. | | | 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 | try: from translate import Translator except: log.error(format_exc()) raise Exception("Run `pip install translate` to use this module.") # interestingly this backend function might just work as is. if re.search("mymemory", params.get("backend", ""), re.I): self.translate = Translator( provider="mymemory", to_lang=params["lang"], email=params.get("email", "") ).translate else: self.translate = Translator( provider="microsoft", to_lang=params["lang"], secret_access_key=params["api_key"] ).translate |
︙ | ︙ | |||
322 323 324 325 326 327 328 | #linebreakwise = None # deep-translator # requires `pip install deep-translator` # ยท more backends than pytranslate, | | > | > > > | | > | | | > > | | > > | > | > | > | > > > > > > > | 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 | #linebreakwise = None # deep-translator # requires `pip install deep-translator` # ยท more backends than pytranslate, # though PONS+Linguee are just dictionaries # โ https://github.com/nidhaloff/deep-translator # class deep_translator(google): def __init__(self, params={}): # config+argparse self.params = params backend = params.get("backend", "Pons") source = self.coarse_lang(params.get("from", "auto")) target = self.coarse_lang(params.get("lang", "en")) # import import functools import deep_translator # map to backends / uniform decorators if re.search("linguee", backend, re.I): self.translate = self.from_words( deep_translator.LingueeTranslator(source=source, target=target).translate ) elif re.search("pons", backend, re.I): self.translate = self.from_words( deep_translator.PonsTranslator(source=source, target=target).translate ) elif re.search("QCRI", backend, re.I): self.translate = functools.partial( deep_translator.QCRI(params["api_key"]).translate, source=source, target=target ) elif re.search("yandex", backend, re.I): self.translate = functools.partial( deep_translator.YandexTranslator(params["api_key"]).translate, source=source, target=target ) # shorten language co-DE to just two-letter moniker def coarse_lang(self, id): if id.find("-") > 0: id = re.sub("(?<!zh)-\w+", "", id) return id # decorator to translate word-wise def from_words(self, fn): def translate(text): words = re.findall("(\w+)", text) words = { w: fn(w) for w in list(set(words)) } text = re.sub("(\w+)", lambda m: words.get(m[0], m[0]), text) |
︙ | ︙ |