LibreOffice plugin to pipe whole Writer documents through Google Translate, that ought to keep most of the page formatting.

โŒˆโŒ‹ โŽ‡ branch:  PageTranslate


options

Translation settings

The options page can be found under Toolsย โ–ธ Optionsย โ–ธ ๐Ÿ—”ย โ–ธ Language Settingsย โ–ธ PageTranslate.
Or per shortcut in Toolsย โ–ธ PageTranslateย โ–ธ Options.

Translation service to use

Machine translations can vary wildly between different services. Which is why PageTranslate provides multiple backends for choice and as fallback:

Google Translate

Google Translate is the default option, and suitable to both text selection and translating whole pages. Provides pretty good machine translations. It incurs some delays for longer texts, as each 1900 characters (sentences/paragraphs) have to be transfered individually (managed automatically, no user interaction necessary).

MyMemory

For MyMemory you should specify an email address in the according input box (though it's optional, it unlocks more requests). No longer requires the python-translate module, but langdetect (for supplying the correct source language). Which is why it sometimes fails, and possibly requires the Tools โ†’ PageTranslate โ†’ From โžœ To option. Doesn't yield quite as good machine translations. But it's an open source service.

PONS Text Translation

With Pons you can also translate whole text documents. This service however requires an explicit source language (Fromโ†’To dialog). Autodetection in PageTranslate is somewhat frail. (Note that this is distinct from the Pons Dict support in deep-translator.)

Command line tool

Allows to send each text paragraph to a local application. To use it, set the command in the according input field again. Placeholders are {lang} for the target language, and {text} for the paragaphs or current text section. (Both get automatically escaped). For translate-cli you might need the -p provider option or a prepared ~/.python-translate.cfg for API keys.

ArgosTranslate (OpenNMT)

ArgosTranslate is an offline translation library based on CTranslate2 and OpenNMT models. It's thus independent from online services and connections, but requires prior setup. Specifically you need to run pip3 install argos-translate and argos-translate-gui to download language packs beforehand. And this usually just works with LibreOffice installations provided through Linux distro package managers (due to the way bundled Python is configured). You can utilize the cmdline tool in any case however. Notably this backend might be slower for long documents, but provides fairly good results.

DeepL API

Utilizes the speedier DeepL Pro API to translate documents. As of yet untested. Requires an API key and paid subscription. No XML mode (to retain full inline formatting) yet, still translates each text segment/paragraph/sentence individually.

DeepL Free API

You can now get a free API key for limited usage (500K characters per month - around 1 or 2 documents per day). This secondary API might not be as well maintained. And signup still requires a credit card (use one of the privacy or temporary online credit card services).

DeepL web interface

Utilizes web scraping on the DeepL online translator. Only suitable for testing and translating single paragraphs or text selection, because it quickly blocks with "error 429 - too many requests". It's also kinda redundant now that there's a Free API option.

GoogleApis Ajax Translate

Is basically just an alternative endpoint for Google Translate, which due to JSON/AJAX might work faster or more reliably even. (But it might just as well get blocked sooner for clients like this. This is built in merely as another fallback option.)

SYSTRAN translate Pro API

Systran is an established machine-translation service, which also offers various APIs. Unfortunately the test keys are worthless for testing; so not sure if this backend works at all.

Some are provided via translate-python (TP):

Microsoft Translator

Requires an authorization key. There's also a free/test subscription for an API key. Not tested within PageTranslate yet.

And more via deep-translator (DT). These won't work in OpenOffice 4.x due to its Python 2.7 runtime:

Yandex Translation

Also requires its own API key. (Unclear if you can still get one though). It's supposed to support automatic language detection, and provides a vast range of target languages.

QCRI Machine Translation

Requires a free API key, and only supports Arabic/Spanish/English translations. Also doesn't support auto-detection, and probably needs the From-To selection.

Papago Web Translator

Might be based on DeepL. Requires a client_id and secret_key in the API field - separated by colon [c123:pw678].

Linguee Dictionary

Performs word-wise translation lookups, so not suitable for translating whole documents, but just text selections. Albeit PageTranslate will split up sentences and pipe each word through the service; that won't yield a readable machine translation.

Pons Dictonary

Also is more of a dictionary than a translation service. Suitable for text-selections, but probably not paragraphs or whole documents. PageTranslate will split-process longer selections word-wise through the Pons Translation interface.

Some DT: entries are duplicates, and could be used as fallback / in case of errors.

Parameters

API key

You can set an API or OAuth key for services that require one. The same input field serves for all backends, so you can't switch between them without also changing this entry first. (Not a common use case to have multiple API subscriptions really. And ideally this would utilize .netrc, but nobody cares about standardization anymore.)

Email adr

An email address is only required by MyMemory. And strictly speaking it's not even required; it just allows for more translations.

Command

This field defines the CLI tool to use for translating. Placeholders can be noted with {text} curly braces, or shell $lang and %from% percent syntax. The Python translate, deep-translator and argos-translate packages provide CLI wrappers. Each having a sample configuration in the combobox dropdown.

Options / Flags

โ quick linebreak handling

Might speed up table processing with Google Translate, as it avoids sending each newline-split sentence separately. It simply conjoins multiple lines temporarily with "/#ยง/" in place of a linebreak (and then rejoins them), so there are less requests. Primarily helps with tables, but less for documents with lengthy paragraphs.

โ also iterate over TextFrames

Handles normal and floating TextFrames. Those are essentially subdocuments in a Writer page. But you probably don't need this option for standard office documents.

โ super slow mode

Iterates over paragraph segments, to keep more inline formatting - but seriously harms mid-sentence translations. And currently the formatting still bleeds into adjoining paragraph segments, so not very useful in practice yet.

โ˜‘ debug mode

Will fill up the /tmp/pagetranslate-libreoffice.txt log file quicker. Currently the debug mode is enabled by default anyway.

๐Ÿด button default behaviour/target language

locale

Per default uses the Office/system language as target.

paragraph

Uses the "paragraph" locale as set in the Writer/language status bar.

select

Always brings up the explicit Fromโ†’To๐Ÿ—บ language selection popup (useful for MyMemory or Pons backends).

en, de, it, fr, ...

You can set this field to any two-letter language code - to be used as default target.

mri-debug

Requires the MRI extension, and brings up an introspection dialog on the document when invoked.


Attachments:

  • options.png added by mario on 2021-05-13 18:48:54. [details]