LibreOffice plugin to pipe whole Writer documents through Google Translate, that ought to keep most of the page formatting.

βŒˆβŒ‹ βŽ‡ branch:  PageTranslate


Artifact [828a96105e]

Artifact 828a96105e1e188ff96323cf683c770055f0a0cf:

  • File help/en/vnd.include-once.pagetranslate/config.page — part of check-in [2606a632a2] at 2022-10-19 12:43:49 on branch trunk — DDG was merged into tb (user: mario size: 13508)

<?xml version="1.0"?>
<?xml-stylesheet href="./mallard2xhp.xsl" type="text/xsl"?>
  <!--%origin%/help/en/vnd.include-once.pagetranslate/-->
<page
    xmlns="http://projectmallard.org/1.0/"
    type="topic" group="first"
    id="config">

<info>
    <link type="guide" xref="OptionsPageTranslate"/>
    <link type="index" xref="translation; pagetranslate; options"/>
    <desc>PageTranslate settings</desc>
</info>

<title>Translation settings</title>
<p>The options page can be found under <guiseq><gui>Tools</gui>
<gui>Options</gui> <gui>πŸ—”</gui> <gui>Language Settings</gui>
<gui>PageTranslate</gui></guiseq>.
Or per shortcut in <guiseq><gui>Tools</gui><gui>PageTranslate</gui><gui>Options</gui></guiseq>.
<media type="image" src="https://fossil.include-once.org/pagetranslate/raw/d35739991b9e453b352ca83e44c2f7f5c7383927?m=image/png" mime="image/png" />
</p>

<section id="service">
  <title>Translation service to use</title>
  <p>Machine translations can vary wildly between different services. Which is why
  PageTranslate provides multiple backends for choice and as fallback:</p>
  <terms>
    <item>
      <title>Google Translate</title>
      <p><link href="https://translate.google.com/">Google Translate</link>
      is the default option, and suitable to both text selection and
      translating whole pages.  Provides pretty good machine translations. 
      It incurs some delays for longer texts, as each 1900 characters
      (sentences/paragraphs) have to be transfered individually (managed
      automatically, no user interaction necessary).</p>
    </item>
    <item>
      <title>MyMemory</title>
      <p>For <link href="https://mymemory.translated.net/">MyMemory</link>
      you should specify an email address in the according input box (though
      it's optional, it unlocks more requests).  No longer requires the
      python-translate module, but <file>langdetect</file> (for supplying
      the correct source language).  Which is why it sometimes fails, and
      possibly requires the Tools β†’ PageTranslate β†’ From ➜ To option. 
      Doesn't yield quite as good machine translations.  But it's an open
      source service.  </p>
    </item>
    <item>
      <title>PONS Text Translation</title>
      <p>With <link href="https://en.pons.com/text-translation">Pons</link>
      you can also translate whole text documents. This service however
      requires an explicit source language (From→To dialog). Autodetection
      in PageTranslate is somewhat frail. (Note that this is distinct from
      the Pons Dict support in deep-translator.)</p>
    </item>
    <item>
      <title>Command line tool</title>
      <p>Allows to send each text paragraph to a local application.  To use
      it, set the command in the according input field again.  Placeholders
      are `{lang}` for the target language, and `{text}` for the paragaphs
      or current text section.  (Both get automatically escaped).  For
      <cmd>translate-cli</cmd> you might need the <var>-p</var> provider
      option or a prepared <file>~/.python-translate.cfg</file> for API keys.
      </p>
    </item>
    <item>
      <title>ArgosTranslate (OpenNMT)</title>
      <p>ArgosTranslate is an offline translation library based on
      CTranslate2 and OpenNMT models. It's thus independent from online
      services and connections, but requires prior setup. Specifically
      you need to run <cmd>pip3 install argos-translate</cmd> and
      <cmd>argos-translate-gui</cmd> to download language packs beforehand.
      And this usually just works with LibreOffice installations provided
      through Linux distro package managers (due to the way bundled Python
      is configured). You can utilize the cmdline tool in any case however.
      Notably this backend might be slower for long documents,
      but provides fairly good results.
      </p>
    </item>
    <item>
      <title>DeepL API</title>
      <p>Utilizes the speedier <link href="https://www.deepl.com/pro">DeepL
      Pro API</link> to translate documents.  As of yet untested.  Requires
      an API key and paid subscription.  No XML mode (to retain full inline
      formatting) yet, still translates each text segment/paragraph/sentence
      individually.</p>
    </item>
    <item>
      <title>DeepL Free API</title>
      <p>You can now get a free API key for limited usage (500K characters
      per month - around 1 or 2 documents per day).  This secondary API
      might not be as well maintained.  And signup still requires a credit
      card (use one of the privacy or temporary online credit card
      services).</p>
    </item>
    <item>
      <title>DeepL web interface</title>
      <p>Utilizes web scraping on the <link
      href="https://www.deepl.com/translator/">DeepL online
      translator</link>.  Only suitable for testing and translating single
      paragraphs or text selection, because it quickly blocks with "error
      429 - too many requests".  It's also kinda redundant now that there's
      a Free API option.</p>
    </item>
    <item>
      <title>GoogleApis Ajax Translate</title>
      <p>Is basically just an alternative endpoint for Google Translate,
      which due to JSON/AJAX might work faster or more reliably even. (But
      it might just as well get blocked sooner for clients like this. This
      is built in merely as another fallback option.)</p>
    </item>
    <item>
      <title>SYSTRAN translate Pro API</title>
      <p>Systran is an established machine-translation service, which also
      offers various APIs. Unfortunately the test keys are worthless for testing;
      so not sure if this backend works at all.</p>
    </item>
    <item>
      <title>LibreTranslate</title>
      <p>LibreTranslate is an instance of ArgosTranslate/OpenNMT. The
      <link href="https://libretranslate.de/">free version</link>
      only permits only a handful of requests before blocking. (Requests
      from PageTranslate cycle across alternative instances). The
      <link href="https://libretranslate.com/">paid option</link>
      might not be worth the money over running a
      <link href="https://github.com/LibreTranslate/LibreTranslate">local instance</link>.
      (There's a config dialog option for localhost:5000 e.g.)</p>
    </item>
    <item>
      <title>DuckDuckGo</title>
      <p>DDG provides a compact interface for 
      <link href="https://duckduckgo.com/?q=translate">text translations</link>,
      which utilizes Microsoft Translator beind the
      scenes, but with an added privacy proxy/instance from DDG.
      Might not be usable for excessive queries / long documents. But
      certainly an interesting alternative.</p>
    </item>
  </terms>
  <p>Some are provided via <link href="https://pypi.org/project/translate/">translate-python</link> (TP):</p>
  <terms>
    <item>
      <title>Microsoft Translator</title>
      <p>Requires an authorization key. There's also a free/test <link
      href="https://azure.microsoft.com/en-us/pricing/details/cognitive-services/translator/">subscription
      for an API key</link>.  Not tested within PageTranslate yet.</p>
    </item>
  </terms>
  <p>And more via <link href="https://pypi.org/project/deep-translator/">deep-translator</link> (DT). These won't work in OpenOffice 4.x due to its Python 2.7 runtime:</p>
  <terms>
    <item>
      <title>Yandex Translation</title> <p>Also requires its own <link
      href="https://translate.yandex.com/">API key</link>. (Unclear if you
      can still get one though). It's supposed to support automatic language
      detection, and provides a vast range of target languages.</p>
    </item>
    <item>
      <title>QCRI Machine Translation</title>
      <p>Requires a <link href="https://mt.qcri.org/api/">free API
      key</link>, and only supports Arabic/Spanish/English translations.
      Also doesn't support auto-detection, and probably needs the From-To
      selection.</p>
    </item>
    <item>
      <title>Papago Web Translator</title>
      <p>Might be based on DeepL. Requires a client_id and secret_key in the
      API field - separated by colon [<cmd>c123:pw678</cmd>].</p>
    </item>
    <item>
      <title>Linguee Dictionary</title>
      <p>Performs word-wise <link
      href="https://www.linguee.com/">translation</link> lookups, so not
      suitable for translating whole documents, but just text selections. 
      Albeit PageTranslate will split up sentences and pipe each word
      through the service; that won't yield a readable machine translation. 
      </p>
    </item>
    <item>
      <title>Pons Dictonary</title>
      <p>Also is more of a <link href="https://de.pons.com/">dictionary</link>
      than a translation service.  Suitable for text-selections, but
      probably not paragraphs or whole documents.  PageTranslate will
      split-process longer selections word-wise through the Pons Translation
      interface.</p>
    </item>
  </terms>
  <p>Some DT: entries are duplicates, and could be used as fallback / in case of errors.</p>
</section>

<section id="params">
  <title>Parameters</title>
  <terms>
    <item>
      <title>API key</title>
      <p>You can set an API or OAuth key for services that require one.  The
      same input field serves for all backends, so you can't switch between
      them without also changing this entry first.  (Not a common use case
      to have multiple API subscriptions really. And ideally this would utilize
      <file>.netrc</file>, but nobody cares about standardization anymore.)</p>
    </item>
    <item>
      <title>Email adr</title>
      <p>An email address is only required by MyMemory.  And strictly
      speaking it's not even required; it just allows for more
      translations.</p>
    </item>
    <item>
      <title>Command</title>
      <p>This field defines the CLI tool to use for translating. Placeholders
      can be noted with {text} curly braces, or shell $lang and %from% percent
      syntax. The Python
      <link href="https://pypi.org/project/translate/">translate</link>,
      <link href="https://pypi.org/project/deep-translator/">deep-translator</link> and
      <link href="https://pypi.org/project/argostranslate/">argos-translate</link>
      packages provide CLI wrappers. Each having a sample configuration in the combobox
      dropdown.</p>
    </item>
  </terms>
</section>

<section id="flags">
  <title>Options / Flags</title>
  <terms>
    <item>
      <title>❏ quick linebreak handling</title>
      <p>Might speed up table processing with Google Translate, as it avoids sending each newline-split sentence separately.
      It simply conjoins multiple lines temporarily with <cmd>"/#Β§/"</cmd> in place of a
      linebreak (and then rejoins them), so there are less requests. Primarily helps with
      tables, but less for documents with lengthy paragraphs.</p>
    </item>
    <item>
      <title>❏ also iterate over TextFrames</title>
      <p>Handles normal and floating TextFrames. Those are essentially subdocuments in a Writer page.
      But you probably don't need this option for standard office documents.</p>
    </item>
    <item>
      <title>❏ slow mode</title>
      <p>Iterates over paragraph segments, to keep more inline formatting - but seriously harms mid-sentence translations.
      And currently the formatting still bleeds into adjoining paragraph segments, so not very useful in practice yet.</p>
    </item>
    <item>
      <title>❏ selection-only mode</title>
      <p>Disables the whole document translation mode. This turns PageTranslate into the original Tradutor plugin,
      which might be useful to prevent long waiting times for big documents when only a partial translation was wanted.
      Will simply trigger a warning if no paragraph selection is active.</p>
    </item>
    <item>
      <title>❏ original text annotations</title>
      <p>This option will inject lots of comments for translated text sections. It'll usually be present per paragraph,
      or table cell; but may intersperse comment markers for long lines (1900 characters, if API breakup requires it).
      </p>
    </item>
    <item>
      <title>β˜‘ debug mode</title>
      <p>Will fill up the <file>/tmp/pagetranslate-libreoffice.txt</file> log file quicker.
      Currently the debug mode is enabled by default anyway.</p>
    </item>
  </terms>
</section>

<section id="flag_action">
  <title>&#x1F3F4; button default behaviour/target language</title>
  <terms>
    <item>
      <title>locale</title>
      <p> Per default uses the Office/system language as target. </p>
    </item>
    <item>
      <title>paragraph</title>
      <p> Uses the "paragraph" locale as set in the Writer/language status bar. </p>
    </item>
    <item>
      <title>select</title>
      <p> Always brings up the explicit Fromβ†’ToπŸ—Ί  language selection popup (useful for MyMemory or Pons backends).</p>
    </item>
    <item>
      <title>en, de, it, fr, ...</title>
      <p> You can set this field to any two-letter language code - to be used as default target. </p>
    </item>
    <item>
      <title>backend=GoogleAjax&amp;from=auto&amp;lang=es</title>
      <p> You can also set this (v2.1) to a combination of arguments. Use query string format key=value with &amp; conjunctions.
      Arguments can be any of target lang=XY and the backend=DuckDuck (fuzzy title matching) or even quick=1 mode or selectonly=1 flags. </p>
    </item>
  </terms>
</section>

</page>