LibreOffice plugin to pipe whole Writer documents through Google Translate, that ought to keep most of the page formatting.

⌈⌋ ⎇ branch:  PageTranslate


Check-in [c436e44ae3]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Restructure Addons.xcu to prepare for additional toolbar buttons. Fix =="auto" check. Wrap initial document checks into try/catch (for yet unsupported Draw docs).
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: c436e44ae3b2bd380232b66be22ad5398137f2c3
User & Date: mario 2020-05-04 13:32:42
Context
2020-05-04
13:52
Use gettempdir() for log file. check-in: 5611c1345a user: mario tags: trunk, 0.9
13:32
Restructure Addons.xcu to prepare for additional toolbar buttons. Fix =="auto" check. Wrap initial document checks into try/catch (for yet unsupported Draw docs). check-in: c436e44ae3 user: mario tags: trunk
2020-05-03
19:02
More langauge options in Extra> menu, explicit support for paragraph/system language targets; to unify option handling for textselection method. check-in: 4e60f20156 user: mario tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to META-INF/manifest.xml.

1
2
3
4
5
6
7
8
9
<?xml version="1.0" encoding="UTF-8"?>
<manifest:manifest>
 <manifest:file-entry manifest:full-path="pkg-desc/pkg-description.txt" manifest:media-type="application/vnd.sun.star.package-bundle-description"/>
 <manifest:file-entry manifest:full-path="pkg-desc/pkg-description.en" manifest:media-type="application/vnd.sun.star.package-bundle-description;locale=en"/>
 <manifest:file-entry manifest:media-type="application/vnd.sun.star.configuration-data"
                       manifest:full-path="registry/data/org/openoffice/Office/Addons.xcu"/>/>
  <manifest:file-entry manifest:media-type="application/vnd.sun.star.uno-component;type=Python" manifest:full-path="pagetranslate.py"/>
  <manifest:file-entry manifest:media-type="iamge/png" manifest:full-path="icons/flags.png"/>
</manifest:manifest>


|
<
|
<
|
|

1
2
3

4

5
6
7
<?xml version="1.0" encoding="UTF-8"?>
<manifest:manifest>
 <manifest:file-entry manifest:media-type="application/vnd.sun.star.package-bundle-description" manifest:full-path="pkg-desc/pkg-description.txt"/>

 <manifest:file-entry manifest:media-type="application/vnd.sun.star.configuration-data" manifest:full-path="registry/data/org/openoffice/Office/Addons.xcu"/>

 <manifest:file-entry manifest:media-type="application/vnd.sun.star.uno-component;type=Python" manifest:full-path="pagetranslate.py"/>
 <manifest:file-entry manifest:media-type="image/png" manifest:full-path="icons/flags.png"/>
</manifest:manifest>

Changes to description.xml.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22



23
<?xml version='1.0' encoding='UTF-8'?>
<description
 xmlns="http://openoffice.org/extensions/description/2006"
 xmlns:dep="http://openoffice.org/extensions/description/2006"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<identifier value="vnd.include-once.pagetranslate"/>
	<version value="0.8"/>
	  <display-name>
        <name lang="en">PageTranslate</name>
    </display-name>
	
	<dependencies>
		<OpenOffice.org-minimal-version value="3.0" dep:name="OpenOffice.org 3.0"/>
	</dependencies>
	<registration>
	<simple-license  accept-by="admin" default-license-id="en" suppress-on-update="true" suppress-if-required="true" >
		<license-text xlink:href="registration/lgpl-en.txt" lang="en" license-id="en" />
	</simple-license>
	</registration>
	<publisher>                                                          
		<name xlink:href="mailto:milky@users.sf.net" lang="en">Mario</name>
	</publisher>



</description>






|















>
>
>

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<?xml version='1.0' encoding='UTF-8'?>
<description
 xmlns="http://openoffice.org/extensions/description/2006"
 xmlns:dep="http://openoffice.org/extensions/description/2006"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<identifier value="vnd.include-once.pagetranslate"/>
	<version value="0.9"/>
	  <display-name>
        <name lang="en">PageTranslate</name>
    </display-name>
	
	<dependencies>
		<OpenOffice.org-minimal-version value="3.0" dep:name="OpenOffice.org 3.0"/>
	</dependencies>
	<registration>
	<simple-license  accept-by="admin" default-license-id="en" suppress-on-update="true" suppress-if-required="true" >
		<license-text xlink:href="registration/lgpl-en.txt" lang="en" license-id="en" />
	</simple-license>
	</registration>
	<publisher>                                                          
		<name xlink:href="mailto:milky@users.sf.net" lang="en">Mario</name>
	</publisher>
        <icon>
		<default xlink:href="icons/flags.png" />
	</icon>
</description>

Added icons/flags.png.

cannot compute difference between binary files

Added icons/flags_16.bmp.

cannot compute difference between binary files

Added icons/flags_26.bmp.

cannot compute difference between binary files

Changes to pagetranslate.py.

1
2
3
4
5
6
7
8
9
10
11
12

13
14
15
16

17

18
19
20
21






22
23
24
25
26
27
28
#!/usr/bin/python
# encoding: utf-8
# api: uno
# type: callback
# category: transform
# title: PageTranslate
# description: Action button to get whole Writer document translated
# version: 0.8
# state: experimental
# author: mario
# forked-from: TradutorLibreText (Claudemir de Almeida Rosa)
# license: GNU LGPL 2.1

# config: -
# 
# LibreOffice plugin for translating documents that's supposed to retain formatting.
# In contrast to the original extension does not use the text selection.

# But currently just translates anything to English. It's also incredibly slow,

# since all paragraphs are piped through the remote service one by one.
#
# Possibly going to reenable the cursor/selection mode later on. Perhaps even
# the language configurability.






#


# OpenOffice UNO bridge
import uno, unohelper
from com.sun.star.task import XJobExecutor
from com.sun.star.awt.MessageBoxButtons import BUTTONS_OK, BUTTONS_OK_CANCEL, BUTTONS_YES_NO, BUTTONS_YES_NO_CANCEL, BUTTONS_RETRY_CANCEL, BUTTONS_ABORT_IGNORE_RETRY




|


|
|

|

>



|
>
|
>
|

<
<
>
>
>
>
>
>







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22


23
24
25
26
27
28
29
30
31
32
33
34
35
#!/usr/bin/python
# encoding: utf-8
# api: uno
# type: callback
# category: language
# title: PageTranslate
# description: Action button to get whole Writer document translated
# version: 0.9
# state: beta
# author: mario
# url: https://fossil.include-once.org/pagetranslate/
# license: GNU LGPL 2.1
# forked-from: TradutorLibreText (Claudemir de Almeida Rosa)
# config: -
# 
# LibreOffice plugin for translating documents that's supposed to retain formatting.
# Per default does not require a text selection to operate, but works on the whole
# page. The original mode (TradutorLibreText) is still supported. But also uses the
# default target language (English).
#
# Additional operation modes/languagess are available through the Extra> add-on menu.
#


# Beware that Writer freezes during the dozens of translation calls to Google.
# In particular long documents might take ages, because each paragraph/line or
# text longer 1900 chars causes another roundtrip.
#
# Not yet tested with Draw or other document types.
# Always creates a log file: /tmp/pagetranslate-libreoffice.log
#


# OpenOffice UNO bridge
import uno, unohelper
from com.sun.star.task import XJobExecutor
from com.sun.star.awt.MessageBoxButtons import BUTTONS_OK, BUTTONS_OK_CANCEL, BUTTONS_YES_NO, BUTTONS_YES_NO_CANCEL, BUTTONS_RETRY_CANCEL, BUTTONS_ABORT_IGNORE_RETRY
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

60
61
62
63
64
65
66
if sys.platform != 'win32':
    import ssl
    myssl = ssl.create_default_context();
    myssl.check_hostname = False
    myssl.verify_mode = ssl.CERT_NONE
    ssl_args["context"] = myssl
http_headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux; LibreOffice/6.3; TradutorLibreText/1.3+PageTranslate/0.5)"
}
# log file
import logging
logging.basicConfig(filename='/tmp/pagetranslate-libreoffice.log', level=logging.DEBUG)
# regex
import re
rx_gtrans = re.compile('class="t0">(.+?)</div>', re.S)
rx_splitpara = re.compile("(.{1,1895\.}|.{1,1900}\s|.*$)", re.S)
rx_empty = re.compile("^[\s\d,.:;§():-]+$")
rx_letters = re.compile("\w\w+", re.UNICODE)





# Office plugin
class pagetranslate(unohelper.Base, XJobExecutor):








|










>







49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
if sys.platform != 'win32':
    import ssl
    myssl = ssl.create_default_context();
    myssl.check_hostname = False
    myssl.verify_mode = ssl.CERT_NONE
    ssl_args["context"] = myssl
http_headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux; LibreOffice/6.3), TradutorLibreText/1.3+PageTranslate/0.9"
}
# log file
import logging
logging.basicConfig(filename='/tmp/pagetranslate-libreoffice.log', level=logging.DEBUG)
# regex
import re
rx_gtrans = re.compile('class="t0">(.+?)</div>', re.S)
rx_splitpara = re.compile("(.{1,1895\.}|.{1,1900}\s|.*$)", re.S)
rx_empty = re.compile("^[\s\d,.:;§():-]+$")
rx_letters = re.compile("\w\w+", re.UNICODE)
rx_breakln = re.compile("\s?/\s?#\s?§\s?/\s?")




# Office plugin
class pagetranslate(unohelper.Base, XJobExecutor):

92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127

128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
        else:
            logging.warning("NO TRANSLATION RESULT EXTRACTED: " + html)
            logging.debug("ORIG TEXT: " + repr(text))
        return text

    # iterate over text segments (1900 char limit)        
    def translate(self, text, lang="auto"):
        if lang == auto:
            lang = self.params["lang"]
        #logging.debug("translate %d chars" % len(text))
        if len(text) < 2:
            logging.debug("skipping/len<2")
            return text
        elif rx_empty.match(text):
            logging.debug("skipping/empty")
            return text
        elif not rx_letters.search(text):
            logging.debug("skipping/noletters")
            return text
        elif len(text) >= 1900:
            logging.debug("spliterate/1900+")
            return " ".join(self.askgoogle(segment, lang) for segment in rx_splitpara.findall(text))
        else:
            return self.askgoogle(text, lang)
            
    # translate w/ preserving paragraph breaks (meant for table cell content)
    def linebreakwise(self, text, lang="auto"):
        return "\n\n".join(self.translate(text, lang) for text in text.split("\n\n"))
        # alternatively, use a temp placeholder '/#/'


    # invoked from toolbar button
    def trigger(self, args):
        logging.debug(".trigger(args=%s) invoked" % repr(args))
        self.argparse(args)
        # check for text selection, and switch to TradutorLibreText method then

        selection = self.document.getCurrentController().getSelection().getByIndex(0)
        if len(selection.getString()):
            return self.rewrite_selection(selection)
        # else iterate over text snippets
        tree = self.document.getText().createEnumeration()
        logging.info("TextDocument.Enumeration…")
        try:
            self.traverse(tree)
        except Exception as exc:
            logging.error(format_exc())
            self.MessageBox(formet_exc())
        logging.info("----")

    
    # break up UNO service: url query string (-> perhaps switch to JSON?)
    def argparse(self, args):
        # leading ?action&
        self.params["mode"] = re.findall("^(\w*)(?=&|$)", args)[0]
        # key=value pairs
        for pair in re.findall("(\w+)=([\w-]+)", args):
            self.params[pair[0]] = pair[1]
        # replace default locale







|




















|







>
|
|
|
|
|
|
<







|







100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142

143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
        else:
            logging.warning("NO TRANSLATION RESULT EXTRACTED: " + html)
            logging.debug("ORIG TEXT: " + repr(text))
        return text

    # iterate over text segments (1900 char limit)        
    def translate(self, text, lang="auto"):
        if lang == "auto":
            lang = self.params["lang"]
        #logging.debug("translate %d chars" % len(text))
        if len(text) < 2:
            logging.debug("skipping/len<2")
            return text
        elif rx_empty.match(text):
            logging.debug("skipping/empty")
            return text
        elif not rx_letters.search(text):
            logging.debug("skipping/noletters")
            return text
        elif len(text) >= 1900:
            logging.debug("spliterate/1900+")
            return " ".join(self.askgoogle(segment, lang) for segment in rx_splitpara.findall(text))
        else:
            return self.askgoogle(text, lang)
            
    # translate w/ preserving paragraph breaks (meant for table cell content)
    def linebreakwise(self, text, lang="auto"):
        return "\n\n".join(self.translate(text, lang) for text in text.split("\n\n"))
        # alternatively, use a temp placeholder '/#§/'


    # invoked from toolbar button
    def trigger(self, args):
        logging.debug(".trigger(args=%s) invoked" % repr(args))
        self.argparse(args)
        # check for text selection, and switch to TradutorLibreText method then
        try:
            selection = self.document.getCurrentController().getSelection().getByIndex(0)
            if len(selection.getString()):
                return self.rewrite_selection(selection)
            # else iterate over text snippets
            tree = self.document.getText().createEnumeration()
            logging.info("TextDocument.Enumeration…")

            self.traverse(tree)
        except Exception as exc:
            logging.error(format_exc())
            self.MessageBox(formet_exc())
        logging.info("----")

    
    # break up UNO service: url query string `.pagetranslate?page&lang=en`
    def argparse(self, args):
        # leading ?action&
        self.params["mode"] = re.findall("^(\w*)(?=&|$)", args)[0]
        # key=value pairs
        for pair in re.findall("(\w+)=([\w-]+)", args):
            self.params[pair[0]] = pair[1]
        # replace default locale

Changes to registry/data/org/openoffice/Office/Addons.xcu.

1
2
3

4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
<?xml version="1.0" encoding="UTF-8"?>
<oor:component-data xmlns:oor="http://openoffice.org/2001/registry" xmlns:xs="http://www.w3.org/2001/XMLSchema" oor:name="Addons" oor:package="org.openoffice.Office">
  <node oor:name="AddonUI">

    <node oor:name="OfficeMenuBarMerging">
      <node oor:name="PageTranslate.OfficeToolBar" oor:op="replace">
        <node oor:name="S1" oor:op="replace">
          <prop oor:name="MergePoint">
            <value>.uno:ToolsMenu\.uno:WordCountDialog</value>
          </prop>
          <prop oor:name="MergeCommand">
            <value>AddAfter</value>
          </prop>
          <prop oor:name="MergeFallback">
            <value>AddPath</value>
          </prop>
          <node oor:name="MenuItems">
            <node oor:name="M1" oor:op="replace">
              <prop oor:name="Title">
                <value xml:lang="en">PageTranslate</value>
                <!--value xml:lang="pt-BR">TradutorLibreText</value-->
              </prop>
              <prop oor:name="ImageIdentifier" oor:type="xs:string">
                <value>%origin%/icons/flags.png</value>
              </prop>
              <node oor:name="Submenu">
                <node oor:name="M2" oor:op="replace">
                  <prop oor:name="Context" oor:type="xs:string"><value/></prop>
                  <prop oor:name="URL" oor:type="xs:string"><value>service:org.openoffice.comp.pyuno.pagetranslate?trigger&amp;lang=en</value></prop>
                  <prop oor:name="Title" oor:type="xs:string"><value/><value xml:lang="en-US">→English</value></prop>
                  <prop oor:name="Target" oor:type="xs:string"><value>_self</value></prop>
                </node>



>



|
<
<
|
<
<
|
<
<




<

|
|
|







1
2
3
4
5
6
7
8


9


10


11
12
13
14

15
16
17
18
19
20
21
22
23
24
25
<?xml version="1.0" encoding="UTF-8"?>
<oor:component-data xmlns:oor="http://openoffice.org/2001/registry" xmlns:xs="http://www.w3.org/2001/XMLSchema" oor:name="Addons" oor:package="org.openoffice.Office">
  <node oor:name="AddonUI">

    <node oor:name="OfficeMenuBarMerging">
      <node oor:name="PageTranslate.OfficeToolBar" oor:op="replace">
        <node oor:name="S1" oor:op="replace">
          <prop oor:name="MergePoint"><value>.uno:ToolsMenu\.uno:WordCountDialog</value></prop>


          <prop oor:name="MergeCommand"><value>AddAfter</value></prop>


          <prop oor:name="MergeFallback"><value>AddPath</value></prop>


          <node oor:name="MenuItems">
            <node oor:name="M1" oor:op="replace">
              <prop oor:name="Title">
                <value xml:lang="en">PageTranslate</value>

              </prop>
              <!--prop oor:name="ImageIdentifier" oor:type="xs:string">
                <value>name.io.pagetranslate.image1</value>
              </prop-->
              <node oor:name="Submenu">
                <node oor:name="M2" oor:op="replace">
                  <prop oor:name="Context" oor:type="xs:string"><value/></prop>
                  <prop oor:name="URL" oor:type="xs:string"><value>service:org.openoffice.comp.pyuno.pagetranslate?trigger&amp;lang=en</value></prop>
                  <prop oor:name="Title" oor:type="xs:string"><value/><value xml:lang="en-US">→English</value></prop>
                  <prop oor:name="Target" oor:type="xs:string"><value>_self</value></prop>
                </node>
55
56
57
58
59
60
61

62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104


105
106
107
108




109

110




111
112


113
114
115
116
117
118
119
                </node>
              </node>
            </node>
          </node>
        </node>
      </node>
    </node>

    <node oor:name="OfficeToolbarMerging">
      <node oor:name="org.openoffice.test.testcomponent" oor:op="replace">
        <node oor:name="T1" oor:op="replace">
          <prop oor:name="MergeToolBar">
            <value>standardbar</value>
          </prop>
          <prop oor:name="MergePoint">
            <value>.uno:Forms</value>
          </prop>
          <prop oor:name="MergeCommand">
            <value>AddAfter</value>
          </prop>
          <prop oor:name="MergeFallback">
            <value>AddLast</value>
          </prop>
          <prop oor:name="MergeContext">
            <value/>
          </prop>
          <node oor:name="ToolBarItems">
            <node oor:name="m1" oor:op="replace">
              <prop oor:name="URL" oor:type="xs:string">
                <value>private:separator</value>
              </prop>
            </node>
            <node oor:name="m2" oor:op="replace">
              <prop oor:name="URL" oor:type="xs:string">
                <value>service:org.openoffice.comp.pyuno.pagetranslate?trigger&amp;lang=en</value>
              </prop>
              <prop oor:name="ImageIdentifier" oor:type="xs:string">
                <value/>
              </prop>
              <prop oor:name="Target" oor:type="xs:string">
                <value>_self</value>
              </prop>
              <prop oor:name="Context" oor:type="xs:string">
                <!--value/-->
                <value>com.sun.star.text.TextDocument,com.sun.star.drawing.DrawingDocument</value>
              </prop>
              <prop oor:name="ControlType" oor:type="xs:string">
                <value>Checkbutton</value>
              </prop>
              <prop oor:name="Title" oor:type="xs:string">
                <value/>


                <value xml:lang="pt-BR">Traduzir</value>
                <value xml:lang="en-US">T→🇬🇧 </value>
              </prop>
            </node>




            <node oor:name="m3" oor:op="replace">

              <prop oor:name="URL" oor:type="xs:string">




                <value>private:separator</value>
              </prop>


            </node>
          </node>
        </node>
      </node>
    </node>
  </node>
</oor:component-data>







>



|
<
<
|
<
<
|
<
<
|
<
<
|
<
<

|
|
<
<

|
<
|
<
|
<
<
|
<
<
|
<
<
<
|
<
<
|
|
>
>
|
<
|
|
>
>
>
>
|
>
|
>
>
>
>
|
|
>
>
|
|
|
|
|


49
50
51
52
53
54
55
56
57
58
59
60


61


62


63


64


65
66
67


68
69

70

71


72


73



74


75
76
77
78
79

80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
                </node>
              </node>
            </node>
          </node>
        </node>
      </node>
    </node>

    <node oor:name="OfficeToolbarMerging">
      <node oor:name="org.openoffice.test.testcomponent" oor:op="replace">
        <node oor:name="T1" oor:op="replace">
          <prop oor:name="MergeToolBar"><value>standardbar</value></prop>


          <prop oor:name="MergePoint"><value>.uno:Forms</value></prop>


          <prop oor:name="MergeCommand"><value>AddAfter</value></prop>


          <prop oor:name="MergeFallback"><value>AddLast</value></prop>


          <prop oor:name="MergeContext"><value/></prop>


          <node oor:name="ToolBarItems">
            <node oor:name="T2" oor:op="replace">
              <prop oor:name="URL" oor:type="xs:string"><value>private:separator</value></prop>


            </node>
            <node oor:name="T3" oor:op="replace">

              <prop oor:name="URL" oor:type="xs:string"><value>service:org.openoffice.comp.pyuno.pagetranslate?trigger&amp;lang=en</value></prop>

              <prop oor:name="ImageIdentifier" oor:type="xs:string"><value></value></prop>


              <prop oor:name="Target" oor:type="xs:string"><value>_self</value></prop>


              <prop oor:name="Context" oor:type="xs:string"><value>com.sun.star.text.TextDocument,com.sun.star.drawing.DrawingDocument</value></prop>



              <prop oor:name="ControlType" oor:type="xs:string"><value>Checkbutton</value></prop>


              <prop oor:name="Title" oor:type="xs:string"><value xml:lang="en-US">T→🇬🇧</value></prop>
            </node>
             <!--🇪🇺 -->
            <node oor:name="T5" oor:op="replace">
              <prop oor:name="URL" oor:type="xs:string"><value>private:separator</value></prop>

            </node>
          </node>
        </node>
      </node>
    </node>

    <node oor:name="Images">
      <node oor:name="name.io.pagetranslate.image1" oor:op="replace">
        <prop oor:name="URL">
            <value>service:org.openoffice.comp.pyuno.pagetranslate?trigger&amp;lang=en</value>
        </prop>
        <node oor:name="UserDefinedImages">
            <prop oor:name="ImageSmallURL" oor:type="xs:string">
                <value>%origin%/icons/flags_16.png</value>
            </prop>
            <prop oor:name="ImageBigURL" oor:type="xs:string">
                <value>%origin%/icons/flags_26.png</value>
            </prop>
        </node>
      </node>
    </node>

  </node>
</oor:component-data>