LibreOffice plugin to pipe whole Writer documents through Google Translate, that ought to keep most of the page formatting.

⌈⌋ ⎇ branch:  PageTranslate


Check-in [c03c78cbb6]

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add preliminary support for Draw/Impress documents. Second toolbar button for translation to system language.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk | 1.0
Files: files | file ages | folders
SHA1: c03c78cbb623c485a14d9f147cb5ea752c89abe8
User & Date: mario 2020-05-06 17:08:55
Context
2020-05-08
19:47
version update before zip check-in: a11baf6627 user: mario tags: trunk
2020-05-06
17:08
Add preliminary support for Draw/Impress documents. Second toolbar button for translation to system language. check-in: c03c78cbb6 user: mario tags: trunk, 1.0
17:04
Addons.xcu image references require `vnd.sun.star.extension://vnd.include-once.pagetranslate/` instead of just `%origin%` to actually get picked up. Also, PNGs work just fine, no BMP required. Add secondary button to translate to system locale/language. check-in: b1775e90db user: mario tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to description.xml.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
<?xml version='1.0' encoding='UTF-8'?>
<description
 xmlns="http://openoffice.org/extensions/description/2006"
 xmlns:dep="http://openoffice.org/extensions/description/2006"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<identifier value="vnd.include-once.pagetranslate"/>
	<version value="0.9"/>
	  <display-name>
        <name lang="en">PageTranslate</name>
    </display-name>
	
	<dependencies>
		<OpenOffice.org-minimal-version value="3.0" dep:name="OpenOffice.org 3.0"/>
	</dependencies>
	<registration>
	<simple-license  accept-by="admin" default-license-id="en" suppress-on-update="true" suppress-if-required="true" >
		<license-text xlink:href="registration/lgpl-en.txt" lang="en" license-id="en" />
	</simple-license>
	</registration>
	<publisher>                                                          
		<name xlink:href="mailto:milky@users.sf.net" lang="en">Mario</name>
	</publisher>
        <icon>
		<default xlink:href="icons/flags.png" />
	</icon>
</description>
|
<
|
<
<
|
|
|
|
|
<
|
|
|
|
|
|
|
|
|
|
|
|
|
|

1

2


3
4
5
6
7

8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
<?xml version="1.0" encoding="UTF-8"?>

<description xmlns="http://openoffice.org/extensions/description/2006" xmlns:dep="http://openoffice.org/extensions/description/2006" xmlns:xlink="http://www.w3.org/1999/xlink">


  <identifier value="vnd.include-once.pagetranslate"/>
  <version value="1.0"/>
  <display-name>
    <name lang="en">PageTranslate</name>
  </display-name>

  <dependencies>
    <OpenOffice.org-minimal-version value="3.0" dep:name="OpenOffice.org 3.0"/>
  </dependencies>
  <registration>
    <simple-license accept-by="admin" default-license-id="en" suppress-on-update="true" suppress-if-required="true">
      <license-text xlink:href="registration/lgpl-en.txt" lang="en" license-id="en"/>
    </simple-license>
  </registration>
  <publisher>
    <name xlink:href="mailto:milky@users.sf.net" lang="en">Mario</name>
  </publisher>
  <icon>
    <default xlink:href="icons/flags.png"/>
  </icon>
</description>

Deleted icons/flags_16.bmp.

cannot compute difference between binary files

Added icons/flags_16.png.

cannot compute difference between binary files

Deleted icons/flags_26.bmp.

cannot compute difference between binary files

Added icons/flags_26.png.

cannot compute difference between binary files

Changes to pagetranslate.py.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/usr/bin/python
# encoding: utf-8
# api: uno
# type: callback
# category: language
# title: PageTranslate
# description: Action button to get whole Writer document translated
# version: 0.9
# state: beta
# author: mario
# url: https://fossil.include-once.org/pagetranslate/
# license: GNU LGPL 2.1
# forked-from: TradutorLibreText (Claudemir de Almeida Rosa)
# config: -
# 







|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/usr/bin/python
# encoding: utf-8
# api: uno
# type: callback
# category: language
# title: PageTranslate
# description: Action button to get whole Writer document translated
# version: 1.0
# state: beta
# author: mario
# url: https://fossil.include-once.org/pagetranslate/
# license: GNU LGPL 2.1
# forked-from: TradutorLibreText (Claudemir de Almeida Rosa)
# config: -
# 
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79





80
81
82
83
84
85
86
    myssl.check_hostname = False
    myssl.verify_mode = ssl.CERT_NONE
    ssl_args["context"] = myssl
http_headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux; LibreOffice/6.3), TradutorLibreText/1.3+PageTranslate/0.9"
}
# log file
import logging
logging.basicConfig(filename='%s/pagetranslate-libreoffice.log'%gettempdir(), level=logging.DEBUG)
# regex
import re
rx_gtrans = re.compile('class="t0">(.+?)</div>', re.S)
rx_splitpara = re.compile("(.{1,1895\.}|.{1,1900}\s|.*$)", re.S)
rx_empty = re.compile("^[\s\d,.:;§():-]+$")
rx_letters = re.compile("\w\w+", re.UNICODE)
rx_breakln = re.compile("\s?/\s?#\s?§\s?/\s?")




# Office plugin
class pagetranslate(unohelper.Base, XJobExecutor):


    def __init__(self, ctx):
        logging.info("init")
        self.params = {"mode":"page", "lang":"en"}





        self.ctx = ctx
        desktop = self.ctx.ServiceManager.createInstanceWithContext( "com.sun.star.frame.Desktop", self.ctx )
        self.document = desktop.getCurrentComponent()
        #self.dispatcher = self.ctx.ServiceManager.createInstanceWithContext("com.sun.star.frame.DispatchHelper", self.ctx)


    # request text translation from google







|
|
















|
|
>
>
>
>
>







53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
    myssl.check_hostname = False
    myssl.verify_mode = ssl.CERT_NONE
    ssl_args["context"] = myssl
http_headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux; LibreOffice/6.3), TradutorLibreText/1.3+PageTranslate/0.9"
}
# log file
import logging as log
log.basicConfig(filename='%s/pagetranslate-libreoffice.log'%gettempdir(), level=log.DEBUG)
# regex
import re
rx_gtrans = re.compile('class="t0">(.+?)</div>', re.S)
rx_splitpara = re.compile("(.{1,1895\.}|.{1,1900}\s|.*$)", re.S)
rx_empty = re.compile("^[\s\d,.:;§():-]+$")
rx_letters = re.compile("\w\w+", re.UNICODE)
rx_breakln = re.compile("\s?/\s?#\s?§\s?/\s?")




# Office plugin
class pagetranslate(unohelper.Base, XJobExecutor):


    def __init__(self, ctx):
        log.info("init")
        self.params = dict(
            mode = "page",
            lang = "en",
            crlf = "iterate",
            log = "debug"
        )
        self.ctx = ctx
        desktop = self.ctx.ServiceManager.createInstanceWithContext( "com.sun.star.frame.Desktop", self.ctx )
        self.document = desktop.getCurrentComponent()
        #self.dispatcher = self.ctx.ServiceManager.createInstanceWithContext("com.sun.star.frame.DispatchHelper", self.ctx)


    # request text translation from google
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127


128

129



130
131
132
133
134
135
136
137







138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161

162
163
164
165
166

167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187



188





189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248

249
250

        # extract content from text <div>
        m = rx_gtrans.search(html)
        if m:
            text = m.group(1)
            text = text.replace("&#39;", "'").replace("&amp;", "&").replace("&lt;", "<").replace("&gt;", ">").replace("&quot;", '"')
            #@todo: https://stackoverflow.com/questions/2087370/decode-html-entities-in-python-string
        else:
            logging.warning("NO TRANSLATION RESULT EXTRACTED: " + html)
            logging.debug("ORIG TEXT: " + repr(text))
        return text

    # iterate over text segments (1900 char limit)        
    def translate(self, text, lang="auto"):
        if lang == "auto":
            lang = self.params["lang"]
        #logging.debug("translate %d chars" % len(text))
        if len(text) < 2:
            logging.debug("skipping/len<2")
            return text
        elif rx_empty.match(text):
            logging.debug("skipping/empty")
            return text
        elif not rx_letters.search(text):
            logging.debug("skipping/noletters")
            return text
        elif len(text) >= 1900:
            logging.debug("spliterate/1900+")
            return " ".join(self.askgoogle(segment, lang) for segment in rx_splitpara.findall(text))
        else:
            return self.askgoogle(text, lang)
            
    # translate w/ preserving paragraph breaks (meant for table cell content)
    def linebreakwise(self, text, lang="auto"):


        return "\n\n".join(self.translate(text, lang) for text in text.split("\n\n"))

        # alternatively, use a temp placeholder '/#§/'





    # invoked from toolbar button
    def trigger(self, args):
        logging.debug(".trigger(args=%s) invoked" % repr(args))
        self.argparse(args)
        # check for text selection, and switch to TradutorLibreText method then
        try:







            selection = self.document.getCurrentController().getSelection().getByIndex(0)
            if len(selection.getString()):
                return self.rewrite_selection(selection)
            # else iterate over text snippets
            tree = self.document.getText().createEnumeration()
            logging.info("TextDocument.Enumeration…")
            self.traverse(tree)
        except Exception as exc:
            logging.error(format_exc())
            self.MessageBox(formet_exc())
        logging.info("----")

    
    # break up UNO service: url query string `.pagetranslate?page&lang=en`
    def argparse(self, args):
        # leading ?action&
        self.params["mode"] = re.findall("^(\w*)(?=&|$)", args)[0]
        # key=value pairs
        for pair in re.findall("(\w+)=([\w-]+)", args):
            self.params[pair[0]] = pair[1]
        # replace default locale
        if self.params.get("lang","-") == "locale":
            self.params["lang"] = self.getOoLocale()
        # log

        logging.info(repr(self.params))


    # iterate over TextContent/TextTable nodes
    def traverse(self, tree):

        while tree.hasMoreElements():
            para = tree.nextElement()
            logging.info(para)
            # table/cells
            if para.supportsService("com.sun.star.text.TextTable"):
                for cellname in para.getCellNames():
                    logging.debug(cellname)
                    text = para.getCellByName(cellname).getText()
                    #self.traverse(text.createEnumeration())
                    text.setString(self.linebreakwise(text.getString())) # or .translate #linebreakwise
                pass
            # normal flow text
            elif para.supportsService("com.sun.star.text.TextContent"):
                text = para.getString()
                text = self.translate(text)
                para.setString(text)
                # the paragraph itself can be enumerated for text portions,
                # but for now it's really slow enough
            else:
                logging.warning("Unsupported document element.")
        #logging.info(para.getSelection().getByIndex(0).getString())



        pass







    # TradutorLibreText (selection rewrite)
    def rewrite_selection(self, xTextRange):
        logging.info("rewrite text selection")

        # Get selected text
        string = xTextRange.getString()
        if self.params["lang"] == "paragraph":
            self.params["lang"] = xTextRange.CharLocale.Language
        elif self.params["mode"] == "tradutor":
            code = self.getOoLocale()
            self.params["lang"] = self.getParaLang(xTextRange).Language

        try:
            trans = self.linebreakwise(string)
            trans = trans.replace('\\n',"\n").replace('\\r',"\n")
            xTextRange.setString(trans)

        except Exception as e:
            try:
                self.MessageBox(str(e))
            except Exception as e:
                logging.info(e)

    # Query system locale
    def getOoLocale(self):
        self.language = self.ctx.ServiceManager.createInstanceWithContext("com.sun.star.i18n.LocaleData", self.ctx)
        self.lang = self.ctx.ServiceManager.createInstanceWithContext("com.sun.star.configuration.ConfigurationProvider", self.ctx)
        properties = []
        arg = PropertyValue()
        arg.Name = "nodepath"
        arg.Value = "/org.openoffice.Setup/L10N"
        properties.append(arg)
        properties = tuple(properties)
        code = self.lang.createInstanceWithArguments("com.sun.star.configuration.ConfigurationAccess", properties).getByName("ooLocale")
        logging.info("ooLocale="+repr(code))
        return code

    # Langinfo=(com.sun.star.i18n.LanguageCountryInfo){ Language = (string)"de", LanguageDefaultName = (string)"German", Country = (string)"DE", CountryDefaultName = (string)"Germany", Variant = (string)"" }
    def getParaLang(self, xTextRange):
        Langinfo = self.language.getLanguageCountryInfo(xTextRange.CharLocale)
        logging.info("Langinfo="+repr(Langinfo))
        return Langinfo

    # user notifications
    def MessageBox(self,MsgText, MsgTitle="", MsgType=MESSAGEBOX, MsgButtons=BUTTONS_OK):
        ParentWin = self.document.getCurrentController().Frame.ContainerWindow
        ctx = uno.getComponentContext()
        sm = ctx.ServiceManager
        sv = sm.createInstanceWithContext("com.sun.star.awt.Toolkit", ctx)
        myBox = sv.createMessageBox(ParentWin, MsgType, MsgButtons, MsgTitle, MsgText)
        return myBox.execute()



# register with LibreOffice
g_ImplementationHelper = unohelper.ImplementationHelper()

g_ImplementationHelper.addImplementation(pagetranslate,

        "org.openoffice.comp.pyuno.pagetranslate",
        ("com.sun.star.task.Job",),)








|
|






|

|


|


|


|






>
>
|
>
|
>
>
>




|

<

>
>
>
>
>
>
>





<


|

|













>
|




>


|



|












|
|
>
>
>
|
>
>
>
>
>




|


















|












|





|















<
|
>
|
|
>
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146

147
148
149
150
151
152
153
154
155
156
157
158
159

160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272

273
274
275
276
277
        # extract content from text <div>
        m = rx_gtrans.search(html)
        if m:
            text = m.group(1)
            text = text.replace("&#39;", "'").replace("&amp;", "&").replace("&lt;", "<").replace("&gt;", ">").replace("&quot;", '"')
            #@todo: https://stackoverflow.com/questions/2087370/decode-html-entities-in-python-string
        else:
            log.warning("NO TRANSLATION RESULT EXTRACTED: " + html)
            log.debug("ORIG TEXT: " + repr(text))
        return text

    # iterate over text segments (1900 char limit)        
    def translate(self, text, lang="auto"):
        if lang == "auto":
            lang = self.params["lang"]
        #log.debug("translate %d chars" % len(text))
        if len(text) < 2:
            log.debug("skipping/len<2")
            return text
        elif rx_empty.match(text):
            log.debug("skipping/empty")
            return text
        elif not rx_letters.search(text):
            log.debug("skipping/noletters")
            return text
        elif len(text) >= 1900:
            log.debug("spliterate/1900+")
            return " ".join(self.askgoogle(segment, lang) for segment in rx_splitpara.findall(text))
        else:
            return self.askgoogle(text, lang)
            
    # translate w/ preserving paragraph breaks (meant for table cell content)
    def linebreakwise(self, text, lang="auto"):
        if self.params["crlf"] != "quick":
            # split on linebreaks and translate each individually
            text = "\n\n".join(self.translate(text, lang) for text in text.split("\n\n"))
        else:
            # use temporary placeholder `/#§/`
            text = self.translate(text.replace("\n\n", "/#$/"), lang)
            text = re.sub(rx_breakln, "\n\n", text)
        return text


    # invoked from toolbar button
    def trigger(self, args):
        log.debug(".trigger(args=%s) invoked" % repr(args))
        self.argparse(args)

        try:
            log.debug(dir(self.document))
            # Draw/Impress?
            if self.document.supportsService("com.sun.star.drawing.DrawingDocument") or self.document.supportsService("com.sun.star.presentation.PresentationDocument"):
                log.info(self.document)
                self.drawtranslate(self.document.getDrawPages())
                return
            # check for text selection, and switch to TradutorLibreText method then
            selection = self.document.getCurrentController().getSelection().getByIndex(0)
            if len(selection.getString()):
                return self.rewrite_selection(selection)
            # else iterate over text snippets
            tree = self.document.getText().createEnumeration()

            self.traverse(tree)
        except Exception as exc:
            log.error(format_exc())
            self.MessageBox(formet_exc())
        log.info("----")

    
    # break up UNO service: url query string `.pagetranslate?page&lang=en`
    def argparse(self, args):
        # leading ?action&
        self.params["mode"] = re.findall("^(\w*)(?=&|$)", args)[0]
        # key=value pairs
        for pair in re.findall("(\w+)=([\w-]+)", args):
            self.params[pair[0]] = pair[1]
        # replace default locale
        if self.params.get("lang","-") == "locale":
            self.params["lang"] = self.getOoLocale()
        # log
        #log.basicConfig(level=log.__dict__[params["log"].upper()])
        log.info(repr(self.params))


    # iterate over TextContent/TextTable nodes
    def traverse(self, tree):
        log.info("TextDocument.Enumeration…")
        while tree.hasMoreElements():
            para = tree.nextElement()
            log.info(para)
            # table/cells
            if para.supportsService("com.sun.star.text.TextTable"):
                for cellname in para.getCellNames():
                    log.debug(cellname)
                    text = para.getCellByName(cellname).getText()
                    #self.traverse(text.createEnumeration())
                    text.setString(self.linebreakwise(text.getString())) # or .translate #linebreakwise
                pass
            # normal flow text
            elif para.supportsService("com.sun.star.text.TextContent"):
                text = para.getString()
                text = self.translate(text)
                para.setString(text)
                # the paragraph itself can be enumerated for text portions,
                # but for now it's really slow enough
            else:
                log.warning("Unsupported document element.")

    # iterate over DrawPages and TextShapes
    def drawtranslate(self, pages):
        for pi in range(0, pages.getCount()):
            page = pages.getByIndex(pi)
            for si in range(0, page.getCount()):
                shape = page.getByIndex(si)
                if shape.supportsService("com.sun.star.drawing.TextShape"):
                    log.info(shape)
                    shape.Text.setString(self.translate(shape.Text.getString()))


    # TradutorLibreText (selection rewrite)
    def rewrite_selection(self, xTextRange):
        log.info("rewrite text selection")

        # Get selected text
        string = xTextRange.getString()
        if self.params["lang"] == "paragraph":
            self.params["lang"] = xTextRange.CharLocale.Language
        elif self.params["mode"] == "tradutor":
            code = self.getOoLocale()
            self.params["lang"] = self.getParaLang(xTextRange).Language

        try:
            trans = self.linebreakwise(string)
            trans = trans.replace('\\n',"\n").replace('\\r',"\n")
            xTextRange.setString(trans)

        except Exception as e:
            try:
                self.MessageBox(str(e))
            except Exception as e:
                log.info(e)

    # Query system locale
    def getOoLocale(self):
        self.language = self.ctx.ServiceManager.createInstanceWithContext("com.sun.star.i18n.LocaleData", self.ctx)
        self.lang = self.ctx.ServiceManager.createInstanceWithContext("com.sun.star.configuration.ConfigurationProvider", self.ctx)
        properties = []
        arg = PropertyValue()
        arg.Name = "nodepath"
        arg.Value = "/org.openoffice.Setup/L10N"
        properties.append(arg)
        properties = tuple(properties)
        code = self.lang.createInstanceWithArguments("com.sun.star.configuration.ConfigurationAccess", properties).getByName("ooLocale")
        log.info("ooLocale="+repr(code))
        return code

    # Langinfo=(com.sun.star.i18n.LanguageCountryInfo){ Language = (string)"de", LanguageDefaultName = (string)"German", Country = (string)"DE", CountryDefaultName = (string)"Germany", Variant = (string)"" }
    def getParaLang(self, xTextRange):
        Langinfo = self.language.getLanguageCountryInfo(xTextRange.CharLocale)
        log.info("Langinfo="+repr(Langinfo))
        return Langinfo

    # user notifications
    def MessageBox(self,MsgText, MsgTitle="", MsgType=MESSAGEBOX, MsgButtons=BUTTONS_OK):
        ParentWin = self.document.getCurrentController().Frame.ContainerWindow
        ctx = uno.getComponentContext()
        sm = ctx.ServiceManager
        sv = sm.createInstanceWithContext("com.sun.star.awt.Toolkit", ctx)
        myBox = sv.createMessageBox(ParentWin, MsgType, MsgButtons, MsgTitle, MsgText)
        return myBox.execute()



# register with LibreOffice
g_ImplementationHelper = unohelper.ImplementationHelper()

g_ImplementationHelper.addImplementation(
    pagetranslate,
    "org.openoffice.comp.pyuno.pagetranslate",
    ("com.sun.star.task.Job",),
)

Deleted pkg-desc/pkg-description.en.

1
Tool to translate texts in several languages
<


Changes to pkg-desc/pkg-description.txt.

1
Tool to translate texts in several languages
|
1
Translate document or text selection into English (or other languages)