Auto wrap trans templatetags for Html

Please note that this is not a perfect solution. It does most of the works for you. But you'll have to do some clean-up manually afterward.
- Current implementation achieve around 80% coverage. And keeps original formatting.
- Error rate is around 15%

Recommended to use with version control. Do a git diff after translation.

#!/bin/env python3

import re, os, sys, html
from bs4 import BeautifulSoup
from bs4.element import Comment

try:
	target_file = sys.argv[ 1 ]
except IndexError:
	print( "Please specify the file" )
	sys.exit( 1 )

infinTigger = 10

def html_escape( text ):
	return html.escape( text ).replace( "×", "&times;" )

def auto_trans( file_path ):

	otext = None
	with open( file_path, 'r' ) as f:
		otext = f.read()

	loopGuard = None
	loopCount = 0

	while True:
		soup = BeautifulSoup( otext, "html.parser" )
		texts = soup.findAll( text = True )

		doBreak = True

		for element in texts:
			stext = element.strip()
			stext = html_escape( stext )
			if not stext or stext.startswith( "{" ):
				continue

			if stext.startswith( "&" ) and stext.endswith( ";" ):
				continue

			eType = type( element )
			if eType is Comment:
				continue

			if "{%" in stext and "%}" in stext:
				print( "WARINING: Partial trans[ " + stext + " ]")
				continue

			text_cont = element.parent
			oelement = str( element )
			otext = otext.replace( oelement, element.replace( stext, "{% trans '" + stext + "' %}" ) )

			if stext == loopGuard:
				loopCount = loopCount + 1
			else:
				loopCount = 0

			if infinTigger < loopCount:
				print( "Fatal Error, infinite loop occured on: %s -> %s" % ( text_cont, stext ) )
				print( "Possibly escaped text" )
				print( "Exiting." )
				sys.exit( 1 )

			loopGuard = stext

			doBreak = False
			break

		if doBreak:
			break

	with open( file_path, 'w' ) as f:
		f.write( otext )

auto_trans( target_file )

Concept

Replaces text to {% trans 'text' %}. Find visible text using Beautiful soup. To keep formatting, replaces are done through original text. And re-initialized with Beautiful soup after each replacement.

Drawback

You cannot have html tag name in your text elements. For example:

<div>
	div
</div>

The above will cause you trouble because the script replace all div to {% trans 'div' %}.
To counter this, you can add a final clean-up step to revert those changes.

Something like this:

output.replace( /<{%trans '([^']+)' %}/g, "<\1" );

You'll have to build your own list of special cases.

Tag(s): gettext django python3 templatetags trans bs4 i18n

斟酌鵬兄

Thu Apr 27 2017 10:18:25 GMT+0000 (Coordinated Universal Time)

Last modified: Sun May 28 2017 16:33:13 GMT+0000 (Coordinated Universal Time)