Python : Telegram Bot Webhook Handler : A Multi-Bot/Multi-Language Design (8)


Multiple-Language Functionality

In this post, we will add the multiple languages functionality to the polybot project so that our bots could speak a few languages based on the corresponding user’s preference.

The plan is that a bot speaks the language of the user’s choice. In order to make the bot remember the user’s preference, we need to implement a session preserve-retrieve mechanism acting on a unique persistent data-ware for each individual bot-user pair communication. This is where we need the session data-ware. The practice is that the handle_chat function code loads the user-bot session data (if exists or uses the default data otherwise) as soon as the user sending the message and the bot referenced in the webhook URL are known, and saves the session data just before the handle_chat function returns. It is necessary to save the session data at the end because it is probable that the codes inside the processing message functions modify the session data for the next updates. For example, user sends a bot command to change some preference.

The language of user’s choice as well as the gender of bot make it possible to find out which sentence code from which language database must be picked as the message text whenever the bot is meant to send a text or a reply message back to the user.

It is possible to use any SQL database (MySQL, SQLite, …) with unicode character set support to provide sentence codes of the supporting languages. That’s beyond the scope of this blog post, and if you plan to implement such a system, you may consider to convert my approach to a database-based sentence look-up. It’s not trivial but not difficult at all.

My approach is a simple lookup in a python config file code.ini; that is the [section] key=value ini file. The number of sentences used by the bots are few enough, and loading multiple config files (one for each language) is light and fast for our purpose.

Setting up code.ini File

The principle definitions are placed in a settings file code.ini as:

[main]
default_lang = en
languages = en, fa, fr
default_timezone = 'Europe/London'

In my design, the existence of English is a must, and it should never be eliminated. You nerds could figure out how you could eliminate English if you wish and let your language be the default. It’s not mandatory but useful and straightforward to use two letter ISO 639 language codes (https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) to identify the languages. Therefore, en, fa and fr are the codes for languages English, Persian (Farsi) and French respectively. It’s a great chance that I could understand these 3 languages at the time of writing this post; especially for French which has a clear distinction between the male and female speakers in some cases; and of course, especially for Persian since it is (like Arabic) a right-to-left (RTL) direction script. English and French are left-to-right (LTR).

The variable default_timezone is reserved for later when we wish to examine the user-bot interactive conversation. It is used to make bot remember the user’s timezone when generating a date/time string to be used in the message text.

This file, code.ini, is placed in the main directory of our project.

Setting up English code-en.ini codebook

The must-exist codebook file is code-en.ini and it is in a sub-directory codes. It is also used as a template for any other language codes. for each language defined in the code.ini settings above, there must be an existing matching code-xx.ini file in which xx is the 2-letter ISO 639 language code. Therefore, according to this setting, there must be two extra ini files: code-fa.ini and code-fr.ini.

We could have chosen to search the directory codes to automatically find the installed languages, and it would work fine. However, I decided that the settings:

languages = en, fa, fr

is a better choice if we want to quickly enable or disable some language whose associated code-xx.ini exists in the directory codes. Therefore, to install a new language: first, we must provide a properly translated codebook code-xx.ini file for that language and place it in the codes directory. Finally, we must add its ISO code name to the languages list in code.ini file.

The basic template for code-en.ini is :

[settings]
name = English
abbr = en
dir = ltr

[cal]
TIME_FORMAT = %Y-%m-%d   %H:%M:%S
LONG_TIME_FORMAT = %A %B %d, %Y   %H:%M:%S

month = JANUARY, FEBRUARY, MARCH, APRIL, MAY, JUNE, JULY, AUGUST, SEPTEMBER, OCTOBER, NOVEMBER, DECEMBER
weekday[s] = SAT, SUN, MON, TUE, WED, THU, FRI
weekday = SATURDAY, SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY

[file]
file_id_1 = file_name_1
file_id_2 = file_name_2
file_id_3[female] = file_name_female_3
file_id_3[male] = file_name_male_3

[codes]
code_1 = text_1
code_2 = text_2
code_3[female] = female version of text_3
code_3[male] = male version of text_3

[info]
code_1 = text_1
code_3[female] = female version of text_3
code_3[male] = male version of text_3

[warning]
code_2 = text_2
code_3[female] = female version of text_3
code_3[male] = male version of text_3

[error]
code_1 = text_1
code_2 = text_2
code_3[female] = female version of text_3
code_3[male] = male version of text_3


code_1 = text_1
code_2 = text_2
code_3[female] = female version of text_3
code_3[male] = male version of text_3

[tip]
code_1 = text_1
code_2 = text_2
code_3[female] = female version of text_3
code_3[male] = male version of text_3

[help]
code_1 = text_1
code_2 = text_2
code_3[female] = female version of text_3
code_3[male] = male version of text_3

There are two well-defined sections [settings] and [cal]. In section [settings], the general information about the language is defined: The native name of the language, its ISO code or abbreviation and its flow direction (LTR or RTL). In section [cal], all information about the calendar conventions are defined: The long and short representation of date/time (according to python’s datetime module: https://docs.python.org/3/library/datetime.html), the name of the months, the short and long names of the weekdays.

The sections [file] is used to point to the external files; that is, the values of the variables in this section are filenames rather than sentences in English or any other language. The content of each file is the text that a bot would send as a message text or reply to a user. Using the ExtendedInterpolation feature of configparser module (https://docs.python.org/3/library/configparser.html), it is possible to assign an automatic name for our files:

file_id_n = file_n_prefix-${settings:abbr}.txt

If we keep this pattern in all code-xx.ini files, then all the files are automatically labelled by each language; for example, file_n_prefix-en.txt, file_n_prefix-fa.txt, file_n_prefix-fr.txt, …

The external files are kept in a relative files sub-directory.

The other sections: [codes], [info], [warning], [error], [question]. [tip] and [help] are the mood or icon sections. In my design, the [codes] section is the default moodless sentence codes. If the variable key is missing in any other mood section, then the value would be searched in this [codes] section. It is a feature that we will add to our design. It’s not the default behavior of configparser module whose default section is called [DEFAULT]. Other mood sections indicates which emoji set must be used automatically along with the given sentence. The section names describe which emoji to use very well; for example, for the variables in [info], each equivalent sentence would be prefixed with ℹ️ emoji. For [tip], the emoji is a turned-on electric bulb and it is meant to suggest some tips.

The section [help] is a special section for an automatic help-string generation for any language in our design. This section would have an entry for each visible bot command so that a command /help could generate a help menu in each language. It could also be used to generate a help-string for each command for BotFather using /setcommands. Unfortunately, the Telegram system do not support multiple languages for this purpose, and we can do that only for one language. The Telegram system does not behave perfect for an RTL language either. So, our options reduce to English or French only. In my design the choice is English. The section [help] is also moodless.

For each phrase code code_n, there could be two varieties of entries in each mood sections: code_n[male] and code_n[female] which are automatically selected based on the gender of the bot. The rule is simple; for example, for a female bot, if code_n[female] exists, it is used … otherwise, the default code_n would be used. We would use a smart function, so that we will always use the key code_n (without gender prefix) whenever we want to send the associated sentence as a message text on behalf of the bot regardless of their gender.

In my design, I use this criteria to classify the sentences in the mood sections. If it is moodless or it could have multiple moods, it is placed under the [codes] section. If it has one clear mood, it is placed under that mood section.

For a realistic example, you could see the English version of code-xx.ini, that is, code-en.ini in the shared link: https://www.pythonanywhere.com/user/soheilnb/shares/1234e7c1702144b3b2ae3abe1171eda0/. Merex uses this database for the English version of the texts.

Dedicated module for multiple language support: lang_code.py

The module lang_code.py is independent of the rest of the code and works directly with code.ini and code-xx.ini files. It has two main parts: am init code which loads the language database, and a function bot_says with the following signature which is the central function for all automatic translations:

def bot_says(gender, lang, text, section=None, mood=None) :
    pass

The parameter gender is either “male” or “female” and specifies the gender of the bot. The parameter lang is the two-letter ISO language code. The parameter text is actually the code of the sentence to be picked for the given language and bot gender. There is no need at all to specify the [male] or [female] suffix for the code. That is done automatically by this function.

The parameter section is the explicit mood section of the text code if it is not specified in the text parameter itself. As we will soon see in the function implementation, the preferred method of specifying the mood is in the text code itself:

# preferred method : main mood is given in the text code 
reply_text = bot_says(my_bot_gender, user_lang, 'SOME_TEXT_CODE@info')

and

# alternative method : main mood given is section keyword
reply_text = bot_says(my_bot_gender, user_lang, 'SOME_TEXT_CODE', section='info')

In my design, both of these function calls (well, almost) return the same things. The first convention is the preferred and more convenient way to specify the mood section. The second method is useful whenever the text parameter is a variable and not a string literal. It helps us not to use the ugly combination : text_var + ‘@info’ for the text parameter.

The parameter mood is not to be mistaken with the mood sections; if you wish to append some emoji at the end of the text, you pass the emoji as the mood parameter. The mood icons according to the mood section names always come before the sentence text. And the extra emoji (given by the mood parameter) is appended at the end of the sentence. If one wishes a custom emoji at the beginning, they could either modify my code to cover more mood sections or they could simply insert the emoji where they like in the sentence value itself, and they would place the sentence in the [code] moodless section in which it has no default associated emoji at the beginning.

The implementation of the lang_code.py module is available in the shared link: https://www.pythonanywhere.com/user/soheilnb/shares/c4ce01244fb64f6abc1e117c6a208eb5/.

The auto_say function in proenv data-ware

Our bot may speak in a specific language only in a reply to a user’s command or a user’s conversation in general. In the context of a webhook operation, it is when an update arrives from the Telegram server, and our app.route function route_polybot_hook handles the update in the form of a request. Fortunately, the flask global g variable is always available during the processing of the request.

We almost never need to call the above function: bot_says, directly. We could use a more convenient function: auto_say which is defined as simple as:

def auto_say(text, section=None, mood=None) :
     gender = g.bot_info.gender
     lang = g.session.lang
     return bot_says(gender, lang, text, section=section, mood=mood)

proenv.auto_say = auto_say

We assumed that the session data-ware is available whenever this function is called, and it contains the user’s preferred language as the lang attribute or key name of session. The gender of bot is also available in the bot_info data-ware.

This auto_say function helps us to keep or mind free of the constant questions whether or bot is a male or female one, and which language is the user’s choice.

This function is placed in flask_app.py main module anywhere in the global area after the definition of proenv data-ware.

In other modules, if g is available under the name extras as the function parameter, we could use this function in the body code of that function as:

reply_text = extras.proenv.auto_say('Hello@codes')

which returns, for example, the bot’s greetings text in the language of the corresponding user’s choice … regarding the bot’s gender if that makes a difference.

Preserving and Retrieving user’s preferences: the session Data-ware

Two global functions: load_session and save_session are used to retrieve and preserve session data and information including users’ various preference per bot-user pair conversations:

def load_session(bot_name, sender_id, _dict_data=None, **kw_args) :
    pass

def save_session(data, bot_name, sender_id) :
    pass

The parameters bot_name and sender_id are the bot-user pair information used to uniquely identify which session data are referenced for saving or retrieval.

In load_session, the default data (if the session data is missing or it does not contain some specific required information) could be passed either by a python dictionary object or expando object (_dict_data), or by some key=value pairs directly pass as the kw_args argument represents the collection of such pairs as a dictionary object. It is possible to pass both parameters at the same time; that is, a dictionary object as well as key=value pairs in which case, the kw_args values override the final values in case some keys appear in both parameters. This function returns the loaded or the default session data-ware which is an expando object.

In save_session, the data to be preserved is passed to function as the first parameter (data). It is the session data-ware and must be the same expando object which is obtained by the load_session function … possibly with some value modified by the message processing code between load and save operations.

The implementation code is easy. It uses two helper functions load_pickle_data(pickle_file, default_data) and save_pickle_data(data, pickle_file) which loads and saves a python pickle file in general and a get_session_pickle_path(bot_name, sender_id) function which returns a unique path bases on the current bot name and user id. This path would be in a directory which could safely be assumed to be existing when this path is obtained; that is, the directory would be created if not exists.

def load_pickle_data(pickle_file, default_data) :
    data = Expando(default_data)
    if os.path.exists(pickle_file) :
        with open(pickle_file, 'rb') as ifile :
            saved_data = pickle.load(ifile)
            expando_update(data, saved_data)

    return data


def save_pickle_data(data, pickle_file) :
    with open(pickle_file, 'wb') as ofile :
        pickle.dump(data, ofile)



def get_session_pickle_path(bot_name, sender_id) :
    path = os.path.join(session_path, bot_name, f'U{sender_id}')
    if not os.path.exists(path) :
        os.makedirs(path, exist_ok=True)

    return os.path.join(path, '.session.pickle')


def load_session(bot_name, sender_id, _dict_data=None, **kw_args) :
    session_data = Expando(_dict_data)
    expando_update(session_data, kw_args)
    session_file = get_session_pickle_path(bot_name, sender_id)
    return load_pickle_data(session_file, session_data)


def save_session(data, bot_name, sender_id) :
    session_file = get_session_pickle_path(bot_name, sender_id)
    save_pickle_data(data, session_file)

A normal session related operation would be a sequence of three steps in this exact order:

1) load session data (remembered from the last conversation)
2) processing the message ... possibly modifying some variables in session data
3) save session data (to be use for the next conversation)

It is very import that these three steps take place in a single atomic step; there should not be another session load or save on the same bot-user pair occurring in this atomic step … that is … since the beginning of loading data until the end of saving data. We have set our webhook URL with the Telegram server using max_connections=1. That ensures that there is only one update be processing at any time. And that ensures that there would not be two or more simultaneous attempts to either load or save the same session data at any given instance of a Telegram update processing. If we do not want to rely on this theory, and we wish to ensure an atomic operation on session data, we could use the locking mechanism:

from lockfile import LockFile

...

def handle_chat(message, extras) :
    flavor = telepot.flavor(message)
    assert flavor == 'chat'

    content_type, chat_type, chat_id = telepot.glance(message, flavor=flavor)
    sender_id = message['from']['id']
    expando_update(extras.provars, content_type=content_type, chat_type=chat_type, chat_id=chat_id, sender_id=sender_id)
    extras.provars.sender_info = get_sender_info_summary(message)
    extras.provars.chat_info = get_chat_info_summary(message)
    extras.provars.auth = get_user_auth(message, extras)

    TEMP_SUFFIX = app.config['TEMP_SUFFIX']
    lock = LockFile(f'/tmp/.LOCK_polybot_{extras.bot_info.name}_U{sender_id}_{TEMP_SUFFIX}.lock')
    with lock:
        extras.session = load_session(extras.bot_info.name, sender_id, Expando(lang='en', tz='Europe/London', last_date=None))

        if content_type == 'text' :
            Command.process_text_message(message, extras)

        else :
            text = '{} says : You sent me a "{}" message.'.format(extras.bot_info.name, content_type)
            bot = extras.bot_info.bot
            bot.sendMessage(chat_id, text)


        extras.session.last_date = DT.now()
        save_session(extras.session, extras.bot_info.name, sender_id)


The constant TEMP_SUFFIX is another magic unique string. You could define it in config.py module like:

import hashlib

...

TEMP_SUFFIX = hashlib.md5(SECRET_KEY).hexdigest()

That is, using the MD5 hashing algorithm, you could create a unique hex string from your SECRET_KEY value. Then, use TEMP_SUFFIX value to create a unique path for the lock path in the global /tmp/ directory of the system.

That’s it! Now, our bots could speak multiple languages … one for each user based on their choice. Well, almost done! A few more alterations are to be done yet.

Modifying reply_to function

Most often, when any of the bots is meant to send a user a text message, it would be a reply to their message. In the blog post series number 6 : Providing Decorator for Bot Commands Using a Dedicated Message Text Processing Class (https://911programming.wordpress.com/2019/05/03/python-telegram-bot-webhook-handler-a-multi-bot-multi-language-design-6/) for the monobot project, a handy function, reply_to has been defined to simplify this reply. Now, we could write the polybot version with multi-language support as:

def reply_to(message, text, extras, forced_reply=False) :
    if text.startswith('AUTO:') :
        text = extras.proenv.auto_say(text[5:])

    bot = extras.bot_info.bot
    chat_id = extras.provars.chat_id
    bot.sendChatAction(chat_id, 'typing')

    if forced_reply or extras.provars.chat_type != 'private' :
        reply_to_message_id = message['message_id']

    else :
        reply_to_message_id = None

    bot.sendMessage(chat_id, text, reply_to_message_id=reply_to_message_id)

It’s basically the same function with a slight difference: if the text parameter of this function is already translated to the target language, we simply pass it to the function. Otherwise, if we have the phrase code, and we wish that the translation to the target language to take place in the reply_to function, we preppend the sentence code with a magic prefix ‘AUTO:’ in which case, the translation to the target languages is performed here in this function.

This reply_to function is a global function. We can add it as a static method of the Command decorator class as:

Command.reply_to = reply_to

The new faces of the two static methods in Command decorator: act_bad_command and act_no_command for polybot project would be:

    @staticmethod
    def act_bad_command(command, message, extras) :
        text = extras.proenv.auto_say('BAD_COMMAND@error') + ':\n\n' + command
        reply_to(message, text, extras, forced_reply=True)


    @staticmethod
    def act_no_command(command, message, extras) :
        reply_to(message, 'AUTO:NO_COMMAND@error', extras, forced_reply=True)

BAD_COMMAND‘ and ‘NO_COMMAND‘ are two sentence codes under the error mood section with various corresponding sentences in each language. In this two examples, you could see clearly how is it possible to obtain the translated text prior to reply_to function (the first case), and how is possible to let the reply_to function translate the sentence code automatically (the second case … using AUTO: prefix).

Modifying the bot act functions

Now, it’s time to modify the bot act functions (see previous blog posts of this series regarding the monobot project). It’s easy and straightforward. For example, for the hello (act_hello):

It was:

@Command(commands=['hello', 'hi'])
def act_hello(command, message, extras) :
    '''Greetings with this bot.'''
    text = f'Hello! Good to see you again! 🌺'
    Command.reply_to(message, text, extras)

Now it is :

@Command(commands=['hello', 'hi'])
def act_hello(command, message, extras) :
    '''hello@help'''
    text = extras.proenv.auto_say('HELLO@codes', mood='🌺')
    Command.reply_to(message, text, extras)

The obvious differences:

  • the function doc-string is a sentence code in [help] mood section now (hello@help). That’s for the automatic help generation in the language of user’s choice.
  • We don’t use the English texts for the replies anymore. We use phrase codes with auto_say function to generate the ultimate translated text.

Another example:

@Command(commands=['start'])
def act_start(command, message, extras) :
    '''start@help'''
    Command.reply_to(message, 'AUTO:START@codes', extras)

This example uses AUTO: prefix.

Language settings:

Merex et al do language and timezone settings at startup. That exact way of bot-user interactive conversation is beyond the scope of this blog post at the moment. I will leave that for a later blog post in this series.

Well, for now, our bots only speak the preset default language: English! How, we could allow user to choice his preferred language.

The simplest way is a dedicated command /language (could be abbreviated as /lang) for this purpose. We could write it such that without any argument, it reports the current language of the user’s choice. But with a single two-letter ISO language code (if valid) as the command parameter, it modifies the speaking language:

@Command(commands=['language', 'lang'])
def act_language(command, message, extras) :
    '''language@help'''
    if len(extras.provars.args) == 0 :
        Command.reply_to(message, 'AUTO:I_SPEAK@info', extras)

    else :
        new_lang = extras.provars.args[0]

        if new_lang in extras.proenv.languages :
            extras.session.lang = new_lang
            Command.reply_to(message, 'AUTO:LANG_CHANGED@info', extras)

        else :
            text = extras.proenv.auto_say('UNKNOWN_LANGUAGE@error') + ':\n\n' + new_lang
            Command.reply_to(message, text, extras)

The bold text extras.session.lang = new_lang does set the new value for the language preference. These sentence codes are supporting this function in each language:

[info]
I_SPEAK = I speak English.
LANG_CHANGED = I speak English now.

[error]
UNKNOWN_LANGUAGE = I do not know this language

with the exact punctuation. For other languages, these sentences must be properly translated mentioning the correct language in the place of English.

Shared links

The full source code of the polybot project so far is shared at PythonAnywhere at locations:

You could find other necessary shared links from the previous blog posts of this series.

That’s all. This is the end of this series of blog posts about multi-bot/multi-language webhook handler for the Telegram bots. In future, we will talk about other aspects of the design … such as implementing the interactive mode, user’s preferences and settings and helps, handling timezone, etc … in some independent blog posts.

This entry was posted in Flask, Python, Web and tagged , , , , , , , , , , . Bookmark the permalink.

Leave a comment