Monday, June 25, 2018

Encoding and Decoding in Web Development

4 comments

In this post , we will see how to handle encoding and decoding of data in javascript . Recently , I worked on a project where I had to read paragraphs of data stored in a database column which can be of type VARCHAR2 or CLOB .  This data needs to be transmitted as JSON and then after some processing , should be displayed on a web page . This data had some special  characters like à  stored in the database .When i rendered this content in a browser  surprisingly I saw some weird characters  instead of the character à . Debugging this issue seemed like a nightmare before understanding the encoding and decoding concepts . So let’s understand these concepts first and then see a solution to such problems .


As you all know , a computer cannot store "letters", "numbers", "pictures" or anything else. The only thing it can store and work with are bits. A bit can only have two values: yes or no, true or false, 1 or 0 . To use bits to represent anything at all besides bits, we need rules. We need to convert a sequence of bits into something like letters, numbers and pictures using an encoding scheme, or encoding for short. 

The below encoding scheme happens to be ASCII. A string of 1s and 0s is broken down into parts of eight bit each (a byte for short). The ASCII encoding specifies a table translating bytes into human readable letters. Here's a short excerpt of that table:


The ASCII encoding encompasses a character set of 128 characters. Since this charset doesn’t cover all the symbols used in different languages , several charsets were invented to cover most of them and they have become countless over time.

All you need to know is : data may be saved using any encoding scheme . But to be able to read it correctly you will have to know what encoding scheme was used so that you can decode it accordingly . Yes , just remember this whenever you are dealing with text or any content as a developer. Use a specific encoding and decoding system to transmit and read data . 

Base64 is a group of such similar binary-to-text encoding schemes that represent binary data in an ASCII string format by translating it into a radix-64 representation. The term Base64 originates from a specific MIME content transfer encoding. Base64 encoding schemes are commonly used when there is a need to encode binary data that needs to be stored and transferred over media that are designed to deal with textual data. This is to ensure that the data remain intact without modification during transport. 

Coming to  JavaScript there are two functions respectively for decoding and encoding base64strings:

The atob() function decodes a string of data which has been encoded using base-64 encoding. Conversely, the btoa() function creates a base-64 encoded ASCII string from a "string" of binary data.Both atob() and btoa() work on strings.

However just simply using these functions did not help me  and few special characters still were not readable on my web page .

The "Unicode Problem"
Since DOMStrings are 16-bit-encoded strings, in most browsers just calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit byte (0x00~0xFF). Please refer the documentation for more details on this .

One solution to this is to escape the whole string (with UTF-8, see encodeURIComponent) and then encode it;

function b64EncodeUnicode(str) {
    // first we use encodeURIComponent to get percent-encoded UTF-8,
    // then we convert the percent encodings into raw bytes which
    // can be fed into btoa.
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
        function toSolidBytes(match, p1) {
            return String.fromCharCode('0x' + p1);
    }));
}

b64EncodeUnicode(' à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="


To decode the Base64-encoded value back into a String:
function b64DecodeUnicode(str) {
    // Going backwards: from bytestream, to percent-encoding, to original string.
    return decodeURIComponent(atob(str).split('').map(function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
    }).join(''));
}
 
b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // " à la mode"
b64DecodeUnicode('Cg=='); // "\n"


If you want to do the encoding in PLSQL before transmitting the data to the client side like I had to , you can use  base64 encoding in SQL . I will share a post on that soon. Please subscribe to my blog for all updates .  


4 comments :