Sunday, January 20, 2013

[Informatica] - Processing UNICODE Characters in Informatica PowerCenter Workflow


Couple of days back one of my friends mailed me and said, he is not able to process Arabic characters using Informatica PowerCenter workflow. You might have faced same issue in processing scripts such as Arabic, Hebrew, Chinese etc. Let discuss about how we can process such non English scripts in Informatica PowerCenter workflows.

Before we jump into Informatica PowerCenter configuration, lets understand couple key concept behind processing different character set.
Character Set : Is a code that pairs a set of natural language characters such as an alphabet or symbol with a set of numbers.  For example The ASCII character set, uses the numbers 0 through 127 to represent all English characters as well as special control characters. UNICODE is the widely used character set, which can represent over 110,000 characters covering 100 scripts such as Arabic and Hebrew Chinese etc..
Character Encoding : Is an algorithm that translates a list of numbers (these number are defined in the character set) to binary so that a computer reads and displays a character in a way that humans can understand. UTF-8 is the popular encoding used for UNICODE character set.

So from above description it is very evident that Character Set and Character Encoding are the key behind processing any foreign characters correctly. We need to have the Informatica PowerCenter Integration Service and Repository Service configured to process all the characters hat might come in your data sources.

INTEGRATION SERVICE CONFIGURATION

You can choose the character set supported by the integration service during the initial configuration or you can change it later from the Administrator console.

While Informatica PowerCenter Installation

During  PowerCenter installation we can set the supported character set or Data movement Mode  as shown in below image. Please Check out Informatica PowerCenter Installation Guide for step by step installation instruction.
Informatica PowerCenter Unicode Setting

After Informatica PowerCenter Installation

We can  change the Character Set later after PowerCenter Installation, you can do this from the Admin Console.  
Log on to Admin Console using the admin user id and password and choose the Integration service from the Domain Navigator as shown in below image. 
Click Edit to change the Data Movement Mode(Character Set) 
Informatica PowerCenter Unicode Setting
Choose 'Unicode' from the drop down list, and OK.
Informatica PowerCenter Unicode Setting
Read Informatica PowerCenter Installation Guide for the complete Informatica Installation Guide.

REPOSITORY SERVICE CONFIGURATION

Character set of Repository Service can only be set during the service configuration. This canot be changed later. See below highlighted image and Check out the complete Informatica PowerCenter Installation Guide.
Informatica PowerCenter Unicode Setting
With this configuration, Informatica PowerCenter will have the capability to handle any character with in the UNICODE character set.

WORKFLOW CONFIGURATION

During the configuration of each workflow, you need to choose the codepage or character encoding for  the data source and the target data. 
You can choose codepage or character encoding from the source and target property as shown in the below image.
Informatica PowerCenter Unicode Setting
You can choose the code page or the character encoding of the target data as shown in below image.
Informatica PowerCenter Unicode Setting