Wednesday, January 18, 2012

Enabling High Precision in Informatica

Summary:
The technical document talks about the enabling high precision scenarios in informatica.
Introduction:

In Informatica the decimal is considered as double function which rounded to only two decimals .So if you we have requirement in informatica to calculate a decimal value column with greater than two decimal and using aggregator function as well .Then we need to enable the high precision property.
First Scenario:

In this scenario if we are calculating a sum of decimal values in aggregator transformation and expecting the value more than two decimals .Then we need to enable high precision value so that the integration service treats the value as a decimal value else it treats the value as double and rounds the value to two decimals.
Second Scenario:

In this scenario if we are calculating a sum of decimal values up to 28 decimals for any scientific calculation as well we can enable high precison .







Conclusion:

By default if for any decimal calculation if we enable high precision integration service will not round the value to decimal or convert the value to double while calculation along with that if you want to calculate up to 28 decimals for any scientific calculation also we can enable high precision option in informatica.

Alphanumeric value calculation in informatica

All screen shots discussed in the below chain represent Alpha.out as source file and Alpha11.out as a target file.

1) To_decimal (value1), ASCII (value1) is the output of target file for two output field values, where value1 is Alpha.out input source file values .In this Screen shot the source file is taken as two outputs for to_decimal and ASCII values for each value it was noticed that only the first numeric character is been considered .
It means only the calculation has been for values for
1
2
3
4
5 and remaining values values for decimal or for ASCII were not been considered.

2) To_decimal (value1), ASCII (value1) is the output of target file for two output filed values, where value1 is Alpha.out source file values To_Decimal will return a zero value for non numeric values and ASCII values will not change as it is single character. And ASCII value for each value is in the second value.





3) Sum (To_decimal(value1)), sum (ASCII(value1)) is the output of target file for two output filed values, where value1 is Alpha.out source file values .It is just the extension of the second screen shot but just sum of the values.

4, 5, 6) The output is for sum (to_decimal) and sum (ASCII) it is noted here that the value will be only taken for the only one character.All the three screen shots , irrespective of any number of no numeric characters you consider it will take the first character and convert the value to decimal. So the output remains unchanged.



Sum (To_decimal (value1)), sum (ASCII(value1)) is the output of target file, where value1 is Alpha.out source file values for two output filed values.




Sum (To_decimal (value1)), sum (ASCII (value1)) is the output of target file, where value1 is Alpha.out source file values for two output filed values.



Sum (To_decimal (value1)), sum (ASCII (value1)) is the output of target file, where value1 is Alpha.out source file values for two output filed values.

CDC-MD5 value in informatica

Summary:
The technical document talks Message digest function which can used to identify the changes in the data more effectively , rather than checking all the columns using lookup .Message digest as termed as MD5 saves good amount of performance when compared to lookups when the comparison columns are more than 10 columns.
Introduction:

In Informatica we have two functions MD5 (Message Digest 5 algorithm) and CRC32 (Cyclic Redundancy Check) for effectively handling of change data capture MD5 is mostly recommended. This function becomes very useful when we have no primary key columns for comparison and where using lookup for comparison for more than ten columns is not recommended.


Message Digest 5 algorithms:

The algorithm basically returns as 32 bit hexadecimal number by calculating a checksum of the input value. It is been recommended to used for hash key generation and can effectively be handled for change data capture. Whereas Cyclic Redundancy Check is mostly recommended for transmission errors.





Traditional Approach:

To identify records for updates and inserts ideally we use a lookup transformation .But the cache built by the lookup cache basically depends on two factors, the number of columns in the comparison condition and the amount of data in the lookup tables. When we do not have primary key columns to identify the changes they is ideally two ways either to compare all the columns in the lookup and to compare the data using the concept of power exchange change data capture. Which can degrade the performance as well.



MD5 Approach:


MD5 using termed as is as message digest algorithm which convert all the input string data to a hexadecimal data .So while comparison we can concatenate all the columns and generate a MD5 value and if any column is modified ,the MD5 values changes .Thus by comparing the MD5 values we can identify whether the data is changed or not.









Conclusion:

Thus by using the MD5 values we can identify whether the data is changed or not without the performance being degraded and can effectively handle the data as well .But MD5 value is always recommended for more comparison columns and literally no primary key columns in the lookup table . And there is limitation as well the input to the MD5 values needs to a string by data type and it returns a 32 bit Hexa decimal.

Special Characters handling in informatica

In order to avoid the unwanted behavior by ETL/Informatica you need to handle in three different places.
1) The Source Connection code page should in sync with the database.
2) The source and target definition of the character filed data type should be nvarchar.
3) The Source qualifier data type should be set to nstring for the column.
4) The column for all the other transformation should be set to nvarchar.
5) If the Target/Source is a flat file, the code page of the flat file should also be changed to ISO Western European or UTF 8 as per the defined characters.