Wednesday, January 18, 2012

Enabling High Precision in Informatica

Summary:
The technical document talks about the enabling high precision scenarios in informatica.
Introduction:

In Informatica the decimal is considered as double function which rounded to only two decimals .So if you we have requirement in informatica to calculate a decimal value column with greater than two decimal and using aggregator function as well .Then we need to enable the high precision property.
First Scenario:

In this scenario if we are calculating a sum of decimal values in aggregator transformation and expecting the value more than two decimals .Then we need to enable high precision value so that the integration service treats the value as a decimal value else it treats the value as double and rounds the value to two decimals.
Second Scenario:

In this scenario if we are calculating a sum of decimal values up to 28 decimals for any scientific calculation as well we can enable high precison .







Conclusion:

By default if for any decimal calculation if we enable high precision integration service will not round the value to decimal or convert the value to double while calculation along with that if you want to calculate up to 28 decimals for any scientific calculation also we can enable high precision option in informatica.

Alphanumeric value calculation in informatica

All screen shots discussed in the below chain represent Alpha.out as source file and Alpha11.out as a target file.

1) To_decimal (value1), ASCII (value1) is the output of target file for two output field values, where value1 is Alpha.out input source file values .In this Screen shot the source file is taken as two outputs for to_decimal and ASCII values for each value it was noticed that only the first numeric character is been considered .
It means only the calculation has been for values for
1
2
3
4
5 and remaining values values for decimal or for ASCII were not been considered.

2) To_decimal (value1), ASCII (value1) is the output of target file for two output filed values, where value1 is Alpha.out source file values To_Decimal will return a zero value for non numeric values and ASCII values will not change as it is single character. And ASCII value for each value is in the second value.





3) Sum (To_decimal(value1)), sum (ASCII(value1)) is the output of target file for two output filed values, where value1 is Alpha.out source file values .It is just the extension of the second screen shot but just sum of the values.

4, 5, 6) The output is for sum (to_decimal) and sum (ASCII) it is noted here that the value will be only taken for the only one character.All the three screen shots , irrespective of any number of no numeric characters you consider it will take the first character and convert the value to decimal. So the output remains unchanged.



Sum (To_decimal (value1)), sum (ASCII(value1)) is the output of target file, where value1 is Alpha.out source file values for two output filed values.




Sum (To_decimal (value1)), sum (ASCII (value1)) is the output of target file, where value1 is Alpha.out source file values for two output filed values.



Sum (To_decimal (value1)), sum (ASCII (value1)) is the output of target file, where value1 is Alpha.out source file values for two output filed values.

CDC-MD5 value in informatica

Summary:
The technical document talks Message digest function which can used to identify the changes in the data more effectively , rather than checking all the columns using lookup .Message digest as termed as MD5 saves good amount of performance when compared to lookups when the comparison columns are more than 10 columns.
Introduction:

In Informatica we have two functions MD5 (Message Digest 5 algorithm) and CRC32 (Cyclic Redundancy Check) for effectively handling of change data capture MD5 is mostly recommended. This function becomes very useful when we have no primary key columns for comparison and where using lookup for comparison for more than ten columns is not recommended.


Message Digest 5 algorithms:

The algorithm basically returns as 32 bit hexadecimal number by calculating a checksum of the input value. It is been recommended to used for hash key generation and can effectively be handled for change data capture. Whereas Cyclic Redundancy Check is mostly recommended for transmission errors.





Traditional Approach:

To identify records for updates and inserts ideally we use a lookup transformation .But the cache built by the lookup cache basically depends on two factors, the number of columns in the comparison condition and the amount of data in the lookup tables. When we do not have primary key columns to identify the changes they is ideally two ways either to compare all the columns in the lookup and to compare the data using the concept of power exchange change data capture. Which can degrade the performance as well.



MD5 Approach:


MD5 using termed as is as message digest algorithm which convert all the input string data to a hexadecimal data .So while comparison we can concatenate all the columns and generate a MD5 value and if any column is modified ,the MD5 values changes .Thus by comparing the MD5 values we can identify whether the data is changed or not.









Conclusion:

Thus by using the MD5 values we can identify whether the data is changed or not without the performance being degraded and can effectively handle the data as well .But MD5 value is always recommended for more comparison columns and literally no primary key columns in the lookup table . And there is limitation as well the input to the MD5 values needs to a string by data type and it returns a 32 bit Hexa decimal.

Special Characters handling in informatica

In order to avoid the unwanted behavior by ETL/Informatica you need to handle in three different places.
1) The Source Connection code page should in sync with the database.
2) The source and target definition of the character filed data type should be nvarchar.
3) The Source qualifier data type should be set to nstring for the column.
4) The column for all the other transformation should be set to nvarchar.
5) If the Target/Source is a flat file, the code page of the flat file should also be changed to ISO Western European or UTF 8 as per the defined characters.

Monday, April 5, 2010

Mutated from Update in informatica

When you have update override in the target and go for treat rows as update and target level properties as update else insert and insert .

This is an informational message stating that your session has inserted 411 rows of which 411 (all) were intended to be updated in the target. But none of those 411 rows were updated (no row with the matching key was found in the target) so their "row-type" has "mutated" into an "insert".
That's it. It's a normal operation.

Refernce link :http://technet.informatica.com/node/111332

Tuesday, May 5, 2009

Pivot query in oracle

Scenario 1:
CREATE TABLE t1 (
id NUMBER PRIMARY KEY,
sum1 NUMBER,
sum2 NUMBER,
sum3 NUMBER
);
INSERT INTO t1 VALUES (1,20,40,50);
INSERT INTO t1 VALUES (2,30,20,25);
INSERT INTO t1 VALUES (3,15,60,55);
COMMIT;
ID SUM1 SUM2 SUM3
---------- ---------- ---------- ----------
1 20 40 50
2 30 20 25
3 15 60 55
SELECT id, num, DECODE(num,'1',sum1,'2',sum2,'3',sum3) data
from t1,
(SELECT '1' num FROM dual UNION ALL
SELECT '2' num FROM dual UNION ALL
SELECT '3' num FROM dual )

ID NUM data
1 1
1 2
1 3
2 1
2 2
2 3



Scenario 2:
scott@DEV816> select job, deptno, count(*) 2 from emp 3 group by job, deptno 4 /JOB DEPTNO COUNT(*)--------- ---------- ----------ANALYST 20 2CLERK 10 1CLERK 20 2CLERK 30 1MANAGER 10 1MANAGER 20 1MANAGER 30 1PRESIDENT 10 1SALESMAN 30 49 rows selected.And you would like to make DEPTNO be a column. We have 4 deptno's in EMP, 10,20,30,40. We can make columns dept_10, dept_20, dept_30, dept_40 that have the values that are currently in the count column. It would look like this:scott@DEV816> scott@DEV816> select job, 2 max( decode( deptno, 10, cnt, null ) ) dept_10, 3 max( decode( deptno, 20, cnt, null ) ) dept_20, 4 max( decode( deptno, 30, cnt, null ) ) dept_30, 5 max( decode( deptno, 40, cnt, null ) ) dept_40 6 from ( select job, deptno, count(*) cnt 7 from emp 8 group by job, deptno ) 9 group by job 10 /JOB DEPT_10 DEPT_20 DEPT_30 DEPT_40--------- ---------- ---------- ---------- ----------ANALYST 2CLERK 1 2 1MANAGER 1 1 1PRESIDENT 1SALESMAN 4

Monday, May 4, 2009

Informatica Architecture --Integration Service vs Power center Server

The differences between 7.x and 8.x is on architecture as well as The architecture of PowerCenter 8.x primarily differs from that of PowerCenter 7.x in the administration of the product. PowerCenter has now adopted a Service Oriented Architecture (SOA).The SOA architecture has replaced several servers that were used within the PowerCenter 7.x installer. A Tomcat web server is used to run an Informatica 'domain' comprising of several repositories or integration 'services'. An integration service is responsible for data movement.
SOA allows a single console tool to be used to administer the PowerCenter installer. This tool is web based and can be accessed from anywhere in your network. SOA supports several integration or repository services to run concurrently. This allows high availability to be implemented (if a server hosting an integration service fails these tasks can be 'failed over' onto another node in the domain).
Additionally PowerCenter 8.x supports the use of Java in mappings, the ability to 'push down' processing into the database and other similar improvements. There is nosignificant change in the client interface.
Additional Information: Several terms have acquired a new meaning with the change in product architecture.
PowerCenter 7.x PowerCenter 8.x R epository server and repository R epository service PowerCenter Server I ntegration service Some other enchancements like push down processing, more flexible java transformations, SQL transformations, string parsing functions are newly added to 8.x. Only change we can see in it is a single place where we do all administration stuff. Repository server is renamed as repository service and informatica pc server renamed as integration service. Not much change in client interface as we still have Repository Manager,Designer,Workflow Manager and Workflow Monitor.
Also on 8.5 and 8.6, security, user and group administration has been moved to the administration console.
When comparing to the architecture 7.1 n 8.1,DOMAIN,NODE,INTEGRATION SERVICES added to the 8.1 architectutre.DOMAIN is a web based admin console.NODE is like INF server or power centre server.INTEGRATION SERVICES is like repository server.DOMAIN,NODE,INTEGRATION SERVICES are at one place that is DOMAIN DATABASE.In 8.1 we don't have admin console n repository agent.and also you can install multiple NODES in a single machine.and you can intiate multiple processes simultaneously.In INF 8.1 has more security you can not create repository with out previliges.you no need to start the services in control panel every time.and also some of the transformations added in 8.1.They are SQL,JAVA transformations.It supports unstructured data like emails,word,pdfs.concept of flatfile updation is introduced.



Effective in PowerCenter 8.0, you might notice an increase in memory and CPU resource usage on machines running PowerCenter Services. The following factors can increase memory and CPU resource usage:
Updates to Integration Service and Repository Service processes. The pmserver and pmdtm process for the Integration Service and the pmrepagent process for the Repository Service include updated functionality that requires more memory and CPU resource usage.
High availability. You configure a domain for high availability, service restart, failover, and recovery operations.
Java Runtime Environment (JRE). The PowerCenter service-oriented architecture uses JRE to run the PowerCenter domains and nodes. Domains and nodes require more memory and CPU resources than when you run a PowerCenter Server or Repository Server in PowerCenter 7.x.
Log Manager. The Log Manager requires additional memory and CPU resources when it collects and processes a large number of log events.