add

About Me

My photo
Oracle Apps - Techno Functional consultant

Thursday, December 27

Informatica - Beginner's

Informatica is a Data integration Tool mainly used for ETL purpose.
ETL stands for Extract, Transform and Load.

Extract – Extracting the data from one / many sources.
Source can be Relational (Ex: Oracle, Microsoft SQL server etc) or files (Fixed or delimited flat file, XML file etc). 
     
Transform- Transforming the data according the business needs by cleansing the data.
Ex: Converting the Date Format coming from source to the format as required in target, Name cleansing etc.

Load – Loading the data into the Target (either Table or Flat file).   

Commonly used Client tools as follows:
1)      R- Repository Manager
2)      D- Designer
3)      W- Workflow Manager
4)      M –Monitor

Repository is the place where the ETL operations take place.
It is used to store the Metadata.
It is mainly used by the Administrator for creating the folders, users, providing the privileges (Access, Authentication) etc.

Steps to add repository:
1) Open the Repository Window.
2) Go to Repository and Click on Add Repository

The below screen will be displayed:

Once the repository and the Username are entered, click OK.
3) The repository gets added to the Repository Manager.

4) Then right click on the newly added repository name in the left panel, the below screen will be displayed:

5) Enter the Password and click on More Button, the following screen will be displayed.
5) Click on Add button to add the Domain information such as  Domain Name,Gateway Host and Gateway Port as shown below:

6) Once the domain is added click on “OK” and then on “CONNECT”.
By doing so, you will be connected to the configured repository.
The designer includes:
i) Source Analyzer:
Source Tables or flat files that will be used in the mapping are imported here.
ii) Target Designer:
Target tables or flat files that will be used in the mapping are imported here.
iii) Transformation Developer:
Used to Create the Transformations that can be reused in multiple mappings.
iv) Mapping Designer:
Used to create Transformation that connects Source to the Target.
Transformations in the mapping cannot be reused in other mappings unless it is configured to be reusable.


v) Mapplet Designer:
Used to create and configure the Transformations that can be used in multiple mappings.
3) TRANSFORMATION:
Designer provides the set of transformations that perform specific functions.

 The Transformation types can be classified based on the following:
a)      Whether number of rows coming from source are affected
b)     Whether it is Connected or Unconnected


Based on whether numbers of rows coming from source are affected or not, the transformations are classified as follows:

In an Active Transformation the Total Number of Output is not same as the Total number of Input rows.
If “m” is the number of rows in the source, row count in target can be m+n (Joiner Transformation) or
m-n (Filter Transformation)
For example, In Filter Transformation, if we need to filter the records based on certain condition the records in the output will be less than the input.
In Passive Transformation the Total Number of Input is exactly equal to the Total Number of Output rows.
Example: Expression Transformation
In Expression Transformation we just perform certain calculations that do not change the number of rows.
Based on whether the transformation is Connected or Unconnected, the transformations are classified as follows:
Connected Transformation is where one of the Input/Output Ports is connected to any other Transformation or with the Source/Target.
Example: Aggregator Transformation.
Unconnected Transformation is not connected to any other transformation in the mapping and is called within another Transformation.
Example: Lookup Transformation
Note: Look up Transformation can be used as connected or Unconnected Transformation.
Commonly Used Transformations:
Source Qualifier
Aggregator
Look Up
Filter
Router
Expression
Joiner

Transformations are created using the Designer Tools:
SOURCE: Columns which are necessary for us can alone be taken to the Source Qualifier instead of taking all the columns because this will improve the performance.
Whenever a source is dragged into the Mapping Designer, the Source Qualifier comes along with the source. We can either use or delete this transformation.
SOURCE QUALIFIER TRANSFORMATION (SQ):
1) Filter conditions can be given if we need to filter only particular records required from the source.
2) Same source qualifier can be used in case if we need to join 2 or more source tables which can be from other schema also but homogenous in nature.
Note: If we need to join tables from different databases or files systems or basically heterogeneous sources, Source qualifier transformation cannot be used, instead Joiner or Lookup Transformation can be used.
Important Points to be noted while using SQ:
i) The number of columns that are in the select statement should match with the number of ports which are projected from the SQ (The order should also be the same). In case if the number of rows / order does not match the following error is displayed:
“Query should exactly match “n” columns projected from the source”
 ii) To check whether the query which is given in the SQ is valid or not,
 We should first establish ODBC Connectivity
1)      Right click the SQ, go to Properties.
2)      At the bottom of the window, “connect to database” option will be there, if there is some prior connection then, we just need to select the ODBC from the drop-down and give Username and Password.

Username: schema name used to connect to Database.
Password: Password used to connect to Database

Else the following steps are to be followed:
3)      Against ODBC data source, click on the LOV box

The following screen will be displayed:
4)      Click on Add button. Then the following screen will be displayed:
            
5)      Click on ODBC for Oracle if oracle is used as the database.
         

Then the below screen will be displayed:


Data Source Name: meaningful name
Description: Any meaning Description

Username: User name with which we connect to Informatica (Ex: Trainee1 from the above example)

Server: Database Node or Connection string (For example Training is the server from the below example)
After all the above steps click “Validate”, then we will get the message “No Errors Detected” in case if there are no errors.
The following are the sample Queries that can be done using SQ:
Example 1: There are 2 tables namely CUSTOMER, PRODUCT

SELECT
CUSTOMER.CUSTOMER_ID,
CUSTOMER.CUSTOMER_NAME,
CUSTOMER.CUSTOMER_ADDRESS,
CUSTOMER.CUSTOMER_CODE,
PRODUCT.PRODUCT_ID,
PRODUCT.PRODUCT_NAME,
PRODUCT.PRODUCT_CODE
FROM
CUSTOMER, PRODUCT
WHERE CUSTOMER.CUSTOMER_ID = '100'       
AND PRODUCT.PRODUCT_ID between 1 and 500
AND CUSTOMER.CUSTOMER_CODE= PRODUCT.PRODUCT_CODE


Example 2:
There is a table LOCATION contains the column START _DATE if we need the start date and also the minimum of the START_DATE it can be done in SQ:
From the below query we can get the minimum of START_DATE and the START_DATE (all dates) can be taken directly from the table.
SELECT B.MIN_START_DATE
FROM LOCATION,
SELECT MIN (START_DATE)   AS MIN_START_DATE FROM LOCATION) B

Note: If we need to select the minimum date column from the same table the query in Example 2 can be used also MIN_START_DATE column should be added to PORTS in SQ containing the same data type as START_DATE.


The aggregator is to perform GROUP BY operations on a particular column.
For example out of 5 columns in the select clause we need distinct values for only 2 columns then aggregator can be placed. It is also used for getting SUM (), AVG (), MIN (), MAX () of the columns grouped on something (like AVG (Salary) in Departments).
Important Points:
i) In Aggregator Number of Input and Output ports should be equal else the mapping will be invalid.
ii) The order of the columns that are coming out of Aggregator should be the same to the next transformation. Else the following error will be displayed:
“Column names/data type mismatch”
The following example clearly explains the logic:

Few useful Tips:
Consider Source Table “S”:
EMP_ID
EMP_NAME
DEPT_ID
SAL

Consider Target “T1”
TGT_EMP_ID
TGT_EMP_NAME
TGT_ DEPT_ID
TGT_SAL

Consider Target “T2”
TGT_EMP_ID
TGT_DEPT_ID
TGT_SAL

T1 and T2 must be having SUM (SAL) grouped on TGT_ DEPT_ID.
Before T1 an aggregator has to be placed to obtain SUM (SAL) grouped on TGT_ DEPT_ID.
But the same aggregator cannot be used in T2, since the number of columns in T1 and T2 are different. So a separate aggregator must be placed before T2 to obtain SUM (SAL) grouped on TGT_ DEPT_ID.


Look up can be made on either SOURCE/ TARGET.
There are 2 types of Look ups:


Unconnected lookups are used when there are many inputs, many lookup columns and only one output column is required form the lookup.
“I” stands for Input Port, “O” stands for Output Port, “L” stands for Lookup Port and “R” stands for Return Port
1)      Input ports only “I” should be checked
2)      “L” should be checked for those ports on which we are performing lookup.
3)      Return Port is the Output column that is the result of the lookup transformation. So for this column “O”,”L” and “R” will be checked.
4)      For unconnected lookups, in the expression we should create an output port and pass the input parameters.
5)      The following syntax should be given in the output port:

:LKP.<lookup_name> (PORT1, PORT2, PORT3)

Where PORT1, PORT2, PORT3 are the input ports in the lookup.
Note: The order of Ports mention in the lookup formula should be same as that in the lookup transformation.
For example:
:LKP.LKP_LOCATION (LOCATION_ID, LOCATION_CD, SYSDATE)

Note: The orders of ports which are given in the brackets are just as given in the above example.
6)      Also the condition for the lookup must be specified in the “Condition” tab.
Here in this example, the following are the lookup conditions:

LOCATION_ID = i_LOCATION_ID,
LOCATION_CD = i_LOCATION_CD,
START_DT = i_DATE

7)      In “Properties” tab of the lookup transformation, we have the option for handling multiple caches i.e., when multiple rows are displayed for the lookup condition. The following are the options available:

>> Return the first matching value
>>Return last matching value
>>Return any matching value
>>Report Error

Using the above options we can determine what happens when the Lookup transformation finds multiple rows that match the lookup condition.
We can select “Return First / Last matching value” from the cache or lookup source, or “Report Error”. Or, we can allow the Lookup transformation to use any value. “Return the first matching value” and “Report Error” are preferred for better performance.
When we configure the Lookup transformation to return any matching value, the transformation returns the first value that matches the lookup condition. It creates an index based on the key ports rather than all Lookup transformation ports.


2) Connected Lookup:
Incase if we need to take out more than one output ports, by doing a lookup on a table we use connected lookup.
“I” stands for Input Port, “O” stands for Output Port, “L” stands for Lookup Port and “R” stands for Return Port
1) Input ports only “I” should be checked
2) “L” should be checked for those ports on which we are performing lookup.
3) For Output columns that are the result of the lookup transformation, “O” and “L” will be checked.

Note: Here the return port will be unchecked.
1. It is always preferable to place the conditions with equality sign (=) first in the list of conditions because it improves the Performance.
2. If we have privilege to modify the database, adding the index to the columns improves the performance of the look up.
It is used to filter the records incase if we need only particular records in the Target that may be either Table/Flat File.
The following example shows only if the Target table / flat file should contain only the CUSTOMER_NAME as ‘JOY’ and 3 weeks of data from the Starting date.
The condition should be given as:

CUSTOMER_NAME='JOY’
AND LOAD_DATE<=ADD_TO_DATE (MIN_START_DATE,'DD', 14)
Whenever there is need to direct the values from various sources to different targets, where the targets are to be loaded based on some conditions, then router transformation is used.
Example1: There is a Customer Details table with:
Customer_Id,
Customer_Name,
Customer_Location,
Customer_Order_Qty
And there are multiple targets with same definition i.e., Customer_Name and Customer_Order_Qty but one target table for each Customer_Location say one for Chicago, one for Plano and one for Texas.
Then we need to use router transformation to route values based on location names.
There are two types of output groups:

User-defined groups: Here in this example:
Group 1 = Customer_Location = ‘Chicago’
Group 2 = Customer_Location = ‘Plano’
Group 3 = Customer_Location = ‘Texas’

Default group: The Designer creates the default group after you create one new user-defined group. The Designer does not allow you to edit or delete the default group. This group does not have a group filter condition associated with it. If all of the conditions evaluate to FALSE, the Integration Service passes the row to the default group. Here in this example, all Customer details apart from ‘Chicago’, ’Plano’ and ‘Texas’ will fall under Default group.

Example2:

If we have to route the Customer_Id (null or 0) to reject file and Customer_Id >0 to some other target file.

i) We need to create a group in the router and place the below condition under one group.

               (ISNULL (CUSTOMER_ID)) or (CUSTOMER_ID = 0)


ii) Also create another group and place the below condition for GOOD records:

  CUSTOMER_ID>0


iii) At last take the ports from the particular GROUP to which ever target is needed.

The expression can be used for mathematical calculations and also used in case we use unconnected lookups in our mappings to refer to the lookup and fetch the return value from that lookup.
The expression transformation is passive and connected transformation and is most commonly used and effective transformation.
The following are the few syntax / examples for reference:
i) Expression Transformation is used for comparision by specifying the conditions.
IIF ((v_PRODUCT_ID> 1   AND  v_PRODUCT_ID  < 500  ), 7.5,ROUND(v_PRODUCT))

ii) It can be used for converting a number to character or vise versa
 
Conversion: TO_CHAR (variable name or column name)
                    
iii)  It can be used for concatenation of two or more columns.
                        
Column1||Column2|| Column 3|| Column4

iv) If there are null values for few columns and these values needs to be assigned to some values in such cases expression transformation can be used.
     
                      
iif (isnull (variablename1 or columnname1) or variablename1 (or) column name1 = 0, iif( isnull(variablename2 or column name2) or variablename2 or column name2= 0 , 0 , variablename2 or column name2), variablename1 or column name1)
                   
v)  The expression can be used as a configuration option for the output port. The return value for the output port needs to match the return value of the expression

iif (column name = some value,:LKP.LOOKUPNAME(PORT1,PORT2,PORT3,etc.), 0)

vi) Date function: ADD_TO_DATE (columnname,’DD’, number of days)

JOINER TRANSFORMATION:
Joiner transformation is active and connected transformation.
It can be used to join source data from two related heterogeneous sources residing in different locations or file systems whereas the Source.
Qualifier is used to join from homogeneous sources. It can be also used for joining the data from the same sources.
It is used to join based on the certain conditions with at least one matching column.
In SQL, a join is a relational operator that combines data from multiple tables into a single result set. The Joiner transformation is similar to an SQL join except that data can originate from different types of sources.
We can define the join type on the Properties tab in the transformation. The Joiner transformation supports the following types of joins:
·                     Normal
·                     Master Outer
·                     Detail Outer
·                     Full Outer
Note: A normal or master outer join performs faster than a full outer or detail outer join.
If a result set includes fields that do not contain data in either of the sources, the Joiner transformation populates the empty fields with null values. If we know that a field will return a NULL and we do not want to insert NULLs in the target, we can set a default value on the Ports tab for the corresponding port.
Rules:

1) All the ports that needs to be joined should be brought to the joiner.

2) There are two pipelines namely master and detail pipeline. We should take out the ports from the detail pipeline till the target and master pipeline normally ends up in joiner, but in case of equiv join we can take out from any ports.

3) Joiner Transformation cannot be used when either of the input pipelines contains update strategy.

4) It cannot be used when the sequence transformation is connected before the joiner.


Important Points to be noted:
i)        Joiner can be used or the connected lookup can be used both does the same function based on the requirements.

ii)      Performance-wise, Joiner takes more time than Lookup, so Lookup is preferred while we have to retrieve data from two tables based on some join conditions.


iii)    We can improve session performance by configuring the Joiner transformation to use sorted input. We can see the performance will be improved when we are working with the large set of data.

iv)    Performing a join in a database is faster than performing a join in the session, by providing the join condition in the Source Qualifier improves the performance.

To configure a mapping to use sorted data, we should establish and maintain a sort order in the mapping, when it processes the Joiner transformation. We need to follow the below tasks to configure the mapping:
 Configure the sort order. Configure the sort order of the data we want to join. We can join sorted flat files, or we can sort relational data using a Source Qualifier transformation. We can also use a Sorter transformation before the joiner transformation.
Configure the Joiner transformation. Configure the Joiner transformation to use sorted data and configure the join condition to use the sort origin ports. The sort origin represents the source of the sorted data.
This is used for creating the sessions and associate the sessions to the workflow.
In properties tab:
Session Log File Directory: <Directory name (UNIX)>\SessLogs
If it is performing INSERT operation then select “Treat source row as INSERT”
In config object tab:
It is always preferred to give Stop on errors should be given as 1.
1)                  Click on Sources, then in connection give the connection name if it is already created.
2)                  Click on the particular connection and check whether the same is used for connecting the Database.
3)                  Click on the target and set the connection for it.
4)                  In the properties which is at the bottom,
5)                  Target load type should be “Normal”.
6)                  Truncate table option should be checked if the target has to be truncated and to avoid error (Violating unique key constraint) while executing.
7)                  Reject file directory should be given as Directoryname(UNIX)\BadFiles
8)                  For the remaining transformation it should be cache directory Directoryname (UNIX)\Cache

Note: If the new connection has to be created for workflow:

Go to Workflow manager à connections à Relational then give the appropriate information. (User, password, server)


Go to Workflow à create à mention the name of the workflow

Below that Integration service should be selected as mentioned by the client.

Next in the workflow properties Directoryname (UNIX)\WorkflowLogs\ should be given.

                    
Fig: Snap-shot of “Edit properties” of session


If the Source / Target used in the mapping is a Relational table then, following session configurations must be done:

Select the particular source / target from the left panel. If the selected source / target is a relational table then “Relational Reader” needs to be selected in top most drop down in the snap-shot shown above. (i.e., in the “Mapping” tab of “Edit Task” of the session). Once this is done, in the “Connections” heading of the snap-shot shown, the Database connection need to be selected.

Similarly for “Flat File” source / target “File-writer” needs to be selected in the top most drop down in the snap-shot shown above.

                      


In the above snap shot “File Writer” is mentioned in the top most drop down. That signifies the target selected in the left panel i.e., highlighted is a flat file. For Flat file source / target additional session properties like:
Source / Output File Directory must be mentioned – this can be file path in Unix box,
Source / Output Filename must be mentioned.

Incase if the target is the flat file we need to go to UNIX box and check whether it is loaded properly.

Steps:
Run à Telnet server name (as mentioned by client)
Telnet server name
Login: As mentioned by client
Password: As mentioned by client

Go to the specified directory and give cd <Directoryname> then the files will be listed.
After this give cat <filename> the records will be listed or vi <filename>

To check the status of the mapping being executed.
Right Click on the session to get the session log which is very useful in case the session is failed. Also click on the run properties to know the number of records loaded into the target and number of records fetched from the source.

No comments: