Several ways to pull Incremental Data from source - Informatica
We often come across a scenario where we are asked to pull yesterday’s records or the latest records after the last run to avoid unnecessary fetching of old records from the source.
We can think of SYSDATE -1 here to solve this but it’s not always correct in all cases suppose the workflow was scheduled in such a way that it should only run on business days and imagine you are pulling INVOICE related data so source table are updated on only business days , So if we run the workflow on Monday and if we have used SYSDATE-1 logic we end up in pulling Sundays records were as Sunday there was no Sales we are wrong here. We need to pull last business day record ie either Friday or Saturday depending on the business. Also we end up with same situation when there are any holidays.
We can pull last business day invoices from many ways, here I am going to share the two ways I have came across.
1. Mapping Variable:
I believe it’s the easiest one of all where we declare a mapping level variable in the mapping and assign it with SESSIONSTARTTIME and use it in SQL override of the same mapping.
Example:
Consider the below simple mapping
Sourceà SQà EXPà Target
Follow these steps to implement this logic
· Go to mapping tab and then Parameter and variable option, there create a mapping variable like $$LASTRUNDT of type date. There you can see initial value; either you can assign that value with some particular date or NULL. Note the date format here is MM/DD/YYYY HH24:MI:SS, so be careful when you are assigning the date there.
· Now use the mapping variable $$LASTRUNDT in the Source qualifier like
Select COL1, COL2, COL3….. COLn from SALES.INVOICE
Where INVOICE_DATE >= TO_DATE(‘$$LASTRUNDT’,’ MM/DD/YYYY HH24:MI:SS’)
· In Expression transformation create a output port LASTRUN and in the expression use
SETVARIABLE($$LASTRUNDT,SESSSTARTTIME). So here you are assigning the SESSIONSTARTTIME value to your mapping variable $$LASTRUNDT.
So how this gone a work now, it’s quite simple at the 1st run it will use default value if you have declared any initial value while creating mapping variable or IS will pick the default value i.e. 01/01/1753 00:00:00. This is how 1st run work, it is advised to initialize the variable with required date.
Then we have used setvariable function in expression right that will assign the session start time to the variable $$LASTRUNDT.
When you run the session again IS will pick the updated $$LASTRUNDT value which hold previous run’s timestamp. So this way we can implement a system which pulls last business day records or latest records after the last run.
2. Mapping Parameter:
Parameter file can also be used to store the last run timestamp as a parameter and then use the parameter in the mapping. This system needs two mapping and session objects to perform the last business day load.
Let’s see how we can achieve it
· Create a mapping parameter in the data load mapping where you are loading the last business day records as $$LASTRUNDT
· Now use the mapping parameter $$LASTRUNDT in the Source qualifier like
Select COL1, COL2, COL3….. COLn from SALES.INVOICE
Where INVOICE_DATE >= TO_DATE(‘$$LASTRUNDT’,’ MM/DD/YYYY HH24:MI:SS’)
· Create a mapping to write the parameter file like below
Source à SQ à Exp à Target(Parameter file).
· In the expression transformation create a output port with expression as below
[Session Name]
$$LASTRUNDT = SESSSTARTTIME
Drag it and connect with the mapping parameter port.
· So here we are writing the last run timestamp to a mapping parameter file, which will be used in the next mapping in filter condition.
So how this will work, here it is during the 1st run u can either initialize the parameter value with some date or you can keep the date value in the parameter file itself so it will take the initial value for the filter condition and pull the record.
Next runs is the mapping parameter session which is going to write the last run timestamp to the parameter. This will be used during next cycle where IS will pick the $$LASTRUNDT parameter value from the parameter which was written by the previous cycle. This is how we can implement data pulling of last business day by using the parameter file.
Hope this was useful to all readers to implement such logic in there work.
Happy Learning!!!
Corrections and suggestions are always welcomed J
While using mapping parameter, should we not use parameter file? I think in general practice its done. And that is also difference between mapping variable & parameter.
ReplyDeleteNow I have a question here, if parameter file is used how parameter will be updated before next run. Is it done through shell scripts or another informatica map?
Informatica parameters has to be placed in paramater file, mapping variable is stored in informatica repository where as parameter will be in stored in infa param directory. This is the practice followed widely.
ReplyDeleteComing to how the parameter file will be updated with load date value, we need to overwrite the param file after each run using a expression transformation in a different mapping or in a same mapping in a different pipeline using the below syntax
[Session Name]
$$LASTRUNDT = $$SESSSTARTTIME
This will make sure when you run the session for next run it will pick incremental data from last run date by using the filter in source qualifier INVOICE_DATE >= TO_DATE(‘$$LASTRUNDT’,’ MM/DD/YYYY HH24:MI:SS’).
Hope this helped
Nice post! This blog gives very important info about BI tools
ReplyDeleteThanks for sharing Informatica Online Training
Thank you so much for this nice information. Hope so many people will get aware of this and useful as well. And please keep update like this.
ReplyDeleteBig Data Consulting Services
Data Lake Solutions
Advanced Analytics Services
Full Stack Development Solutions
Happy to know if this was useful to you.
Delete