Posts

Showing posts from July, 2013

Implementing SDC 2 with MD5/CRC32 function(CHECKSUM)

Image
In general when requirement come to implement SCD type 2, we look out for any date field in the Target. If not date field we go for SCD type 2 with flag column, still we can go for SCD type two with versioning. This all scenario holds good when there is a date column or flag column in the table it’s easy for a developer to implement SCD type2. Let’s say we come across a situation where there are no date columns or flag column, then how to implement SCD type 2. Usually we come across such situation when you are dealing with very old legacy system which doesn’t have such fields in the table. Then the solution is to generate hash code for the entire row and use it as identifier column for reference. To achieve this informatica allow us to use hash function namely MD5 () – Message Digest and CRC32 () – Cyclic Redundancy Check. The hash code in ETL perspective is referred as checksum value Now we will try this in a simple example to see how does it works. I am considering a simple fla...

Several ways to pull Incremental Data from source - Informatica

We often come across a scenario where we are asked to pull yesterday’s records or the latest records after the last run to avoid unnecessary fetching of old records from the source. We can think of SYSDATE -1 here to solve this but it’s not always correct in all cases suppose the workflow was scheduled in such a way that it should only run on business days and imagine you are pulling INVOICE related data so source table are updated on only business days , So if we run the workflow on Monday and if we have used SYSDATE-1 logic we end up in pulling Sundays records were as Sunday there was no Sales we are wrong here. We need to pull last business day record ie either Friday or Saturday depending on the business. Also we end up with same situation when there are any holidays. We can pull last business day invoices from many ways, here I am going to share the two ways I have came across. 1.       Mapping Variable: I believe it’s the easiest one of all wh...

Data Cleansing: Handle Non Printable and Special Characters – Informatica

Data cleansing is important aspect while performing ETL. There are many cases were we get non printable and special characters in the input file or table. Informatica Regular expression and function are very handy here. So here is how we can avoid Non Printable and Special Characters when we are loading the target tables. Use REGREPLACE function for handling Non Printable characters and REPLACESTR for replacing multiple special characters in the field. Examples: Non Printable : Syntax : 1.   REG_REPLACE( subject, pattern, replace, numReplacements ) Argument Required/Optional Description subject Required String datatype. Passes the string you want to search. pattern Required String datatype. Passes the character string to be replaced. You must use perl compatible regular expression syntax. Enclose the pattern in single quotes. replace Required String datatype. Passes the new character string. numReplacements Optional Numeri...