Tuesday, August 6, 2013

Aggregator TF in Detail


Aggregator Transformation in Detail:-

1)        Aggregator TF: It is connected and an Active TF (Input # Output)

2)        Incremental Aggregation(IA): (In the session Level Properties)
Incremental Aggregation is the process of capturing the changes in the source and calculating the aggregations in a session. This process makes the integration service to update the target incrementally and avoids the process of calculating the aggregations on the entire source.
If the source changes after the first session execution, then in the next run the entire source data will not be consider for Aggregation process. Only the newly added source data will be considered for Aggregation process. This has been explained in the following section Understanding of IA. This will be useful, when the effect for Target will be minimal.

Note: we cannot use sorted input option for incremental aggregation (since the records come for SORTED INPUT as presorted for that particular run but in the historical cache (in case of IA) the data may not be in the sorted order).

3)        Reinitialize Aggregate Cache: (In the session Level Properties)
This will treat as the Aggregation process starts for the first time along with the wf session.
It will remove all the previous session information related to Aggregator. This has been explained in the following section Understanding of IA.

4)        Sorted Input: (In the session Level Mapping)
This option can be used to improve the session performance. You can use this option only when the input to the aggregator transformation in sorted on group by ports. The Integration Service assumes all the data is sorted by group and it performs aggregate calculations as it reads rows for a group.
The integration service uses memory to process the aggregator transformation and it does not use cache memory.

If you specify the sorted input option without actually sorting the data, then integration service fails the session.

IMP: How the NULL values are handled in Aggregator?

You can configure the integration service to treat null values in aggregator functions as NULL or zero. By default the integration service treats null values as NULL in aggregate functions.

        Understanding of IA:
                        After the end of the each session execution –
                                    


Notes:
·          If the Mapping has been changed and saved, then the Existing Aggregate Cache will be deleted and the new Aggregate cache will create.
·          If we configure the session by activating the option reinitialize the aggregate cache, then it will delete the old Index and Data files and will create the new files.
·          On decreasing the Partition/Moving/Deleting aggregated files, a new Cache will create.

IS ---à Integration Service: Integration service is nothing but the ETL process. The Integration Service starts one or more Integration Service processes to run and monitor workflows. When you run a workflow, the Integration service process starts and locks the workflow, runs the workflow tasks, and starts the process to run sessions. Informatica integration services to access, understand, transform, and manage their enterprise data assets. These
services profile, cleanse, match, transform, move, federate, and monitor the quality of data, wherever and in whatever form it resides, to ensure that the enterprise receives data that is consistent, accurate, and timely.
Aggregated Cache:
The Power Center server stores key values in the Index cache and output values in the Data cache.
Both the Index file and Data file used as to store historical data.
Index Cache (File) ---à
The index cache holds group information from the group by ports.
All Group by Columns are in AGGREGATOR INDEX CACHE.
Index Cache (default size) - 12MB (12,000,000 byte)
The value of index Cache can be increased up to 24MB.
Data Cache (File) ---à
DATA CACHE is generally larger than the AGGREGATOR INDEX CACHE.
Columns in Data Cache:
  • Variable ports if any
  • Non group by input/output ports.
  • Non group by input ports used in non-aggregate output expression.
  • Port containing aggregate function
Data Cache (default size) - 24MB (24,000,000 byte)

Nested Aggregate Functions:
You can nest one aggregate function within another aggregate function. You can either use single-level aggregate functions or multiple nested functions in an aggregate transformation. You cannot use both single-level and nested aggregate functions in an aggregator transformation. The Mapping designer marks the mapping as invalid if an aggregator transformation contains both single-level and nested aggregate functions. If you want to create both single-level and nested aggregate functions, create separate aggregate transformations.
Explanation: The nested aggregate function returns only one output row, whereas the single level aggregate function returns more than one row. Since the numbers of rows returned are not same, you cannot use both single level and nested aggregate functions in the same transformation. If you include both the single level and nested functions in the same aggregator, the designer marks the mapping or Mapplet as invalid. So, you need to create separate aggregator transformations.

Note: We can nest up to two levels only.

Examples: MAX(SUM(sales))

Conditional clauses:

You can reduce the number of rows processed in the aggregation by specifying a conditional clause.
Example: SUM(salary, slaray>1000)

This will include only the salaries which are greater than 1000 in the SUM calculation.

Non Conditional clauses:
You can also use non-aggregate functions in aggregator transformation.
Example: IIF( SUM(sales) <20000 o:p="" sales="" sum="">

Note: Without Aggregator TF we can take sum, by implementing the SORTing TF (if not having already sorted input) , ExpressionTF to set a FLAG and a Filter to filter the Flag = 1.


Performance Tuning on AggregatorTF:
1)        Unless it is really required, don’t go for the AggregatorTF.
2)        Try to use the aggregation in the source qualifier SQL query
(Exception: In case of source as a flat file / the aggregated fields are derived inside mapping)
3)        Ensure that all the columns used in the group by are sorted in the same order of group by to enhance the performance
4)        “Sorted Input” option should be checked in the aggregator properties to enhance the performance.
(Use sorted input: Sort the data before passing into aggregator).
5)        Filter the unwanted data before aggregating.
6)        Limit the number of input/output or output ports to reduce the amount of data the aggregator transformation stores in the data cache.


No comments:

Post a Comment