Saturday, July 3, 2010

Oracle GoldenGate Best Practices and Tips

Lately I've been working, once again, with GoldenGate (now Oracle GoldenGate) data integration software. GoldenGate offers tremendously useful capabilities which include CDC (Change Data Capture), Data Warehouse ETL, efficient/low impact data replication from diverse database management systems, real time standby database maintenance (for high-availability, upgrades and patches, feeding Oracle Data Integrator (ODI) and data distribution. So, I thought I'd offer some GoldenGate Best Practices and Tips that I've learned largely by making mistakes:

I. Best Practices

PARALLEL PROCESSING

Ensure the system has enough shared memory. GoldenGate runs as an Oracle process. Each Extract or Replicat process requires upwards of 25-50 MB of system shared memory. This means less memory for the Oracle DBMS, especially the SGA.

Use parallel Replicat groups on the target system to reduce latency thru parallelism. Consider parallel Extract groups for tables that are fetch intensive (e.g., those that trigger SQL procedures).

Group tables that have R.I. to each other in the same Extract-Replicat pair.

Pair each Replicat with its own trail and corresponding Extract process.

When using parallel Replicats, configure each one to process a different portion of the overall data.

PASSTHRU PARAMETER

Consider using this parameter if there is no filtering, conversion or mapping required and you’re using DATAPUMP.

In pass-through mode, the Extract process does not look up table definitions, either from the database or from a data definitions file. Pass-through mode increases the throughput of the data pump, because all of the functionality that looks up object definitions is bypassed.
This saves database fetches to improve performance.

INSERTAPPEND

A new GoldenGate 10.4 feature.

Use for large transactions .

Puts records at end of table rather than doing a more costly insert into other areas of table.


DATAPUMP (not the Oracle DB utility)

1. Primary Extract group writes to a trail on the source system.

2. Reads this trail and sends the data across the network to a remote
trail on the target.

3. Adds storage flexibility and also serves to isolate the primary
Extract process from TCP/IP activity.

4. Can be configured for online or batch.

5. Can perform data filtering, mapping, and conversion, or it can be
configured in pass-through mode, where data is passively
transferred as-is, without manipulation.

6. Use to perform filtering thereby removing that processing overhead
from the primary extract group.

7. Use one or more pumps for each source and each target for
parallelism.

In most business cases, it is best practice to use a data pump. Some reasons for using a data pump include the following:

● Protection against network and target failures:
In a basic GoldenGate configuration, with only a trail on the target system, there is nowhere on the source system to store data that the Extract process continuously extracts into memory. If the network or the target system becomes unavailable, the primary Extract could run out of memory and abend. However, with a trail and data pump on the source system, captured data can be moved to disk, preventing the abend. When connectivity is restored, the data pump extracts the data from the source trail and sends it to the target system(s).

● You are implementing several phases of data filtering or transformation. When using complex filtering or data transformation configurations, you can configure a data pump to perform the first transformation either on the source system or on the target system,
and then use another data pump or the Replicat group to perform the second transformation.

● Consolidating data from many sources to a central target. When synchronizing multiple source databases with a central target database, you can store extracted data on each source system and use data pumps on each of those systems to send the data to a trail
on the target system. Dividing the storage load between the source and target systems reduces the need for massive amounts of space on the target system to accommodate data arriving from multiple sources.

● Synchronizing one source with multiple targets. When sending data to multiple target systems, you can configure data pumps on the source system for each target. If network connectivity to any of the targets fails, data can still be sent to the other targets


STEP BY STEP Datapump Configuration

ON THE SOURCE SYSTEM

To configure the Manager process . (Reference: Oracle GoldenGate Administration Guide, Version 10.4):

1. On the source, configure the Manager process according to the instructions in Chapter 2.

2. In the Manager parameter file, use the PURGEOLDEXTRACTS parameter to control the purging of files from the local trail.

To configure the primary Extract group:

3. On the source, use the ADD EXTRACT command to create a primary Extract group. For documentation purposes, this group is called ext.

ADD EXTRACT , TRANLOG, BEGIN