Tuesday, December 31, 2013

Introduction to Golden Gate

In 2009, Oracle acquired Golden Gate to strengthen its data replication technology as part of future strategies within data warehouse and real time transaction database. Before this acquisition, Oracle  had its own replication technology which mainly constitutes basic replication, advance replication and stream replication.

Oracle's basic replication process had two different methods: log based and trigger based. In log based replication, snapshot schema and DB links had to set up between source and target database. Data was transported from online redo log to target database over network.

In advance replication Oracle included multiple master replications from multiple environments and trigger based replication method.

In later versions of Oracle 9i, stream replication was introduced with improvements in its earlier replication technologies. It uses log based replication where committed transaction’s online redo log of source system is mined to target system over network. Also a new background process to manage communication and operations of replication activities has been introduced

However these replication technologies couldn't stand long as it involved complicacy and complex software development to harness the power of Oracle to Non Oracle database environment to allow the transaction to be moved between environment.

In 1990, a small software company Golden Gate came with different approach on data replication between cross platform. Instead of using different formats, Golden Gate implemented a uniform format and used a command prompt GGSCI (Golden Gate Software Command Interface) to perform data replication operation. Oracle Goldengate 11g supports both DDL and DML operation. Legacy Oracle 8i version or prior requires an upgrade to 9i or later to implement the Goldengate.

Goldengate supports all popular databases such as Oracle, MySQL, DB2, Sybase, Teradata, SQL server etc.

Goldengate contains three installable components which can be downloaded separately from Oracle's website and can be installed on both, source and target system depending on requirement. Below is the list of installable components for Goldengate:
  •  Oracle Goldengate which includes GGSCI, DEFGEN, LOGDUMP and REVERSE.
  • Oracle Goldengate Director which includes Director Service application, Monitor Agent, Director Client and Administrator client.
  • Oracle Goldengate Veridata which includes Agent, CLI, Web interface.

 Architecture

Manager Process
  • Start and manage Goldengate processes such as collector.
  • Manage port numbers.
  • Trail Management.
  • Create  error, event and reports.
  • Process command GGSCI.

Extract Process – Extract DML and DDL data from source database. 

Data Pump Process: Option process but recommended. Read records from source trails, process them and passes over to target system.

Collector Process: This background process runs on Target system.  It writes records on remote trail.

Replicate Process: Reads Remote trail, process them and applies changes to target Database.


Trail: It is a Goldengate intermittent file written in proprietary format by either extract process (source system) or collector background process (target system).

Wednesday, December 4, 2013

Introduction to Oracle Cluster Table


Table clustering is an optional technique of storing data. In table clustering, rows from one or more tables those are associated with each other on common data entity are physically stored adjacent to each other on same data block. This common entity is called ‘Cluster key’. Identifying the cluster key is the first step to implement the oracle table clustering.

Once good candidate for cluster key is identified, two more challenges come on the way to make it success. First challenge is avoid disk block chaining and second challenge is large variance or standard deviation.

The Disk Block chaining is very mush possible in cluster table. If combined size of all rows of tables in single row cluster exceeds the size of disk block that would lead to immediate disk block chaining.  Chaining is linking together two separate disk block.  Disk block chaining reduces the advantage of Oracle Cluster table.
Second Challenge come through variance in Cluster row size which causes either significant disk block chaining or wasted disk space.

When creating oracle cluster, an optional size parameter should be used to set the max number of cluster key with their corresponding rows allowed within disk block. Oracle divides available disk block space by this optional parameter to calculate the maximum no of cluster keys. In data warehouse system where data is almost static and does not grow, this optional parameter would not necessary. Whenever update or Insert statement executes this causes Cluster row gets larger. It is quite possible that Cluster row will significantly increase in size. If accumulated increase in size of all cluster rows consumes all the available remaining space within a disk block, Disk chaining will happen. This is undesirable and require expensive disk seek operation to access it.
On other hand large variance in the size of cluster rows can cause either disk chaining or either wastage of disk space. By looking both these challenges in Oracle cluster table optional size parameter play a vital role. If average cluster row size is small, there will be more Cluster Row with in disk block. On the other hand average cluster row size is big which will reduce the MAXIMUM limit.

In select statement execution sorting and join filter are almost mandatory operation performed before the result gets displayed.  By introducing Oracle Cluster Table both these operations skip from query execution. The fields of the cluster key are more often indexed, so high maintenance indexes for these fields are reduced.
Example of crusting
EMP and DEPT table are associated to each other via DEPTNO column and it is primary key

Query execution for heap table is explained below. In later section Oracle Cluster table is explained in detail.
SELECT E.EMPNO,
         E.ENAME,
         E.SAL,
         E.COMM,
         E.HIREDATE,
         D.DEPNTO,
         D.DNAME,
         D.LOC
    FROM EMP E, DEPT D
   WHERE E.DEPTNO = D.DEPTNO
ORDER BY D.DEPTNO;

The IO operation of the above query will involve following steps.
  1. Traversing DEPTNO index to get the next dept no.
  2. Fetch rows from DEPT table that correspond to DEPTNO using ROWID identified from Index  scan. 
  3. Find each row from EMP table correspond to DEPTNO unless there is another index available on DEPTNO.
  4. Fetch Row from EMP table.
 
Once the cluster table created, the IO operation of the above query will involve below steps.
  1. Traversing through DEPTNO ( cluster key) index. 
  2. Fetch cluster row, consisting row from DEPT table and all rows from EMP table correspond to DEPTNO.

Create cluster, Table and Index

CREATE CLUSTER CLUS_EMP_DEPT (DEPTNO NUMBER (3)) -- Cluster Key
    SIZE 600                      -- Average Cluster Row size in bytes
    TABLESPACE MY_TABLESPACE      --if table space omitted then it
    STORAGE (INITIAL 500K         -- will create in default tablespace
             NEXT    500K        -- Cluster index is created by
             MINEXTENTS 2        -- default
             PCTINCREASE 20);

CREATE TABLE EMP
(
   EMPNO    NUMBER (5) PRIMARY KEY,
   ENAME    VARCHAR2 (15) NOT NULL,
   DEPTNO   NUMBER (3)    REFERENCES DEPT
)
CLUSTER CLUS_EMP_DEPT ( DEPTNO );
 
CREATE TABLE DEPT
(
   DEPTNO   NUMBER (3) PRIMARY KEY,
   DNAME    VARCHAR2 (20),
   LOC      VARCHAR2 (30)
)
CLUSTER CLUS_EMP_DEPT ( DEPTNO );
 
CREATE INDEX CLUS_EMP_DEPT_INDEX                          -- explicitly create
   ON CLUSTER CLUS_EMP_DEPT                               -- index if required
   TABLESPACE MY_INDEX_TABLESPACE
   STORAGE (INITIAL 80 K NEXT 50 K MINEXTENTS 2 MAXEXTENTS 12 PCTINCREASE 20);
 
 

External Table

Oracle External Table External tables are defined as tables that do not resides in the database allows you to access data that is stor...