Tuesday, 1 July 2008

JDBC Batch Updates with Oracle

The JDBC batch update feature allows multiple SQL statements to be submitted to Oracle at the same time. This can have a significant impact on performance by reducing network round trips.

With Oracle, there are two batch update techniques; standard JDBC batch updates and Oracle batch updates. We will look at both of these and demonstrate the performance benefits of each.

This simple table is used for our tests.
CREATE TABLE batch_test
( id NUMBER,
name VARCHAR2(50)
);
Table created.

Without Batching

First we will execute an INSERT statement ten thousand times without batching. This will give us a baseline to compare with the batched examples later on.

Autocommit has been disabled (as it always should be) and the updates are committed explicitly after one hundred rows.
public static void testNoBatching() throws SQLException {
    Connection con = ConnectionFactory.getNewConnection();
PreparedStatement pstmt =
con.prepareStatement("INSERT INTO batch_test (id, name) " +
"VALUES (?, rpad('x', 50, 'x'))");
    long start = System.currentTimeMillis();
for (int i = 1; i <= 10000; i++) {
pstmt.setInt(1, i);
pstmt.executeUpdate();
if (i % 100 == 0) {
con.commit();
}
}
    System.out.printf("Inserting 10,000 rows took %4.2f seconds\n",  
(System.currentTimeMillis() - start) / 1000f);

pstmt.close();
con.close();
}
------------------------------------------------------------------------------
Inserting 10,000 rows took 12.83 seconds

This test uses a PreparedStatement and bind variables to ensure that the SQL is executed as efficiently as possible. Even so, without batching, the inserts took nearly 13 seconds.

Standard JDBC Batch Updates

With standard JDBC batch updates, the SQL statements are batched and submitted explicitly.
public static void testStandardBatching() throws SQLException {
    Connection con = ConnectionFactory.getNewConnection();
PreparedStatement pstmt =
con.prepareStatement("INSERT INTO batch_test (id, name) " +
"VALUES (?, rpad('x', 50, 'x'))");
    if (!con.getMetaData().supportsBatchUpdates()) {
throw new SQLException("Batch updates not supported");
}
    long start = System.currentTimeMillis();
for (int i = 1; i <= 10000; i++) {
pstmt.setInt(1, i);
pstmt.addBatch();
if (i % 100 == 0) {
int[] updateCounts = pstmt.executeBatch();
con.commit();
}
}
    System.out.printf("Inserting 10,000 rows took %4.2f seconds\n",  
(System.currentTimeMillis() - start) / 1000f);
    pstmt.close();
con.close();
}
------------------------------------------------------------------------------
Inserting 10,000 rows took 0.40 seconds
The addBatch method adds an SQL statement to the current batch. At an appropriate point the batch is submitted to Oracle by calling the executeBatch method. Here we submit the batch after one hundred inserts, just before we commit. executeBatch returns an array of integers which contains the number of rows affected by each statement in the batch.

This time, the entire test took less than half a second, a fraction of the time taken to perform the original test.

Oracle Batch Updates

Oracle update batching takes a different approach. The batch size is set on the OraclePreparedStatement object and SQL statements are implicitly batched by the JDBC driver until the batch size is reached. At this point the SQL is automatically submitted to Oracle.
public static void testOracleBatching() throws SQLException {
    Connection con = ConnectionFactory.getNewConnection();
PreparedStatement pstmt =
con.prepareStatement("INSERT INTO batch_test (id, name) " +
"VALUES (?, rpad('x', 50, 'x'))");
                                 
((OraclePreparedStatement)pstmt).setExecuteBatch (100);
long start = System.currentTimeMillis();
for (int i = 1; i <= 10000; i++) {
pstmt.setInt(1, i);
int updateCount = pstmt.executeUpdate();
if (i % 100 == 0) {
con.commit();
}
}
    System.out.printf("Inserting 10,000 rows took %4.2f seconds\n",  
(System.currentTimeMillis() - start) / 1000f);
    pstmt.close();
con.close();
}
------------------------------------------------------------------------------
Inserting 10,000 rows took 0.39 seconds
This time we implement batching with a single call to the setExecuteBatch method on OraclePreparedStatement. The update count returned from executeUpdate will return zero if the call is adding the statement to the batch, or the total number of rows affected by all the update statements if the batch is being submitted to Oracle. Once again, with the batch size set to one hundred, the ten thousand rows are inserted in less than half a second.

Summary

These examples show that there can be a significant improvement in performance when batch updates are used rather than single statement updates. In our tests, execution times were reduced from over twelve seconds to less than half a second when batching was introduced.

Of course, not every application is suited to batch updates but, when used appropriately, the performance benefits can be considerable.

Monday, 30 June 2008

Using Byte and Char with Character Datatypes

Character datatypes such as CHAR, VARCHAR2 and CLOB can be defined using byte or character length semantics as follows.
CREATE TABLE t(x VARCHAR2(10),   
y VARCHAR2(20 BYTE),
z VARCHAR2(30 CHAR));
Table created.
If CHAR or BYTE is not specified then the value defined by the NLS_LENGTH_SEMANTICS parameter is used as the default. We can check the current value of this parameter as follows.
SELECT value
FROM v$nls_parameters
WHERE parameter = 'NLS_LENGTH_SEMANTICS';
VALUE                                                            
---------------
BYTE
1 rows selected
In the above example, table t was created with columns x and y with maximum lengths of 10 and 20 bytes respectively and column z with a maximum length of 30 characters.

But what does this actually mean?

Character Set

During database creation, the character set to be used by CHAR, VARCHAR2 and CLOB datatypes is specified. These are some of the values available for selection.

US7ASCII ASCII 7-bit American (SB)
WE8ISO8859P1 ISO 8859-1 West European 8-bit (SB)
EE8ISO8859P2 ISO 8859-2 East European 8-bit (SB)
JA16VMS JVMS 16-bit Japanese (MB)
KO16KSCCS KSCCS 16-bit Korean (MB)
AL16UTF16 Unicode 3.2 UTF-16 Universal character set (MB)
AL32UTF8 Unicode 3.2 UTF-8 Universal character set (MB)

SB = Single Byte
MB = Multi Byte
The character set for a database can be determined with the following SQL.
SELECT value
FROM v$nls_parameters
WHERE parameter = 'NLS_CHARACTERSET';
VALUE                                    
---------------
WE8ISO8859P1
1 rows selected

When using a single byte character set such as WE8ISO8859P1, a character is always held in a single byte so specifying the BYTE or CHAR qualifier has no effect on storage allocation. However, with a multi-byte character set such as EE8ISO8859P2, a single character may require up to 4 bytes of storage. In this case it may be useful to specify the length of the column in characters and let Oracle figure out how many bytes to allocate.

Consideration must be given to the maximum number of bytes that can be allocated. When using BYTE semantics the maximum is 2000 for CHAR datatypes, and 4000 for VARCHAR2. However, when using the CHAR length qualifier the maximum size that can be specified may be reduced in order to accommodate the character set.

National Character Set

The Unicode or national character set is also specified when creating the database. For example.


AL32UTF8 UTF-8
AL16UTF16 UTF-16

This setting is used by the Unicode datatypes NCHAR, NVARCHAR2 and NCLOB and the size of the column is always specified in characters.

Use the following SQL to determine the national character set details for a database.
SELECT VALUE
FROM nls_database_parameters
WHERE parameter = 'NLS_NCHAR_CHARACTERSET';
VALUE                                    
---------------
UTF8
1 rows selected

Fast Dual in Oracle

Oracle 10g introduces FAST DUAL. Now, when the the DUAL table is accessed, no logical I/O is performed which improves performance.

We can demonstrate this by generating a million rows using the DUAL table.
set timi on
set autot trace stat
SELECT rownum FROM dual CONNECT BY LEVEL <= 1000000;
1000000 rows selected.
Elapsed: 00:00:08.07
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
14380219 bytes sent via SQL*Net to client
733838 bytes received via SQL*Net from client
66668 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1000000 rows processed
The statistics trace shows that 1 million rows have been processed without any logical I/O being performed.

Saturday, 14 June 2008

Oracle Quick Tip: Divide a result set into groups using NTILE

The NTILE analytic function divides a result set into a specified number of groups or buckets.

The following SQL takes a generated result set and, using NTILE, adds a column containing three distinct values with an equal number of rows for each value.
SELECT id, val, ntile(3) OVER(ORDER BY id) nt
FROM (
SELECT rownum id, 'xyz' || rownum val
FROM dual
CONNECT BY LEVEL <=12
)
ORDER BY id;
ID       VAL          NT
-------- ------------ ---------
1 xyz1 1
2 xyz2 1
3 xyz3 1
4 xyz4 1
5 xyz5 2
6 xyz6 2
7 xyz7 2
8 xyz8 2
9 xyz9 3
10 xyz10 3
11 xyz11 3
12 xyz12 3
12 rows selected

Friday, 13 June 2008

Oracle Flashback Table

The FLASHBACK TABLE command can be used to restore an earlier state of a table in the event of human or application error.

Restoring a dropped table

The first use of FLASHBACK TABLE is to retrieve a table from the recycle bin after it has been dropped.
DROP TABLE dept;
Table dropped.

SELECT *
FROM dept;
SQL Error: ORA-00942: table or view does not exist

SELECT object_name, original_name
FROM recyclebin;
OBJECT_NAME                    ORIGINAL_NAME                    
------------------------------ --------------------------------
BIN$QzJJ24I1hx7gQAAK+wFg5Q==$0 DEPT
1 rows selected

FLASHBACK TABLE dept TO BEFORE DROP;
Flashback complete.

SELECT *
FROM dept;
DEPTNO                 DNAME          LOC           
---------------------- -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
4 rows selected
The RENAME TO clause can be used to give the table a new name during the restore. This is useful if a new table has been created with the same name since the old one was deleted.
FLASHBACK TABLE dept TO BEFORE DROP RENAME TO dept_before_drop;
Flashback complete.
Any indexes or triggers associated with the table will still have their recycled names and will need to be re-instated manually. Also, views and procedures which rely on the table will have been invalidated and will need to be recompiled.

Restoring a table to an earlier state

FLASHBACK TABLE can also be used to revert the state of a table back to a specified time or SCN.

Let's see this in action using the EMP table in the Oracle demo schema.
SELECT deptno, count(*)
FROM emp
GROUP BY deptno;
DEPTNO       COUNT(*)               
------------ --------------
10 3
20 5
30 6
3 rows selected
Suppose we wish to delete all entries for department 10 but we mistype the SQL and commit before we notice our mistake.
DELETE FROM emp
WHERE deptno >= 10;
14 rows deleted

COMMIT;
Commit complete.

SELECT deptno, count(*)
FROM emp
GROUP BY deptno;
DEPTNO       COUNT(*)               
------------ --------------
0 rows selected
All the rows have been deleted and the transaction committed. But fear not, flashback technology was introduced to deal with just this kind of problem. We can restore the table back to the state it was in, say, one minute ago. Before we can do this, however, we must ensure that row movement is enabled on the table.
ALTER TABLE emp ENABLE ROW MOVEMENT;
Table altered.

FLASHBACK TABLE emp TO TIMESTAMP systimestamp - INTERVAL '1' MINUTE;
Flashback complete.

SELECT deptno, count(*)
FROM emp
GROUP BY deptno;
DEPTNO       COUNT(*)               
------------ --------------
10 3
20 5
30 6
3 rows selected
The table has been 'rewound' to a prior point in time. All the rows that existed on the table one minute ago have been restored. It is as if the delete never happened.

Alternatively, we could restore to a particular SCN. To use this method we must know the exact SCN we wish to restore to, so we will need to record the value before the update is attempted.
SELECT current_scn
FROM v$database;
CURRENT_SCN            
----------------------
3779819703
1 rows selected

DELETE FROM emp
WHERE deptno >= 10;
14 rows deleted

COMMIT;
COMMIT succeeded.

FLASHBACK TABLE emp TO SCN 3779819703;
FLASHBACK TABLE succeeded.

SELECT deptno, count(*)
FROM emp
GROUP BY deptno;
DEPTNO       COUNT(*)               
------------ --------------
10 3
20 5
30 6
3 rows selected

Restore Point

Oracle 10g Release 2 introduced the concept of restore points. A restore point is a name which is associated with the SCN of the database at the time the restore point was created. We can reference a restore point name in the FLASHBACK TABLE command.

For example.
CREATE RESTORE POINT before_changes;
Restore point created.
If we run in to any problems during our updates, we can simply rewind one or more tables back to the restore point.
DELETE FROM emp
WHERE deptno >= 10;
14 rows deleted

COMMIT;
COMMIT succeeded.

FLASHBACK TABLE emp TO RESTORE POINT before_changes;
FLASHBACK TABLE succeeded.

SELECT deptno, count(*)
FROM emp
GROUP BY deptno;
DEPTNO       COUNT(*)               
------------ --------------
10 3
20 5
30 6
3 rows selected

Restrictions

There are are some rules and restrictions to be aware of when using FLASHBACK TABLE.
  • This feature is only available with Oracle Enterprise Edition.
  • You will need the appropriate privileges (such as FLASHBACK ANY TABLE) to execute these commands. See the Oracle documentation for more details.
  • Oracle may restore rows to a different location and allocate new rowids. Any references to the original rowids will no longer be valid.
  • There must be sufficient redo data available in the database to restore the data. This can be guaranteed for a predetermined period of time if the UNDO_RETENTION initialization parameter is used.
  • You cannot restore a table to an earlier state across any DDL operations that change the structure of the table (including TRUNCATE).
  • Restore points were only introduced at 10g Release 2.

Precision and scale do not specify column size

Precision and scale do not determine the physical storage allocation for a number, only the maximum size of the value that can be stored in it. A NUMBER column will occupy as much or as little space as it needs to, up to a maximum of 22 bytes. The physical size will grow and shrink depending on the size of the number to be held. In this sense, it behaves in a similar way to a VARCHAR2 datatype which also dynamically allocates enough bytes to hold the value up to the maximum specified size.

To demonstrate this we will use the VSIZE function which returns the number of bytes in the internal representation of a column.

CREATE TABLE t (
num_col NUMBER(16)
);
CREATE TABLE succeeded.

INSERT INTO t (num_col) 
VALUES (1234567890);
1 rows inserted

SELECT num_col, VSIZE(num_col) 
FROM t;
NUM_COL                VSIZE(NUM_COL)         
---------------------- ----------------------
1234567890 6
1 rows selected
If we change the value we see that the internal size, the number of bytes required to hold our number, has changed.
UPDATE t 
SET num_col = 1;
1 rows updated

SELECT num_col, VSIZE(num_col) 
FROM t;
NUM_COL                VSIZE(NUM_COL)         
---------------------- ----------------------
1 2
1 rows selected
So when we specify a precision and scale for a number we are not telling Oracle to reserve a fixed number of bytes for every number in the column. We are only providing the maximum size and the number of decimal places. Oracle will store the number in the minimum number of bytes necessary to hold the value.

Oracle Quick Tip: Setting the Session Date Format

The date or timestamp format for the current session can be altered with the following commands.
ALTER SESSION SET NLS_DATE_FORMAT = [format model];
ALTER SESSION SET NLS_TIMESTAMP_FORMAT = [format model];
The current settings for the session can seen with this SQL.
SELECT *
FROM v$nls_parameters
WHERE parameter IN ('NLS_DATE_FORMAT', 'NLS_TIMESTAMP_FORMAT');
PARAMETER             VALUE                                                     
--------------------- -----------------------------
NLS_DATE_FORMAT DD-MON-RR
NLS_TIMESTAMP_FORMAT DD-MON-RR HH24.MI.SSXFF
2 rows selected
This can be seen when a date and timestamp are displayed.
SELECT sysdate "Date", to_timestamp(sysdate) "Timestamp" FROM dual;
Date          Timestamp  
------------- -----------------------------------
13-JUN-08 13-JUN-08 22.29.27.000000000
1 rows selected
To change the format, issue the ALTER SESSION command and specify the new format model.
ALTER SESSION SET nls_date_format = 'dd-mon-yyyy hh24:mi:ss';
Session altered.

ALTER SESSION SET nls_timestamp_format = 'dd-mon-yy hh:mi:ss';
Session altered.
The date and timestamp are now displayed with the revised format.
SELECT sysdate "Date", to_timestamp(sysdate) "Timestamp" FROM dual;
Date                  Timestamp  
--------------------- -----------------------------------
13-jun-2008 22:31:21 13-jun-08 10:31:21
1 rows selected