LEAD and LAG analytical functions provide access to more than one row of a table at the same time, without the need for a self join. LEAD allows access to a row at a physical offset beyond the current row and LAG allows access to a row at a physical offset prior to the current row.Here are a couple of examples which show
LEAD and LAG in action.Upper and Lower Bounds
TheSALGRADE table, supplied as part of the Oracle demo schema, holds lower and upper salary values for a grade as follows.Another way to maintain this information is to store only the upper bound and derive the lower bound from the the previous upper bound plus one.SELECT *
FROM salgrade
ORDER BY grade;GRADE LOSAL HISAL
----------- ------------ ----------
1 1 1200
2 1201 1400
3 1401 2000
4 2001 3000
5 3001 99995 rows selected.
First, we create our new table containing just the grade and high salary values.
UsingCREATE TABLE new_salgrade AS
SELECT grade, hisal
FROM salgrade;CREATE TABLE succeeded.SELECT *
FROM new_salgrade;GRADE HISAL
----------- -----------
1 1200
2 1400
3 2000
4 3000
5 99995 rows selected.
LAG we can construct a view which will produce the data in the same format as the original table.One advantage with this technique is that the table is easier to maintain. If a salary grade is changed, then only theCREATE VIEW salgrade_v AS
SELECT
grade,
LAG(hisal, 1, 0) OVER (ORDER BY hisal) + 1 AS losal,
hisal
FROM new_salgrade;CREATE VIEW succeeded.SELECT *
FROM salgrade_v
ORDER BY grade;GRADE LOSAL HISAL
---------- ---------- -----------
1 1 1200
2 1201 1400
3 1401 2000
4 2001 3000
5 3001 99995 rows selected.
HISAL on one row needs to be updated. The LOSAL on the next row is adjusted automatically.UPDATE new_salgrade
SET hisal = 2200
WHERE grade = 3;1 rows updatedCOMMIT;Commit complete.SELECT *
FROM salgrade_v;GRADE LOSAL HISAL
----------- ---------- -------------
1 1 1200
2 1201 1400
3 1401 2200
4 2201 3000
5 3001 99995 rows selected.
External Log File
The next example processes metric data which has been collected and recorded in an external log file.The contents of the file look like this.
Suppose we are required to produce the following report with one line for each metric sample.21-sep-07 13:51:12
type: metric
low: 33
avg: 121
high: 23021-sep-07 19:23:55
type: metric
low: 66
avg: 75
high: 12021-sep-07 23:40:17
type: metric
low: 13
avg: 98
high: 55422-sep-07 01:39:22
type: metric
low: 121
avg: 375
high: 52022-sep-07 10:57:15
type: metric
low: 43
avg: 99
high: 148
We will use an external table to access the contents of the file.recorded_at duration low avg high
-------------------- ------------- ---- ---- ----
dd-mon-yy hh:mi:ss 9 9:99:99.9 999 999 999
dd-mon-yy hh:mi:ss 9 9:99:99.9 999 999 999
dd-mon-yy hh:mi:ss 9 9:99:99.9 999 999 999
Now we can start to construct our SQL statement.CREATE OR REPLACE DIRECTORY tmp_dir AS '/tmp';Directory created.CREATE TABLE event_log (
rec VARCHAR2(200))
ORGANIZATION EXTERNAL(
DEFAULT DIRECTORY tmp_dir
ACCESS PARAMETERS(
records delimited by newline
fields
missing field VALUES ARE NULL
reject rows with all null fields)
LOCATION('event.log'));Table created.SELECT *
FROM event_log
WHERE rownum <=10;REC
-----------------------------
21-sep-07 13:51:12
type:
metric
low: 33
avg: 121
high: 230
21-sep-07 19:23:55
type:
metric
low: 66
avg: 75
high: 12010 rows selected.
First, we will remove any labels from the records using
REGEXP_REPLACE. Labels are identified as a single word at the start of a line followed by a colon and a space.Next, we use theSELECT REGEXP_REPLACE(rec,'^[[:alpha:]]*: ', null) rec
FROM event_log
WHERE rownum <= 10;REC
-----------------------------
21-sep-07 13:51:12
metric
33
121
230
21-sep-07 19:23:55
metric
66
75
12010 rows selected
LEAD function to to merge several records from the file into a single row for each sample.And finally, now we have all the required columns with the correct data types, we can use theSELECT
TO_TIMESTAMP(recorded_at, 'dd-mon-yy hh24:mi:ss') recorded_at,
TO_NUMBER(low) low,
TO_NUMBER(avg) avg,
TO_NUMBER(high) high
FROM (
SELECT
rec,
LAG(rec, 1, null) OVER (ORDER BY rownum) recorded_at,
LEAD(rec, 1, null) OVER (ORDER BY rownum) low,
LEAD(rec, 2, null) OVER (ORDER BY rownum) avg,
LEAD(rec, 3, null) OVER (ORDER BY rownum) high
FROM (
SELECT REGEXP_REPLACE(rec,'^[[:alpha:]]*: ', null) rec
FROM event_log
)
)
WHERE rec = 'metric';recorded_at low avg high
------------------ ------ ------ ------
21-SEP-07 13.51.12 33 121 230
21-SEP-07 19.23.55 66 75 120
21-SEP-07 23.40.17 13 98 554
22-SEP-07 01.39.22 121 375 520
22-SEP-07 10.57.15 43 99 1485 rows selected.
LAG function to calculate the duration between the current sample and the previous one.So there we have it. In this example, we read an external file, stripped out the labels and and usedSELECT
recorded_at,
recorded_at - LAG(recorded_at, 1, null)
OVER (ORDER BY recorded_at) duration,
low,
avg,
high
FROM (
SELECT
TO_TIMESTAMP(recorded_at, 'dd-mon-yy hh24:mi:ss') recorded_at,
TO_NUMBER(low) low,
TO_NUMBER(avg) avg,
TO_NUMBER(high) high
FROM (
SELECT
rec,
LAG(rec, 1, null) OVER (ORDER BY rownum) recorded_at,
LEAD(rec, 1, null) OVER (ORDER BY rownum) low,
LEAD(rec, 2, null) OVER (ORDER BY rownum) avg,
LEAD(rec, 3, null) OVER (ORDER BY rownum) high
FROM (
SELECT REGEXP_REPLACE(rec,'^[[:alpha:]]*: ', null) rec
FROM event_log
)
)
WHERE rec = 'metric'
);recorded_at duration low avg high
------------------- ------------ ------ ----- ------
21-SEP-07 13.51.12 33 121 230
21-SEP-07 19.23.55 0 5:32:43.0 66 75 120
21-SEP-07 23.40.17 0 4:16:22.0 13 98 554
22-SEP-07 01.39.22 0 1:59:5.0 121 375 520
22-SEP-07 10.57.15 0 9:17:53.0 43 99 1485 rows selected.
LEAD to pivot several rows into columns. Then we used LAG to calculate the duration between adjacent samples.


0 comments:
Post a Comment