From DB2 v10, you can specify
that a portion of a LOB column will be stored in the base TableSpace. This
improves the performance of applications that access LOB data, because it
reduces the need to access the auxiliary LOB TableSpace. That also enables the
creation of expression-based indexes on the inline portion of a LOB column,
which can help improving the performance of searches through a LOB column [more
info on IBM website].
Inline LOBs means that, for LOBs
that are less than or equal to the size of the specified inline length, DB2
will store the entire LOB data in the base TableSpace, so more or less as if it
would be a VARCHAR. However, for LOBs that are bigger than the inline length,
the inline portion of the LOB resides in the base TableSpace, and the remainder
of the LOB in the LOB auxiliary TableSpace. In order to use this feature,
assuming you are currently using standard LOBs columns, you can alter existing
LOB columns to become inline (using an ALTER TABLE statement).
But, what is the optimal INLINE
LENGTH that you should specify ? The question is almost philosophical, since
the answer greatly depends on the LOB Lengths Distribution in the column, in
other words, it depends on the type of data that are stored in the LOB. And the
DBA creating the table does not necessarily know what it contains, nor what it
will contain…
Let’s examine 5 types of LOB
length distributions that could exist in a LOB column.
1st case
: almost all LOBs data are small, limited in size, with a few exceptions.
This case is almost too good to
be true, because the answer of the optimal INLINE LENGTH is obvious. You should
setup the INLINE LENGTH “threshold” so that it fits the “almost all small data”
:
2nd case
: the LOBs lengths are equally distributed, from the minimum length to the
maximum length
This case is also a no brainer,
and you will want to setup a threshold based on the percentage of LOBs row that
should be self-contained in the inline LOB. For example, a 80 / 20 distribution
as illustrated in the graph below:
3rd case
: the LOBS lengths is are linearly distributed (ascending or descending)
Similarly to the 2nd
case, you will want to setup a threshold based on the percentage of LOBs row
that should be self-contained in the inline LOB. For example, a 80 / 20
distribution as illustrated in the graph below:
4th case
: the LOBs length follow a normal distribution (Gauss).
This case might be more common,
but essentially (if you forget about the small amount of low length LOBs on the
left of the graph), you end up with a situation very similar as the 1st
Case, where you will want to set the Inline LOB Length very close to the end of
the Gauss curve. You might also use a 80 / 20 percent approach.
5th case
: the LOBs length distribution is sinusoidal
For this case, it is fairly
complicated to provide an educated suggestion as to where to set the Inline LOB
length “threshold”. In fact this looks like a succession of Gauss curves (4th
case), so you could setup the threshold after any of these curves as
illustrated below. Furthermore, you might want to analyze which data get
accessed more frequently, an thus determine after which curve to place the
threshold. Hopefully, this case will not be that common in real environements.
From
Theory to Practice …
The above describes distribution
models, which is nice in theory… Well, in practice, you will want some sort of
tool, to analyze existing length of LOB data, to understand in which case your
LOBs are, and adapt the INLINE length accordingly (or confirm the setup is good
the way it is). Read below …
The following SQL statement will
list the columns in tables which are using INLINE LOB technique in your
subsystem:
SELECT
TBCREATOR,TBNAME,"NAME",LENGTH,LENGTH2,COLTYPE
FROM
"SYSIBM".SYSCOLUMNS
WHERE
( COLTYPE =
'CLOB'
OR
COLTYPE =
'BLOB'
OR
COLTYPE = 'DBCLOB'
)
AND
LENGTH > 4
;
“LENGTH” > 4 indicates that the
LOB column uses the INLINE LOB technique, the actual Inline LOB Length is
LENGTH-4
“LENGTH2” reflects the maximum
length of the LOB column (inline + stored in Auxiliary)
As an example, the below test
was made on a SYSIBM catalog table SYSIBM.SYSVIEWS
that contains a LOB column PARSETREE, a 1GB BLOB
(1073741824) with Inline LENGTH 27674-4 = 27670
Here the sample SQL statement
that I used to visualize the length distribution of a LOB column. It simply
categorizes the various lengths of the LOBs into LOB length ranges.
SELECT
RANGE, COUNT(LENGTH) AS ROW_COUNT FROM
(
SELECT
LENGTH(PARSETREE) AS
LENGTH,
(CEIL(27670/100)*(1+CEIL( 100 * LENGTH(PARSETREE) / 27670))) AS RANGE
FROM
SYSIBM.SYSVIEWS) AS
TABLE_WITH_RANGE
GROUP
BY
RANGE
ORDER
BY RANGE
;
When exporting the output of the
query into Excel, one can make a simple graph representation of the LOB length
distribution and visualize if the INLINE LENGTH value is properly set. The
analyzes shows that we are pretty much in the scenario of 1st Case,
which I would assume to be the most common one anyway. The Inline LOB Length
“threshold” seems adequately placed, for this LOB column.
No comments:
Post a Comment