Monday, November 25, 2013

List Indexes and IX Columns for a given table

If you ever wanted to see the indexes associated to a given table, the SYSIBM.SYSINDEXES catalog table is for you.
And if you want to see the columns that are indexed, this information is stored in yet another catalog table, SYSIBM.SYSKEYS, but this catalog table does not contain a direct relation to the table.

So, if you want to see all related Indexes and IX Columns for a given table, you must do a JOIN. So far so good, except that you end up with one row, for each columns of the indexes.
That is, if you have a table with 2 indexes, that reference respectively 3 and 5 columns of the table, you will get 8 rows.

   Table  Index  ColName  ColSort
   TB1    IX1    COL1     A
   TB1    IX1    COL2     D
   TB1    IX1    COL3     A
   TB1    IX2    COL1     A
   TB1    IX2    COL5     A
   TB1    IX2    COL2     A
   TB1    IX2    COL3     A
   TB1    IX2    COL7     A

I wanted to simplify the results to get only 2 rows (one per index), that will contain the list of columns, something like:

   Table  Index  Columns
   TB1    IX1    COL1,A,COL2,D,COL3,A
   TB1    IX2    COL1,A,COL5,A,COL2,A,COL3,A,COL7,A

Browsing the internet, I learnt that a special GROUP_CONCAT(expr) function can be used with the GROUP BY syntax, but it does not work with DB2 for z/OS.
Similar results could be archived programmatically, or using a UDF, or … using the new XML functions now available in DB2 for z/OS.
Since I never really played with XML functions, that was a good reason to start!

The following SQL, tested under DB2 version 10, provides the expected results. All the magic happens via the function XMLAGG. The rest (XML2CLOB, CAST, TRIM, or other REPLACEs) is just here to make the results “nicer”.
If you want to use it, just replace the table name (in red), and execute!

  SELECT I.CREATOR, I.TBNAME, I.CREATOR, I.NAME, I.UNIQUERULE,  
  TRIM(L ',' FROM REPLACE(REPLACE(CAST(XML2CLOB(                
  XMLAGG(XMLELEMENT(NAME "C", K.COLNAME, ',', K.ORDERING)       
       ORDER BY K.COLSEQ)                                     
  ) AS VARCHAR(2000)),'<C>',','),'</C>','')) AS "COLUMNS"       
  FROM  SYSIBM.SYSINDEXES I                                     
  INNER JOIN SYSIBM.SYSKEYS K                                   
  ON I.CREATOR = K.IXCREATOR AND I.NAME = K.IXNAME              
  WHERE I.TBNAME = 'PIN_ACCOUNT'                                
  GROUP BY I.TBCREATOR, I.TBNAME, I.CREATOR, I.NAME, I.UNIQUERULE
  ORDER BY I.TBCREATOR, I.TBNAME, I.CREATOR, I.NAME, I.UNIQUERULE ;

Sample output

  CREATOR   TBNAME       CREATOR   NAME      UNIQUERULE  COLUMNS
  $APEPCDB  PIN_ACCOUNT  $APEPCDB  XPEP0180  P           ACCTNUM,A,PIN_COMPANY_ID,A
  $APEPCDB  PIN_ACCOUNT  $APEPCDB  XPEP0280  D           BANK_NUMBER,A


Monday, October 21, 2013

Improve your embedded SQL statements

Review this 3 minute video and discover how you can use the Expert Rules feature of CA Plan Analyzer for DB2 for z/OS to tune your embedded SQL statements. The product can be used by the DB2 Database administrator to analyze existing applications, or by Application Programmers to improve their SQL statements prior their programs reach the production environment.


CA Plan Analyzer for DB2 for z/OS allows you to bypass frustrating and time-consuming methods for plan analysis such as the DB2 EXPLAIN facility, the PLAN_TABLE output or catalog statistics collection. Determine which access paths DB2 has chosen and why with easy-to-understand reports and the ability to customize the individual skill levels.


Friday, October 18, 2013

Discard processing during a Reorg

Review this 3 minutes video and discover how you can use the Discard processing feature of CA Rapid Reorg for DB2 for z/OS to simplify your business workflow, through a simple use case scenario.


CA Rapid Reorg for DB2 for z/OS reorganizes your tablespaces and indexes to alleviate problems caused by disorganized data. CA Rapid Reorg for DB2 for z/OS can reclaim space used by dropped tables, recluster data, remove overflow pointers, reestablish free space, rebalance index trees and reduce the number of levels required.

Thursday, October 17, 2013

csDUG Conference 2013 in Prague


35 attendees from the Czech and Slovak Republics met at the CA Technologies office in Prague on the 8th of October 2013 for a one-day conference. It was the first DB2 user conference in the Czech and Slovak region.



The attendees were a mix of database administrators, systems programmers, application developers and other specialists in the DB2 for z/OS and mainframe area. 

Steen Rasmussen (CA) talking about changes in DB2 10 catalog
The conference focused on the new features in DB2 10 and 11 and IBM DB2 Accelerator. The speakers were experienced DB2 specialists and architects from IBM and CA Technologies. The attendees appreciated the value of the sessions because they received a lot of new information that is related to their job.


MichaƂ Bialecki (IBM), Philippe Dubost (CA), Peter Priehoda (IBM) and Robin Hopper (CA)  
The event was free and sponsored by CA and IBM. There were 5 sessions plus an open and closing session. You can see full agenda at October 8, 2013 – csDUG Conference.

One of the attendees won a free pass for the next IDUG conference in Phoenix, Arizona (USA).

Winner of the free pass for the next IDUG
If you missed the conference this year you will have opportunity to attend next year. As we shape the next event, we would like to hear from you with your preferred city (Prague, Brno, Ostrava or Bratislava), the most convenient time of year for you, and what topics are you interested in. Please post a comment to the blog article. Thank you.








Friday, October 4, 2013

My embedded SQL changed... do I really need to bind again ?

Review this 3 minute video and discover how you can use CA Bind Analyzer for DB2 for z/OS to avoid re-calculating the access path when the changes in an embedded SQL are only “cosmetic”, and what are the benefits of not binding again.


CA Bind Analyzer for DB2 for z/OS helps accelerate the application development lifecycle and decrease unnecessary CPU and I/O resource usage by helping you avoid unnecessary binds.


Tuesday, October 1, 2013

Ensuring That You Can Recover DB2 Data

Many things can go wrong during the recovery process. It is often too late when you find that out during the recovery process.

For example this can happen:
    • An image copy that is required to recover your table space does not exist or is corrupted. 
    • The archive log that is required for recovery is not in BSDS or not cataloged. 
    • The table space selected for recovery is not recoverable because some data has been loaded via a load utility, logging was turned off, and an image copy was not taken after the load. This table space is not recoverable via a recover utility. The same thing can happen when you run REORG without logging. 
CA Recovery Analyzer™ for DB2 for z/OS can help you with that. You can group table spaces and other data objects into strategies using various selection criteria and CA Recovery Analyzer will generate recovery JCL for these objects in an optimal way. During the analysis phase CA Recovery Analyzer verifies that the recovery can really be done by checking the availability of required image copies, archive logs, status of table spaces and verification that the selected recovery point is valid.

If your environment is complex, then you will have many strategies defined and you would like to verify that they are valid and you can recover your data using them.

Strategy Reporting Services in CA Recovery Analyzer are useful for verifying that your strategies do not overlap and that all the spaces you need to recover are included in a recovery strategy. It can also verify that all your strategies are valid and generate the JCL to do the recovery so you will have it ready when you will need it.

If you are already using CA Recovery Analyzer and you have your strategies defined you may benefit from grouping those using Reporting Services and having them analyzed regularly using a batch job.

Here is how you can do it:
    1. Include all of your strategies into one reporting strategy 
    2. Set Generate the Analyze Strategies Error Report, Generate the JCL for each strategy to Y and specify output DSN 
    3. Analyze reporting strategy – it creates JCL to execute the reporting 
    4. Submit the JCL to execute the reporting strategy – to analyze included strategies and generate recovery jobs


Screenshots:









Release 18.0 of CA Recovery Analyzer it will provide an easier way to create a batch job to perform the regular analysis of your recovery strategies.


Wednesday, September 25, 2013

Improve your DB2 Disaster Recovery process

Review this 2 minute video and discover how CA Merge/Modify for DB2 for z/OS can help your business to proactively prepare for a fast recovery from system failures, outages and other disasters while your users continue to perform all their normal DB2 operations. It also reduces table space downtime and speeds image copy processing by allowing the creation of additional copies of image copies.


CA Merge/Modify for DB2 for z/OS merges DB2 log data with existing image copies, thus creating new, up-to-date image copies without impacting DB2 systems, applications or data availability. Alternatively, the change accumulation functionality lets you accumulate log records from multiple DB2 logs into a single, compact file. In addition, the COPY-IMAGE-COPY feature allows creating copies of existing image copies with no impact on online availability of the tables.

Friday, September 13, 2013

Who corrupted the e-mail addresses in my DB2 table ?

Review this 2 minutes video to discover the UNDO/REDO features of CA Log Analyzer for DB2 for z/OS, and how it can save you time to discover who corrupted your data, and minimize your efforts to fix this mess!


CA Log Analyzer for DB2 for z/OS is a powerful product that analyzes DB2 log and SMF records to aid in auditing data changes, recovering data, backing out errant updates without impacting application availability and migrating changes to other subsystems or RDBMSs. It can generate required UNDO SQL statements without losing access to DB2 and can focus on specific data ranges.


Monday, August 26, 2013

October 8, 2013 – csDUG Conference

The Czech Republic and Slovakia DB2 Users Group (csDUG) is a new Regional User Group (RUG) that was formed recently.
On October 8th 2013, we will be hosting our FIRST csDUG EVENT, sponsored by CA Technologies and IBM!


This is a full day FREE conference with top speakers from IBM and CA Technologies - an great opportunity DB2 users to meet industry experts !

Agenda of the conference


10.00 - 10.15       Welcome, csDUG info &  agenda             Philippe Dubost & Peter Priehoda
10.15 - 11.00       DB2 v11 technical overview                      Michal Bialecki (IBM)
11.00 - 11.45       ZIIP offloads                                             Robin Hopper (CA)
11.45 - 13.00       LUNCH
13.00 - 14.00       The DB2 V10 catalog – a revolution          Steen Rasmussen (CA)
14.00 - 15.00       A Review of Bind and Rebind Parameters for DB2 for z/OS   Steve Thomas (CA)
15.00 - 16.00       IDAA demo                                              Saso Prek (IBM)
16.00 - ?              Networking


Location


CA lab 3rd floor, building 12
V Parku 12, 148 00 Prague 4, Chodov
Czech Republic

Are you a qualified DB2 specialist? Do you want to discover what DB2 is or what it can do for you? Are you an experienced DBA in environments other databases and need support in DB2 for a new project? Are you interested in what role it can play a DB2 architecture of your business?
     --> Join the October 8 csDUG conference to listen to industry experts and network with other people interested in DB2!

One of the attendees will win a FREE pass for the next IDUG conference in Phoenix , US !

Register to this FREE conference by sending an email to Register.csDUG.8.OCT@gmail.com, please specify your name, job title, and company.

Tuesday, August 20, 2013

Upcoming European DB2 User Group events

A very short post today. Since it’s sometimes difficult to follow all the DB2-related events that happen around the world, here is a probably-non-exhaustive list of upcoming DB2 User Group events in Europe :

User Group Name
Event location
Date(s)
Agenda
SQL Adria
Zagreb
September 27
[link]
csDUG
Prague
October 8
[link]
TDUG
Istanbul
October 8
[link]
Guide DB2 France
Paris
October 10
N/A
IDUG
Barcelona
October 13 Ă  18
[link]
spDUG
Madrid
October 22
[link]
DB2 YTR
Helsinki
December ?
[link]


I hope this information is useful to you, if you are aware of any event missing in this list, leave a comment, and I will update this post.



Thursday, August 8, 2013

Save cpu by including extra columns to an unique index

Indexes are mainly used to improve the performance of SELECT statements. They however need to be used with caution, because indexes consume a lot of disk space. One could argue that nowadays DASD is relatively cheap, so that’s not a big deal. In fact, there is another big reason why one should be cautious about adding indexes: an index adds 30% of CPU on INSERTs.

IBM DB2 v10 introduces a new feature, the ability to include non-key columns to an unique index [more info on IBM site].

What is it good for ?


The new INCLUDE feature in DB2 v10 allows you to reduce the number of indexes, which results in cost savings in DASD and in CPU.

Let’s see how, through a very simple scenario: a table composed of 2 columns, COL1 is a unique ID, and COL2 is a non-unique column. You obviously need to define an unique index IX1 on COL1 to insure the uniqueness of the rows in COL1. But because a lot of queries gather data from COL1 and COL2 with ORDER BY, you decided (prior DB2 v10), to create another index IX2 on COL1 and COL2, to improve performance of the applications (SORT avoidance).
Now, in DB2 v10, you can simply INCLUDE COL2 as part as IX1 as a non-key column, meaning that X1 can both assure the uniqueness requirement in COL1, and keys for COL2. As a result, X2 became obsolete and can be deleted (saving DASD space and improving INSERTs performance).



Obviously, the unique key of an index can be composed of several columns, but the idea is the same.



In the scenario above, you will want to include COL3, COL4, and COL5 into the unique index IX1, and get rid of IX2.



Note that you can INCLUDE several non-key columns to an unique index, but you can only INCLUDE them one at a time … Meaning that in the above scenario, you will need to execute 3 SQL statements to INCLUDE to 3 non-key columns :
ALTER INDEX IX1 ADD INCLUDE (COL3);
ALTER INDEX IX1 ADD INCLUDE (COL4);
ALTER INDEX IX1 ADD INCLUDE (COL5);

From Theory to Practice …


The above theory is nice, but this scenario might be very difficult to locate manually, in a production DB2 environment composed of thousands of tables and indexes.

The following SQL statement will scan the DB2 catalog to locate some of these cases, namely locating non-unique indexes which contain columns that include the key of an unique index (when the unique index is composed of only 1 column).

SELECT IX.CREATOR,IX.NAME,         
       IX.TBCREATOR,IX.TBNAME,     
       IX.UNIQUERULE,IX.COLCOUNT,  
       KEYS.IXCREATOR,KEYS.IXNAME, 
       KEYS.COLNAME,KEYS.COLNO,    
       IX2.CREATOR,IX2.NAME,       
       IX2.TBCREATOR,IX2.TBNAME,   
       IX2.UNIQUERULE,IX2.COLCOUNT,
       KEYS2.IXCREATOR,KEYS2.IXNAME,
       KEYS2.COLNAME,KEYS2.COLNO   
  FROM SYSIBM.SYSINDEXES IX,       
       SYSIBM.SYSKEYS    KEYS,     
       SYSIBM.SYSINDEXES IX2,      
       SYSIBM.SYSKEYS    KEYS2     
 WHERE IX.UNIQUERULE = 'U'        
   AND IX.CREATOR = KEYS.IXCREATOR
   AND IX.NAME = KEYS.IXNAME      
   AND IX.COLCOUNT = 1            
   AND IX.TBCREATOR = IX2.TBCREATOR
   AND IX.TBNAME = IX2.TBNAME       
   AND IX2.CREATOR = KEYS2.IXCREATOR
   AND IX2.NAME = KEYS2.IXNAME      
   AND IX2.UNIQUERULE = 'D'         
   AND IX.TBCREATOR = IX2.TBCREATOR 
   AND IX.TBNAME = IX2.TBNAME        
   AND KEYS.COLNAME = KEYS2.COLNAME ;

With the result of this SQL statement, you can prepare your ALTER INDEX to INCLUDE non-key columns to the unique indexes listed, and get rid of the corresponding now-superfluous indexes.


If the unique index is composed of several columns, getting such simple output will be much more complex in pure SQL, because of the necessity of some kind of "table transpose" involving 2 UNIONs, and several INNER JOINs - due to the structure of the catalog. That said, if one gets rid (or changes the value) of the AND IX.COLCOUNT = 1 WHERE clause in the above statement, and looks carefully at the output (IX.COLCOUNT and number of IX2 rows matching), locating the other cases should be pretty straight forward.

Monday, July 29, 2013

What means Big Data to you ?

Whether you are an insurance provider, a bank, an advertising company, or any company of medium-to-large size, you need to consume and exploit huge amount of data internally. You would also benefit from analyzing additional, external, data which are available to you, but can be difficult to collect and process in a simple and meaning way.

As an insurance company, you might want to intercept Facebook updates of your clients who publish skiing activities while receiving an insurance for a broken leg.

As an international bank, you might want to monitor the GPS location of mobile transactions to intercept two transactions made on the same personal account at the same minute, one from South Africa, and the second from China.

As an advertising company, you might want to monitor, analyze, and react accordingly to the Tweets on a particular theme, in real-time. Similarly, as a manufacturer (for example a shoes or a car manufacturer), you might want to monitor discussions and complaints on specialized forums to gather data regarding to quality of the products, understand which parts / models breaks more often, in order to improve the quality of your manufactured products (using, for instance, the 6 sigma model).

As a software company, you might want data regarding the real utilization of each function / feature in your products, in order to make educated decision, and pro-actively improve the quality and focus the development accordingly (using, for instance, the Agile/Scrum methodology).

As a political party, you might want to have much better and much more granular information regarding voting intentions and more especially to be able to locate (and convince) undecided voters that are most likely to accept your political opinions (for example, if your party wants to abolish the 2nd amendment, there’s probably no chances to convince an individual who renewed his NRA membership last month, who’s favorite movie is Rambo II, and who just bought yet another rifle to add in his 20 pieces living room collection).

And to some degree, any company would benefit from a better understanding of their clients’ needs, satisfaction level, profiles, … you name it.

The term “Big Data” is just raising, and its meaning may evolve, but it so far covers:

  1. The idea of using vast amount of data, from various sources, such as internal company records up to social media activity.
  2. The idea of analyzing the above described data in real-time
  3. The idea that the real-time analyzes will help to trigger actions (as automatically as possible) to
  • Improve security
  • Detect frauds
  • Better understand the market / the clients
  • Monitor / Adjust ideas and discussions in social networking web-sites
Since almost a decade, millions of people openly share their profile (Facebook), share synthetic information (Twitter) to the world, and publish opinions and concerns on various blogs, forums, and the like. Big Data is a logical extension of this trend, as an exploitation of the data generated by this new mentality, mentality that can be summarized by the openness and visibility of individuals’ profiles and opinions. So … Big Data =  Big Brother ? Yes, in a way, in a Big way. And as any evolution in IT, some companies will make use of these new trends and dynamics, some will stay behind, will you ?

Monday, July 22, 2013

Is the length of your inline LOBs optimal ?

From DB2 v10, you can specify that a portion of a LOB column will be stored in the base TableSpace. This improves the performance of applications that access LOB data, because it reduces the need to access the auxiliary LOB TableSpace. That also enables the creation of expression-based indexes on the inline portion of a LOB column, which can help improving the performance of searches through a LOB column [more info on IBM website].

Inline LOBs means that, for LOBs that are less than or equal to the size of the specified inline length, DB2 will store the entire LOB data in the base TableSpace, so more or less as if it would be a VARCHAR. However, for LOBs that are bigger than the inline length, the inline portion of the LOB resides in the base TableSpace, and the remainder of the LOB in the LOB auxiliary TableSpace. In order to use this feature, assuming you are currently using standard LOBs columns, you can alter existing LOB columns to become inline (using an ALTER TABLE statement).

But, what is the optimal INLINE LENGTH that you should specify ? The question is almost philosophical, since the answer greatly depends on the LOB Lengths Distribution in the column, in other words, it depends on the type of data that are stored in the LOB. And the DBA creating the table does not necessarily know what it contains, nor what it will contain…

Let’s examine 5 types of LOB length distributions that could exist in a LOB column.

1st case : almost all LOBs data are small, limited in size, with a few exceptions.



This case is almost too good to be true, because the answer of the optimal INLINE LENGTH is obvious. You should setup the INLINE LENGTH “threshold” so that it fits the “almost all small data” :


2nd case : the LOBs lengths are equally distributed, from the minimum length to the maximum length




This case is also a no brainer, and you will want to setup a threshold based on the percentage of LOBs row that should be self-contained in the inline LOB. For example, a 80 / 20 distribution as illustrated in the graph below:



3rd case : the LOBS lengths is are linearly distributed (ascending or descending)


Similarly to the 2nd case, you will want to setup a threshold based on the percentage of LOBs row that should be self-contained in the inline LOB. For example, a 80 / 20 distribution as illustrated in the graph below:


4th case : the LOBs length follow a normal distribution (Gauss).



This case might be more common, but essentially (if you forget about the small amount of low length LOBs on the left of the graph), you end up with a situation very similar as the 1st Case, where you will want to set the Inline LOB Length very close to the end of the Gauss curve. You might also use a 80 / 20 percent approach. 


5th case : the LOBs length distribution is sinusoidal


For this case, it is fairly complicated to provide an educated suggestion as to where to set the Inline LOB length “threshold”. In fact this looks like a succession of Gauss curves (4th case), so you could setup the threshold after any of these curves as illustrated below. Furthermore, you might want to analyze which data get accessed more frequently, an thus determine after which curve to place the threshold. Hopefully, this case will not be that common in real environements.
  


 From Theory to Practice …

The above describes distribution models, which is nice in theory… Well, in practice, you will want some sort of tool, to analyze existing length of LOB data, to understand in which case your LOBs are, and adapt the INLINE length accordingly (or confirm the setup is good the way it is). Read below …

The following SQL statement will list the columns in tables which are using INLINE LOB technique in your subsystem:

SELECT TBCREATOR,TBNAME,"NAME",LENGTH,LENGTH2,COLTYPE
  FROM "SYSIBM".SYSCOLUMNS                          
 WHERE ( COLTYPE = 'CLOB'                           
      OR COLTYPE = 'BLOB'                           
      OR COLTYPE = 'DBCLOB' )                       
    AND  LENGTH  > 4 ;                               

“LENGTH” > 4 indicates that the LOB column uses the INLINE LOB technique, the actual Inline LOB Length is LENGTH-4
“LENGTH2” reflects the maximum length of the LOB column (inline + stored in Auxiliary)

As an example, the below test was made on a SYSIBM catalog table SYSIBM.SYSVIEWS that contains a LOB column PARSETREE, a 1GB BLOB (1073741824) with Inline LENGTH 27674-4 = 27670  

Here the sample SQL statement that I used to visualize the length distribution of a LOB column. It simply categorizes the various lengths of the LOBs into LOB length ranges.

SELECT RANGE, COUNT(LENGTH) AS ROW_COUNT FROM (                      
SELECT LENGTH(PARSETREE) AS LENGTH,                                  
 (CEIL(27670/100)*(1+CEIL( 100 * LENGTH(PARSETREE) / 27670))) AS RANGE
FROM SYSIBM.SYSVIEWS) AS TABLE_WITH_RANGE                          
GROUP BY RANGE                                                       
ORDER BY RANGE ;                                                     

When exporting the output of the query into Excel, one can make a simple graph representation of the LOB length distribution and visualize if the INLINE LENGTH value is properly set. The analyzes shows that we are pretty much in the scenario of 1st Case, which I would assume to be the most common one anyway. The Inline LOB Length “threshold” seems adequately placed, for this LOB column.


The nice thing about this technique is that it is re-usable, and that the same query can be ran over time, when the LOB data values evolve, monitoring the LOB length distribution, to make sure the INLINE LENGTH value is still optimal.

Monday, July 15, 2013

Impact of SELECT * versus SELECTing specific columns

A couple of weeks ago, I attended a DB2 for z/OS technical conference and one of the sessions focused on tuning SQL statements and other topics related to DB2 performance. For the records the presentation name was “DB2 10 for z/OS: selected performance topics” and the presenter was Cristian Molaro (Independent DB2 Consultant, IBM Gold Consultant & IBM Champion).

Anyway, I learned that there is a notable performance impact when doing a simple and dirty SELECT * compared to selecting only the specific columns you need. There was no benchmarks however, so I thought it would be a good test to perform, create some simple tests with various tables and measure the impact of this SELECT * bad coding practice. First thing to mention, the performance degradation is not visible if we are dealing with small tables (small number of rows). My tests with a 1000 rows table did not show any difference. But when I performed tests on a more realist size of table (1,000,000 rows), I did notice meaningful differences.

The below graphs represents the results of a performance test (CPU time, and Elapsed time) for a 1,000,000 rows table, residing in a DB2 v10 New Function Mode (NFM) subsystem. This represents a fairly simple table, with 7 columns only (integer, dates, and timestamps), and compared a SELECT * with SELECT COL1, COL2.


The results of this test speak for themselves, there is a 70% overhead of CPU time consumed, and a 17% overhead of Elapsed time spent when using the SELECT * statement, i.e. when retrieving all columns, some of which are probably not needed by “the application”. Other tests performed show similar results, granting that obviously the overhead of SELECT * varies depending on the number of columns specified in the SELECT COL1, COL2, … and the size of the table (number of rows).

Digging deeper, I used a product a few years ago that examines embedded SQL statement in Cobol programs: CA Plan Analyzer (it also works for other programming languages). While writing this blog article, I went back to this tool, to see if it would detect a SELECT * statement. It did. Not to enter into too much details about this tool, there is a rule (sort of trigger) called Expert Rule 0064 that will be triggered if an embedded SELECT * SQL statement is discovered in the application / plan / package analyzed. In addition to the performance impact mentioned detailed above, CA Plan Analyzer also notifies the user with the following recommendation (which makes a lot of sense, and IMHO another good reason why application developers should not use SELECT * type of SQL statements in their application) :
                                                                             
         This should be avoided because of problems that can be encountered 
         when adding and removing columns from the underlying table(s). Your
         application program host variables will not correspond to the      
         added/removed columns.                                             

In a nutshell, if you are a DB2 application developer and you care about your applications performance : do not use SELECT * statements !
If you are a DB2 administrator, you may want to share this post with your application developers, and / or look for products that can detect the use of embedded SELECT * statements running in your DB2 environment.

Monday, July 8, 2013

Improving the response time of Native SQL procedures in DB2 v10

IBM claims better performance “out of the box” when upgrading to DB2 version 10. Some additional manual steps are however recommended to gain additional performance of your DB2 application and/or environment. One of which is about Native SQL Procedures.

DB2 version 9 introduced a new type of stored procedures called Native SQL Procedures. These procedures execute in DBM1 address space, and provide SQL performance improvements due to less cross-memory calls compared to external stored procedures.
Several improvements were done in this area in DB2 version 10. IBM however mentions that the response time improvement (up to 20% improvement) can be achieved only if the existing Native SQL Procedures are dropped and re-created. That is, if you had created Native SQL Procedures under DB2 version 9 and upgraded to DB2 version 10, you might want to DROP/CREATE those.

I created a SQL statement to locate Native SQL Procedures that were created prior the migration to DB2 version NFM. Since I did not know the exact date when the subsystems were upgraded, I figure out that I could use a trick to get this information. As in every new DB2 version, the DB2 catalog contains additional tables that are created during the migration process. I took one of them, SYSIBM.SYSAUTOALERTS, and queried SYSIBM.SYSTABLES to get the CREATEDTS value. That is a timestamp that indicated when a table was created. So the following SQL statement indicates when DB2 was either installed at the version 10 level, or upgraded to version 10 NFM:

-- When was DB2 upgraded to v10 NFM ?
SELECT CREATEDTS               
  FROM SYSIBM.SYSTABLES        
 WHERE NAME    = 'SYSAUTOALERTS'
        AND CREATOR = 'SYSIBM' ;

The game is then to locate Native SQL Procedures created prior that date. Stored procedures are listed in SYSIBM.SYSROUTINES, and the column ORIGIN = 'N' indicates that we deals with a Native SQL Procedure. The following statement will thus provide a list of the Native SQL Procedures created prior DB2 was upgraded to version 10 NFM :

-- List of Native SQL Procedures to re-create
SELECT CREATEDBY,OWNER,NAME,ORIGIN,CREATEDTS        
  FROM SYSIBM.SYSROUTINES                           
 WHERE ORIGIN = 'N'                                 
   AND CREATEDTS < ( SELECT CREATEDTS               
                       FROM SYSIBM.SYSTABLES        
                      WHERE NAME    = 'SYSAUTOALERTS'
                        AND CREATOR = 'SYSIBM' );    

With that, I got the list of Native SQL Procedures that I want to DROP / RECREATE. Hopefully, you have the DDL for these objects stored in dataset, but nothing is less obvious. If not, you can use tools to generate the DDL statements from the information in the catalog, in this example I used CA RC/Query for DB2 for z/OS to locate a particular Native SQL Procedure and generate its DDL:


This command produces the following output :


All what’s needed is to update the SQL, add the DROP syntax and a couple of COMMITs, and the job is done!

I hope that this post is useful to you. If you used the technique described above, I will be glad to read your comments, especially if you noticed and quantified performance improvements when dropping and re-creating your Native SQL Procedures!