How to make your SQL run faster

—- Using SQL tend to fall into the trap that is too focused on the results are correct, while ignoring the performance differences that may exist between different implementations, this difference in performance in large or complex OLTP (online transaction processing or decision support system DSS) database environment has been particularly evident. I found work in practice, poor SQL tend to come from inappropriate index design, sufficient connection conditions and non-optimized whe
re clause. In them proper optimization, its speed has been significantly improved! Now I from the three areas are discussed in summary:
—- In order to visually illustrate the problem, all instances of the SQL run-time tested not more than 1 second are expressed as (<1 sec).

—- Test environment –

—- Host: HP LH II
—- Frequency: 330MHz
—- Memory: 128 MB
—- Operating System: Operserver5.0.4
—- Database: Sybase11.0.3

An unreasonable design of the index

—- Example: Table Record 620000 lines, take a look at the different indexes, the following operation of several SQL:

—- 1. Built on the date a non-clustered index
select count (*) from record where date>
‘19991201 ‘And date <‘19991214’ and amount>
2000 (25 seconds)
select date, sum (amount) from record group by date
(55 seconds)
select count (*) from record where date>
‘19990901 ‘And Place IN (‘ BJ ‘,’ SH ‘) (27 seconds)
—- Analysis:
—- Date a large number of duplicate values ??in the non-clustered index, the data is physically stored randomly in the data page in the range lookup, you must perform a table scan to find all the rows in this range.

—- 2 a clustered index on the date,
select count (*) from record where date>
‘19991201 ‘And date <‘19991214’ and amount>
2000 (14 seconds)
select date, sum (amount) from record group by date
(28 seconds)
select count (*) from record where date>
‘19990901 ‘And Place IN (‘ BJ ‘,’ SH ‘) (14 seconds)
—- Analysis:
—- Clustered index, the data in physical order on the data page, duplicate values ??also arranged together, and thus in the range search, you can find from the end point of this range, and only in this range scanning data pages to avoid a wide range of scanning improves the query speed.

—- 3. Composite index on the place, date, amount
select count (*) from record where date>
‘19991201 ‘And date <‘19991214’ and amount>
2000 (26 seconds)
select date, sum (amount) from record group by date
(27 seconds)
select count (*) from record where date>
‘19990901 ‘And place in (‘ BJ, ‘SH’) (<1 sec)
—- Analysis:
—- This is a very reasonable combination of the index, because it is the leading column is the place, first and second SQL does not refer to the place, so there is no use on the index; third SQL using Place and reference all the columns are included in the composite index, the index covering, so its speed is very fast.

—- 4. Composite index on the date, place, amount
select count (*) from record where date>
‘19991201 ‘And date <‘19991214’ and amount>
2000 (<1 sec)
select date, sum (amount) from record group by date
(11 seconds)
select count (*) from record where date>
‘19990901 ‘And place in (‘ BJ ‘,’ SH ‘) (<1 sec)
—- Analysis:
—- This is a a reasonable composite index. It date as the leading column, each SQL can take advantage of the index, and in the first and third SQL index covering, and thus achieve the optimal performance.

—- 5. Summary:
—- By default, the establishment of the index is non-clustered index, but sometimes it is not the best; the reasonable index designed to be based on the analysis and forecasting of various queries. In general:
—- ? large number of duplicate values, and often range queries (BETWEEN,>, <,> =, <=) and order by
Column group by occurs, may consider establishing a clustered index;
—- (2), often simultaneous access to multiple columns, and each column contains duplicate values ??may consider establishing a composite index;
—- (3) composite index to try to make the key query form index covering the leading column must be the most frequently used columns.

 

 

Second, sufficient connection conditions:

—- For example: Table Card 7896 line has a non-clustered index, the card_no table account 191122 lines in
the account_no on a non-clustered index, and proved in a different table join conditions, the implementation of two SQL:

select sum (a.amount) from account a,
card b where a.card_no = b.card_no (20 seconds)
—- SQL read:
select sum (a.amount) from account a,
card b where a.card_no = b.card_no and a.
account_no = b.account_no (<1 sec)
—- Analysis:
—- The first connection conditions, the best query plan will account for the outer table card for the inner table, the use of
index card, the I / O times can be estimated by the formula:
—- On the outer sheet account 22541 + (the outer sheet account 191 122 lines * inner card table corresponding to the first row of the outer table you want to find 3) = 595 907 I / O
—- In the second join condition, the best query plan is the card for the outer table, account for the inner table, the use of
index, on account of its I / O frequency can be estimated by the formula:
—- Outer table card on 1944 + (7896 lines of the outer table card * corresponding to each row of the outer table you want to find 4) = 33528 I / O on the inner table account
—- Visible only fully connected conditions, the best solution will be executed.

—- Summary:

—- 1 multi-table operation before the actual implementation, the query optimizer according to the join condition, listed several possible connectivity solutions and to find the best solution to minimize overhead.Join condition to take full account of the tables with indexes, the number of rows in the table; appearance choice by the formula: number of matching rows in the outer table * determine the number of the inner table each time to find the product minimum programs.
—- 2 View the method of implementation of the program – set showplanon open showplan option, you can see the join order, using what indexes information; want to see more detailed information required sa role to perform dbcc (3604 , 310,30
2).

Third, the non-optimized where clause

—- 1. Example: the following SQL conditional statement column has the appropriate index, but the execution speed is very slow:
select * from record where
substring (card_no, 1,4) = ‘5378 ‘(13 seconds)
select * from record where
amount/30 1000 (11 seconds)
select * from record where
convert (char (10), date, 112) = ‘19991201 ‘(10 seconds)
—- Analysis:
—- The where clause of the result of any operation of the column are obtained column by column in the SQL runtime, so it had to be a table search, but did not use at the index of the column; If these results in the query compiler you can get, you can SQL optimizer optimization, use the index to avoid table search, so the SQL re-written as follows:
select * from record where card_no like
‘5378 ‘(<1 second)
select * from record where amount
<1000 * 30 (<1 sec)
select * from record where date = ‘1999 / 12/01 ‘
(<1 second)
—- You will find SQL significantly faster up!

—- Example: Table stuff 200000 lines, id_no on the non-clustered indexes, see the following SQL:
select count (*) from stuff where id_no in (‘0 ‘, ‘1’)
(23 seconds)
—- Analysis:
—- Where conditions ‘in’ logically equivalent to ‘or’, so the parser will in (‘0 ‘, ‘1’) into id_no = ‘0 ‘or id_no = ‘1’ to executed. We expect it will be based on each or clause Find, and then adding the results, so you can use the index on id_no; But in fact (according to the showplan), it uses “OR strategy”
Remove the meet each or clause, stored in a temporary database table, and then create a unique index to remove duplicate rows, the last results from this temporary table. Therefore, the actual process does not take advantage of the on id_no index, but also by the impact of the tempdb database performance and completion time.
—- Practice has proved that the number of rows of the table, the more, the worse the performance of the worksheet, 620000 rows when stuff goes to 220 seconds, the execution time! Not as separate or clause:
select count (*) from stuff where id_no = ‘0 ‘
select count (*) from stuff where id_no = ‘1 ‘
—- Get two results, for an addition cost-effective. Use the index because every sentence, the execution time is only 3 seconds,
620000 line, and time is only 4 seconds. Or, with a better way to write a simple stored procedure:
create proc count_stuff as
declare @ a int
declare @ b int
declare @ c int
declare @ d char (10)
begin
select @ a = count (*) from stuff where id_no = ‘0 ‘
select @ b = count (*) from stuff where id_no = ‘1 ‘
end
select @ c = @ a + @ b
select @ d = convert (char (10), @ c)
print @ d
—- Directly calculated results, the execution time is as fast as with the above!

—- Summary:

—- Visible, so-called optimization where clause use of the index, not optimize the table scan or overhead.

 

 

—- Column will result in a table scan, which includes a database function that computes the expression query as much as possible the action moved to the right of the equal sign.
—- 2.in or clause often use the worksheet, the index failure; does not produce a large number of duplicate values, you can consider the clause apart; apart clause should be included in the index.
—- 3 to be good at using a stored procedure, SQL has become more flexible and efficient.

—- It can be seen from the above example, SQL optimization in real terms is the correct result under the premise of using the optimizer statements that can be identified, take full advantage of indexes, table scans to reduce the I / O frequency, try to avoid table search The occurrence. In fact, S
QL performance optimization is a complex process, these just a reflection of the application-level, in-depth study would also involve the allocation of resources of the database layer, network layer, flow control, and the overall design of the operating system layer.