Why so many I/Os for simple index query?

This is a discussion on Why so many I/Os for simple index query? within the sybase forums in Other Databases category; Hi. This has probably been asked 1E+06 times, and I apologize, but I couldn't find any info on it. I have gotten the same basic results on a variety of ASE versions from 11.9.2.6 to 15.0.2 ESD#2 and ESD#5. I have a table with a million rows. There is a nonclustered index on a column. There are 50 distinct values of the column spread throughout the table. I realize that with so many rows per distinct index value, there will generally be lots of I/Os even if the index is ...

Go Back   Database Forum > Other Databases > sybase

Database Forums

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-15-2008, 01:48 PM
Default Why so many I/Os for simple index query?

Hi.

This has probably been asked 1E+06 times, and I apologize, but I couldn't
find any info on it. I have gotten the same basic results on a variety of
ASE versions from 11.9.2.6 to 15.0.2 ESD#2 and ESD#5.

I have a table with a million rows. There is a nonclustered index on a
column. There are 50 distinct values of the column spread throughout the
table. I realize that with so many rows per distinct index value, there will
generally be lots of I/Os even if the index is being used. But still I am
confused when I do any of these queries:

select distinct my_indexed_column from mytable
select my_indexed_column from mytable group by my_indexed_column
select count(distinct my_indexed_column) from mytable

I have verified that in all cases my index is being used, and it's a
covering index (i.e. "Base table will not be read".) But each of those
queries costs 11,000 I/Os. It just seems like logically ASE should be able
to extract the 50 distinct values out of the index with a bare minimum of
I/Os (like maybe, I don't know, 10?). Why so many I/Os?

Thanks.
- John.


Reply With Quote
  #2  
Old 08-15-2008, 04:21 PM
Default Re: Why so many I/Os for simple index query?

Because it requires a full index scan (leaf level) of the index to obtain
those 50 values, as it's the leaf level of the index which is the only level
which contains ALL of the values.

Look at the structure of a b-tree index. There is nothing in the btree that
says which page each (first occurence?) of the distinct value of the column
resides.

I'll do my best in ascii to illustrate the issue. Presume that I have 4
index leaf level pages with key values as below:

--pg1--
100
100
100
200
--pg2--
200
300
400
400
400
--pg3--
400
400
400
400
--pg4--
400
500
500

The btree level above this leaf level doesn't tell me that I can skip
reading p3 to find unique values, nor does it necessarily (it might, but
it's a dense structure, so can't be guaranteed) tell me exactly which page
contains the first occurence of "200". The challenge comes in finding a
method to sort all of these values somehow, and I think that you'll find the
methods used in ASE 15 (particularly "hash distinct") better performing at
this sorting step especially with large sets of data to plow through.
Remember, it's only the leaf level which contains all of those unique values
that you are after.

I find the phonebook analogy useful in these situations as well. Pretend
that I ask you for a list of all of the unique last names from the white
pages of the phone book. And of course, each page of the phone book has an
index on it which lists the beginning and ending last names on those pages.
You can't satisfy my request by simply reading the indexed names on the top
of the page. You have to look at each name on every page of the phone book
and sort out the distinct names by keeping track of which ones you've
already seen. Not only that, you have to at least "look" at every page.
Even if multiple pages contain "Smith", you have to open and look at the
first entry of the next page to see if there is a different name starting on
that page. If it's another "Smith", you then MUST at least look at the
first entry of the next page, etc. Those are your page IO's, and you have
to open/read each page.

"John Flynn" wrote in message
news:48a5b35c-at-forums-1-dub...
> Hi.
>
> This has probably been asked 1E+06 times, and I apologize, but I couldn't
> find any info on it. I have gotten the same basic results on a variety of
> ASE versions from 11.9.2.6 to 15.0.2 ESD#2 and ESD#5.
>
> I have a table with a million rows. There is a nonclustered index on a
> column. There are 50 distinct values of the column spread throughout the
> table. I realize that with so many rows per distinct index value, there
> will generally be lots of I/Os even if the index is being used. But still
> I am confused when I do any of these queries:
>
> select distinct my_indexed_column from mytable
> select my_indexed_column from mytable group by my_indexed_column
> select count(distinct my_indexed_column) from mytable
>
> I have verified that in all cases my index is being used, and it's a
> covering index (i.e. "Base table will not be read".) But each of those
> queries costs 11,000 I/Os. It just seems like logically ASE should be able
> to extract the 50 distinct values out of the index with a bare minimum of
> I/Os (like maybe, I don't know, 10?). Why so many I/Os?
>
> Thanks.
> - John.
>
>



Reply With Quote
  #3  
Old 08-15-2008, 04:59 PM
Default Re: Why so many I/Os for simple index query?

Hi Kevin,

> Because it requires a full index scan (leaf level) of the index to
> obtain those 50 values, as it's the leaf level of the index which is
> the only level which contains ALL of the values.


Thanks for the explanation, I really appreciate it. It makes sense to me
now.

- John.


Reply With Quote
  #4  
Old 08-19-2008, 06:21 PM
Default Re: Why so many I/Os for simple index query?

If this index was created just as a covering index to support aggregate
queries - creating it with ignore dupe key might help.....

John Flynn wrote:
> Hi Kevin,
>
>> Because it requires a full index scan (leaf level) of the index to
>> obtain those 50 values, as it's the leaf level of the index which is
>> the only level which contains ALL of the values.

>
> Thanks for the explanation, I really appreciate it. It makes sense to me
> now.
>
> - John.
>
>

Reply With Quote
  #5  
Old 08-19-2008, 06:44 PM
Default Re: Why so many I/Os for simple index query?

????

Ignore_dup_key is only for creating unique indexes, and only affects
inserts/updates that attempt to create duplicate keys. An index on such a
column as the OP has is obviously not unique. I'm not sure I understand the
recommendation.

"Jeff Tallman" wrote in message
news:48ab3962$1-at-forums-1-dub...
> If this index was created just as a covering index to support aggregate
> queries - creating it with ignore dupe key might help.....
>
> John Flynn wrote:
>> Hi Kevin,
>>
>>> Because it requires a full index scan (leaf level) of the index to
>>> obtain those 50 values, as it's the leaf level of the index which is
>>> the only level which contains ALL of the values.

>>
>> Thanks for the explanation, I really appreciate it. It makes sense to me
>> now.
>>
>> - John.
>>


Reply With Quote
  #6  
Old 08-19-2008, 11:06 PM
Default Re: Why so many I/Os for simple index query?


It was a brain freeze while on a concall and trying to look at these -
proof that multitasking in humans is impossible - I was thinking it
might reject just the index keys - but instead it rejects the entire row
(uuugghhhh). Been too long since I used these as usually they are a
serious indication of a logic problem.... Oh for the want of sparse
indexing!!!

Sherlock, Kevin wrote:
> ????
>
> Ignore_dup_key is only for creating unique indexes, and only affects
> inserts/updates that attempt to create duplicate keys. An index on such a
> column as the OP has is obviously not unique. I'm not sure I understand the
> recommendation.
>
> "Jeff Tallman" wrote in message
> news:48ab3962$1-at-forums-1-dub...
>> If this index was created just as a covering index to support aggregate
>> queries - creating it with ignore dupe key might help.....
>>
>> John Flynn wrote:
>>> Hi Kevin,
>>>
>>>> Because it requires a full index scan (leaf level) of the index to
>>>> obtain those 50 values, as it's the leaf level of the index which is
>>>> the only level which contains ALL of the values.
>>> Thanks for the explanation, I really appreciate it. It makes sense to me
>>> now.
>>>
>>> - John.
>>>

>

Reply With Quote
  #7  
Old 08-25-2008, 10:41 PM
Default Re: Why so many I/Os for simple index query?

Kevin has provided a great explanation to "the question". Let me add
my two cents, as I come across this all the time at cust sites. Of
course I am treating your examples as real.

> On 2008-08-16 02:48:28 +1000, "John Flynn" said:
>
> select distinct my_indexed_column from mytable
> select my_indexed_column from mytable group by my_indexed_column


If you have 50 distinct values over a 1m row table and you are doing
this reasonably often, you really have a normalisation problem here.
Low cardinality indices are prime candidates for performance
improvement. You may be better off with a Parent table containing the
50 values; which will most probably be a description not the PK; create
a proper PK; implement an FK to the child table; and my_indexed_column
NOT indexed on the child table. That will reduce the above selects to
50 rows.

The 11,000 I/Os to scan the index indicate that you probably have the
description in the FK, carried in the child, which is not good.

And of course, an Indexed column should never be var length or nullable.

> select count(distinct my_indexed_column) from mytable


If the performance of that is not acceptable, you can put the count in
a column in the parent table, and update it whenever you insert/delete
the child. After all, it is a data base of facts, not a heap that
needs to be counted again and again.
--
Cheers
Derek
Senior Sybase DBA / Information Architect
Copyright © 2008 Software Gems Pty Ltd
Quality Standards = Zero Maintenance + Zero Surprises
Performance Standards = Predictability + Scaleability

Reply With Quote
  #8  
Old 08-28-2008, 08:47 PM
Default Re: Why so many I/Os for simple index query?

Derek Asirvadem wrote:
> If you have 50 distinct values over a 1m row table and you are doing
> this reasonably often, you really have a normalisation problem here.
> Low cardinality indices are prime candidates for performance
> improvement. You may be better off with a Parent table containing the
> 50 values; which will most probably be a description not the PK;
> create a proper PK; implement an FK to the child table; and
> my_indexed_column NOT indexed on the child table. That will reduce
> the above selects to 50 rows.


If I'm understanding correctly what you're saying, then what you've
described is very close to how I already have it. The "50 distinct values"
are foreign keys to a master table.

The master table is a table of branch offices. There are, say, 70 branch
offices. Each branch office has a PK. Meanwhile, the million-row table is a
table of sales. Each one is identified with which branch office it "belongs"
to, by way of a FK that relates to the PK in the master table. Twenty of the
branch offices don't have any sales in the big table. So the sales table
only has 50 distinct values of the FK. I want to find out which those 50
are. I don't see how reading my master table of 70 branch offices helps me
to get that answer.

> The 11,000 I/Os to scan the index indicate that you probably have the
> description in the FK, carried in the child, which is not good.


No. The FK is a true FK.

>> select count(distinct my_indexed_column) from mytable

>
> If the performance of that is not acceptable, you can put the count in
> a column in the parent table, and update it whenever you insert/delete
> the child. After all, it is a data base of facts, not a heap that
> needs to be counted again and again.


Agreed.

Thanks.
- John.


Reply With Quote
  #9  
Old 09-19-2008, 08:19 AM
Default Re: Why so many I/Os for simple index query?

Going back to your original request, ie, how many branches have (at least 1) record in sales:

select b.PK
from branches b
where exists(select s.FK from sales s where s.FK = b.PK)

- the optimizer will table scan branches as the outer table

- for each of the 70 PKs in branches an exists() sub-query is run against sales

- each exists() sub-query will use the index on s.FK and require a few IOs to return a true/false with just a few IOs



John Flynn wrote:
> The master table is a table of branch offices. There are, say, 70 branch
> offices. Each branch office has a PK. Meanwhile, the million-row table is a
> table of sales. Each one is identified with which branch office it "belongs"
> to, by way of a FK that relates to the PK in the master table. Twenty of the
> branch offices don't have any sales in the big table. So the sales table
> only has 50 distinct values of the FK. I want to find out which those 50
> are. I don't see how reading my master table of 70 branch offices helps me
> to get that answer.

Reply With Quote
  #10  
Old 09-22-2008, 04:15 PM
Default Re: Why so many I/Os for simple index query?

Thanks Derek and Mark, that is very helpful.


Reply With Quote
Reply


Thread Tools
Display Modes



All times are GMT -4. The time now is 08:40 PM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Integrated by bbpixel2008 :: jvbPlugin R1013.368.1

Search Engine Friendly URLs by vBSEO 3.1.0
vB Ad Management by =RedTyger=
In an effort to better serve ads to our visitors, cookies are used on Mydatabasesupport.com. For more information, check out our Privacy Policy.