Re: [GENERAL] strange explain analyze output

This is a discussion on Re: [GENERAL] strange explain analyze output within the postgresql forums in Other Databases category; Jeff Davis writes: > This is in version 8.3.1 (I also tried 8.3.3). > It looks like the sort is producing more rows than the input. The hash > aggregate produces 10k, but the sort produces 10M. > Merge Join (cost=199211.12..660979.37 rows=9998773 width=12) (actual > time=8887.540..27866.804 rows=10000000 loops=1) > Merge Cond: (public.t.a = public.t.a) > -> Index Scan using t_a_idx on t (cost=0.00..286789.72 rows=9998773 > width=8) (actual time=19.784..5676.407 rows=10000000 loops=1) > -> Sort (cost=199211.12..199217.72 rows=2641 ...

Go Back   Database Forum > Other Databases > postgresql

Database Forums

Register FAQ Calendar Search Today's Posts Mark Forums Read
  #1  
Old 08-28-2008, 01:42 AM
Default Re: [GENERAL] strange explain analyze output

Jeff Davis writes:
> This is in version 8.3.1 (I also tried 8.3.3).
> It looks like the sort is producing more rows than the input. The hash
> aggregate produces 10k, but the sort produces 10M.


> Merge Join (cost=199211.12..660979.37 rows=9998773 width=12) (actual
> time=8887.540..27866.804 rows=10000000 loops=1)
> Merge Cond: (public.t.a = public.t.a)
> -> Index Scan using t_a_idx on t (cost=0.00..286789.72 rows=9998773
> width=8) (actual time=19.784..5676.407 rows=10000000 loops=1)
> -> Sort (cost=199211.12..199217.72 rows=2641 width=8) (actual
> time=8867.749..11692.015 rows=10000000 loops=1)
> Sort Key: public.t.a
> Sort Method: quicksort Memory: 647kB
> -> HashAggregate (cost=199001.60..199034.61 rows=2641
> width=8) (actual time=8854.848..8859.306 rows=10001 loops=1)


What this shows is that the HashAggregate emitted 10001 output rows,
which necessarily was the number of rows sorted. The Sort node was
(successfully) called on to produce an output row 10000000 times.

The reason that these statements are not inconsistent is that the
Sort is the inner relation for a mergejoin. In the presence of
duplicate keys in the outer relation, a mergejoin will "rewind" and
rescan duplicate keys in the inner relation; that is, any row in the
inner relation will be fetched approximately as many times as it has
matches in the outer relation. So it looks like you've got roughly
1000X duplication in these tables?

(BTW, the planner knows that this is expensive and will avoid mergejoins
when there are many duplicate keys. But apparently a hash join seemed
even worse for the stats of this particular pair of tables.)

regards, tom lane

--
Sent via pgsql-general mailing list (pgsql-general-at-postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply With Quote
Reply


Thread Tools
Display Modes



All times are GMT -4. The time now is 09:26 AM.


Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Integrated by bbpixel2008 :: jvbPlugin R1013.368.1

Search Engine Friendly URLs by vBSEO 3.1.0
vB Ad Management by =RedTyger=
In an effort to better serve ads to our visitors, cookies are used on Mydatabasesupport.com. For more information, check out our Privacy Policy.