Computer Webmaster Gaming Console Graphics Forum

Welcome to the Computer Webmaster Gaming Console Graphics Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact us.

MK PitStop Main Earn $25 Earn Money Posting Extras Members Blogs Image Hosting User Pages
Go Back   Computer Webmaster Gaming Console Graphics Forum > Webmaster Forum > Website Coding > Database
Register FAQ/Rules Become A V.I.P. Member Search Today's Posts Mark Forums Read

Database Database problems or need to ask a question? maybe something to do with sql injections or a database software question. Database topics cover MySQL, PostgreSQL, Oracle, SQL Server or anything else related to databases.

Google
Closed Thread
 
LinkBack Thread Tools Display Modes
Old 07-01-2007, 7:31 PM   #1
Duke Ionescu
 
Duke  Ionescu's Avatar
 
Posts: n/a
My Photos: (0)

Banked:
MK Cash: $

I am Worth:
MK Cash: $
Donate

Recent Blog: None

Default Settle a debate - to index or not to index?

I have 2 tables:

mysql> desc a;
+---------+------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+------------------+------+-----+---------+-------+
| item_id | int(10) unsigned | NO | | 0 | |
+---------+------------------+------+-----+---------+-------+
1 row in set (0.00 sec)

mysql> select count(*) from a;
+----------+
| count(*) |
+----------+
| 38364373 |
+----------+
1 row in set (0.00 sec)

mysql> desc b;
+---------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra
|
+---------+---------------------+------+-----+---------+----------------+
| item_id | int(10) unsigned | NO | PRI | NULL | auto_increment
|
| col2 | int(10) unsigned | NO | MUL | |
|
+---------+---------------------+------+-----+---------+----------------+
5 rows in set (0.01 sec)

mysql> show index from b;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name |
Collation | Cardinality | Sub_part | Packed | Null | Index_type |
Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| b | 0 | PRIMARY | 1 | item_id | A
| 60688657 | NULL | NULL | | BTREE | |
| b | 1 | col2 | 1 | col2 | A
| NULL | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)

mysql> select count(*) from b;
+----------+
| count(*) |
+----------+
| 60688657 |
+----------+
1 row in set (0.00 sec)

There are no duplicates in table a. I want to run the following query:
SELECT col2 FROM a JOIN b USING(`item_id`)

I am arguing that an index (a primary key) on the only column in table
a will not make any difference whatsoever. Our DBA is telling me
otherwise, claiming that w/o an index "it will go thru the whole table
every time it can't find a matching row." It is not clear to me what
he is trying to say; I disagree that it will be any slower to run the
query w/o an index on table a.

I see mysql doing the following: going thru each row in table a using
natural row order and joining table b on it. On the join it uses the
index on `item_id` in table b to see if there is a match and selects
col2 if found. I do not see how an index on table a could optimize
this process.

The DBA also claims that because it would do lookups out of order, it
would have to "seek" more. I don't think one can assume which way it
would be faster - it all depends on how the data is physically laid out
on disk, which we don't know, so no assumptions can be made either way.
He also mentioned having to move the row pointer more; I do not know
enough about the innards of mysql to make a comment on this.

EXPLAIN seems to support my reasoning: I created a copy of table a and
added a primary key on it; EXPLAIN says it will scan the same number of
rows in both cases - all of them:
mysql> EXPLAIN SELECT col2 FROM a JOIN b USING(`item_id`);
+----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len
| ref | rows | Extra |
+----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
| 1 | SIMPLE | a | ALL | NULL | NULL | NULL
| NULL | 38364373 | |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
| a.item_id | 1 | Using where |
+----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
2 rows in set (0.00 sec)

mysql> EXPLAIN SELECT col2 FROM a_indexed JOIN b USING(`item_id`);
+----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
| 1 | SIMPLE | a_indexed | index | PRIMARY | PRIMARY | 4
| NULL | 38364373 | Using index |
| 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
| a_indexed.item_id | 1 | Using where |
+----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
2 rows in set (0.01 sec)

The DBA said that EXPLAIN is just an approximation. While I agree with
this statement, I don't see how it applies to this problem.

Am I wrong? Am I right? Please enlighten me (with a detailed technical
explanation).


P.S. The names of the tables & columns have been changed for privacy.

 
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit!
Old 07-01-2007, 7:31 PM   #2
Jerry Stuckle
 
Jerry Stuckle's Avatar
 
Posts: n/a
My Photos: (0)

Banked:
MK Cash: $

I am Worth:
MK Cash: $
Donate

Recent Blog: None

Default Settle a debate - to index or not to index?

Duke Ionescu wrote:
> I have 2 tables:
>
> mysql> desc a;
> +---------+------------------+------+-----+---------+-------+
> | Field | Type | Null | Key | Default | Extra |
> +---------+------------------+------+-----+---------+-------+
> | item_id | int(10) unsigned | NO | | 0 | |
> +---------+------------------+------+-----+---------+-------+
> 1 row in set (0.00 sec)
>
> mysql> select count(*) from a;
> +----------+
> | count(*) |
> +----------+
> | 38364373 |
> +----------+
> 1 row in set (0.00 sec)
>
> mysql> desc b;
> +---------+---------------------+------+-----+---------+----------------+
> | Field | Type | Null | Key | Default | Extra
> |
> +---------+---------------------+------+-----+---------+----------------+
> | item_id | int(10) unsigned | NO | PRI | NULL | auto_increment
> |
> | col2 | int(10) unsigned | NO | MUL | |
> |
> +---------+---------------------+------+-----+---------+----------------+
> 5 rows in set (0.01 sec)
>
> mysql> show index from b;
> +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> | Table | Non_unique | Key_name | Seq_in_index | Column_name |
> Collation | Cardinality | Sub_part | Packed | Null | Index_type |
> Comment |
> +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> | b | 0 | PRIMARY | 1 | item_id | A
> | 60688657 | NULL | NULL | | BTREE | |
> | b | 1 | col2 | 1 | col2 | A
> | NULL | NULL | NULL | | BTREE | |
> +-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
> 3 rows in set (0.00 sec)
>
> mysql> select count(*) from b;
> +----------+
> | count(*) |
> +----------+
> | 60688657 |
> +----------+
> 1 row in set (0.00 sec)
>
> There are no duplicates in table a. I want to run the following query:
> SELECT col2 FROM a JOIN b USING(`item_id`)
>
> I am arguing that an index (a primary key) on the only column in table
> a will not make any difference whatsoever. Our DBA is telling me
> otherwise, claiming that w/o an index "it will go thru the whole table
> every time it can't find a matching row." It is not clear to me what
> he is trying to say; I disagree that it will be any slower to run the
> query w/o an index on table a.
>
> I see mysql doing the following: going thru each row in table a using
> natural row order and joining table b on it. On the join it uses the
> index on `item_id` in table b to see if there is a match and selects
> col2 if found. I do not see how an index on table a could optimize
> this process.
>
> The DBA also claims that because it would do lookups out of order, it
> would have to "seek" more. I don't think one can assume which way it
> would be faster - it all depends on how the data is physically laid out
> on disk, which we don't know, so no assumptions can be made either way.
> He also mentioned having to move the row pointer more; I do not know
> enough about the innards of mysql to make a comment on this.
>
> EXPLAIN seems to support my reasoning: I created a copy of table a and
> added a primary key on it; EXPLAIN says it will scan the same number of
> rows in both cases - all of them:
> mysql> EXPLAIN SELECT col2 FROM a JOIN b USING(`item_id`);
> +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
> | id | select_type | table | type | possible_keys | key | key_len
> | ref | rows | Extra |
> +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
> | 1 | SIMPLE | a | ALL | NULL | NULL | NULL
> | NULL | 38364373 | |
> | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
> | a.item_id | 1 | Using where |
> +----+-------------+-------+--------+---------------+---------+---------+-----------+----------+-------------+
> 2 rows in set (0.00 sec)
>
> mysql> EXPLAIN SELECT col2 FROM a_indexed JOIN b USING(`item_id`);
> +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
> | id | select_type | table | type | possible_keys | key |
> key_len | ref | rows | Extra |
> +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
> | 1 | SIMPLE | a_indexed | index | PRIMARY | PRIMARY | 4
> | NULL | 38364373 | Using index |
> | 1 | SIMPLE | b | eq_ref | PRIMARY | PRIMARY | 4
> | a_indexed.item_id | 1 | Using where |
> +----+-------------+-----------+--------+---------------+---------+---------+-------------------+----------+-------------+
> 2 rows in set (0.01 sec)
>
> The DBA said that EXPLAIN is just an approximation. While I agree with
> this statement, I don't see how it applies to this problem.
>
> Am I wrong? Am I right? Please enlighten me (with a detailed technical
> explanation).
>
>
> P.S. The names of the tables & columns have been changed for privacy.
>


I would argue that your DBA is correct, When you select from a_indexed, an
index would allow you to select in numeric order. As the join values would now
be in numeric order, there would potentially be less I/O on table b.

For instance - with an index it could select in order of 1,2,3,4, etc. But
without the index it could select in the order of 25694, 30125, 81320, 301976, etc.

The first would be able to use many of the rows already buffered by the index in
b. The second would have to select several different blocks from the index for b.

Now - if your buffers are big enough to hold all the indexes, than probably
MySQLr would eventually find everything in the buffer (you'd only have to read
each block of the index once). But if your buffers are too small, the second
may have to re-read the same block several times.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================
 
Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit!
Featured Websites
Free Space
Free Space
Free Space Free Space
Closed Thread
Tags: , ,




Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
renumber index/key field? Wm PHP 1 07-01-2007 4:20 PM
WDS does not index some outlook folders pjgrun MSN questions 0 06-26-2007 12:47 AM
How to unlock INDEX bferree Graphics in general 2 06-11-2007 9:46 PM
Duplicate Index for Outlook PST Tommo B MSN questions 0 05-28-2007 1:45 AM
Cannot index Outlook Jerry MSN questions 2 05-28-2007 1:43 AM


Featured Websites




All times are GMT +1. The time now is 4:40 AM.


Powered by: vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.0.0
Cheap Computers
MK PitStop Copyright 2005 - 2008

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98