If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#41
|
|||
|
|||
GROUP BY vs DISTINCT
"Michel Walsh" vanderghast@VirusAreFunnierThanSpam wrote in
: I should have said NOT EXISTS rather than NOT IN. There are also problems with Jet's implementation of IN clauses, as it doesn't seem to always use the indexes on both sides, one case where it often doesn't is NOT IN(), in fact. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#42
|
|||
|
|||
GROUP BY vs DISTINCT
They are NOT logically equivalent? Well, I totally disagree.
To summarize: SELECT DISTINCT f1 FROM foobar is logically the same as SELECT f1 FROM foobar GROUP BY f1 You may claim the reverse with all your might, it would be better to supply a counter example where these two statements differ, else, your might won't matter much: SQL is math (set theory) based, not "intention" based, not "claim" based. The results described by each of the above statement ***are*** logically the same. Vanderghast, Access MVP "David W. Fenton" wrote in message ... "Michel Walsh" vanderghast@VirusAreFunnierThanSpam wrote in : The purpose of an optimizer is to find the best method. Yes. Given the commands given. Jet optimizer fails to find the best method in that specific case, so Jet optimizer didn't optimize as it should. Because it was given the wrong commands. I hardly see how can someone can say that this is fine and acceptable state. So, indeed, Jet is crippled, in that specific case, as it is in the NOT IN() construction. All SQL variants have certain things that aren't well-implemented. If a better method is called a 'fix-up' in your dictionary, then, no problem with me, as long as NO ONE think there is no need to bring these 'fix-ups' into Jet. It would be nice if Jet could fix the user error, as long as it didn't break performance in other cases. Finally, again, SQL is descriptive, and logically equivalent results should be optimized the same way, indecently of the exact syntax... in particular if the query is a basic elementary one. But they aren't logically equivalent -- they are only equivalent in the *results* displayed. -- David W. Fenton http://www.dfenton.com/ usenet at dfenton dot com http://www.dfenton.com/DFA/ |
#43
|
|||
|
|||
GROUP BY vs DISTINCT
Michel Walsh wrote:
They are NOT logically equivalent? Well, I totally disagree. To summarize: SELECT DISTINCT f1 FROM foobar is logically the same as SELECT f1 FROM foobar GROUP BY f1 You may claim the reverse with all your might, it would be better to supply a counter example where these two statements differ, else, your might won't matter much: SQL is math (set theory) based, not "intention" based, not "claim" based. The results described by each of the above statement ***are*** logically the same. To me, "Group By" implies aggregating across the groups created. If you have no intention to aggregate anything then why create groupings? I mean I *can* turn off the light by unscrewing the bulb, but why do that when there is a switch on the wall? If I want distinct values I use the DISTINCT clause. Some other mechanism that might provide the same result is of no interest to me as long as the mechanism specifically designed for what I want to accomplish is there and works. -- Rick Brandt, Microsoft Access MVP Email (as appropriate) to... RBrandt at Hunter dot com |
#44
|
|||
|
|||
GROUP BY vs DISTINCT
Ok, let us do it the other way.
Consider: SELECT f1 FROM table GROUP BY f1 and SELECT DISTINCT f1 FROM table In MS SQL Server, both queries takes the same execution time, say 1 Unit of Execution Time. In Jet, the second query uses the same plan than MS SQL Server, so say the second query, in Jet, also takes 1 UET, but the first query, in Jet, easily takes up to 10 times that. So, up to now, someone can tell, ok, Jet is not as performing for the GROUP BY than for the DISTINCT, so, as you said, let use DISTINCT when possible, and GROUP BY otherwise. (same as not using NOT EXISTS when an outer join can do). BUT now, that *is* the problem, consider: SELECT f1, MAX(f2) FROM table GROUP BY f1 Again, MS SQL Server uses 1 UET ! because computing the MAX once the groups are 'made' is almost negligible in comparison (less that half one percent of the UET). And that last query plan uses the SAME plan that for the very first query, technique that Jet knows and used for DISTINCT, remember that point. This time, though, YOU cannot write the query, explicitly, using DISTINCT, so, Jet, uses the same procedure than for the first query, which is up to10 times slower than it should, and all that while Jet has the capabilities, as MS SQL Server does, to do it efficiently. So,while Jet has already the knowledge about how to do it 10 times faster (it did, proof being that it does it with DISTINCT), it is a BUG if it does not uses it. And this time, the user cannot do anything for it. I was not aware of that crippling feature of Jet until recently, since the query IS basic and elementary, I was on the impression that Jet was indeed doing the same as MS SQL Server. But now that I know that ALL Jet queries implying GROUP BY are probably up to 10 times slower than they should, I should say everything is well, no problemo, business as usual? Now, about "intention". Again, no, no, and no, SQL is not about "if I want". In theory, two logically equivalent statements should be evaluated the same way. Example: SELECT * FROM f1 INNER JOIN f2 ON f1.g1=f2.g2 is equivalent to SELECT * FROM f1, f2 WHERE g1=g2 and NO, it is NOT a valid reason to say that since the SYNTAX is not the same, it is then acceptable that the query plan would not be the same! In SQL, since you don't say HOW TO solve the problem, just what is the result you want, you delegate the responsibility to find the optimum way to solve the problem to the database engine, and the database should then be able to do an acceptable work in finding the best solution (based on what is described by the SQL statement). In fact, the db can use an approach you would have never ever imagine! That is not a problem, your work is to describe the result, and the optimizer job is to find the best solution given the circumstances (which depends on statistics too, such as the size of the table, if there are indexes or not, etc). And in the same way, it is not relevant if you want to aggregate or not, DISTINCT and GROUP BY are of the same family of descriptors. Your "intentions" are not part of the SQL language, neither as tag, comment, neither in the case of DISTINCT/GROUP BY, if you want or not aggregate. (Aggregating is almost free, anyhow, in comparison with the execution time for the sorting). And in ANY CASES, even * if * it was 'intention based'' as in an imperative programmation language, and it is not, unless you can re-write: SELECT f1, MAX(f2) FROM table GROUP BY f1 using DISTINCT, then JET ***is crippled***, as it is now, since there is an execution plan that Jet could run up to 10 times faster, and while Jet has all the knowledge to do it, it is just too ... crippled... to do it. And that, for ALMOST ALL queries with a GROUP BY clause, ALMOST ALL of them! up to 10 times! Now, back to your question: if I don't want to use group, as with your 'intention' in using DISTINCT, why creating them? simply because BY DEFINITION, they are the same thing. With DISTINCT, to know if f1="a", f2="b" has already been selected, or not, what will you do? With GROUP BY, to know if f1="a" and f2="b" has already been created or not, what will you do? In both cases, you create some LIST of what is already "selected/created", and find a way to easily answer to the question: is (f1, f2) already in this LIST or not. You see, in terms of bits and bytes, "selected" (DISTINCT) or "created" (GROUP) is irrelevant, same thing in math and set theory ... a bit and a byte don't really see what is your 'intention'. Vanderghast, Access MVP "Rick Brandt" wrote in message et... Michel Walsh wrote: They are NOT logically equivalent? Well, I totally disagree. To summarize: SELECT DISTINCT f1 FROM foobar is logically the same as SELECT f1 FROM foobar GROUP BY f1 You may claim the reverse with all your might, it would be better to supply a counter example where these two statements differ, else, your might won't matter much: SQL is math (set theory) based, not "intention" based, not "claim" based. The results described by each of the above statement ***are*** logically the same. To me, "Group By" implies aggregating across the groups created. If you have no intention to aggregate anything then why create groupings? I mean I *can* turn off the light by unscrewing the bulb, but why do that when there is a switch on the wall? If I want distinct values I use the DISTINCT clause. Some other mechanism that might provide the same result is of no interest to me as long as the mechanism specifically designed for what I want to accomplish is there and works. -- Rick Brandt, Microsoft Access MVP Email (as appropriate) to... RBrandt at Hunter dot com |
#45
|
|||
|
|||
What GROUP BY is not...
To illustrate what GROUP BY is and is NOT, so things can be clear in mind.
SELECT f1, f2, f3, MAX(f4), MIN(f5), SUM(f6) FROM somewhere GROUP BY f1, f2, f3 can be 'described' as: do all different (distinct) groups of possible triples {f1, f2, f3} and once these groups are made, for each of them, compute the aggregate. If you imagine that it is what is effectively done, ie. first, define 'm' buckets, one per triple {f1, f2, f3}, fill the each bucket with the records that belong to them, THEN, once it is done, look again on each bucket and compute the aggregate,... well, that is a possible 'plan', but a very not-efficient one since you 'touch' each record twice. We can easily solve the query by touching each record only ONCE. Indeed, assume there is already a "group" created for f1='a', f2='b', f3='c', then once a new record has the same values for that triple, do we have to store it? Well, surely we don't need to store f1, f2, and f3 again, since they are the same values than those of the 'group' identifier. What about f4 of the new record? again, the existing group can already has the MAXupToNowOf(f4), MINupToNowOf(f5) and SUMupToNowOf(f6). So, why not using f4 of the new record and 'aggregate it one term at a time' with MaxUpToNowOf(f4), and same thing with f5, and with f6. Do we need to 'touch' that new record? no! So not only you don't need to touch each record more than once, but each 'bucket' is at most, one row deep. That is a 'line', or a 'row', if you wish. Then, what look can take the whole set of 'rows'? you got it, a list, a SORTED list, no dup on the sorted fields: f1, f2, f3. So, what DISTINCT is? Vanderghast, Access MVP |
#46
|
|||
|
|||
What GROUP BY is not...
Michel Walsh wrote:
To illustrate what GROUP BY is and is NOT, so things can be clear in mind. [snip] I don't see where you are going here. The OP was about what was quickest; Group By (without aggregating) or DISTINCT. You are now going on about the fact that Jet might not use the best plan for Group By queries. That is beside the point isn't it? If I need to aggregate on groups then I must use Group By and whether Jet does that in the absolute most efficient manner is irrelevant. And as I stated before if I don't need to aggregate but only want distinct values then I use the DISTINCT clause. I see no point in creating off the wall query solutions that are not going to self document in order to save a few fractions of a second. If it will save more than that then either the database is poorly designed or has been built with the wrong database engine. -- Rick Brandt, Microsoft Access MVP Email (as appropriate) to... RBrandt at Hunter dot com |
#47
|
|||
|
|||
What GROUP BY is not...
The OP did asked what was the quickest, yes, but some answers, to stay
polite, let call it that way, he got where about there was a whole world of difference between a GROUP BY and DISTINCT. I continue to claim both syntax are logically the same, and bring back-up about my claim (as MS SQL Server using the same plan of executions, and, recently, explanations about this plan, since some people seem to continue to think those are two different things). Few milliseconds, for small tables, but as someone else pointed it out, and I found the same order of magnitude with using one indexed field, with JET, the ratio of execution time is by an order of magnitude, so that while DISTINCT took, for me, on average, 1 sec, the GROUP BY took 11 sec. It is not ONLY by a few milliseconds. And since the message seems to have problem to pass, I am glad to repeat it: SQL is not about what are your intentions are, neither how to solve a problem but ABOUT A DESCRIPTION of what we want. DISTINCT and GROUP BY (without aggregate) are logically equivalent, and even more, GROUP BY WITH AGGREGATE is free lunch by comparison with the same statement WITHOUT AGGREGATE, since computing the aggregate is negligible in comparison with the sorting. If you use DISTINCT when you don't want aggregate, great, but you still use inefficient GROUP BY when you need aggregate. If you are glad and happy camper with that, I am glad for you, should I not? Vanderghast, Access MVP "Rick Brandt" wrote in message et... Michel Walsh wrote: To illustrate what GROUP BY is and is NOT, so things can be clear in mind. [snip] I don't see where you are going here. The OP was about what was quickest; Group By (without aggregating) or DISTINCT. You are now going on about the fact that Jet might not use the best plan for Group By queries. That is beside the point isn't it? If I need to aggregate on groups then I must use Group By and whether Jet does that in the absolute most efficient manner is irrelevant. And as I stated before if I don't need to aggregate but only want distinct values then I use the DISTINCT clause. I see no point in creating off the wall query solutions that are not going to self document in order to save a few fractions of a second. If it will save more than that then either the database is poorly designed or has been built with the wrong database engine. -- Rick Brandt, Microsoft Access MVP Email (as appropriate) to... RBrandt at Hunter dot com |
#48
|
|||
|
|||
What GROUP BY is not...
If the user is not expert and not familiar with editing SQL, have only used
the GUI for simple queries, then a click on the GROUP BY button, however amatuerish, will probably get them to the desired result quicker. If they are even less expert than that, learning what the GROUP BY button does will probably be beneficial sooner that learning what DISTINCT does, and how to apply it. "Michel Walsh" vanderghast@VirusAreFunnierThanSpam wrote in message ... The OP did asked what was the quickest, yes, but some answers, to stay polite, let call it that way, he got where about there was a whole world of difference between a GROUP BY and DISTINCT. I continue to claim both syntax are logically the same, and bring back-up about my claim (as MS SQL Server using the same plan of executions, and, recently, explanations about this plan, since some people seem to continue to think those are two different things). Few milliseconds, for small tables, but as someone else pointed it out, and I found the same order of magnitude with using one indexed field, with JET, the ratio of execution time is by an order of magnitude, so that while DISTINCT took, for me, on average, 1 sec, the GROUP BY took 11 sec. It is not ONLY by a few milliseconds. And since the message seems to have problem to pass, I am glad to repeat it: SQL is not about what are your intentions are, neither how to solve a problem but ABOUT A DESCRIPTION of what we want. DISTINCT and GROUP BY (without aggregate) are logically equivalent, and even more, GROUP BY WITH AGGREGATE is free lunch by comparison with the same statement WITHOUT AGGREGATE, since computing the aggregate is negligible in comparison with the sorting. If you use DISTINCT when you don't want aggregate, great, but you still use inefficient GROUP BY when you need aggregate. If you are glad and happy camper with that, I am glad for you, should I not? Vanderghast, Access MVP "Rick Brandt" wrote in message et... Michel Walsh wrote: To illustrate what GROUP BY is and is NOT, so things can be clear in mind. [snip] I don't see where you are going here. The OP was about what was quickest; Group By (without aggregating) or DISTINCT. You are now going on about the fact that Jet might not use the best plan for Group By queries. That is beside the point isn't it? If I need to aggregate on groups then I must use Group By and whether Jet does that in the absolute most efficient manner is irrelevant. And as I stated before if I don't need to aggregate but only want distinct values then I use the DISTINCT clause. I see no point in creating off the wall query solutions that are not going to self document in order to save a few fractions of a second. If it will save more than that then either the database is poorly designed or has been built with the wrong database engine. -- Rick Brandt, Microsoft Access MVP Email (as appropriate) to... RBrandt at Hunter dot com |
#50
|
|||
|
|||
What GROUP BY is not...
one question then :
Does the jet engine sort the selected data before grouping them? If I had to write the function my self, I would create two queries one for a small amount of data and the other for considerable amount of data. Because sorting the data would be useful to avoid make a useless loop on the list of data to display to check if it's displayed already or not, because it's the same as the previous record of the sorted data... "Michel Walsh" vanderghast@VirusAreFunnierThanSpam a écrit dans le message de news: ... The OP did asked what was the quickest, yes, but some answers, to stay polite, let call it that way, he got where about there was a whole world of difference between a GROUP BY and DISTINCT. I continue to claim both syntax are logically the same, and bring back-up about my claim (as MS SQL Server using the same plan of executions, and, recently, explanations about this plan, since some people seem to continue to think those are two different things). Few milliseconds, for small tables, but as someone else pointed it out, and I found the same order of magnitude with using one indexed field, with JET, the ratio of execution time is by an order of magnitude, so that while DISTINCT took, for me, on average, 1 sec, the GROUP BY took 11 sec. It is not ONLY by a few milliseconds. And since the message seems to have problem to pass, I am glad to repeat it: SQL is not about what are your intentions are, neither how to solve a problem but ABOUT A DESCRIPTION of what we want. DISTINCT and GROUP BY (without aggregate) are logically equivalent, and even more, GROUP BY WITH AGGREGATE is free lunch by comparison with the same statement WITHOUT AGGREGATE, since computing the aggregate is negligible in comparison with the sorting. If you use DISTINCT when you don't want aggregate, great, but you still use inefficient GROUP BY when you need aggregate. If you are glad and happy camper with that, I am glad for you, should I not? Vanderghast, Access MVP "Rick Brandt" wrote in message et... Michel Walsh wrote: To illustrate what GROUP BY is and is NOT, so things can be clear in mind. [snip] I don't see where you are going here. The OP was about what was quickest; Group By (without aggregating) or DISTINCT. You are now going on about the fact that Jet might not use the best plan for Group By queries. That is beside the point isn't it? If I need to aggregate on groups then I must use Group By and whether Jet does that in the absolute most efficient manner is irrelevant. And as I stated before if I don't need to aggregate but only want distinct values then I use the DISTINCT clause. I see no point in creating off the wall query solutions that are not going to self document in order to save a few fractions of a second. If it will save more than that then either the database is poorly designed or has been built with the wrong database engine. -- Rick Brandt, Microsoft Access MVP Email (as appropriate) to... RBrandt at Hunter dot com |
Thread Tools | |
Display Modes | |
|
|