A Microsoft Office (Excel, Word) forum. OfficeFrustration

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » OfficeFrustration forum » Microsoft Excel » General Discussion
Site Map Home Register Authors List Search Today's Posts Mark Forums Read  

searching a large database with a long list of search terms



 
 
Thread Tools Display Modes
  #31  
Old January 8th, 2006, 12:13 PM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

... I'm not sure if he (or you or Max) ensured that
there were no duplicates between the sheets.


Pete,

I followed the OP's response (below)
to David (the first responder) as the "state-of-events":

David asked:
.. The other 6 sheets, where the lookup is taking place, there are not
duplicate lookup values, ie aaa, bbb, ccc would only have a single entry
somewhere on the six other sheets and only one city associated with each
lookup value?


OP's reply to David:
Yes, these are unique.


Also, as clarified in my responses since:

.. I presumed what the OP would like to have
was to match & extract separate returns from each of the 6 sheets
for all the 3000 lines. Then he could assess further what he wants done.


My suggestion simply lines up all the returns from the 6 sheets
for each of the 3000 lines in 6 cols.

--
Rgds
Max
xl 97
---
Singapore, GMT+8
xdemechanik
http://savefile.com/projects/236895
--


  #32  
Old January 8th, 2006, 07:39 PM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

Roger,

I still have my test files. I'll tinker around with this sometime Sunday
night.

The thing that surprised me the most was the file size. I knew it would be
big but did not think it would be 27 - 28 Mb big!

Biff

"Roger Govier" wrote in message
...
Hi Biff

Thank you for this mammoth effort, and for sharing the results with us.
They make fascinating reading.
Whilst for some while now I have tried to avoid Indirect functions when
there are lots of formulae and/or data involved, I always used to use
Vlookup, but more recently I have made much more use of INDEX(), MATCH().

I wondered whether, with this mass of data, there would be any significant
difference in calculation time if one used the format
=INDEX(Sheet2!A:B,MATCH(A2,Sheet1!A:A,0),2)
in place of
=VLOOKUP(A2,Sheet1!A:B,2,0)
throughout the formulae.
The formulae would be longer, and look more horrendous, but I wonder
whether there would be any speed difference.

If you had the time (and inclination) to carry out this test with the data
you already have set up, I would be most interested to see the results.

--
Regards

Roger Govier


"Biff" wrote in message
...
Here are the results of 3 tests:

Computer specs:

Pentium P4, 2.0 ghz, 256 Mb ram, WinXP (all service packs, all patches),
Excel 2002 (XP) (all service packs)
Other than the operating system, Excel is the only app running.

File configuration: (based on the OPs description)

7 sheets total, 1 summary, 6 data

Summary sheet(1): 3 columns x 3000 rows. Lookup values in column A,
A1:A3000
Data sheets(6): 2 columns x 65536 rows

Test 1 (based on the reply from Max)

File size (wo/formulas) - 26.6 Mb
File size (w/formulas) - 27.5 Mb

This formula was copied to 6 columns x 3000 rows:

=IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUM N(A1)&"'!A:A"),0)),"NO
ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B" ),2,0)))

I was unable to copy/drag in a single operation. When I tried, Excel
"froze-up". I had to use Task Manager to regain control. Tried twice and
Excel "froze" both times. I had to drag copy in increments of ~200 rows
at a time. I didn't time this but to copy to all 3000 rows took at least
an hour. (calculation was on automatic) After all formulas were copied:

Calc time (data sheets unsorted) ~6:45 (m:ss)
Calc time (data sheets sorted ascending) ~3:30 (m:ss)

Test 2

Deleted all the above formulas, reset the used range.

Used this array formula copied to 1 column x 3000 rows:

=IF(SUMPRODUCT(COUNTIF(INDIRECT("'"&WSlist&"'!A:A" ),A1)),VLOOKUP(A1,INDIRECT("'"&INDEX(WSlist,MATCH( TRUE,COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)0,0) )&"'!A:B"),2,0),"")

After all formulas were copied:

File size (w/formulas) - 27.2 Mb
Calc time (data sheets sorted ascending) ~3:35 (m:ss)

Test 3

Deleted all the above formulas. Decided to try a monster nested IF
formula but I hit the nested function limit so I split the formula into 2
cells. I cell formula did the lookup on sheets 2,3,4. The other cell
formula did the lookup on sheets 5,6,7.

=IF(NOT(ISERROR(VLOOKUP(A1,Sheet2!A:B,2,0))),VLOOK UP(A1,Sheet2!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sh eet3!A:B,2,0))),VLOOKUP(A1,Sheet3!A:B,2,0),IF(NOT( ISERROR(VLOOKUP(A1,Sheet4!A:B,2,0))),VLOOKUP(A1,Sh eet4!A:B,2,0),"")))

=IF(D1="",IF(NOT(ISERROR(VLOOKUP(A1,Sheet5!A:B,2,0 ))),VLOOKUP(A1,Sheet5!A:B,2,0),IF(NOT(ISERROR(VLOO KUP(A1,Sheet6!A:B,2,0))),VLOOKUP(A1,Sheet6!A:B,2,0 ),IF(NOT(ISERROR(VLOOKUP(A1,Sheet7!A:B,2,0))),VLOO KUP(A1,Sheet7!A:B,2,0),"No
Entry"))),"")

After all formulas were copied:

File size (w/formulas) - 28.2 Mb
Calc time (data sheets sorted ascending) ~1 second

I did not test using unsorted data sheets in tests 2 and 3.

Conclusion:

Sorting the data can speed up calc time significantly in "large" files.
The use of 1000's of volatile functions should be avoided at all costs!
Monster formulas aren't all bad!

Comments/suggestions welcome!

Biff

"Biff" wrote in message
...
If anyone is still following this thread I'll do some tests and post the
results. Stay tuned!

Biff

"Pete_UK" wrote in message
ups.com...
Max,

The OP didn't get back to me when I asked about sorting the reference
data beforehand. Can you sort your random data in the 6 sheets then
re-apply your formula to take advantage of this to see if there is a
big increase in speed? In theory, the binary search technique applied
if the data is sorted should make a massive difference to 6 * 65536
entries.

Pete









  #33  
Old January 9th, 2006, 03:14 AM posted to microsoft.public.excel.misc
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

I went skiing today so was only able to get to this tonight. Thanks
guys, this is good info. Most of what you're talking about is way
beyond me though

You know what I should do since this is a little much for me it
seems... I should just write up exactly what I'm after and then let
folks give me a price to do it (probably with a macro). Is that
allowed on this group? I really don't have access to "excel"
programmers locally and if I did they would propabably charge a great
deal of money per hour where some sharp on this list could do it for
extra cash. I'm guessing that it probably wouldn't take more than a
couple of hours, maybe even an hour, for someone familiar with macros.

Good tip about Access but I don't have that program and if I did would
be even more lost than I am with Excel

Joe

  #34  
Old January 10th, 2006, 05:58 AM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

Hmmm........

This is turning into a "major" undertaking!

Although the calc times I reported in my other post are correct and
"accurate", I failed to disclose that the calc times do not reflect the fact
that no changes were made to any of the lookup table data. This means that
the lookup formulas did not have to recalculate (excluding those with
volatile functions) with any subsequent workbook calculations. I just
assumed that at this point the formulas and tables would remain "static" and
the calc times I reported were for any new data or formulas entered that
were not associated with the lookups!

That was a major gaff on my part!

So, that ~1 second calc time for the monster IF/VLOOKUP only pertains to
calc time for new data/formula entry.

To thoroughly test these operations/concepts in this context takes a lot of
time and patience. I think that it has already been demonstrated that under
these parameters Excel is going to be "very" slow. For a non-static,
actively used spreadsheet to take upwards of 3 to 4 minutes (at the minimum
end) to calculate, is to me, unacceptable.

I guess this would be "easier" to do if one had benchmarking software!

And with those observations I think I'll end my "tests".

Biff

"Biff" wrote in message
...
Roger,

I still have my test files. I'll tinker around with this sometime Sunday
night.

The thing that surprised me the most was the file size. I knew it would be
big but did not think it would be 27 - 28 Mb big!

Biff

"Roger Govier" wrote in message
...
Hi Biff

Thank you for this mammoth effort, and for sharing the results with us.
They make fascinating reading.
Whilst for some while now I have tried to avoid Indirect functions when
there are lots of formulae and/or data involved, I always used to use
Vlookup, but more recently I have made much more use of INDEX(), MATCH().

I wondered whether, with this mass of data, there would be any
significant difference in calculation time if one used the format
=INDEX(Sheet2!A:B,MATCH(A2,Sheet1!A:A,0),2)
in place of
=VLOOKUP(A2,Sheet1!A:B,2,0)
throughout the formulae.
The formulae would be longer, and look more horrendous, but I wonder
whether there would be any speed difference.

If you had the time (and inclination) to carry out this test with the
data you already have set up, I would be most interested to see the
results.

--
Regards

Roger Govier


"Biff" wrote in message
...
Here are the results of 3 tests:

Computer specs:

Pentium P4, 2.0 ghz, 256 Mb ram, WinXP (all service packs, all patches),
Excel 2002 (XP) (all service packs)
Other than the operating system, Excel is the only app running.

File configuration: (based on the OPs description)

7 sheets total, 1 summary, 6 data

Summary sheet(1): 3 columns x 3000 rows. Lookup values in column A,
A1:A3000
Data sheets(6): 2 columns x 65536 rows

Test 1 (based on the reply from Max)

File size (wo/formulas) - 26.6 Mb
File size (w/formulas) - 27.5 Mb

This formula was copied to 6 columns x 3000 rows:

=IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUM N(A1)&"'!A:A"),0)),"NO
ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B" ),2,0)))

I was unable to copy/drag in a single operation. When I tried, Excel
"froze-up". I had to use Task Manager to regain control. Tried twice and
Excel "froze" both times. I had to drag copy in increments of ~200 rows
at a time. I didn't time this but to copy to all 3000 rows took at least
an hour. (calculation was on automatic) After all formulas were copied:

Calc time (data sheets unsorted) ~6:45 (m:ss)
Calc time (data sheets sorted ascending) ~3:30 (m:ss)

Test 2

Deleted all the above formulas, reset the used range.

Used this array formula copied to 1 column x 3000 rows:

=IF(SUMPRODUCT(COUNTIF(INDIRECT("'"&WSlist&"'!A:A" ),A1)),VLOOKUP(A1,INDIRECT("'"&INDEX(WSlist,MATCH( TRUE,COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)0,0) )&"'!A:B"),2,0),"")

After all formulas were copied:

File size (w/formulas) - 27.2 Mb
Calc time (data sheets sorted ascending) ~3:35 (m:ss)

Test 3

Deleted all the above formulas. Decided to try a monster nested IF
formula but I hit the nested function limit so I split the formula into
2 cells. I cell formula did the lookup on sheets 2,3,4. The other cell
formula did the lookup on sheets 5,6,7.

=IF(NOT(ISERROR(VLOOKUP(A1,Sheet2!A:B,2,0))),VLOOK UP(A1,Sheet2!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sh eet3!A:B,2,0))),VLOOKUP(A1,Sheet3!A:B,2,0),IF(NOT( ISERROR(VLOOKUP(A1,Sheet4!A:B,2,0))),VLOOKUP(A1,Sh eet4!A:B,2,0),"")))

=IF(D1="",IF(NOT(ISERROR(VLOOKUP(A1,Sheet5!A:B,2,0 ))),VLOOKUP(A1,Sheet5!A:B,2,0),IF(NOT(ISERROR(VLOO KUP(A1,Sheet6!A:B,2,0))),VLOOKUP(A1,Sheet6!A:B,2,0 ),IF(NOT(ISERROR(VLOOKUP(A1,Sheet7!A:B,2,0))),VLOO KUP(A1,Sheet7!A:B,2,0),"No
Entry"))),"")

After all formulas were copied:

File size (w/formulas) - 28.2 Mb
Calc time (data sheets sorted ascending) ~1 second

I did not test using unsorted data sheets in tests 2 and 3.

Conclusion:

Sorting the data can speed up calc time significantly in "large" files.
The use of 1000's of volatile functions should be avoided at all costs!
Monster formulas aren't all bad!

Comments/suggestions welcome!

Biff

"Biff" wrote in message
...
If anyone is still following this thread I'll do some tests and post
the results. Stay tuned!

Biff

"Pete_UK" wrote in message
ups.com...
Max,

The OP didn't get back to me when I asked about sorting the reference
data beforehand. Can you sort your random data in the 6 sheets then
re-apply your formula to take advantage of this to see if there is a
big increase in speed? In theory, the binary search technique applied
if the data is sorted should make a massive difference to 6 * 65536
entries.

Pete











  #35  
Old January 10th, 2006, 06:23 AM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

Hi Biff

Thank you for your efforts and observations.
I must admit I was very surprised at the ~ 1 second time, especially for
such a large file size. Thankfully I have never had to deal with
anything of that size but one client's data file does grow to around 9
MB by the end of each year, and that is where I am always looking to see
if I can make changes to improve the existing performance, which does at
times border on the limits of patience!!!

--
Regards

Roger Govier


"Biff" wrote in message
...
Hmmm........

This is turning into a "major" undertaking!

Although the calc times I reported in my other post are correct and
"accurate", I failed to disclose that the calc times do not reflect
the fact that no changes were made to any of the lookup table data.
This means that the lookup formulas did not have to recalculate
(excluding those with volatile functions) with any subsequent workbook
calculations. I just assumed that at this point the formulas and
tables would remain "static" and the calc times I reported were for
any new data or formulas entered that were not associated with the
lookups!

That was a major gaff on my part!

So, that ~1 second calc time for the monster IF/VLOOKUP only pertains
to calc time for new data/formula entry.

To thoroughly test these operations/concepts in this context takes a
lot of time and patience. I think that it has already been
demonstrated that under these parameters Excel is going to be "very"
slow. For a non-static, actively used spreadsheet to take upwards of
3 to 4 minutes (at the minimum end) to calculate, is to me,
unacceptable.

I guess this would be "easier" to do if one had benchmarking software!

And with those observations I think I'll end my "tests".

Biff

"Biff" wrote in message
...
Roger,

I still have my test files. I'll tinker around with this sometime
Sunday night.

The thing that surprised me the most was the file size. I knew it
would be big but did not think it would be 27 - 28 Mb big!

Biff

"Roger Govier" wrote in message
...
Hi Biff

Thank you for this mammoth effort, and for sharing the results with
us. They make fascinating reading.
Whilst for some while now I have tried to avoid Indirect functions
when there are lots of formulae and/or data involved, I always used
to use Vlookup, but more recently I have made much more use of
INDEX(), MATCH().

I wondered whether, with this mass of data, there would be any
significant difference in calculation time if one used the format
=INDEX(Sheet2!A:B,MATCH(A2,Sheet1!A:A,0),2)
in place of
=VLOOKUP(A2,Sheet1!A:B,2,0)
throughout the formulae.
The formulae would be longer, and look more horrendous, but I wonder
whether there would be any speed difference.

If you had the time (and inclination) to carry out this test with
the data you already have set up, I would be most interested to see
the results.

--
Regards

Roger Govier


"Biff" wrote in message
...
Here are the results of 3 tests:

Computer specs:

Pentium P4, 2.0 ghz, 256 Mb ram, WinXP (all service packs, all
patches), Excel 2002 (XP) (all service packs)
Other than the operating system, Excel is the only app running.

File configuration: (based on the OPs description)

7 sheets total, 1 summary, 6 data

Summary sheet(1): 3 columns x 3000 rows. Lookup values in column A,
A1:A3000
Data sheets(6): 2 columns x 65536 rows

Test 1 (based on the reply from Max)

File size (wo/formulas) - 26.6 Mb
File size (w/formulas) - 27.5 Mb

This formula was copied to 6 columns x 3000 rows:

=IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUM N(A1)&"'!A:A"),0)),"NO
ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B" ),2,0)))

I was unable to copy/drag in a single operation. When I tried,
Excel "froze-up". I had to use Task Manager to regain control.
Tried twice and Excel "froze" both times. I had to drag copy in
increments of ~200 rows at a time. I didn't time this but to copy
to all 3000 rows took at least an hour. (calculation was on
automatic) After all formulas were copied:

Calc time (data sheets unsorted) ~6:45 (m:ss)
Calc time (data sheets sorted ascending) ~3:30 (m:ss)

Test 2

Deleted all the above formulas, reset the used range.

Used this array formula copied to 1 column x 3000 rows:

=IF(SUMPRODUCT(COUNTIF(INDIRECT("'"&WSlist&"'!A:A" ),A1)),VLOOKUP(A1,INDIRECT("'"&INDEX(WSlist,MATCH( TRUE,COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)0,0) )&"'!A:B"),2,0),"")

After all formulas were copied:

File size (w/formulas) - 27.2 Mb
Calc time (data sheets sorted ascending) ~3:35 (m:ss)

Test 3

Deleted all the above formulas. Decided to try a monster nested IF
formula but I hit the nested function limit so I split the formula
into 2 cells. I cell formula did the lookup on sheets 2,3,4. The
other cell formula did the lookup on sheets 5,6,7.

=IF(NOT(ISERROR(VLOOKUP(A1,Sheet2!A:B,2,0))),VLOOK UP(A1,Sheet2!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sh eet3!A:B,2,0))),VLOOKUP(A1,Sheet3!A:B,2,0),IF(NOT( ISERROR(VLOOKUP(A1,Sheet4!A:B,2,0))),VLOOKUP(A1,Sh eet4!A:B,2,0),"")))

=IF(D1="",IF(NOT(ISERROR(VLOOKUP(A1,Sheet5!A:B,2,0 ))),VLOOKUP(A1,Sheet5!A:B,2,0),IF(NOT(ISERROR(VLOO KUP(A1,Sheet6!A:B,2,0))),VLOOKUP(A1,Sheet6!A:B,2,0 ),IF(NOT(ISERROR(VLOOKUP(A1,Sheet7!A:B,2,0))),VLOO KUP(A1,Sheet7!A:B,2,0),"No
Entry"))),"")

After all formulas were copied:

File size (w/formulas) - 28.2 Mb
Calc time (data sheets sorted ascending) ~1 second

I did not test using unsorted data sheets in tests 2 and 3.

Conclusion:

Sorting the data can speed up calc time significantly in "large"
files. The use of 1000's of volatile functions should be avoided at
all costs! Monster formulas aren't all bad!

Comments/suggestions welcome!

Biff

"Biff" wrote in message
...
If anyone is still following this thread I'll do some tests and
post the results. Stay tuned!

Biff

"Pete_UK" wrote in message
ups.com...
Max,

The OP didn't get back to me when I asked about sorting the
reference
data beforehand. Can you sort your random data in the 6 sheets
then
re-apply your formula to take advantage of this to see if there
is a
big increase in speed? In theory, the binary search technique
applied
if the data is sorted should make a massive difference to 6 *
65536
entries.

Pete













 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Format cells with a formula (7 conditions). danindenver General Discussion 3 January 2nd, 2006 02:40 PM
Tasks, Assignments and Projects Database Structure. Bernard Piette Database Design 2 December 21st, 2005 01:30 PM
HELP! CANNOT CONNECT TO SQL SERVER Glint General Discussion 19 May 9th, 2005 02:47 PM
SUGGESTION: Shape search enhancements tlonski Visio 1 November 27th, 2004 09:39 PM
synchronizing form and list box Deb Smith Using Forms 8 June 21st, 2004 08:15 PM


All times are GMT +1. The time now is 09:22 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 OfficeFrustration.
The comments are property of their posters.