A Microsoft Office (Excel, Word) forum. OfficeFrustration

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » OfficeFrustration forum » Microsoft Excel » General Discussion
Site Map Home Register Authors List Search Today's Posts Mark Forums Read  

searching a large database with a long list of search terms



 
 
Thread Tools Display Modes
  #21  
Old January 7th, 2006, 04:20 AM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

See my test results.

I probably would've gone with what I did in my test 2.

As you can see, the calc times were pretty much the same (what I would've
done versus what you did). I don't like to use a lot of helper cells if I
don't have to. The monster formula was the hands-down winner, but who
"likes" monster formulas? They tend to scare people away!

Biff

"Max" wrote in message
...
Biff,

What would have been your suggestion to the OP,
had not the "volume" got in the way ?
--
Rgds
Max
xl 97
---
Singapore, GMT+8
xdemechanik
http://savefile.com/projects/236895
--




  #22  
Old January 7th, 2006, 04:37 AM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

Comments/suggestions welcome!

My compliments on your effort, Biff. Amazing.

Sorting the data can speed up calc time significantly in "large" files.
The use of 1000's of volatile functions should be avoided at all costs!
Monster formulas aren't all bad!


Looks like what I suggested should hence be trashed permanently g
Better to have 6 similar formulas pointing to each of the 6 sheets

I was unable to copy/drag in a single operation.


FWIW, my m/c was able to fill 6 C x 3000 R, over here.
(no freezing)

Btw, think there was a slight interp difference in that I presumed what the
OP would like to have was to match & extract separate returns from each of
the 6 sheets for all the 3000 lines. Then he could assess further what he
wants done.
I didn't assume any "precedence" order in the 6 data sheets.

--
Rgds
Max
xl 97
---
Singapore, GMT+8
xdemechanik
http://savefile.com/projects/236895
--


  #23  
Old January 7th, 2006, 11:32 AM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

I probably would've gone with what I did in my test 2.

... the super complex array g ?
filled down only in 1 col x 3000 rows
(and which also contains the same volatile INDIRECT ?)

2 observations:

1. The returns are different, ref explanation in my response to you earlier
(pasted below), because of different interps of what the OP wanted. My
suggestion simply lines up all the returns from the 6 sheets for each of the
3000 lines in 6 cols. Yours return results in 1 col, with an implicit
precedence order assumed in the 6 data sheets. Nothing wrong there, just
different interps.

Btw, think there was a slight interp difference in that I presumed what

the
OP would like to have was to match & extract separate returns from each of
the 6 sheets for all the 3000 lines. Then he could assess further what he
wants done.
I didn't assume any "precedence" order in the 6 data sheets.


As you can see, the calc times were pretty much the same ..


2. Wondering whether the calc times would be any different if say, your
array formula were to be modified to return similarly as mine the results in
6 cols ?

Just some thoughts, Biff g
--
Rgds
Max
xl 97
---
Singapore, GMT+8
xdemechanik
http://savefile.com/projects/236895
--


  #24  
Old January 7th, 2006, 11:40 AM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

Better to have 6 similar formulas pointing to each of the 6 sheets

What's meant is simply ..

In D1:
=IF($A1="","",IF(ISNA(MATCH($A1,'1'!$A:$A,0)),"NO
ENTRY",VLOOKUP($A1,'1'!$A:$B,2,0)))

In E1:
=IF($A1="","",IF(ISNA(MATCH($A1,'2'!$A:$A,0)),"NO
ENTRY",VLOOKUP($A1,'2'!$A:$B,2,0)))

and so on in F1 till I1 (the same formula essentially but with the
sheetnames changed accordingly to '3', '4', '5', and '6')

D1:I1 is copied down 3000 rows
--
Rgds
Max
xl 97
---
Singapore, GMT+8
xdemechanik
http://savefile.com/projects/236895
--


  #25  
Old January 7th, 2006, 11:56 AM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

In self-retrospect, I should have suggested the foregoing simpler formulas
(w/o the INDIRECT). It only takes less than a minute to manually edit the
other 5 sheetnames ! urgh ..
--
Rgds
Max
xl 97
---
Singapore, GMT+8
xdemechanik
http://savefile.com/projects/236895
--


  #26  
Old January 7th, 2006, 01:28 PM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

Biff,

my congratulations also on a mammoth effort!

My interpretation of the OP's original request was that he wanted one
result - I had imagined that he had one massive lookup table (of 393000
rows) which had to be split into 6 because of Excel's row limit of 64k.
If this were the case, I'm not sure if he (or you or Max) ensured that
there were no duplicates between the sheets.

Very interesting - well done!

Pete

  #27  
Old January 7th, 2006, 06:32 PM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

1. The returns are different, ref explanation in my response to you earlier
(pasted below), because of different interps of what the OP wanted. My
suggestion simply lines up all the returns from the 6 sheets for each of
the
3000 lines in 6 cols. Yours return results in 1 col, with an implicit
precedence order assumed in the 6 data sheets.


I'm not following you on this????

My interp is that on the summary are lookup values that may or may not be on
any one of 6 sheets. Basically, it's just a lookup across multiple sheets. I
don't understand your use of "precedence" ???????

(and which also contains the same volatile INDIRECT ?)


See my "conclusion" !

The problem with this is convincing people (maybe even ones' self) that the
use of a monster nested IF/VLOOKUP is the best way to go!

Biff

"Max" wrote in message
...
I probably would've gone with what I did in my test 2.


.. the super complex array g ?
filled down only in 1 col x 3000 rows
(and which also contains the same volatile INDIRECT ?)

2 observations:

1. The returns are different, ref explanation in my response to you
earlier
(pasted below), because of different interps of what the OP wanted. My
suggestion simply lines up all the returns from the 6 sheets for each of
the
3000 lines in 6 cols. Yours return results in 1 col, with an implicit
precedence order assumed in the 6 data sheets. Nothing wrong there, just
different interps.

Btw, think there was a slight interp difference in that I presumed what

the
OP would like to have was to match & extract separate returns from each
of
the 6 sheets for all the 3000 lines. Then he could assess further what
he
wants done.
I didn't assume any "precedence" order in the 6 data sheets.


As you can see, the calc times were pretty much the same ..


2. Wondering whether the calc times would be any different if say, your
array formula were to be modified to return similarly as mine the results
in
6 cols ?

Just some thoughts, Biff g
--
Rgds
Max
xl 97
---
Singapore, GMT+8
xdemechanik
http://savefile.com/projects/236895
--




  #28  
Old January 7th, 2006, 10:53 PM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

Hi Biff

Thank you for this mammoth effort, and for sharing the results with us.
They make fascinating reading.
Whilst for some while now I have tried to avoid Indirect functions when
there are lots of formulae and/or data involved, I always used to use
Vlookup, but more recently I have made much more use of INDEX(),
MATCH().

I wondered whether, with this mass of data, there would be any
significant difference in calculation time if one used the format
=INDEX(Sheet2!A:B,MATCH(A2,Sheet1!A:A,0),2)
in place of
=VLOOKUP(A2,Sheet1!A:B,2,0)
throughout the formulae.
The formulae would be longer, and look more horrendous, but I wonder
whether there would be any speed difference.

If you had the time (and inclination) to carry out this test with the
data you already have set up, I would be most interested to see the
results.

--
Regards

Roger Govier


"Biff" wrote in message
...
Here are the results of 3 tests:

Computer specs:

Pentium P4, 2.0 ghz, 256 Mb ram, WinXP (all service packs, all
patches), Excel 2002 (XP) (all service packs)
Other than the operating system, Excel is the only app running.

File configuration: (based on the OPs description)

7 sheets total, 1 summary, 6 data

Summary sheet(1): 3 columns x 3000 rows. Lookup values in column A,
A1:A3000
Data sheets(6): 2 columns x 65536 rows

Test 1 (based on the reply from Max)

File size (wo/formulas) - 26.6 Mb
File size (w/formulas) - 27.5 Mb

This formula was copied to 6 columns x 3000 rows:

=IF($A1="","",IF(ISNA(MATCH($A1,INDIRECT("'"&COLUM N(A1)&"'!A:A"),0)),"NO
ENTRY",VLOOKUP($A1,INDIRECT("'"&COLUMN(A1)&"'!A:B" ),2,0)))

I was unable to copy/drag in a single operation. When I tried, Excel
"froze-up". I had to use Task Manager to regain control. Tried twice
and Excel "froze" both times. I had to drag copy in increments of ~200
rows at a time. I didn't time this but to copy to all 3000 rows took
at least an hour. (calculation was on automatic) After all formulas
were copied:

Calc time (data sheets unsorted) ~6:45 (m:ss)
Calc time (data sheets sorted ascending) ~3:30 (m:ss)

Test 2

Deleted all the above formulas, reset the used range.

Used this array formula copied to 1 column x 3000 rows:

=IF(SUMPRODUCT(COUNTIF(INDIRECT("'"&WSlist&"'!A:A" ),A1)),VLOOKUP(A1,INDIRECT("'"&INDEX(WSlist,MATCH( TRUE,COUNTIF(INDIRECT("'"&WSlist&"'!A:A"),A1)0,0) )&"'!A:B"),2,0),"")

After all formulas were copied:

File size (w/formulas) - 27.2 Mb
Calc time (data sheets sorted ascending) ~3:35 (m:ss)

Test 3

Deleted all the above formulas. Decided to try a monster nested IF
formula but I hit the nested function limit so I split the formula
into 2 cells. I cell formula did the lookup on sheets 2,3,4. The other
cell formula did the lookup on sheets 5,6,7.

=IF(NOT(ISERROR(VLOOKUP(A1,Sheet2!A:B,2,0))),VLOOK UP(A1,Sheet2!A:B,2,0),IF(NOT(ISERROR(VLOOKUP(A1,Sh eet3!A:B,2,0))),VLOOKUP(A1,Sheet3!A:B,2,0),IF(NOT( ISERROR(VLOOKUP(A1,Sheet4!A:B,2,0))),VLOOKUP(A1,Sh eet4!A:B,2,0),"")))

=IF(D1="",IF(NOT(ISERROR(VLOOKUP(A1,Sheet5!A:B,2,0 ))),VLOOKUP(A1,Sheet5!A:B,2,0),IF(NOT(ISERROR(VLOO KUP(A1,Sheet6!A:B,2,0))),VLOOKUP(A1,Sheet6!A:B,2,0 ),IF(NOT(ISERROR(VLOOKUP(A1,Sheet7!A:B,2,0))),VLOO KUP(A1,Sheet7!A:B,2,0),"No
Entry"))),"")

After all formulas were copied:

File size (w/formulas) - 28.2 Mb
Calc time (data sheets sorted ascending) ~1 second

I did not test using unsorted data sheets in tests 2 and 3.

Conclusion:

Sorting the data can speed up calc time significantly in "large"
files. The use of 1000's of volatile functions should be avoided at
all costs! Monster formulas aren't all bad!

Comments/suggestions welcome!

Biff

"Biff" wrote in message
...
If anyone is still following this thread I'll do some tests and post
the results. Stay tuned!

Biff

"Pete_UK" wrote in message
ups.com...
Max,

The OP didn't get back to me when I asked about sorting the
reference
data beforehand. Can you sort your random data in the 6 sheets then
re-apply your formula to take advantage of this to see if there is a
big increase in speed? In theory, the binary search technique
applied
if the data is sorted should make a massive difference to 6 * 65536
entries.

Pete







  #29  
Old January 7th, 2006, 11:31 PM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

My interp is that on the summary are lookup values
that may or may not be on any one of 6 sheets.
Basically, it's just a lookup across multiple sheets. I
don't understand your use of "precedence" ???????


My presumption was that there could be multiple "city" returns for the same
lookup value in col A within the 6 sheets. And that the OP might want to
see all of it before deciding next steps.

An illustration ..
For eg: for "aaa", there could be the data
for "aaa" in sheets: 1,2,3,4 such as:

aaa new york (in sheet: 1)
aaa chicago (in sheet: 2)
aaa miami (in sheet: 3)
aaa houston (in sheet: 4)
[ No "aaa" in sheets 5 & 6 (say) ]

Using nested IF(ISNA(VLOOKUP1),IF(ISNA(VLOOKUP2), ... ), or, your array
formula would return only the "1st" matching value, depending on how the
nested "IF(ISNA(VLOOKUP.." is structured (i.e. the "precedence" order: Check
sheet: 1 first, then check sheet; 2, then sheet: 3, and so on). Or, in your
array, depending on the order that the sheets are listed in WSList. If I
list: 1 as the 1st sheet (at the top in WSList), it returns: new york.
Change the top to: 2, it'll return: chicago. "3" will return: miami. And so
on.
(Btw, it was a nice array, Biff.)

Hope the above clarifies a little better what I meant by "precedence".
Just slightly different interps on the OP's needs, nothing wrong either way.
--
Rgds
Max
xl 97
---
Singapore, GMT+8
xdemechanik
http://savefile.com/projects/236895
--


  #30  
Old January 8th, 2006, 02:16 AM posted to microsoft.public.excel.misc,microsoft.public.excel.programming,microsoft.public.excel
external usenet poster
 
Posts: n/a
Default searching a large database with a long list of search terms

Hope the above clarifies a little better what I meant by "precedence".

OK, now I see.

Biff

"Max" wrote in message
...
My interp is that on the summary are lookup values
that may or may not be on any one of 6 sheets.
Basically, it's just a lookup across multiple sheets. I
don't understand your use of "precedence" ???????


My presumption was that there could be multiple "city" returns for the
same
lookup value in col A within the 6 sheets. And that the OP might want to
see all of it before deciding next steps.

An illustration ..
For eg: for "aaa", there could be the data
for "aaa" in sheets: 1,2,3,4 such as:

aaa new york (in sheet: 1)
aaa chicago (in sheet: 2)
aaa miami (in sheet: 3)
aaa houston (in sheet: 4)
[ No "aaa" in sheets 5 & 6 (say) ]

Using nested IF(ISNA(VLOOKUP1),IF(ISNA(VLOOKUP2), ... ), or, your array
formula would return only the "1st" matching value, depending on how the
nested "IF(ISNA(VLOOKUP.." is structured (i.e. the "precedence" order:
Check
sheet: 1 first, then check sheet; 2, then sheet: 3, and so on). Or, in
your
array, depending on the order that the sheets are listed in WSList. If I
list: 1 as the 1st sheet (at the top in WSList), it returns: new york.
Change the top to: 2, it'll return: chicago. "3" will return: miami. And
so
on.
(Btw, it was a nice array, Biff.)

Hope the above clarifies a little better what I meant by "precedence".
Just slightly different interps on the OP's needs, nothing wrong either
way.
--
Rgds
Max
xl 97
---
Singapore, GMT+8
xdemechanik
http://savefile.com/projects/236895
--




 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Format cells with a formula (7 conditions). danindenver General Discussion 3 January 2nd, 2006 02:40 PM
Tasks, Assignments and Projects Database Structure. Bernard Piette Database Design 2 December 21st, 2005 01:30 PM
HELP! CANNOT CONNECT TO SQL SERVER Glint General Discussion 19 May 9th, 2005 02:47 PM
SUGGESTION: Shape search enhancements tlonski Visio 1 November 27th, 2004 09:39 PM
synchronizing form and list box Deb Smith Using Forms 8 June 21st, 2004 08:15 PM


All times are GMT +1. The time now is 07:42 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 OfficeFrustration.
The comments are property of their posters.