If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#41
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
Thanks. I'll need to take some time to decipher this. But first, can you
explain: - What's the function of the double minus sign in the formula? - Why does the date separator come into play here at all? The only thing I need to do is to look for a 7-digit number in a text field where we know nothing at all about the rest of the contents. Why should the operation of a formula be dependent upon extraneous information, such as how a date is formatted? The rest of the text string needs to be considered as a fully arbitrary series of characters; in other words, the function needs to work entirely irrespective of the remainder of the field. Thanks for your help. "Rick Rothstein" wrote: I forgot to include the note regarding array-entered formulas. Here is my message again, but with the note... In case you want to consider it, here is a non-RegEx array-entered** formula that will do what you want... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))* ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW (1:30),7))),ROW(1:30))),7) **Commit this formula using Ctrl+Shift+Enter, not just Enter by itself Note though, that this formula is dependent on what your default date separator is. Mine is the slash character (/) and that is what I used in the FIND function call... if your default date separator is a different symbol, then just replace my slash with that character. -- Rick (MVP - Excel) "Rick Rothstein" wrote in message ... In case you want to consider it, here is a non-RegEx array-entered** formula that will do what you want... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))* ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW (1:30),7))),ROW(1:30))),7) Note though, that this formula is dependent on what your default date separator is. Mine is the slash character (/) and that is what I used in the FIND function call... if your default date separator is a different symbol, then just replace my slash with that character. -- Rick (MVP - Excel) "Eric_NY" wrote in message ... I just read your message from last Friday. The text is free format. Users can enter it in whatever format they want. The 7-digit number is somewhere within the text. I've glanced through it and in the samples I've seen, there's no consistency in what appears before or after the 7-digit number. I used the regex solution that Ron Rosenfeld suggested, and adjusted the regular expression by removing the "\b" before and after the "\d{7}". "Rick Rothstein" wrote: I would still be interested in the answer to my questions... "Is the number always precede by a space when the number is interior to the text? What about that "dot" after it... is there always a dot following it? If the dot might not always be there, is there always a space after the number when it is interior to the text?" -- Rick (MVP - Excel) "Eric_NY" wrote in message ... "However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. " I can't find that option. Where should I look for it? Which menu? "Ron Rosenfeld" wrote: On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY wrote: Ron - Thanks. I'm going to try the Longre morefunc add-in. If I use the add-in functions and then send the sheet to someone by email, will the formulas still work? Or does the recipient also have to install the add-in separately on his own machine? Thanks for your help. Ordinarily no. They would have to install it themselves. However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. If you do that, it will then be usable by the recipient with no particular effort on his part. One caution concerning the add-in -- it will not work on strings that are longer than 255 characters. This, apparently is an .xll limitation, and there is no good way around it within the add-in. By the way, if you should use the UDF approach, the UDF should be embedded within the workbook, so its use should be transparent to your user. But there are a lot of other useful functions in morefunc. --ron |
#42
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
First off, before answering your questions, here is a revised formula (the
previous one could not detect an 8 or more digit number in front of the 7 digit number you wanted to find... =MID(A1,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(A1," ","x"),ROW(1:99),7))* ISERR(SEARCH("e",MID(A1,ROW(1:99),7)))*ISERR(FIND( "/",MID(A1,ROW (1:99),7)))*ISERROR(--MID(A1,ROW(1:99),8))*IF(ROW(1:99)1,ISERROR (--MID(A1,ROW(1:99),ROW(1:99)-1))),ROW(1:99))),7) I also changed the limit to text strings up to 99 characters long. If you could have longer text strings for this formula to process, then change *all* the 99's in my formula to a number that is equal to the largest total number of characters that your text could be. I also changed the formulas cell reference from the arbitrary F5 I used in my previous submission to the more standard A1 cell reference that is used when the actual cell reference is unknown. Okay, first off, the double minus sign is a means of converting the text representation of a number into an actual numeric value. Excel will does this conversion automatically whenever the text representation of a numeric value is used in a numerical calculation. Putting the double minus sign in front force Excel to attempt to multiply the text by minus one twice (--TextNumber is the same as doing this... -1*-1*TextNumber If TextNumber is the actual representation of a numerical value, then that number will be returned (because minus one times minus one is equivalent to plus 1); if it is not the actual representation of a numerical value, attempting to multiply it by the first minus sign (which is equivalent to minus 1) will generate an error. So my code uses the double minus sign to attempt to change the text at each point of the array's iteration to see if ISNUMBER and ISERROR is true or not in order to calculate the values necessary to retrieve the 7 digit number you are after. And the above explanation is the key to understanding the second part of your question. There are two problems with converting the text representation of a number to an actual value... numbers that are powers of 10 (Excel uses and E, for exponent I'm guessing, to indicate a power of 10; for example 1.23E4 is the same thing as 1234) and dates (which Excel works with as offsets for January 1, 1900; so the human readable date of 7/20/2009 is really 40014). So, as the array formula iterates down the text, it will would see text constructions like these 7 character long pieces of text... 1.23E45 and 07/20/09... as numbers when the double minus sign is applied to them, so I needed to filter these "false positives" out if they occurred before your actual 7 digit number. I handled this by looking for an embedded "e" (either upper or lower case), for the E-Notation problem, or a date separator symbol which is the slash in my system (but which can vary depending on regional settings). -- Rick (MVP - Excel) "Eric_NY" wrote in message ... Thanks. I'll need to take some time to decipher this. But first, can you explain: - What's the function of the double minus sign in the formula? - Why does the date separator come into play here at all? The only thing I need to do is to look for a 7-digit number in a text field where we know nothing at all about the rest of the contents. Why should the operation of a formula be dependent upon extraneous information, such as how a date is formatted? The rest of the text string needs to be considered as a fully arbitrary series of characters; in other words, the function needs to work entirely irrespective of the remainder of the field. Thanks for your help. "Rick Rothstein" wrote: I forgot to include the note regarding array-entered formulas. Here is my message again, but with the note... In case you want to consider it, here is a non-RegEx array-entered** formula that will do what you want... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))* ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW (1:30),7))),ROW(1:30))),7) **Commit this formula using Ctrl+Shift+Enter, not just Enter by itself Note though, that this formula is dependent on what your default date separator is. Mine is the slash character (/) and that is what I used in the FIND function call... if your default date separator is a different symbol, then just replace my slash with that character. -- Rick (MVP - Excel) "Rick Rothstein" wrote in message ... In case you want to consider it, here is a non-RegEx array-entered** formula that will do what you want... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))* ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW (1:30),7))),ROW(1:30))),7) Note though, that this formula is dependent on what your default date separator is. Mine is the slash character (/) and that is what I used in the FIND function call... if your default date separator is a different symbol, then just replace my slash with that character. -- Rick (MVP - Excel) "Eric_NY" wrote in message ... I just read your message from last Friday. The text is free format. Users can enter it in whatever format they want. The 7-digit number is somewhere within the text. I've glanced through it and in the samples I've seen, there's no consistency in what appears before or after the 7-digit number. I used the regex solution that Ron Rosenfeld suggested, and adjusted the regular expression by removing the "\b" before and after the "\d{7}". "Rick Rothstein" wrote: I would still be interested in the answer to my questions... "Is the number always precede by a space when the number is interior to the text? What about that "dot" after it... is there always a dot following it? If the dot might not always be there, is there always a space after the number when it is interior to the text?" -- Rick (MVP - Excel) "Eric_NY" wrote in message ... "However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. " I can't find that option. Where should I look for it? Which menu? "Ron Rosenfeld" wrote: On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY wrote: Ron - Thanks. I'm going to try the Longre morefunc add-in. If I use the add-in functions and then send the sheet to someone by email, will the formulas still work? Or does the recipient also have to install the add-in separately on his own machine? Thanks for your help. Ordinarily no. They would have to install it themselves. However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. If you do that, it will then be usable by the recipient with no particular effort on his part. One caution concerning the add-in -- it will not work on strings that are longer than 255 characters. This, apparently is an .xll limitation, and there is no good way around it within the add-in. By the way, if you should use the UDF approach, the UDF should be embedded within the workbook, so its use should be transparent to your user. But there are a lot of other useful functions in morefunc. --ron |
#43
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
Thanks for the clarification so the formula i posted above should be robust.
Did you any luck with it? "Eric_NY" wrote: The text is user-entered and free-format. It can contain anything. Except in case of input error, it always contains the 7-digit number, which is what I'm trying to extract. "Rick Rothstein" wrote: I saw that, but was wondering if that 7-digit number could ever be encased in text like this... abc1234567def... hence my question about a leading space (assuming the number were not at the beginning of the text) and a following "dot" (as shown in his example) or possibly a following space to offset it from other text. I've been playing around with an array-entered** formula concept like this... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"), ROW(1:30),7)),ROW(1:30))),7) **Commit this formula using Ctrl+Shift+Enter, not just Enter by itself I'm not really concerned about the leading space any more (the SUBSTITUTE function has taken care of that), but I am interested in what trails the 7-digit number. Of course there are still problems with the above formula; for example, things constructed like this...17Jul09... and like this... 1.23e45... will register as "7-digit numbers". Unfortunate, the first 7 digits of a number longer than 7 digits will also register as a "hit". I just figured it would be nice to know if there were *any* formatting type things to lock onto before attempting to patch the above formula any further. As a matter-of-fact, if the OP is reading this... I would also like to know if the text could contain 8-digit or longer numbers, date-looking text like 17Jul09 and floating point power of ten numbers like 1.23e45? -- Rick (MVP - Excel) "JoeU2004" wrote in message ... "Rick Rothstein" wrote: I would still be interested in the answer to my questions... In response to my questions, Eric wrote: "Unfortunately there's no precise format. I just need to look for a 7-digit number, which can be anywhere in the string. (I could restrict it to looking for the first 7-digit number, since I know that there's never more than one.)" ----- original message ----- "Rick Rothstein" wrote in message ... I would still be interested in the answer to my questions... "Is the number always precede by a space when the number is interior to the text? What about that "dot" after it... is there always a dot following it? If the dot might not always be there, is there always a space after the number when it is interior to the text?" -- Rick (MVP - Excel) "Eric_NY" wrote in message ... "However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. " I can't find that option. Where should I look for it? Which menu? "Ron Rosenfeld" wrote: On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY wrote: Ron - Thanks. I'm going to try the Longre morefunc add-in. If I use the add-in functions and then send the sheet to someone by email, will the formulas still work? Or does the recipient also have to install the add-in separately on his own machine? Thanks for your help. Ordinarily no. They would have to install it themselves. However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. If you do that, it will then be usable by the recipient with no particular effort on his part. One caution concerning the add-in -- it will not work on strings that are longer than 255 characters. This, apparently is an .xll limitation, and there is no good way around it within the add-in. By the way, if you should use the UDF approach, the UDF should be embedded within the workbook, so its use should be transparent to your user. But there are a lot of other useful functions in morefunc. --ron |
#44
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
Very clever!
I've seen that double minus before and never understood the point. Thanks for explaining. Where in your formula is the logic which checks for the number starting first in character position 1, then position 2, then 3, etc.? The only way I could think of was something unwieldy, such as: IF(ISNUMBER(MID(A1,1,7)),MID(A1,1,7),IF(ISNUMBER(M ID(A1,2,7)),MID(A1,2,7).... )))))))). Obviously that's not a great solution. Yours is obviously more compact and better - if I can figure out how it works! Thanks again for your help. "Rick Rothstein" wrote: First off, before answering your questions, here is a revised formula (the previous one could not detect an 8 or more digit number in front of the 7 digit number you wanted to find... =MID(A1,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(A1," ","x"),ROW(1:99),7))* ISERR(SEARCH("e",MID(A1,ROW(1:99),7)))*ISERR(FIND( "/",MID(A1,ROW (1:99),7)))*ISERROR(--MID(A1,ROW(1:99),8))*IF(ROW(1:99)1,ISERROR (--MID(A1,ROW(1:99),ROW(1:99)-1))),ROW(1:99))),7) I also changed the limit to text strings up to 99 characters long. If you could have longer text strings for this formula to process, then change *all* the 99's in my formula to a number that is equal to the largest total number of characters that your text could be. I also changed the formulas cell reference from the arbitrary F5 I used in my previous submission to the more standard A1 cell reference that is used when the actual cell reference is unknown. Okay, first off, the double minus sign is a means of converting the text representation of a number into an actual numeric value. Excel will does this conversion automatically whenever the text representation of a numeric value is used in a numerical calculation. Putting the double minus sign in front force Excel to attempt to multiply the text by minus one twice (--TextNumber is the same as doing this... -1*-1*TextNumber If TextNumber is the actual representation of a numerical value, then that number will be returned (because minus one times minus one is equivalent to plus 1); if it is not the actual representation of a numerical value, attempting to multiply it by the first minus sign (which is equivalent to minus 1) will generate an error. So my code uses the double minus sign to attempt to change the text at each point of the array's iteration to see if ISNUMBER and ISERROR is true or not in order to calculate the values necessary to retrieve the 7 digit number you are after. And the above explanation is the key to understanding the second part of your question. There are two problems with converting the text representation of a number to an actual value... numbers that are powers of 10 (Excel uses and E, for exponent I'm guessing, to indicate a power of 10; for example 1.23E4 is the same thing as 1234) and dates (which Excel works with as offsets for January 1, 1900; so the human readable date of 7/20/2009 is really 40014). So, as the array formula iterates down the text, it will would see text constructions like these 7 character long pieces of text... 1.23E45 and 07/20/09... as numbers when the double minus sign is applied to them, so I needed to filter these "false positives" out if they occurred before your actual 7 digit number. I handled this by looking for an embedded "e" (either upper or lower case), for the E-Notation problem, or a date separator symbol which is the slash in my system (but which can vary depending on regional settings). -- Rick (MVP - Excel) "Eric_NY" wrote in message ... Thanks. I'll need to take some time to decipher this. But first, can you explain: - What's the function of the double minus sign in the formula? - Why does the date separator come into play here at all? The only thing I need to do is to look for a 7-digit number in a text field where we know nothing at all about the rest of the contents. Why should the operation of a formula be dependent upon extraneous information, such as how a date is formatted? The rest of the text string needs to be considered as a fully arbitrary series of characters; in other words, the function needs to work entirely irrespective of the remainder of the field. Thanks for your help. "Rick Rothstein" wrote: I forgot to include the note regarding array-entered formulas. Here is my message again, but with the note... In case you want to consider it, here is a non-RegEx array-entered** formula that will do what you want... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))* ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW (1:30),7))),ROW(1:30))),7) **Commit this formula using Ctrl+Shift+Enter, not just Enter by itself Note though, that this formula is dependent on what your default date separator is. Mine is the slash character (/) and that is what I used in the FIND function call... if your default date separator is a different symbol, then just replace my slash with that character. -- Rick (MVP - Excel) "Rick Rothstein" wrote in message ... In case you want to consider it, here is a non-RegEx array-entered** formula that will do what you want... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))* ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW (1:30),7))),ROW(1:30))),7) Note though, that this formula is dependent on what your default date separator is. Mine is the slash character (/) and that is what I used in the FIND function call... if your default date separator is a different symbol, then just replace my slash with that character. -- Rick (MVP - Excel) "Eric_NY" wrote in message ... I just read your message from last Friday. The text is free format. Users can enter it in whatever format they want. The 7-digit number is somewhere within the text. I've glanced through it and in the samples I've seen, there's no consistency in what appears before or after the 7-digit number. I used the regex solution that Ron Rosenfeld suggested, and adjusted the regular expression by removing the "\b" before and after the "\d{7}". "Rick Rothstein" wrote: I would still be interested in the answer to my questions... "Is the number always precede by a space when the number is interior to the text? What about that "dot" after it... is there always a dot following it? If the dot might not always be there, is there always a space after the number when it is interior to the text?" -- Rick (MVP - Excel) "Eric_NY" wrote in message ... "However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. " I can't find that option. Where should I look for it? Which menu? "Ron Rosenfeld" wrote: On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY wrote: Ron - Thanks. I'm going to try the Longre morefunc add-in. If I use the add-in functions and then send the sheet to someone by email, will the formulas still work? Or does the recipient also have to install the add-in separately on his own machine? Thanks for your help. Ordinarily no. They would have to install it themselves. However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. If you do that, it will then be usable by the recipient with no particular effort on his part. One caution concerning the add-in -- it will not work on strings that are longer than 255 characters. This, apparently is an .xll limitation, and there is no good way around it within the add-in. By the way, if you should use the UDF approach, the UDF should be embedded within the workbook, so its use should be transparent to your user. But there are a lot of other useful functions in morefunc. --ron |
#45
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
On Mon, 20 Jul 2009 07:48:01 -0700, Eric_NY
wrote: I used the regex solution that Ron Rosenfeld suggested, and adjusted the regular expression by removing the "\b" before and after the "\d{7}". If you remove the "\b"'s, then the regex solution will return, for example, 7 digit portions of 8 digit numbers; or if you have an entry like NH123456789, it will return 1234567. Is that what you want? --ron |
#46
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
I see what you mean.
The problem I was getting was that the "\b" was excluding some cases that I found in my data (such as "SR1234567" and "1234567remedy"). there were other cases with a non-space immediately adjacent to the 7-digit sequence. So I just took out the "\b" part of the pattern. (I was also embarrassed when I presented the results and realized that my 7-digit numbers actually began at 987262 - i.e., a 6-digit number, so not all of them were in fact 7 digits. So my logic was wrong and I missed one that I should have found.) But for my current purposes this is good enough. I'm doing a one-time analysis of several thousand records, and don't need to develop a permanent, perfect solution. I revised the regular expression to be "good enough" considering the data that I saw in front of me. Many thanks for your help. "Ron Rosenfeld" wrote: On Mon, 20 Jul 2009 07:48:01 -0700, Eric_NY wrote: I used the regex solution that Ron Rosenfeld suggested, and adjusted the regular expression by removing the "\b" before and after the "\d{7}". If you remove the "\b"'s, then the regex solution will return, for example, 7 digit portions of 8 digit numbers; or if you have an entry like NH123456789, it will return 1234567. Is that what you want? --ron |
#47
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
I needed to make one more modification to my formula to correct an area
where it incorrectly missed the 7 digit number. I have tested this version and I now believe it to work correctly in all situation. It is still an array-entered** formula... =MID(A1,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(A1," ","x"),ROW(1:99),7))* ISERR(SEARCH("e",MID(A1,ROW(1:99),7)))*ISERR(FIND( "/",MID(A1, ROW(1:99),7)))*ISERROR(--MID(A1&"x",ROW(1:99),8))*ISERROR(--MID( "x"&A1,ROW(2:100)-1,1)),ROW(1:99))),7) **Commit this formula using Ctrl+Shift+Enter, not just Enter by itself -- Rick (MVP - Excel) "Eric_NY" wrote in message ... Very clever! I've seen that double minus before and never understood the point. Thanks for explaining. Where in your formula is the logic which checks for the number starting first in character position 1, then position 2, then 3, etc.? The only way I could think of was something unwieldy, such as: IF(ISNUMBER(MID(A1,1,7)),MID(A1,1,7),IF(ISNUMBER(M ID(A1,2,7)),MID(A1,2,7).... )))))))). Obviously that's not a great solution. Yours is obviously more compact and better - if I can figure out how it works! Thanks again for your help. "Rick Rothstein" wrote: First off, before answering your questions, here is a revised formula (the previous one could not detect an 8 or more digit number in front of the 7 digit number you wanted to find... =MID(A1,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(A1," ","x"),ROW(1:99),7))* ISERR(SEARCH("e",MID(A1,ROW(1:99),7)))*ISERR(FIND( "/",MID(A1,ROW (1:99),7)))*ISERROR(--MID(A1,ROW(1:99),8))*IF(ROW(1:99)1,ISERROR (--MID(A1,ROW(1:99),ROW(1:99)-1))),ROW(1:99))),7) I also changed the limit to text strings up to 99 characters long. If you could have longer text strings for this formula to process, then change *all* the 99's in my formula to a number that is equal to the largest total number of characters that your text could be. I also changed the formulas cell reference from the arbitrary F5 I used in my previous submission to the more standard A1 cell reference that is used when the actual cell reference is unknown. Okay, first off, the double minus sign is a means of converting the text representation of a number into an actual numeric value. Excel will does this conversion automatically whenever the text representation of a numeric value is used in a numerical calculation. Putting the double minus sign in front force Excel to attempt to multiply the text by minus one twice (--TextNumber is the same as doing this... -1*-1*TextNumber If TextNumber is the actual representation of a numerical value, then that number will be returned (because minus one times minus one is equivalent to plus 1); if it is not the actual representation of a numerical value, attempting to multiply it by the first minus sign (which is equivalent to minus 1) will generate an error. So my code uses the double minus sign to attempt to change the text at each point of the array's iteration to see if ISNUMBER and ISERROR is true or not in order to calculate the values necessary to retrieve the 7 digit number you are after. And the above explanation is the key to understanding the second part of your question. There are two problems with converting the text representation of a number to an actual value... numbers that are powers of 10 (Excel uses and E, for exponent I'm guessing, to indicate a power of 10; for example 1.23E4 is the same thing as 1234) and dates (which Excel works with as offsets for January 1, 1900; so the human readable date of 7/20/2009 is really 40014). So, as the array formula iterates down the text, it will would see text constructions like these 7 character long pieces of text... 1.23E45 and 07/20/09... as numbers when the double minus sign is applied to them, so I needed to filter these "false positives" out if they occurred before your actual 7 digit number. I handled this by looking for an embedded "e" (either upper or lower case), for the E-Notation problem, or a date separator symbol which is the slash in my system (but which can vary depending on regional settings). -- Rick (MVP - Excel) "Eric_NY" wrote in message ... Thanks. I'll need to take some time to decipher this. But first, can you explain: - What's the function of the double minus sign in the formula? - Why does the date separator come into play here at all? The only thing I need to do is to look for a 7-digit number in a text field where we know nothing at all about the rest of the contents. Why should the operation of a formula be dependent upon extraneous information, such as how a date is formatted? The rest of the text string needs to be considered as a fully arbitrary series of characters; in other words, the function needs to work entirely irrespective of the remainder of the field. Thanks for your help. "Rick Rothstein" wrote: I forgot to include the note regarding array-entered formulas. Here is my message again, but with the note... In case you want to consider it, here is a non-RegEx array-entered** formula that will do what you want... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))* ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW (1:30),7))),ROW(1:30))),7) **Commit this formula using Ctrl+Shift+Enter, not just Enter by itself Note though, that this formula is dependent on what your default date separator is. Mine is the slash character (/) and that is what I used in the FIND function call... if your default date separator is a different symbol, then just replace my slash with that character. -- Rick (MVP - Excel) "Rick Rothstein" wrote in message ... In case you want to consider it, here is a non-RegEx array-entered** formula that will do what you want... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))* ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW (1:30),7))),ROW(1:30))),7) Note though, that this formula is dependent on what your default date separator is. Mine is the slash character (/) and that is what I used in the FIND function call... if your default date separator is a different symbol, then just replace my slash with that character. -- Rick (MVP - Excel) "Eric_NY" wrote in message ... I just read your message from last Friday. The text is free format. Users can enter it in whatever format they want. The 7-digit number is somewhere within the text. I've glanced through it and in the samples I've seen, there's no consistency in what appears before or after the 7-digit number. I used the regex solution that Ron Rosenfeld suggested, and adjusted the regular expression by removing the "\b" before and after the "\d{7}". "Rick Rothstein" wrote: I would still be interested in the answer to my questions... "Is the number always precede by a space when the number is interior to the text? What about that "dot" after it... is there always a dot following it? If the dot might not always be there, is there always a space after the number when it is interior to the text?" -- Rick (MVP - Excel) "Eric_NY" wrote in message ... "However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. " I can't find that option. Where should I look for it? Which menu? "Ron Rosenfeld" wrote: On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY wrote: Ron - Thanks. I'm going to try the Longre morefunc add-in. If I use the add-in functions and then send the sheet to someone by email, will the formulas still work? Or does the recipient also have to install the add-in separately on his own machine? Thanks for your help. Ordinarily no. They would have to install it themselves. However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. If you do that, it will then be usable by the recipient with no particular effort on his part. One caution concerning the add-in -- it will not work on strings that are longer than 255 characters. This, apparently is an .xll limitation, and there is no good way around it within the add-in. By the way, if you should use the UDF approach, the UDF should be embedded within the workbook, so its use should be transparent to your user. But there are a lot of other useful functions in morefunc. --ron |
#48
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
On Mon, 20 Jul 2009 13:16:01 -0700, Eric_NY
wrote: I see what you mean. The problem I was getting was that the "\b" was excluding some cases that I found in my data (such as "SR1234567" and "1234567remedy"). there were other cases with a non-space immediately adjacent to the 7-digit sequence. So I just took out the "\b" part of the pattern. Aha. I see the problem. It's possible (but not necessary in view of what you wrote below) to account for that. For example, one could look for the 7 digits to be bounded by either a non-digit or the beginning or end of the line. (I was also embarrassed when I presented the results and realized that my 7-digit numbers actually began at 987262 - i.e., a 6-digit number, so not all of them were in fact 7 digits. So my logic was wrong and I missed one that I should have found.) Again, if you required a permanent solution, that could be adjusted for. But for my current purposes this is good enough. They do say that "perfect is the enemy of good enough" :-) I'm doing a one-time analysis of several thousand records, and don't need to develop a permanent, perfect solution. I revised the regular expression to be "good enough" considering the data that I saw in front of me. Many thanks for your help. Glad to. I learn also. Thanks for the feedback. --ron |
Thread Tools | |
Display Modes | |
|
|