If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#31
|
|||
|
|||
Need formula to extract a numeric value from a free-format text
Ignore this message... it was mis-sent.
-- Rick (MVP - Excel) "Rick Rothstein" wrote in message ... I am still not sure if the OP wanted the 7-digit number to stand alone (as a "word") or not, so I just went for the first isolated 7 digits in a row whether imbedded in other text or not. The reason I thought that is because of the "dot" that followed the 7-digit number in the OP's posted example text. In thinking about it, I'm guessing you took that to be a period at the end of a sentence. -- Rick (MVP - Excel) "Rick Rothstein" wrote in message ... I know you asked for a formula, but would a UDF (user defined function) be acceptable (it would require allowing macros to run)? If so, press Alt+F11 to go into the VB editor and, once there, add a Module (Insert/Module from its menu bar). Next copy/paste the following into the code window that opened up... Function First7DigitNumber(S As String) As String Dim X As Long For X = 1 To Len(S) If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then First7DigitNumber = Mid(S, X, 7) End If Next End Function Now, go back to your worksheet and use this formula in whatever cell you want (changing the A1 reference to the cell address containing your text)... =First7DigitNumber(A1) This UDF finds the first "isolated" 7 digit number (that is, a 7 digit number at the beginning or end of the text or, if interior to the text, with non-digit characters in front and behind it). -- Rick (MVP - Excel) "Eric_NY" wrote in message ... I have a column containing text values like "User requested authority of emergency ID for reason NHUSER23 Restore of object to library LEVEL2 under remedy 1074317. 06/04/09 17:46 QPGMR". Some of the cells contain a 7-digit number and others don't. The 7-digit number does not start at a fixed location in the text. Is it possible to write a formula which looks for the 7-digit number, and returns the number if present, and another value (such as 0 or "*MISSING*") if not? I'm using Excel 2003. Thanks. |
#32
|
|||
|
|||
Need formula to extract a numeric value from a free-format text
I found this definition of a word boundary in Regular Expressions...
"A word boundary represents the spot where a letter or number meets a space, apostrophe, a period, or anything else that isn't a letter or number" Given that, this modification of my function should do what your RegExp solution does... Function First7DigitNumber(S As String) As String Dim X As Long For X = 1 To Len(S) If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9]#######[!a-zA-Z0-9]" Then First7DigitNumber = Mid(S, X, 7) Exit Function End If Next First7DigitNumber = "*MISSING*" End Function -- Rick (MVP - Excel) "Ron Rosenfeld" wrote in message ... On Sat, 18 Jul 2009 10:54:43 -0400, "Rick Rothstein" wrote: Function First7DigitNumber(S As String) As String Dim X As Long For X = 1 To Len(S) If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then First7DigitNumber = Mid(S, X, 7) End If Next End Function I note that given the following modification of the OP's test string: "User requested authority of emergency ID for reason NHUSER1234567 Restore of object to library LEVEL2 under remedy 61074317. 06/04/09 17:46 QPGMR" your routine returns 1234567 whereas my UDF returns "MISSING" since there are no seven digit words. (Lori's formulas return 1074317) --ron |
#33
|
|||
|
|||
Need formula to extract a numeric value from a free-format text
On Sat, 18 Jul 2009 12:33:47 -0400, "Rick Rothstein"
wrote: The reason I thought that is because of the "dot" that followed the 7-digit number in the OP's posted example text. In thinking about it, I'm guessing you took that to be a period at the end of a sentence. Yes, I did. I found this definition of a word boundary in Regular Expressions... "A word boundary represents the spot where a letter or number meets a space, apostrophe, a period, or anything else that isn't a letter or number" Not quite what I understand it to be (but close)./ The definitions I've seen indicate that a word boundary "Matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start and/or end of the string if the first and/or last characters in the string are word characters." And, at least in VBScript, a word character is a digit, letter or underscore e.g: [A-Za-z0-9_] Given that, this modification of my function should do what your RegExp solution does... Function First7DigitNumber(S As String) As String Dim X As Long For X = 1 To Len(S) If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9]#######[!a-zA-Z0-9]" Then First7DigitNumber = Mid(S, X, 7) Exit Function End If Next First7DigitNumber = "*MISSING*" End Function It comes pretty close. Just change this line to include the underscores: If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]" Then --ron |
#34
|
|||
|
|||
Need formula to extract a numeric value from a free-format text
I had the damnedest time trying to find what RegExp considered a word
boundary (basically, each sight just kept say use \w without listing what it consider the boundary). When I finally found the one I cited, I figured it was a universal definition. Now I'm guessing there might be version differences between the various RegExp engines. Yes, I fix to account for the underbar is as you have shown it. For the archives, here is the UDF with the change you indicated... Function First7DigitNumber(S As String) As String Dim X As Long For X = 1 To Len(S) If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]" Then First7DigitNumber = Mid(S, X, 7) Exit Function End If Next First7DigitNumber = "*MISSING*" End Function -- Rick (MVP - Excel) "Ron Rosenfeld" wrote in message ... On Sat, 18 Jul 2009 12:33:47 -0400, "Rick Rothstein" wrote: The reason I thought that is because of the "dot" that followed the 7-digit number in the OP's posted example text. In thinking about it, I'm guessing you took that to be a period at the end of a sentence. Yes, I did. I found this definition of a word boundary in Regular Expressions... "A word boundary represents the spot where a letter or number meets a space, apostrophe, a period, or anything else that isn't a letter or number" Not quite what I understand it to be (but close)./ The definitions I've seen indicate that a word boundary "Matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start and/or end of the string if the first and/or last characters in the string are word characters." And, at least in VBScript, a word character is a digit, letter or underscore e.g: [A-Za-z0-9_] Given that, this modification of my function should do what your RegExp solution does... Function First7DigitNumber(S As String) As String Dim X As Long For X = 1 To Len(S) If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9]#######[!a-zA-Z0-9]" Then First7DigitNumber = Mid(S, X, 7) Exit Function End If Next First7DigitNumber = "*MISSING*" End Function It comes pretty close. Just change this line to include the underscores: If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]" Then --ron |
#35
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
Good points although I'm pretty sure this is academic from the parenthesised
reply to JoeU above. From the context given, if it's known there's no more than one seven digit number it seems unlikely there are any longer numbers either. Given the high risk of typos in freeform text fields it's possible a udf may be more reliable but there are just too many unknowns. A thorough check will need to take place whatever the method. "Ron Rosenfeld" wrote: On Sat, 18 Jul 2009 04:40:01 -0700, Lori wrote: I'm not sure what you mean by "numbers that are part of other strings". I thought that's what was wanted but i'm probably missing something. Sorry. That statement was not clear. What I meant is that your routine seems to return seven digits from substrings that are longer. NH1234567890 -- 4567890 45678901234 -- 8901234 I interpreted the OP's requirements to indicate that he wanted to extract the first seven digit WORD; whereas your routines extract the first (or last) seven consecutive digits, even if they are part of another word. In the regex I presented, WORD is defined as a seven digit string of digits surrounded by a non-word character. A non-word character is anything except a digit, letter or underscore. So, mine is somewhat flawed in that it would return seven digits if they were prepended by, let us say, an asterisk or ampersand; but it would return **MISSING** in the above instances. But given &1234567*, mine would also return the seven digits. This could be taken care of, if necessary, by defining word a bit differently. --ron |
#36
|
|||
|
|||
Need formula to extract a numeric value from a free-format text
On Sat, 18 Jul 2009 19:26:09 -0400, "Rick Rothstein"
wrote: I had the damnedest time trying to find what RegExp considered a word boundary (basically, each sight just kept say use \w without listing what it consider the boundary). When I finally found the one I cited, I figured it was a universal definition. Now I'm guessing there might be version differences between the various RegExp engines. Yes, I fix to account for the underbar is as you have shown it. For the archives, here is the UDF with the change you indicated... Function First7DigitNumber(S As String) As String Dim X As Long For X = 1 To Len(S) If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]" Then First7DigitNumber = Mid(S, X, 7) Exit Function End If Next First7DigitNumber = "*MISSING*" End Function And, to keep them together, here is the Regex version: =================== Option Explicit Function Seven(s As String) As String Dim re As Object, mc As Object Set re = CreateObject("vbscript.regexp") re.Pattern = "\b\d{7}\b" If re.test(s) = True Then Set mc = re.Execute(s) Seven = mc(0) Else Seven = "*MISSING*" End If End Function ==================== Here are some definitions with regard to word boundaries and some of the differences: There are three different positions that qualify as word boundaries: 1. Before the first character in the string, if the first character is a word character. 2. After the last character in the string, if the last character is a word character. 3. Between two characters in the string, where one is a word character and the other is not a word character. In all flavors, the characters [a-zA-Z0-9_] are word characters. And that is the case for VBScript. However, some of the other flavors will recognize characters from other languages, and/or unicode characters, as word characters. And I believe there is one flavor (?Python) where you can even set flags to change the definition of a word character. --ron |
#37
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
I just read your message from last Friday.
The text is free format. Users can enter it in whatever format they want. The 7-digit number is somewhere within the text. I've glanced through it and in the samples I've seen, there's no consistency in what appears before or after the 7-digit number. I used the regex solution that Ron Rosenfeld suggested, and adjusted the regular expression by removing the "\b" before and after the "\d{7}". "Rick Rothstein" wrote: I would still be interested in the answer to my questions... "Is the number always precede by a space when the number is interior to the text? What about that "dot" after it... is there always a dot following it? If the dot might not always be there, is there always a space after the number when it is interior to the text?" -- Rick (MVP - Excel) "Eric_NY" wrote in message ... "However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. " I can't find that option. Where should I look for it? Which menu? "Ron Rosenfeld" wrote: On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY wrote: Ron - Thanks. I'm going to try the Longre morefunc add-in. If I use the add-in functions and then send the sheet to someone by email, will the formulas still work? Or does the recipient also have to install the add-in separately on his own machine? Thanks for your help. Ordinarily no. They would have to install it themselves. However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. If you do that, it will then be usable by the recipient with no particular effort on his part. One caution concerning the add-in -- it will not work on strings that are longer than 255 characters. This, apparently is an .xll limitation, and there is no good way around it within the add-in. By the way, if you should use the UDF approach, the UDF should be embedded within the workbook, so its use should be transparent to your user. But there are a lot of other useful functions in morefunc. --ron |
#38
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
The text is user-entered and free-format. It can contain anything. Except in
case of input error, it always contains the 7-digit number, which is what I'm trying to extract. "Rick Rothstein" wrote: I saw that, but was wondering if that 7-digit number could ever be encased in text like this... abc1234567def... hence my question about a leading space (assuming the number were not at the beginning of the text) and a following "dot" (as shown in his example) or possibly a following space to offset it from other text. I've been playing around with an array-entered** formula concept like this... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"), ROW(1:30),7)),ROW(1:30))),7) **Commit this formula using Ctrl+Shift+Enter, not just Enter by itself I'm not really concerned about the leading space any more (the SUBSTITUTE function has taken care of that), but I am interested in what trails the 7-digit number. Of course there are still problems with the above formula; for example, things constructed like this...17Jul09... and like this... 1.23e45... will register as "7-digit numbers". Unfortunate, the first 7 digits of a number longer than 7 digits will also register as a "hit". I just figured it would be nice to know if there were *any* formatting type things to lock onto before attempting to patch the above formula any further. As a matter-of-fact, if the OP is reading this... I would also like to know if the text could contain 8-digit or longer numbers, date-looking text like 17Jul09 and floating point power of ten numbers like 1.23e45? -- Rick (MVP - Excel) "JoeU2004" wrote in message ... "Rick Rothstein" wrote: I would still be interested in the answer to my questions... In response to my questions, Eric wrote: "Unfortunately there's no precise format. I just need to look for a 7-digit number, which can be anywhere in the string. (I could restrict it to looking for the first 7-digit number, since I know that there's never more than one.)" ----- original message ----- "Rick Rothstein" wrote in message ... I would still be interested in the answer to my questions... "Is the number always precede by a space when the number is interior to the text? What about that "dot" after it... is there always a dot following it? If the dot might not always be there, is there always a space after the number when it is interior to the text?" -- Rick (MVP - Excel) "Eric_NY" wrote in message ... "However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. " I can't find that option. Where should I look for it? Which menu? "Ron Rosenfeld" wrote: On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY wrote: Ron - Thanks. I'm going to try the Longre morefunc add-in. If I use the add-in functions and then send the sheet to someone by email, will the formulas still work? Or does the recipient also have to install the add-in separately on his own machine? Thanks for your help. Ordinarily no. They would have to install it themselves. However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. If you do that, it will then be usable by the recipient with no particular effort on his part. One caution concerning the add-in -- it will not work on strings that are longer than 255 characters. This, apparently is an .xll limitation, and there is no good way around it within the add-in. By the way, if you should use the UDF approach, the UDF should be embedded within the workbook, so its use should be transparent to your user. But there are a lot of other useful functions in morefunc. --ron |
#39
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
In case you want to consider it, here is a non-RegEx array-entered** formula
that will do what you want... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))* ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW (1:30),7))),ROW(1:30))),7) Note though, that this formula is dependent on what your default date separator is. Mine is the slash character (/) and that is what I used in the FIND function call... if your default date separator is a different symbol, then just replace my slash with that character. -- Rick (MVP - Excel) "Eric_NY" wrote in message ... I just read your message from last Friday. The text is free format. Users can enter it in whatever format they want. The 7-digit number is somewhere within the text. I've glanced through it and in the samples I've seen, there's no consistency in what appears before or after the 7-digit number. I used the regex solution that Ron Rosenfeld suggested, and adjusted the regular expression by removing the "\b" before and after the "\d{7}". "Rick Rothstein" wrote: I would still be interested in the answer to my questions... "Is the number always precede by a space when the number is interior to the text? What about that "dot" after it... is there always a dot following it? If the dot might not always be there, is there always a space after the number when it is interior to the text?" -- Rick (MVP - Excel) "Eric_NY" wrote in message ... "However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. " I can't find that option. Where should I look for it? Which menu? "Ron Rosenfeld" wrote: On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY wrote: Ron - Thanks. I'm going to try the Longre morefunc add-in. If I use the add-in functions and then send the sheet to someone by email, will the formulas still work? Or does the recipient also have to install the add-in separately on his own machine? Thanks for your help. Ordinarily no. They would have to install it themselves. However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. If you do that, it will then be usable by the recipient with no particular effort on his part. One caution concerning the add-in -- it will not work on strings that are longer than 255 characters. This, apparently is an .xll limitation, and there is no good way around it within the add-in. By the way, if you should use the UDF approach, the UDF should be embedded within the workbook, so its use should be transparent to your user. But there are a lot of other useful functions in morefunc. --ron |
#40
|
|||
|
|||
Need formula to extract a numeric value from a free-format tex
I forgot to include the note regarding array-entered formulas. Here is my
message again, but with the note... In case you want to consider it, here is a non-RegEx array-entered** formula that will do what you want... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))* ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW (1:30),7))),ROW(1:30))),7) **Commit this formula using Ctrl+Shift+Enter, not just Enter by itself Note though, that this formula is dependent on what your default date separator is. Mine is the slash character (/) and that is what I used in the FIND function call... if your default date separator is a different symbol, then just replace my slash with that character. -- Rick (MVP - Excel) "Rick Rothstein" wrote in message ... In case you want to consider it, here is a non-RegEx array-entered** formula that will do what you want... =MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))* ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW (1:30),7))),ROW(1:30))),7) Note though, that this formula is dependent on what your default date separator is. Mine is the slash character (/) and that is what I used in the FIND function call... if your default date separator is a different symbol, then just replace my slash with that character. -- Rick (MVP - Excel) "Eric_NY" wrote in message ... I just read your message from last Friday. The text is free format. Users can enter it in whatever format they want. The 7-digit number is somewhere within the text. I've glanced through it and in the samples I've seen, there's no consistency in what appears before or after the 7-digit number. I used the regex solution that Ron Rosenfeld suggested, and adjusted the regular expression by removing the "\b" before and after the "\d{7}". "Rick Rothstein" wrote: I would still be interested in the answer to my questions... "Is the number always precede by a space when the number is interior to the text? What about that "dot" after it... is there always a dot following it? If the dot might not always be there, is there always a space after the number when it is interior to the text?" -- Rick (MVP - Excel) "Eric_NY" wrote in message ... "However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. " I can't find that option. Where should I look for it? Which menu? "Ron Rosenfeld" wrote: On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY wrote: Ron - Thanks. I'm going to try the Longre morefunc add-in. If I use the add-in functions and then send the sheet to someone by email, will the formulas still work? Or does the recipient also have to install the add-in separately on his own machine? Thanks for your help. Ordinarily no. They would have to install it themselves. However, with the morefunc add-in there is an option, I believe it is a menu option, to install the add-in as part of the workbook. If you do that, it will then be usable by the recipient with no particular effort on his part. One caution concerning the add-in -- it will not work on strings that are longer than 255 characters. This, apparently is an .xll limitation, and there is no good way around it within the add-in. By the way, if you should use the UDF approach, the UDF should be embedded within the workbook, so its use should be transparent to your user. But there are a lot of other useful functions in morefunc. --ron |
Thread Tools | |
Display Modes | |
|
|