A Microsoft Office (Excel, Word) forum. OfficeFrustration

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » OfficeFrustration forum » Microsoft Excel » General Discussion
Site Map Home Register Authors List Search Today's Posts Mark Forums Read  

Need formula to extract a numeric value from a free-format text



 
 
Thread Tools Display Modes
  #31  
Old July 18th, 2009, 05:11 PM posted to microsoft.public.excel.misc
Rick Rothstein[_2_]
external usenet poster
 
Posts: 2,013
Default Need formula to extract a numeric value from a free-format text

Ignore this message... it was mis-sent.

--
Rick (MVP - Excel)


"Rick Rothstein" wrote in message
...
I am still not sure if the OP wanted the 7-digit number to stand alone (as
a "word") or not, so I just went for the first isolated 7 digits in a row
whether imbedded in other text or not. The reason I thought that is because
of the "dot" that followed the 7-digit number in the OP's posted example
text. In thinking about it, I'm guessing you took that to be a period at
the end of a sentence.

--
Rick (MVP - Excel)


"Rick Rothstein" wrote in message
...
I know you asked for a formula, but would a UDF (user defined function) be
acceptable (it would require allowing macros to run)? If so, press Alt+F11
to go into the VB editor and, once there, add a Module (Insert/Module from
its menu bar). Next copy/paste the following into the code window that
opened up...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
End If
Next
End Function

Now, go back to your worksheet and use this formula in whatever cell you
want (changing the A1 reference to the cell address containing your
text)...

=First7DigitNumber(A1)

This UDF finds the first "isolated" 7 digit number (that is, a 7 digit
number at the beginning or end of the text or, if interior to the text,
with non-digit characters in front and behind it).

--
Rick (MVP - Excel)


"Eric_NY" wrote in message
...
I have a column containing text values like "User requested authority of
emergency ID for reason NHUSER23 Restore of object to library LEVEL2
under remedy 1074317. 06/04/09 17:46 QPGMR".

Some of the cells contain a 7-digit number and others don't. The 7-digit
number does not start at a fixed location in the text.

Is it possible to write a formula which looks for the 7-digit number,
and
returns the number if present, and another value (such as 0 or
"*MISSING*")
if not?

I'm using Excel 2003.

Thanks.




  #32  
Old July 18th, 2009, 05:33 PM posted to microsoft.public.excel.misc
Rick Rothstein[_2_]
external usenet poster
 
Posts: 2,013
Default Need formula to extract a numeric value from a free-format text

I found this definition of a word boundary in Regular Expressions...

"A word boundary represents the spot where a letter or
number meets a space, apostrophe, a period, or anything
else that isn't a letter or number"

Given that, this modification of my function should do what your RegExp
solution does...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9]#######[!a-zA-Z0-9]" Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function

--
Rick (MVP - Excel)


"Ron Rosenfeld" wrote in message
...
On Sat, 18 Jul 2009 10:54:43 -0400, "Rick Rothstein"
wrote:

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!0-9]#######[!0-9]" Then
First7DigitNumber = Mid(S, X, 7)
End If
Next
End Function


I note that given the following modification of the OP's test string:

"User requested authority of emergency ID for reason NHUSER1234567
Restore
of object to library LEVEL2 under remedy 61074317. 06/04/09 17:46 QPGMR"

your routine returns 1234567 whereas my UDF returns "MISSING" since there
are
no seven digit words.

(Lori's formulas return 1074317)

--ron


  #33  
Old July 18th, 2009, 10:57 PM posted to microsoft.public.excel.misc
Ron Rosenfeld
external usenet poster
 
Posts: 3,719
Default Need formula to extract a numeric value from a free-format text

On Sat, 18 Jul 2009 12:33:47 -0400, "Rick Rothstein"
wrote:

The reason I thought that is because
of the "dot" that followed the 7-digit number in the OP's posted example
text. In thinking about it, I'm guessing you took that to be a period at the
end of a sentence.


Yes, I did.


I found this definition of a word boundary in Regular Expressions...

"A word boundary represents the spot where a letter or
number meets a space, apostrophe, a period, or anything
else that isn't a letter or number"


Not quite what I understand it to be (but close)./

The definitions I've seen indicate that a word boundary "Matches at the
position between a word character (anything matched by \w) and a non-word
character (anything matched by [^\w] or \W) as well as at the start and/or end
of the string if the first and/or last characters in the string are word
characters."

And, at least in VBScript, a word character is a digit, letter or underscore
e.g: [A-Za-z0-9_]



Given that, this modification of my function should do what your RegExp
solution does...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9]#######[!a-zA-Z0-9]" Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function


It comes pretty close. Just change this line to include the underscores:

If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]" Then

--ron
  #34  
Old July 19th, 2009, 12:26 AM posted to microsoft.public.excel.misc
Rick Rothstein[_2_]
external usenet poster
 
Posts: 2,013
Default Need formula to extract a numeric value from a free-format text

I had the damnedest time trying to find what RegExp considered a word
boundary (basically, each sight just kept say use \w without listing what it
consider the boundary). When I finally found the one I cited, I figured it
was a universal definition. Now I'm guessing there might be version
differences between the various RegExp engines. Yes, I fix to account for
the underbar is as you have shown it. For the archives, here is the UDF with
the change you indicated...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]"
Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function

--
Rick (MVP - Excel)


"Ron Rosenfeld" wrote in message
...
On Sat, 18 Jul 2009 12:33:47 -0400, "Rick Rothstein"
wrote:

The reason I thought that is because
of the "dot" that followed the 7-digit number in the OP's posted example
text. In thinking about it, I'm guessing you took that to be a period at
the
end of a sentence.


Yes, I did.


I found this definition of a word boundary in Regular Expressions...

"A word boundary represents the spot where a letter or
number meets a space, apostrophe, a period, or anything
else that isn't a letter or number"


Not quite what I understand it to be (but close)./

The definitions I've seen indicate that a word boundary "Matches at the
position between a word character (anything matched by \w) and a non-word
character (anything matched by [^\w] or \W) as well as at the start and/or
end
of the string if the first and/or last characters in the string are word
characters."

And, at least in VBScript, a word character is a digit, letter or
underscore
e.g: [A-Za-z0-9_]



Given that, this modification of my function should do what your RegExp
solution does...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9]#######[!a-zA-Z0-9]"
Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function


It comes pretty close. Just change this line to include the underscores:

If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]" Then

--ron


  #35  
Old July 19th, 2009, 12:32 AM posted to microsoft.public.excel.misc
Lori
external usenet poster
 
Posts: 673
Default Need formula to extract a numeric value from a free-format tex

Good points although I'm pretty sure this is academic from the parenthesised
reply to JoeU above. From the context given, if it's known there's no more
than one seven digit number it seems unlikely there are any longer numbers
either.

Given the high risk of typos in freeform text fields it's possible a udf may
be more reliable but there are just too many unknowns. A thorough check will
need to take place whatever the method.


"Ron Rosenfeld" wrote:

On Sat, 18 Jul 2009 04:40:01 -0700, Lori
wrote:

I'm not sure what you mean by "numbers that are part of other strings".
I thought that's what was wanted but i'm probably missing something.


Sorry. That statement was not clear.

What I meant is that your routine seems to return seven digits from substrings
that are longer.

NH1234567890 -- 4567890
45678901234 -- 8901234

I interpreted the OP's requirements to indicate that he wanted to extract the
first seven digit WORD; whereas your routines extract the first (or last) seven
consecutive digits, even if they are part of another word.

In the regex I presented, WORD is defined as a seven digit string of digits
surrounded by a non-word character. A non-word character is anything except a
digit, letter or underscore.

So, mine is somewhat flawed in that it would return seven digits if they were
prepended by, let us say, an asterisk or ampersand; but it would return
**MISSING** in the above instances.

But given &1234567*, mine would also return the seven digits. This could be
taken care of, if necessary, by defining word a bit differently.
--ron

  #36  
Old July 19th, 2009, 01:26 AM posted to microsoft.public.excel.misc
Ron Rosenfeld
external usenet poster
 
Posts: 3,719
Default Need formula to extract a numeric value from a free-format text

On Sat, 18 Jul 2009 19:26:09 -0400, "Rick Rothstein"
wrote:

I had the damnedest time trying to find what RegExp considered a word
boundary (basically, each sight just kept say use \w without listing what it
consider the boundary). When I finally found the one I cited, I figured it
was a universal definition. Now I'm guessing there might be version
differences between the various RegExp engines. Yes, I fix to account for
the underbar is as you have shown it. For the archives, here is the UDF with
the change you indicated...

Function First7DigitNumber(S As String) As String
Dim X As Long
For X = 1 To Len(S)
If Mid(" " & S & " ", X, 9) Like "[!a-zA-Z0-9_]#######[!a-zA-Z0-9_]"
Then
First7DigitNumber = Mid(S, X, 7)
Exit Function
End If
Next
First7DigitNumber = "*MISSING*"
End Function


And, to keep them together, here is the Regex version:

===================
Option Explicit
Function Seven(s As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = "\b\d{7}\b"
If re.test(s) = True Then
Set mc = re.Execute(s)
Seven = mc(0)
Else
Seven = "*MISSING*"
End If
End Function
====================

Here are some definitions with regard to word boundaries and some of the
differences:

There are three different positions that qualify as word boundaries:

1. Before the first character in the string, if the first character is
a word character.
2. After the last character in the string, if the last character is a
word character.
3. Between two characters in the string, where one is a word character
and the other is not a word character.

In all flavors, the characters [a-zA-Z0-9_] are word characters. And that is
the case for VBScript. However, some of the other flavors will recognize
characters from other languages, and/or unicode characters, as word characters.
And I believe there is one flavor (?Python) where you can even set flags to
change the definition of a word character.
--ron
  #37  
Old July 20th, 2009, 03:48 PM posted to microsoft.public.excel.misc
Eric_NY
external usenet poster
 
Posts: 88
Default Need formula to extract a numeric value from a free-format tex

I just read your message from last Friday.

The text is free format. Users can enter it in whatever format they want.
The 7-digit number is somewhere within the text. I've glanced through it and
in the samples I've seen, there's no consistency in what appears before or
after the 7-digit number.

I used the regex solution that Ron Rosenfeld suggested, and adjusted the
regular expression by removing the "\b" before and after the "\d{7}".

"Rick Rothstein" wrote:

I would still be interested in the answer to my questions...

"Is the number always precede by a space when the number
is interior to the text? What about that "dot" after it... is there
always a dot following it? If the dot might not always be
there, is there always a space after the number when it is
interior to the text?"

--
Rick (MVP - Excel)


"Eric_NY" wrote in message
...
"However, with the morefunc add-in there is an option, I believe it is a
menu
option, to install the add-in as part of the workbook. "

I can't find that option. Where should I look for it? Which menu?

"Ron Rosenfeld" wrote:

On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY

wrote:

Ron - Thanks. I'm going to try the Longre morefunc add-in.

If I use the add-in functions and then send the sheet to someone by
email,
will the formulas still work? Or does the recipient also have to install
the
add-in separately on his own machine?

Thanks for your help.

Ordinarily no. They would have to install it themselves. However, with
the
morefunc add-in there is an option, I believe it is a menu option, to
install
the add-in as part of the workbook. If you do that, it will then be
usable by
the recipient with no particular effort on his part.

One caution concerning the add-in -- it will not work on strings that are
longer than 255 characters. This, apparently is an .xll limitation, and
there
is no good way around it within the add-in.

By the way, if you should use the UDF approach, the UDF should be
embedded
within the workbook, so its use should be transparent to your user. But
there
are a lot of other useful functions in morefunc.
--ron



  #38  
Old July 20th, 2009, 03:51 PM posted to microsoft.public.excel.misc
Eric_NY
external usenet poster
 
Posts: 88
Default Need formula to extract a numeric value from a free-format tex

The text is user-entered and free-format. It can contain anything. Except in
case of input error, it always contains the 7-digit number, which is what I'm
trying to extract.

"Rick Rothstein" wrote:

I saw that, but was wondering if that 7-digit number could ever be encased
in text like this... abc1234567def... hence my question about a leading
space (assuming the number were not at the beginning of the text) and a
following "dot" (as shown in his example) or possibly a following space to
offset it from other text. I've been playing around with an array-entered**
formula concept like this...

=MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),
ROW(1:30),7)),ROW(1:30))),7)

**Commit this formula using Ctrl+Shift+Enter, not just Enter by itself

I'm not really concerned about the leading space any more (the SUBSTITUTE
function has taken care of that), but I am interested in what trails the
7-digit number. Of course there are still problems with the above formula;
for example, things constructed like this...17Jul09... and like this...
1.23e45... will register as "7-digit numbers". Unfortunate, the first 7
digits of a number longer than 7 digits will also register as a "hit". I
just figured it would be nice to know if there were *any* formatting type
things to lock onto before attempting to patch the above formula any
further. As a matter-of-fact, if the OP is reading this... I would also like
to know if the text could contain 8-digit or longer numbers, date-looking
text like 17Jul09 and floating point power of ten numbers like 1.23e45?

--
Rick (MVP - Excel)


"JoeU2004" wrote in message
...
"Rick Rothstein" wrote:
I would still be interested in the answer to my questions...


In response to my questions, Eric wrote:

"Unfortunately there's no precise format. I just need to look for a
7-digit
number, which can be anywhere in the string. (I could restrict it to
looking
for the first 7-digit number, since I know that there's never more than
one.)"


----- original message -----

"Rick Rothstein" wrote in message
...
I would still be interested in the answer to my questions...

"Is the number always precede by a space when the number
is interior to the text? What about that "dot" after it... is there
always a dot following it? If the dot might not always be
there, is there always a space after the number when it is
interior to the text?"

--
Rick (MVP - Excel)


"Eric_NY" wrote in message
...
"However, with the morefunc add-in there is an option, I believe it is a
menu
option, to install the add-in as part of the workbook. "

I can't find that option. Where should I look for it? Which menu?

"Ron Rosenfeld" wrote:

On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY

wrote:

Ron - Thanks. I'm going to try the Longre morefunc add-in.

If I use the add-in functions and then send the sheet to someone by
email,
will the formulas still work? Or does the recipient also have to
install the
add-in separately on his own machine?

Thanks for your help.

Ordinarily no. They would have to install it themselves. However,
with the
morefunc add-in there is an option, I believe it is a menu option, to
install
the add-in as part of the workbook. If you do that, it will then be
usable by
the recipient with no particular effort on his part.

One caution concerning the add-in -- it will not work on strings that
are
longer than 255 characters. This, apparently is an .xll limitation,
and there
is no good way around it within the add-in.

By the way, if you should use the UDF approach, the UDF should be
embedded
within the workbook, so its use should be transparent to your user.
But there
are a lot of other useful functions in morefunc.
--ron





  #39  
Old July 20th, 2009, 04:25 PM posted to microsoft.public.excel.misc
Rick Rothstein[_2_]
external usenet poster
 
Posts: 2,013
Default Need formula to extract a numeric value from a free-format tex

In case you want to consider it, here is a non-RegEx array-entered** formula
that will do what you want...

=MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))*
ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW
(1:30),7))),ROW(1:30))),7)

Note though, that this formula is dependent on what your default date
separator is. Mine is the slash character (/) and that is what I used in the
FIND function call... if your default date separator is a different symbol,
then just replace my slash with that character.

--
Rick (MVP - Excel)


"Eric_NY" wrote in message
...
I just read your message from last Friday.

The text is free format. Users can enter it in whatever format they want.
The 7-digit number is somewhere within the text. I've glanced through it
and
in the samples I've seen, there's no consistency in what appears before or
after the 7-digit number.

I used the regex solution that Ron Rosenfeld suggested, and adjusted the
regular expression by removing the "\b" before and after the "\d{7}".

"Rick Rothstein" wrote:

I would still be interested in the answer to my questions...

"Is the number always precede by a space when the number
is interior to the text? What about that "dot" after it... is there
always a dot following it? If the dot might not always be
there, is there always a space after the number when it is
interior to the text?"

--
Rick (MVP - Excel)


"Eric_NY" wrote in message
...
"However, with the morefunc add-in there is an option, I believe it is
a
menu
option, to install the add-in as part of the workbook. "

I can't find that option. Where should I look for it? Which menu?

"Ron Rosenfeld" wrote:

On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY

wrote:

Ron - Thanks. I'm going to try the Longre morefunc add-in.

If I use the add-in functions and then send the sheet to someone by
email,
will the formulas still work? Or does the recipient also have to
install
the
add-in separately on his own machine?

Thanks for your help.

Ordinarily no. They would have to install it themselves. However,
with
the
morefunc add-in there is an option, I believe it is a menu option, to
install
the add-in as part of the workbook. If you do that, it will then be
usable by
the recipient with no particular effort on his part.

One caution concerning the add-in -- it will not work on strings that
are
longer than 255 characters. This, apparently is an .xll limitation,
and
there
is no good way around it within the add-in.

By the way, if you should use the UDF approach, the UDF should be
embedded
within the workbook, so its use should be transparent to your user.
But
there
are a lot of other useful functions in morefunc.
--ron




  #40  
Old July 20th, 2009, 04:34 PM posted to microsoft.public.excel.misc
Rick Rothstein[_2_]
external usenet poster
 
Posts: 2,013
Default Need formula to extract a numeric value from a free-format tex

I forgot to include the note regarding array-entered formulas. Here is my
message again, but with the note...

In case you want to consider it, here is a non-RegEx array-entered** formula
that will do what you want...

=MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))*
ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW
(1:30),7))),ROW(1:30))),7)

**Commit this formula using Ctrl+Shift+Enter, not just Enter by itself

Note though, that this formula is dependent on what your default date
separator is. Mine is the slash character (/) and that is what I used in the
FIND function call... if your default date separator is a different symbol,
then just replace my slash with that character.

--
Rick (MVP - Excel)


"Rick Rothstein" wrote in message
...
In case you want to consider it, here is a non-RegEx array-entered**
formula that will do what you want...

=MID(F5,MIN(IF(ISNUMBER(--MID(SUBSTITUTE(F5," ","x"),ROW(1:30),7))*
ISERR(SEARCH("e",MID(F5,ROW(1:30),7)))*ISERR(FIND( "/",MID(F5,ROW
(1:30),7))),ROW(1:30))),7)

Note though, that this formula is dependent on what your default date
separator is. Mine is the slash character (/) and that is what I used in
the FIND function call... if your default date separator is a different
symbol, then just replace my slash with that character.

--
Rick (MVP - Excel)


"Eric_NY" wrote in message
...
I just read your message from last Friday.

The text is free format. Users can enter it in whatever format they want.
The 7-digit number is somewhere within the text. I've glanced through it
and
in the samples I've seen, there's no consistency in what appears before
or
after the 7-digit number.

I used the regex solution that Ron Rosenfeld suggested, and adjusted the
regular expression by removing the "\b" before and after the "\d{7}".

"Rick Rothstein" wrote:

I would still be interested in the answer to my questions...

"Is the number always precede by a space when the number
is interior to the text? What about that "dot" after it... is there
always a dot following it? If the dot might not always be
there, is there always a space after the number when it is
interior to the text?"

--
Rick (MVP - Excel)


"Eric_NY" wrote in message
...
"However, with the morefunc add-in there is an option, I believe it is
a
menu
option, to install the add-in as part of the workbook. "

I can't find that option. Where should I look for it? Which menu?

"Ron Rosenfeld" wrote:

On Thu, 16 Jul 2009 12:04:04 -0700, Eric_NY

wrote:

Ron - Thanks. I'm going to try the Longre morefunc add-in.

If I use the add-in functions and then send the sheet to someone by
email,
will the formulas still work? Or does the recipient also have to
install
the
add-in separately on his own machine?

Thanks for your help.

Ordinarily no. They would have to install it themselves. However,
with
the
morefunc add-in there is an option, I believe it is a menu option, to
install
the add-in as part of the workbook. If you do that, it will then be
usable by
the recipient with no particular effort on his part.

One caution concerning the add-in -- it will not work on strings that
are
longer than 255 characters. This, apparently is an .xll limitation,
and
there
is no good way around it within the add-in.

By the way, if you should use the UDF approach, the UDF should be
embedded
within the workbook, so its use should be transparent to your user.
But
there
are a lot of other useful functions in morefunc.
--ron





 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump


All times are GMT +1. The time now is 04:22 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 OfficeFrustration.
The comments are property of their posters.