How to use Regular Expressions (Regex) in Microsoft Excel both in-cell
and loops
itemprop="text">
How can I use regular
expressions in Excel and take advantage of Excel's powerful grid-like setup for data
manipulation?
- In-cell
function to return a matched pattern or replaced value in a
string.
- Sub to loop through a column of data and extract
matches to adjacent cells.
- What setup is
necessary?
- What are Excel's special characters for
Regular expressions?
/>
I understand Regex is not ideal for many
situations (To use or not to
use regular expressions?) since excel can use Left
,
Mid
, Right
,
Instr
type commands for similar manipulations.
style="font-weight: bold;">
Answer
href="http://en.wikipedia.org/wiki/Regular_expressions" rel="noreferrer">Regular
expressions are used for Pattern Matching.
To use in Excel follow these steps
:
Step 1: Add VBA
reference to "Microsoft VBScript Regular Expressions
5.5"
- Select
"Developer" tab ( rel="noreferrer">I don't have this tab what do I
do?)
- Select "Visual Basic" icon from 'Code'
ribbon section
- In "Microsoft Visual Basic for
Applications" window select "Tools" from the top
menu.
- Select
"References"
- Check the box next to "Microsoft VBScript
Regular Expressions 5.5" to include in your
workbook.
- Click
"OK"
Step
2: Define your
pattern
Basic
definitions:
-
Range.
- E.g.
a-z
matches an lower case letters from a to
z
- E.g.
0-5
matches any number
from 0 to
5
[]
Match exactly one of the objects inside these
brackets.
- E.g.
[a]
matches the letter a
- E.g.
[abc]
matches a single letter which can be a, b or
c
- E.g.
[a-z]
matches any single
lower case letter of the
alphabet.
()
Groups different matches for return purposes. See examples
below.
{}
Multiplier
for repeated copies of pattern defined before
it.
- E.g.
[a]{2}
matches two consecutive lower case letter a:
aa
- E.g.
[a]{1,3}
matches at least one and up to three lower case letter
a
, aa
, aaa
+
Match at least one, or more, of the pattern defined before
it.
- E.g.
a+
will match consecutive a's a
,
aa
, aaa
, and so
on
?
Match zero or one of the pattern defined before
it.
- E.g. Pattern may or
may not be present but can only be matched one time.
- E.g.
[a-z]?
matches empty string or any single lower case
letter.
*
Match zero or more of the pattern defined before it.
- E.g.
Wildcard for pattern that may or may not be present.
- E.g.
[a-z]*
matches empty string or string of lower case
letters.
.
Matches any
character except newline
\n
- E.g.
a.
Matches a two character string starting with a and ending
with anything except
\n
|
OR operator
- E.g.
a|b
means either a
or
b
can be matched.
- E.g.
red|white|orange
matches exactly one of the
colors.
^
NOT operator
- E.g.
[^0-9]
character can not contain a
number
- E.g.
[^aA]
character can not be lower case a
or upper case
A
\
Escapes special character that follows (overrides above
behavior)
- E.g.
\.
, \\
, \(
,
\?
, \$
,
\^
/>
Anchoring
Patterns:
^
Match must occur at start of
string
- E.g.
^a
First character must be lower case letter
a
- E.g.
^[0-9]
First character must be a
number.
$
Match must occur at end of
string
- E.g.
a$
Last character must be lower case letter
a
/>
Precedence
table:
Order
Name Representation
1 Parentheses ( )
2 Multipliers ? + * {m,n} {m,
n}?
3 Sequence & Anchors abc ^ $
4 Alternation
|
/>
Predefined Character
Abbreviations:
abr
same as meaning
\d [0-9] Any single digit
\D [^0-9] Any single
character that's not a digit
\w [a-zA-Z0-9_] Any word character
\W
[^a-zA-Z0-9_] Any non-word character
\s [ \r\t\n\f] Any space
character
\S [^ \r\t\n\f] Any non-space character
\n [\n] New
line
/>
Example 1:
Run as macro
The following
example macro looks at the value in cell A1
to see if the first
1 or 2 characters are digits. If so, they are removed and the rest of the string is
displayed. If not, then a box appears telling you that no match is found. Cell
A1
values of 12abc
will return
abc
, value of 1abc
will return
abc
, value of abc123
will return "Not
Matched" because the digits were not at the start of the
string.
Private Sub
simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New
RegExp
Dim strInput As String
Dim Myrange As
Range
Set Myrange = ActiveSheet.Range("A1")
If strPattern <> "" Then
strInput = Myrange.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End
With
If regEx.Test(strInput) Then
MsgBox
(regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not
matched")
End If
End If
End
Sub
/>
Example 2:
Run as an in-cell
function
This example is the same as
example 1 but is setup to run as an in-cell function. To use, change the code to
this:
Function
simpleCellRegex(Myrange As Range) As String
Dim regEx As New
RegExp
Dim strPattern As String
Dim strInput As String
Dim strReplace As String
Dim strOutput As
String
strPattern =
"^[0-9]{1,3}"
If strPattern <> "" Then
strInput = Myrange.Value
strReplace = ""
With
regEx
.Global = True
.MultiLine = True
.IgnoreCase =
False
.Pattern = strPattern
End
With
If regEx.test(strInput) Then
simpleCellRegex =
regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not
matched"
End If
End If
End
Function
Place
your strings ("12abc") in cell A1
. Enter this formula
=simpleCellRegex(A1)
in cell B1
and
the result will be "abc".
src="https://i.stack.imgur.com/q3RRC.png" alt="enter image description
here">
/>
Example 3:
Loop Through Range
This
example is the same as example 1 but loops through a range of
cells.
Private Sub
simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"
Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp
Dim strInput As String
Dim Myrange As Range
Set Myrange
= ActiveSheet.Range("A1:A5")
For Each cell In
Myrange
If strPattern <> "" Then
strInput =
cell.Value
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern =
strPattern
End With
If regEx.Test(strInput)
Then
MsgBox (regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not matched")
End If
End If
Next
End
Sub
/>
Example 4: Splitting
apart different patterns
This example loops
through a range (A1
, A2
&
A3
) and looks for a string starting with three digits followed
by a single alpha character and then 4 numeric digits. The output splits apart the
pattern matches into adjacent cells by using the ()
.
$1
represents the first pattern matched within the first set of
()
.
Private
Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern
As String
Dim strInput As String
Dim Myrange As
Range
Set Myrange =
ActiveSheet.Range("A1:A3")
For Each C In Myrange
strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"
If strPattern
<> "" Then
strInput = C.Value
With
regEx
.Global = True
.MultiLine = True
.IgnoreCase =
False
.Pattern = strPattern
End With
If
regEx.test(strInput) Then
C.Offset(0, 1) = regEx.Replace(strInput,
"$1")
C.Offset(0, 2) = regEx.Replace(strInput, "$2")
C.Offset(0, 3) = regEx.Replace(strInput, "$3")
Else
C.Offset(0, 1)
= "(Not matched)"
End If
End If
Next
End
Sub
Results:
src="https://i.stack.imgur.com/9eCZ5.png" alt="enter image description
here">
/>
Additional Pattern
Examples
String
Regex Pattern Explanation
a1aaa [a-zA-Z][0-9][a-zA-Z]{3} Single alpha, single
digit, three alpha characters
a1aaa [a-zA-Z]?[0-9][a-zA-Z]{3} May
or may not have preceeding alpha character
a1aaa [a-zA-Z][0-9][a-zA-Z]{0,3}
Single alpha, single digit, 0 to 3 alpha characters
a1aaa
[a-zA-Z][0-9][a-zA-Z]* Single alpha, single digit, followed by any number of alpha
characters
\<\/[a-zA-Z][0-9]\> Exact non-word
character except any single alpha followed by any single
digit
No comments:
Post a Comment