Thursday 16 November 2017

How to use Regular Expressions (Regex) in Microsoft Excel both in-cell and loops

itemprop="text">


How can I use regular
expressions in Excel and take advantage of Excel's powerful grid-like setup for data
manipulation?




  • In-cell
    function to return a matched pattern or replaced value in a
    string.

  • Sub to loop through a column of data and extract
    matches to adjacent cells.

  • What setup is
    necessary?

  • What are Excel's special characters for
    Regular expressions?



/>


I understand Regex is not ideal for many
situations (To use or not to
use regular expressions?
) since excel can use Left,
Mid, Right,
Instr type commands for similar manipulations.


style="font-weight: bold;">

Answer




href="http://en.wikipedia.org/wiki/Regular_expressions" rel="noreferrer">Regular
expressions are used for Pattern Matching.



To use in Excel follow these steps
:



Step 1: Add VBA
reference to "Microsoft VBScript Regular Expressions
5.5"





  • Select
    "Developer" tab ( rel="noreferrer">I don't have this tab what do I
    do?
    )

  • Select "Visual Basic" icon from 'Code'
    ribbon section

  • In "Microsoft Visual Basic for
    Applications" window select "Tools" from the top
    menu.

  • Select
    "References"

  • Check the box next to "Microsoft VBScript
    Regular Expressions 5.5" to include in your
    workbook.

  • Click
    "OK"



Step
2
: Define your
pattern




Basic
definitions:



-
Range.




  • E.g.
    a-z matches an lower case letters from a to
    z

  • E.g. 0-5 matches any number
    from 0 to
    5



[]
Match exactly one of the objects inside these
brackets.





  • E.g.
    [a] matches the letter a

  • E.g.
    [abc] matches a single letter which can be a, b or
    c

  • E.g. [a-z] matches any single
    lower case letter of the
    alphabet.



()
Groups different matches for return purposes. See examples
below.



{} Multiplier
for repeated copies of pattern defined before
it.





  • E.g.
    [a]{2} matches two consecutive lower case letter a:
    aa

  • E.g.
    [a]{1,3} matches at least one and up to three lower case letter
    a, aa, aaa



+
Match at least one, or more, of the pattern defined before
it.




  • E.g.
    a+ will match consecutive a's a,
    aa, aaa, and so
    on




?
Match zero or one of the pattern defined before
it.




  • E.g. Pattern may or
    may not be present but can only be matched one time.

  • E.g.
    [a-z]? matches empty string or any single lower case
    letter.



*
Match zero or more of the pattern defined before it.

- E.g.
Wildcard for pattern that may or may not be present.
- E.g.
[a-z]* matches empty string or string of lower case
letters.



. Matches any
character except newline
\n




  • E.g.
    a. Matches a two character string starting with a and ending
    with anything except
    \n



|
OR operator





  • E.g.
    a|b means either a or
    b can be matched.

  • E.g.
    red|white|orange matches exactly one of the
    colors.



^
NOT operator




  • E.g.
    [^0-9] character can not contain a
    number


  • E.g. [^aA]
    character can not be lower case a or upper case
    A



\
Escapes special character that follows (overrides above
behavior)




  • E.g.
    \., \\, \(,
    \?, \$,
    \^



/>


Anchoring
Patterns:



^
Match must occur at start of
string




  • E.g.
    ^a First character must be lower case letter
    a

  • E.g.
    ^[0-9] First character must be a
    number.




$
Match must occur at end of
string




  • E.g.
    a$ Last character must be lower case letter
    a



/>

Precedence
table:




Order
Name Representation
1 Parentheses ( )
2 Multipliers ? + * {m,n} {m,
n}?
3 Sequence & Anchors abc ^ $
4 Alternation
|


/>

Predefined Character
Abbreviations:




abr
same as meaning
\d [0-9] Any single digit
\D [^0-9] Any single
character that's not a digit
\w [a-zA-Z0-9_] Any word character
\W
[^a-zA-Z0-9_] Any non-word character
\s [ \r\t\n\f] Any space
character
\S [^ \r\t\n\f] Any non-space character
\n [\n] New
line



/>

Example 1:
Run as macro



The following
example macro looks at the value in cell A1 to see if the first
1 or 2 characters are digits. If so, they are removed and the rest of the string is
displayed. If not, then a box appears telling you that no match is found. Cell
A1 values of 12abc will return
abc, value of 1abc will return
abc, value of abc123 will return "Not
Matched" because the digits were not at the start of the
string.



Private Sub
simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"

Dim strReplace As String: strReplace = ""

Dim regEx As New
RegExp
Dim strInput As String
Dim Myrange As
Range

Set Myrange = ActiveSheet.Range("A1")


If strPattern <> "" Then
strInput = Myrange.Value


With regEx

.Global = True
.MultiLine = True

.IgnoreCase = False
.Pattern = strPattern
End
With

If regEx.Test(strInput) Then
MsgBox
(regEx.Replace(strInput, strReplace))
Else
MsgBox ("Not
matched")

End If
End If
End
Sub


/>

Example 2:
Run as an in-cell
function



This example is the same as
example 1 but is setup to run as an in-cell function. To use, change the code to
this:




Function
simpleCellRegex(Myrange As Range) As String
Dim regEx As New
RegExp
Dim strPattern As String
Dim strInput As String

Dim strReplace As String
Dim strOutput As
String


strPattern =
"^[0-9]{1,3}"


If strPattern <> "" Then

strInput = Myrange.Value
strReplace = ""

With
regEx
.Global = True
.MultiLine = True
.IgnoreCase =
False
.Pattern = strPattern

End
With

If regEx.test(strInput) Then
simpleCellRegex =
regEx.Replace(strInput, strReplace)
Else
simpleCellRegex = "Not
matched"
End If
End If
End
Function



Place
your strings ("12abc") in cell A1. Enter this formula
=simpleCellRegex(A1) in cell B1 and
the result will be "abc".



src="https://i.stack.imgur.com/q3RRC.png" alt="enter image description
here">



/>

Example 3:
Loop Through Range



This
example is the same as example 1 but loops through a range of
cells.




Private Sub
simpleRegex()
Dim strPattern As String: strPattern = "^[0-9]{1,2}"

Dim strReplace As String: strReplace = ""
Dim regEx As New RegExp

Dim strInput As String
Dim Myrange As Range

Set Myrange
= ActiveSheet.Range("A1:A5")


For Each cell In
Myrange
If strPattern <> "" Then
strInput =
cell.Value

With regEx
.Global = True

.MultiLine = True
.IgnoreCase = False
.Pattern =
strPattern
End With


If regEx.Test(strInput)
Then
MsgBox (regEx.Replace(strInput, strReplace))
Else

MsgBox ("Not matched")
End If
End If
Next
End
Sub



/>

Example 4: Splitting
apart different patterns



This example loops
through a range (A1, A2 &
A3) and looks for a string starting with three digits followed
by a single alpha character and then 4 numeric digits. The output splits apart the
pattern matches into adjacent cells by using the ().
$1 represents the first pattern matched within the first set of
().



Private
Sub splitUpRegexPattern()
Dim regEx As New RegExp
Dim strPattern
As String

Dim strInput As String
Dim Myrange As
Range

Set Myrange =
ActiveSheet.Range("A1:A3")

For Each C In Myrange

strPattern = "(^[0-9]{3})([a-zA-Z])([0-9]{4})"

If strPattern
<> "" Then
strInput = C.Value


With
regEx
.Global = True
.MultiLine = True
.IgnoreCase =
False
.Pattern = strPattern
End With

If
regEx.test(strInput) Then
C.Offset(0, 1) = regEx.Replace(strInput,
"$1")

C.Offset(0, 2) = regEx.Replace(strInput, "$2")

C.Offset(0, 3) = regEx.Replace(strInput, "$3")
Else
C.Offset(0, 1)
= "(Not matched)"
End If
End If
Next
End
Sub



Results:



src="https://i.stack.imgur.com/9eCZ5.png" alt="enter image description
here">



/>

Additional Pattern
Examples



String
Regex Pattern Explanation
a1aaa [a-zA-Z][0-9][a-zA-Z]{3} Single alpha, single
digit, three alpha characters

a1aaa [a-zA-Z]?[0-9][a-zA-Z]{3} May
or may not have preceeding alpha character
a1aaa [a-zA-Z][0-9][a-zA-Z]{0,3}
Single alpha, single digit, 0 to 3 alpha characters
a1aaa
[a-zA-Z][0-9][a-zA-Z]* Single alpha, single digit, followed by any number of alpha
characters

\<\/[a-zA-Z][0-9]\> Exact non-word
character except any single alpha followed by any single
digit

No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print &q...