Regular Expressions – Visual Basic Tutorial

Many applications need to process text. For example, a Web form that collected a user’s address would need to verify that the user’s zip code is in a valid format for a zip code. Similarly, a company phone directory would need to verify that phone numbers matched the standard format. Some applications generate text-based log files, while other applications process input contained in text files.

While the String class contains a set of methods for string searching and manipulation, regular expressions present a far more powerful and efficient way to process text. For example, the method Regex.IsMatch(s, “^\(?\d{3}\)?[\s\-]?\d{3}\-?\d{4}$”) returns true if the String s is a valid phone number format.

After reading this tutorial, you will be able to:

  • Validate data with a regular expression
  • Extract data with a regular expression
  • Format data using regular expressions

Sample Regular Expression Diagrammed

Regular expressions use codes to match different parts of a string. For example, * matches the beginning of a string, and $ matches the end of a string.

Placing a ? after a character or code makes it optional, meaning the regular expression can match whether or not the characters appear.

Regular expressions use the \ character to indicate special codes. For example, if you want to match an asterisk in a string, you can’t simply specify the * character, because regular expressions use that to match the beginning of a string. Instead, to match an asterisk, you must specify \*. Similarly, to match a question mark, use \?, or use \- to match a hyphen.

\d matches a numeric value (0-9), and use can match several numeric values in a row by specifying a numeric value in brackets. For example, \d{3} matches three numbers (such as ’932′) and \d{8} matches eight numbers.

You can group a set of characters and codes together using parentheses. For example, in the diagram, the group (\-\d{4}) matches a dash followed by four numbers. Because it has a ? after the group, the entire group is optional.

Regular Expressions Explained

In this video, I explain an example regular expression (transcript below):

This regular expression can be used to evaluate user input to determine whether it is a valid zip code in the United States. Now, let’s walk through the process the regular expression evaluator would use to test user input.

The first character, a ^, must match the beginning of the string.

The \d code represents numeric digits. Because it follows the carat, it means the first symbol must be a number. Because it has a {5} after it, it means there must be five numbers in a row.

The next code is a actually a group—the set of characters surrounded by parenthesis.

This group has a question mark after it, indicating that the entire group is optional. Therefore, it’s OK that it doesn’t exist in the user input.

The last character, a $, indicates the end of the regular expression. If the regular expression didn’t include this, the user input could have extra characters following the zip code. By including the carat at the beginning and the $ at the end, the regular expression requires the user input to match exactly, with no other characters at the beginning or end.

We now know that the user did provide a valid zip code, because it matched the regular expression.

Now, let’s evaluate user input that includes the optional four-digit extension to the zip code.

As before, the first characters match the beginning of the user input

This time, though, there is additional input to be evaluated. The first character in the group matches the hyphen.

The next several characters match the next four numeric digits, completing the evaluation of the user input.

Now, let’s consider what would happen if the user entered invalid input accidentally.

As before, the first several characters match the valid zip code.

Now, the regular expression evaluator attempts to match the optional group to the next character, a space. It doesn’t match, but that’s ok, because this group is optional (as indicated by the question mark following the group).

The regular expression evaluator attempts to match the $ symbol, which represents the end of the user input, with the space. The space isn’t the end of the user input, so the regular expression evaluator rejects the input.

Testing Regular Expressions

After adding the System.Text.RegularExpressions namespace to your application, you can call the Boolean Regex.IsMatch static method to determine whether a string matches a regular expression. For example, the following line displays True if the first parameter is a valid zip code, or false if it does not match the regular expression defined in the second parameter.

Console.WriteLine(Regex.IsMatch("01331", @"^\d{5}(\-\d{4})?$"))

“78756″, “38292-3933″, or “83000-0002″ would all match the specified regular expression. “abcdef”, “83292-”, and “3838-38383″ would not match the regular expression.

Extracting Data with Regular Expressions

Besides using regular expressions to determine whether input matches a pattern, you can use regular expressions to extract specific data from user input. For example, if a user types in their city, state, and zip code in a single line, you can use a regular expression to save the city, state, and zip code as separate strings.

For example, the following code sample extracts the name from input by matching everything after ‘Name: ‘. Note that the group ‘(.*$)’ is used to match all characters until the end of the string. It (and any other groups) can then be referenced using the Match.Groups collection.

Dim input As String = "Name: Chris Ashton" 

Dim m As Match = Regex.Match(input, "Name: (.*$)") 

Console.WriteLine(m.Groups(1))

Running that console application displays the name ‘Chris Ashton’, successfully extracting the data from the string.

Return to the .NET Framework Tutorials Table of Contents.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>