Validating and searching for a phone number is a common task for user entry or content searching. A person may enter their phone number in a number of formats making the task slightly more difficult, but the important part concerns recognizing a number with correct number of digits with allowed prefixes. For this task we’ll create a regular expression that can be used to validate a US phone number an entry or search for US phone numbers within a body of text.

observations

In the US, a phone number must be 10 digits (area code followed by 7 digit-number) and may be prefixed by the international or long-distance code (+1 or 1). Breaking up a US phone number into parts, we have area code of 3 digits followed by central office code (CO code, also known as exchange code) of 3 digits followed by four digits. Obviously beginning an area code or CO code with the international prefix of ‘1’ would be confusing, so these cannot begin with 1. Zero is for the operator, so that’s out as the first digit of area or CO codes. Certain prefixes are reserved in North America such as 911 and 411. What else is reserved? According this Wikipedia entry, all n11 combinations are reserved such that if the second digit is a ‘1’ then third digit cannot be a ‘1’ which makes things a bit easier for matching area and CO codes.

What about fictional characters with “555” CO code? It turns out only a range is set aside for fiction (555-0100 to 555-0199), so “555” is acceptable.

Phone numbers commonly appear with separators other than hyphen such as spaces, periods, or none at all such as (907) 555-0123, 1-907-555-0123, 907.555.0123, 907 555 0123, or 9075550123.

assumptions

Let’s allow fictional character phone numbers for simplicity. Besides, movie characters need phone numbers, too. We’ll allow dots, hyphens, and spaces as separators in any combination such that 907 555.0123 or 907.555-0123 are acceptable.

regular expression

Let’s start with the area and CO codes to recognize three digits that begin with 2-9 followed by any two-digit combination that isn’t two 1s. “[2-9]” covers the first digit. If the second digit is a “1” then we must exclude “1” from the third. The simplest way is to OR explicit groupings. In the first case, “1[023456789]” and the second, “[023456789]1” covers our exception. We must also allow no 1s at all giving us the result:

regular expression for area and CO code
(?:1[023456789]|[023456789]1|[023456789][023456789])

The “?:” marks this as a non-capture group. You could write the number ranges in other ways.

Now we include our optional international code or long distance prefix, allowed separators, and remaining four digits. I chose not to include the word boundary (\b) at the beginning since parenthesis counts as a boundary. Test your expression with some sample numbers in a regular expression tool such as Draco RegexTest as seen in the screenshot below.

regular expression for US phone number
(?:\+?1[\s.-]?)?(?:\(?[2-9](?:1[023456789]|[023456789]1|[023456789][023456789])\)?\s*?|[\s.-]?[2-9](?:1[023456789]|[023456789]1|[023456789][023456789])[\s.-]?)[2-9](?:1[023456789]|[023456789]1|[023456789][023456789])[ .-]?\d{4}\b

PhoneRegexScr

If you want to be more strict so that the same separator must appear whenever the number doesn’t include parenthesis around the area code, such as only hyphens or only dots, you could pull out the grouping and capture the first then replace the remaining with “\1” to match the captured character.

program

In C#, prefix a string literal with “@” to avoid having to escape escape characters and simply paste in your regular expression. Below is my C# test program for recognizing a US phone number anywhere within the given text. In Visual Studio 2013, choose “Start without debugging” (Ctrl+F5).

C#: valid phone number in text
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace NumberValidator
{
class Program
{
static void Main(string[] args)
{
string testString = "";
if (args != null && args.Length > 0 && args[0].Length > 0)
{
testString = args[0];
}
else
{
System.Console.WriteLine("Enter a string to test if has a phone number: ");
testString = System.Console.ReadLine();
}
if (HasPhoneNumber(testString))
{
System.Console.WriteLine("Found a phone number in \"{0}\"", testString);
}
else
{
System.Console.WriteLine("Did not find a valid phone number in \"{0}\"", testString);
}
}
/// <summary>
/// assume US phone number of 10 digits. Unused prefixes allowed, but reserved prefixes should fail.
/// </summary>
/// <param name="str"></param>
/// <returns></returns>
static bool HasPhoneNumber(string str)
{
if (str != null && str.Length > 9)
{
// use @ so we don't need to escape the escapes
string patternString = @"(?:\+?1[\s.-]?)?(?:\(?[2-9](?:1[023456789]|[023456789]1|[023456789][023456789])\)?\s*?|[\s.-]?[2-9](?:1[023456789]|[023456789]1|[023456789][023456789])[\s.-]?)[2-9](?:1[023456789]|[023456789]1|[023456789][023456789])[\s.-]?\d{4}\b";
return System.Text.RegularExpressions.Regex.IsMatch(str, patternString);
}
return false;
}
}
}