asp tutorials, asp.net tutorials, sample code, and Microsoft news from 15Seconds
Data Access  |   Troubleshooting  |   Security  |   Performance  |   ADSI  |   Upload  |   Email  |   Control Building  |   Component Building  |   Forms  |   XML  |   Web Services  |   ASP.NET  |   .NET Features  |   .NET 2.0  |   App Development  |   App Architecture  |   IIS  |   Wireless
 
Pioneering Active Server
 Power Search





Active News
15 Seconds Weekly Newsletter
• Complete Coverage
• Site Updates
• Upcoming Features

More Free Newsletters
Reference
News
Articles
Archive
Writers
Code Samples
Components
Tools
FAQ
Feedback
Books
Links
DL Archives
Community
Messageboard
List Servers
Mailing List
WebHosts
Consultants
Tech Jobs
15 Seconds
Home
Site Map
Press
Legal
Privacy Policy
internet.commerce














internet.com
IT
Developer
Internet News
Small Business
Personal Technology
International

Search internet.com
Advertise
Corporate Info
Newsletters
Tech Jobs
E-mail Offers

HardwareCentral
Compare products, prices, and stores at Hardware Central!

Server-Side Validations Using Regular Expressions
By Prasanna Pattam
Rating: 4.1 out of 5
Rate this article


  • email this article to a colleague
  • suggest an article

    Introduction

    Validating user input is the bane of every software developer's existence. Having one module that addresses all the user inputs greatly reduces the number of validating errors. Writing a single routine that does all the validation would be very difficult and tedious. The Regular expressions tool, however, can be used to validate all user inputs. It can make your code faster, more efficient, and less error prone. Regular expressions are a powerful tool for searching and processing text, and even in its simplest form, can make many complex tasks easy. This article provides a brief tutorial on the basics of Regular expressions and shows how the module can be extended to test border conditions.

    Download complete source code

    Regular Expressions: What are They?

    Regular expressions are a tiny, highly specialized programming language. They used to only be familiar to Unix users. Text editors like vi allowed regular expressions-formatted searches. Finally, Microsoft decided to give the same power to us and implemented it in Interdev. Most likely people haven't noticed it when they are using Find in Interdev.

    When Microsoft started creating scripting languages for the Windows platform, only JScript contained regular expressions, leaving VBScript alone in the dark. That changed with version 5 of the VBScript engine. To ensure that Visual Basic (VB) developers can use regular expressions, the VBScript regular expressions engine has been implemented as a COM object. This makes them much more powerful, since they can be called from various sources outside of VBScript, such as Visual Basic or C.

    Regular expressions provide tools for developing complex pattern-matching and textual search-and-replace algorithms. Any Perl, egrep, awk, or sed developer will tell you that regular expressions are one of the most powerful utilities available for manipulating text and data. By creating patterns to match specific strings, a developer has total control over searching, extracting, or replacing data. In short, to master regular expressions is to master your data.

    A regular expression is a series of characters that define a pattern. The pattern is then compared to a target string to see whether there are any matches to the pattern in the target string.

    Patterns
    Regular expressions are almost another language by itself. A pattern defines the criteria to search for within a string. Regular expressions can be as simple as plain text, or use a unique language consisting of special characters and modifiers to build these patterns.

    Alphabetic Data:
    In a regular expression, the period (.) represents exactly one occurrence of any character other than a new line. Thus, the regular expression m.n will be matched not only by man, but also by remind and mint, since each of them contain an m and a n separated by exactly one letter.

    		m.n --- matches	man, men, met, mint, remind
    
    To narrow the range of acceptable characters in a regular expression, a character class can be used. This is simply a list of one or more characters surrounded by square brackets [ ], and it is matched by only the characters within the brackets.
    		m[ai]n --- matches man, mint, remind
    
    The above expression only matches any string which has a or i in between m and n. Inside the square brackets you could specify the range of characters.
    		m[a-z]n --- matches any alphabetic character between m and n
    		m[a-z0-9]n --- matches any alpha-numeric character between m and n
    
    If you want to further reduce the criteria, such as m should be the first character, then use the following:
    	
    		^m[ai]n --- matches man, mint
    
    A caret (^) at the begging of the pattern string enforces that the target string should start with the pattern string. But if you put the caret inside the square brackets [^], then it is altogether a different meaning. Caret inside the square bracket means "don't match the characters inside the brackets."
    		m[^ai]n --- matches men, mend, diamond
    
    Putting a caret at the begging of the above expression reduces the search criteria.
    		^m[^ai]n --- matches men, met
    
    As caret enforces the pattern to match from the starting of the string, $ enforces that pattern should match at the end of the string.
    		m[ai]n$ ---	matches man,  min, cumin
    
    So putting caret (^) at the beginning and dollar ($) at the end forces the string to match the pattern both at the begging and the end.
    		^m[ai]n$ --- matches	man, min
    		^m[^ai]n$ --- matches	men
    
    Finally, let's see two more special characters, * and +. * represents 0 or more characters.
    		m[o]*n --- matches mn, mon, moon, moooon
    
    + is almost the same as *. The plus sign assures that there should be one or more characters that match.
    		m[o]+n --- matches mon, moon, moooon
    
    + and * are used to match to any number of characters. If you want to match only for a certain number of characters, then specify the number of characters inside {}.
    		m[o]{3}n --- matches mooon
    
    You can even give a range of characters.
    	
    		m[o]{2,4}n --- matches moon, mooon, moooon	
    
    Here there can be 2 to 4 o's between m and n.

    Numeric Data:
    So far we have seen the string patterns. Let's see how to match numeric data. If you want to match the numbers, \d should be used. \d matches one numeric character

    		\d --- matches 1, 23423,  abc1, abc1def
    
    You can use *, +, ^, $, [], {} along with this and they will have the same meaning as with the strings.

    So:

    		\d+ --- matches 1, 123, abc1def
    		\d* --- matches 1, 123, abc1def, abc
    		^\d+$ --- matches 1, 123
    		^\d{3}$ ---	matches 123, 234
    		^\d{2,4}$ --- matches 12, 123, 1234
    
    So far we have seen what the patterns look like. There are many more types of patterns. Discussing them is beyond the scope of this article. For further reference about the special characters, visit http://msdn.microsoft.com/scripting/default.htm?/scripting/jscript/doc/jsgrpregexpsyntax.htm.

    RegExp Object

    Microsoft provided us with the RegExp object. In order to use RegExp object in VB, select the References menu item from the VB Project menu, then check the "Microsoft VBScript Regular Expressions" entry. No reference or CreateObject is required if you want to use it in VBScript. Once the reference is set, the RegExp object is instantiated with the following code:

        Dim regEx As RegExp
        Set regEx = New RegExp
    
    The RegExp object has a property called IgnoreCase. When set to false, it makes the comparison case sensitive.
        regEx.IgnoreCase = False            ' Set case sensitivity.
    
    Once the Case option is set, we have to set the pattern and then invoke the Test method of the object with the target string to compare. The Test method returns a Boolean. If the pattern matches the target string, then the Test method returns true, otherwise false.
        regEx.Pattern = "^\d{2,4}$" 		' Set pattern.
        Validate = regEx.Test("123")    	 ' Execute the search test.
    
    In the above code, the pattern is set to be a number that is 2 to 4 characters and the target string is "123," a number of 3 characters in length. Since the pattern matches the target string, the Test method returns true.

    There is also an Execute method in the RegExp object that returns Match object and Match collection, which we won't discuss here.

    Patterns to Match User Inputs

    Numeric Data Types:
    Below are some sample cases that use regular expressions and its patterns in VB to validate user inputs.

    This case uses only positive integers. The pattern would be

    	Positive Integer --- ^\d+$
    
    The caret in the beginning and dollar at the end insure that the target string should be a number (\d) and the + enforces there should be at least one numeric character. If you want to test only for negative numbers, then the pattern is
    	Negative Integer --- ^-\d+$
    
    Observe the "-" after the caret (^) that forces the user to enter "-" before the numeric character. Finally, if you want to test an integer number that can be positive or negative, then the pattern is
    	Integer ---	^-{0,1}\d+$
    
    0, 1 inside the curly brackets {} tells the pattern to have 0 or 1 "-" before the numeric character.

    Extend the same logic to match the numbers. (A number can have a decimal; and if it has decimal, then it can have only one decimal and there should be at least one numeric character after the decimal.)

    The pattern to match a positive number is

    	Positive Number --- ^\d*\.{0,1}\d+$
    
    \d* means there can be 0 or more numeric characters before a decimal. \. represents a period(or decimal). As a period has special meaning in regular expressions, \ (backslash) overrides its special meaning so that a period can be matched. The curly brackets next to the period forces it to have only one decimal or no decimal. And the \d+ at the end means there should be at least one numeric character after the decimal.

    The pattern for checking for a negative number and number (can be positive or negative) is

    	Negative Number --- ^-\d*\.{0,1}\d+$
    	Positive Number or Negative Number --- ^-{0,1}\d*\.{0,1}\d+$
    
    A zip code can have 5 digits (99999) or can have a 5-4 (99999-9999) pattern. The two possible cases can't be represented by one pattern. Based on whether there is "-" in the zip code, use the pattern which suits. This is illustrated by the following code. (Here strValue is the target string and strPattern is the pattern.)
                Zip Code
    	'it can be a 5 digits(99999) 0r 5-4(99999-9999) pattern
                If InStr(strValue, "-") = 0 Then
                    strPattern = "^\d{5}$"
                Else
                    strPattern = "^\d{5}-\d{4}$"
                End If
    
    Validating a Social Security Number is almost similar to the zip code. A social security number can also be represented by two forms (999-99-9999 Or 999999999). The code to match this pattern is
                Social Security Number
    	'it should of the pattern 999-99-9999 Or 999999999
                If InStr(strValue, "-") = 0 Then
                    strPattern = "^\d{9}$"
                Else
                    strPattern = "^\d{3}-\d{2}-\d{4}$"
                End If
    
    Alphabetic Data Types:
    Below are patterns for validating alphabetic data types. If you have a first-name field in your form, then that field should be validated for alphabets only. (In one of my applications, customer service received an email saying that a first name was not accepted by the form because it contained some numbers.) The pattern to match only alphabets follows:
    	Alphabets --- ^[a-zA-Z]+$
    
    Instead of a first name, if you have a name field that takes first name and last name, then you should also allow space for the space between the two names.. That pattern string is
    	Alphabets with space --- ^[a-zA-Z ]+$
    
    The pattern for an alphanumeric string (alphabets and numbers) is
    	AlphaNumeric --- ^[a-zA-Z0-9]+$
    
    The following is a tough pattern. The typical email pattern looks like name@domain.extension. The name must start with a alphabetic character and can contain a number or a special character, such as period (.), underscore (_), and dash (-), but it should not end with special characters. So the pattern to match the name in the email address is "[a-zA-Z][\w\.-]*[a-zA-Z0-9]"

    Domain name follows the same rules.

    The extension cannot have special characters such as "-" and "_. Putting all of these together will give the pattern for an email address.

    Email -- 
    
    ^[a-zA-Z][\w\.-]*[a-zA-Z0-9]@[a-zA-Z][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$
    
    Finally, there's a validation for the date. This is really tricky. If you are using the "isDate" function to validate your date validations, then think again. In isDate 1/1/200 (the 3-digit year) is a valid date.

    The isDate function gives true for both the American date (mm/dd/yy) and the European date (dd/mm/yy). This function isn't completely dependable. Once the isDate function returns true, it has to extract Month and Year from the date and has to check that Month is less than or equal to 12.

    Year cannot be 3 characters. Regular expression checks that the year is 2 or 4 characters, after that we have to check if it is 3 characters or not. If it is not 3 characters, then it is either 2 or 4 characters.

    The module that validates email is

               Date
     	'it should be an American Date  mm/dd/yy[yy] only and no time
                strPattern = "^\d{1,2}\/\d{1,2}\/\d{2,4}$"
                'Checking for valid date
                If Not IsDate(strValue) Then
                    bInvalid = True
                Else
                    'checking whether year is not 3 digit
                    Dim arrDate
                    arrDate = Split(strValue, "/")
                    If UBound(arrDate) <> 2 Then
                        bInvalid = True
                    Else
                        'checking that month is less than 12
                        If CInt(arrDate(0)) > 12 Then bInvalid = True
                        'checking that year should not be 3 digits
                        If Len(arrDate(2)) = 3 Then bInvalid = True
                    End If
                End If
    
    

    Putting It Together

    So far we have seen patterns for different user inputs. I wrote one module that is used to validate all the possible user inputs. This module also validates the boundary conditions for the numeric data types (such as it should be less than 100). The attached file contains the working copy of this module, along with a technical document that contains how to use this function and provides regular expression syntax. This function also has patterns of other possible user inputs that are not discussed here. I also provided a chance for the user to specify a personal pattern and call this module for a special case.

    Conclusion

    In this article you have seen what a regular expression is and how to use it in VB to do field validations of user input. For further information and discussions of pattern syntax in regular expressions visit the Microsoft sites . http://msdn.microsoft.com/library/dotnet/cpguide/cpconintroductiontoregularexpressions.htm or http://www.microsoft.com/windows2000/en/advanced/help/default.asp?url=/WINDOWS2000/en/advanced/help/ISMain-concepts_47.htm.

    About the Author

    Prasanna Pattam is a senior architect at Qwest Communications International, Inc., and has been working in the ASP technologies for the last four years. He can be reached at Prasanna.Pattam@qwest.com.

  • Rate This Article
    Not HelpfulMost Helpful
    1 2 3 4 5
    Other Articles
    Jun 6, 2002 - Client Side Validation Using the XMLHTTPRequest Object
    Jonathan Zufi shows how to use the XMLHTTP object within JavaScript or VBScript to validate form-field information without having to submit a page and wait for the result.
    [Read This Article]  [Top]
    Nov 6, 2001 - Writing Your Own Script File to Migrate a Database
    Learn how to write a script file using SQL Server's Bulk Copy Program for easy and speedy database migration.
    [Read This Article]  [Top]
    Sep 5, 2001 - Firing Events in a Shared Hosting Environment
    Firing events on a Web server is an easy task. However most of the easy solutions require you to have your own dedicated IIS or SQL Server on the Internet to play with, a privilege not shared by many. In this article, Matthew Muller shows you how to get the same functionality in a shared hosting environment.
    [Read This Article]  [Top]
    Jun 26, 2001 - Dynamically Changing Static Web Galleries
    Web galleries are an easy way to add interactivity and content to your Web site. However, how do you keep the Web galleries consistent with your site and how do you overcome the deficiencies of your Web gallery creation tool? John Sorensen explains a simple way to do both.
    [Read This Article]  [Top]
    Jun 8, 2001 - Implementing Dynamic Arrays of Objects
    Using classes in ASP 3.0 we can create dynamic arrays of objects. Donnell DeLeon Smith's article also shows how we can implement a class of dynamic arrays of objects several layers deep, if required.
    [Read This Article]  [Top]
    Mar 27, 2001 - Using ASP to Send a Wireless Text Message
    Even though SMS is now in high gear, developers remain slated with restrictive limits to carrier resources. Sending an SMS message via e-mail requires the acceptance of several hidden flaws. Joe Lauer shows how to avoid these complications by sending a wireless text-message through the use of ASP.
    [Read This Article]  [Top]
    Sep 20, 2000 - How to Display File ACLs on Your Web Page without Active Directory
    Thought displaying file ACLs on a Web page in a browser was impossible without Active Directory installed? Think again. Through a patchwork of technologies, Larry Schwartz proves otherwise.
    [Read This Article]  [Top]
    Aug 11, 2000 - Servers-Side Validations on the Client Side
    Servers-side validations on the client side...isn't that an oxymoron? Maybe, but Pandurang Nayak shows us how to accomplish a type of remote scripting using a mix of Javascript and ASP.
    [Read This Article]  [Top]
    Aug 3, 2000 - Recursive Functions
    A function that calls itself repeatedly, satisfying some condition is called a Recursive Function. Using recursion, we split a complex problem into its single simplest case. The recursive function only knows how to solve that simplest case. You'll see the difference between solving a problem iteratively and recursively later.
    [Read This Article]  [Top]
    Jul 27, 2000 - Effect of Using Multiple Scripting Languages in ASP
    Do you know what happens when you use multiple languages within your ASP page? Gopikrishna S throws light on how an ASP page behaves when multiple languages are used for server side scripting.
    [Read This Article]  [Top]
    Mailing List
    Want to receive email when the next article is published? Just Click Here to sign up.

    Support the Active Server Industry



    JupiterOnlineMedia

    internet.comearthweb.comDevx.commediabistro.comGraphics.com

    Search:

    Jupitermedia Corporation has two divisions: Jupiterimages and JupiterOnlineMedia

    Jupitermedia Corporate Info


    Legal Notices, Licensing, Reprints, & Permissions, Privacy Policy.

    Advertise | Newsletters | Tech Jobs | Shopping | E-mail Offers