|
download source code
Introduction
How would you like to enable searching of your SQL server data using Google syntax instead
of using Transact-SQL? Because of the commonality of the Google Search syntax, web users
find it simple to allow access to construct a Google's search syntax query. This article
will show how to implement a .NET class library to query into a SQL Server using Google
syntax for simple queries.
This article will walk you through installing and configuring of a full-text search on a
table then creating and using a web-page to search that table using Google-style syntax.
Microsoft Full-Text Search
Microsoft has added Full-Text Search to SQL server to help with complex queries. In order
for Full-Text Search to work, you need to have Full-Text Search installed from SQL Server.
Full-Text Search is an installation option of the Database Engine Install, when you install SQL Server.
Examples of Search Differences
One word queries are simple in both Google and T-SQL, however start to use multiple word
queries and negation and the languages differ greatly. Google search syntax makes writing
queries easier for the average user. If you want to find any information about dogs in
Alaska, just type in "dogs Alaska". Google assumes that you don't mean a literal
phrase "dogs Alaska" but instead assumes you mean information about dogs AND Alaska.
SQL Server doesn't make any assumptions for the average user but expects you to use Transact
SQL to say exactly what you want. The following table illustrates the differences with two
search queries.
| What to search for |
Google Syntax |
Transact SQL |
| All items with vacation and Hawaii. |
Vacation Hawaii |
Select * from table where contains (column,'vacation and hawaii')
Or
Select * from table where contains (column,'vacation') AND contains (column,'hawaii')
|
| All items with either vacation or Hawaii but not both. |
Vacation OR Hawaii |
Select * from table where contains (column,'vacation or hawaii') |
It's important to notice that the single quote used in the Transact SQL doesn't mean a
literal phrase but is used by the Full-text search engine. Literal searches in the
Full-Text Search engine would use double quotes to indicate a literal search.
For example: select * from table where contains (column, '"vacation and Hawaii"').
The Contains Keyword
The CONTAINS keyword is used in order to allow for precise as well as fuzzy matches on
a column with Full-Text Search. This is similar to the LIKE keyword but applies some
logic to find allow for prefixes, nearness, inflection, and synonyms. The LIKE
keyword doesn't allow as much ambiguous matches but is more a literal match with
wildcard characters to allow for patterns.
Example Database and Table
This article uses a T-SQL script to create all objects used in this article.
You may need to download this script or follow along with your own Database.
Most actions can be taken in the SQL Server Manager but the T-SQL code is
provider for quicker and scripted implementation.
-
First we create a database to play in and use:
CREATE DATABASE dbGoogleToSQL
USE dbGoogleToSQL
-
Then turn on full-text search (assuming it is installed)
and create a full-text search catalog
-- turn on full-text search
Sp_fulltext_database 'enable'
-- create default catalog named ftCatalogGoogleToSQL
CREATE FULLTEXT CATALOG ftCatalogGoogleToSQL AS DEFAULT
-
Create Table with 2 columns: first is id and second is
searchable text column. The searchable column must be not null.
CREATE TABLE tblFullSearch(
colId uniqueidentifier NOT NULL DEFAULT (newid()),
colText text COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
CONSTRAINT pk_tblFullSearch PRIMARY KEY CLUSTERED
)
-
Populate the table with data.
-- row 1
Insert into tblFullSearch (colText) values ('All good things come to those who code')
-- row 2
Insert into tblFullSearch (colText) values ('I would like to take a vacation')
-- row 3
Insert into tblFullSearch (colText) values ('I would like to take a "vacation and Hawaii"')
-- row 4
Insert into tblFullSearch (colText) values ('I would like to take a ''vacation and Hawaii')
-- row 5
Insert into tblFullSearch (colText) values ('I would like to take a vacation and go to the great, wonderful, interesting island of Hawaii')
-- row 6
Insert into tblFullSearch (colText) values ('Is Maui a nice island?')
-- row 7
Insert into tblFullSearch (colText) values ('Is Maui an island on Hawaii?')
-- row 8
Insert into tblFullSearch (colText) values ('Does ''Maui or Hawaii'' return any rows')
-- row 9
Insert into tblFullSearch (colText) values ('How much sand is in the great state of Hawaii?')
-
Create Full-text index on catalog and populate. You need to populate each data change.
--create index
CREATE FULLTEXT INDEX ON tblFullSearch (colText) KEY INDEX PK_tblFullSearch ON ftCatalogGoogleToSql
--populate index
sp_fulltext_table 'tblFullSearch','start_full'
-
Test index for completion. This table has very few rows so it should be
done indexing quickly but larger tables may take some time. Result set
should be total row count in this example. The result should be 9.
SELECT FULLTEXTCATALOGPROPERTY('ftCatalogGoogleToSql','ItemCount')
-
Test with CONTAINS query. Notice that 3 rows are returned: 1,2,3. Row 3
has both vacation and Hawaii but not in the literal form of 'vacation and hawaii'.
Select * from table where contains (colText,'vacation and hawaii')
Now that the database is ready to use, we need an interface to test out the GoogleToSql
class library. This library takes a Google-style query and column name and returns the
CONTAINS clause for that query.
Google has a great cheat sheet describing their query syntax found at
http://www.google.com/help/cheatsheet.html.
GoldParser and Calitha Parsing
In order to parse the Google syntax and turn it into SQL syntax, the GoogleToSql assembly
uses the GoldParser engine with the Calitha C# engine. The GoldParser lets you define a
language via a Backus-Naur Form (BNF), and compiles that language into a file (.cgt) that
the Calitha engine can use. I have already written the abbreviated version of the Google
Syntax in the BNF and compiled it into a file to use. The GoogleToSql.cgt file is the BNF
for Google (as found at
http://code.google.com/apis/base/query-lang-spec.html)
in a compiled form.
You need to download The Calitha engine at: http://www.calitha.com.
Make sure the GoldParserEngine.dll and the CalithaLib.dll are in the bin directory for the web site
(which we will create later in the article). The GoogleToSql.cgt file (provided from the article download)
needs to be in the web.config specified location called "CompiledGoldParserFileLocation".
The GoogleToSql class library reads the GoogleToSql.cgt file, and feeds it to the Calthita engine.
The GoogleToSql class library takes in the Google-style query and uses the parsing engine and the
compiled .cgt to understand the parts of the query. For each item in a query (as defined by the BNF),
the Calitha engine throws events that the GoogleToSql class library is listening for and translates
into T-SQL for the CONTAINS phrase. This is all done behind the scenes, all you need to do is
include the right files, and call the methods on GoogleToSql assembly.
The web site code needs to pass in the Google-style query and finish building the Transact SQL
query once it gets the CONTAINS phrase back from the GoogleToSql class library. Then execute the
Transact SQL statement, displaying the results on the web page.
Create a Web Site and Add The 3 DLLs
Create a web site in Visual Studio. Add the three DLLs (GoogleToSql.dll, GoldParserEngine.dll, CalithaLib.dll)
in the web site BIN directory. On the web page, add a textbox for the query, a submit button, a grid view
(or whatever you want to display the results in) as well as a SQLDataSource (or however you want to do
the data access to your SQL server).
The code for my default.aspx.cs file looks like:
using System;
using System.Data;
using System.Configuration;
using System.Collections;
using System.Web;
using System.Web.Security;
using System.Web.UI;
using System.Web.UI.WebControls;
using System.Web.UI.WebControls.WebParts;
using System.Web.UI.HtmlControls;
using GoogleToSql;
namespace GoogleToSqlTestWeb
{
public partial class _Default : System.Web.UI.Page
{
String _searchTerm;
String _containsClause;
String _sqlStatement;
protected void Page_Load(object sender, EventArgs e)
{
}
protected void Button1_Click(object sender, EventArgs e)
{
// get Google style query
_searchTerm = this.TextBox1.Text;
if (_searchTerm.Length == 0)
return;
_searchTerm = this.TextBox1.Text.ToLower();
// create new GoogleToSql object
GoogleToSql.ContainsClause _myGoogleToWebConverter = new GoogleToSql.ContainsClause();
if (_myGoogleToWebConverter == null)
return;
// get contains clause from object
_containsClause = _myGoogleToWebConverter.Convert(_searchTerm);
if (_containsClause.Length == 0)
return;
_containsClause = _containsClause.Replace("ccc", "colText");
_sqlStatement = "select * from tblFullSearch where " + _containsClause;
this.SqlDataSource1.SelectCommand = _sqlStatement;
this.GridView1.DataBind();
}
}
}
The .Convert method is the call into the GoogleToSql class library. The phrase
"vacation Hawaii" is parsed using Calitha events and transformed into
"vacation and Hawaii". Use the phrase "vacation hawaii" and
notice that you get three results. The bottom row has vacation and Hawaii but not literally together.
Deleting Database Objects
The T-SQL code for deleting objects created in this article follows:
-- drop index
drop FULLTEXT INDEX ON tblFullSearch
-- drop FULLTEXT CATALOG ftCatalogGoogleToSql
drop FULLTEXT CATALOG ftCatalogGoogleToSql
-- drop table
drop table tblFullSearch
-- switch databases
use model
-- drop database
drop database dbGoogleToSql
Summary
Once you have added the 2 supporting class libraries of GoldParser and Calitha, you can use the
GoogleToSql library to use Google sytle search terms for your SQL Server database. In my next
article I will look at the internals to the GoogleToSql class library, discussing how the
language parsing works, and giving you the code to expand the google syntax or customize
it to your needs.
Required Files
Required 3rd Party Files:
Required Files Provided By Author:
- GoogleToSql.dll
- GoogleToSql.cgt
Web Site Files Provided By Author:
- Default.aspx
- Default.aspx.cs
- Default.aspx.designer.cs
- TestWeb.csproj
- Web.config
- CreateDB.sql
Zip File Containing Files Listed Above: 070719.zip (2 MB)
|