Creating and Using the DataTableReader Class
One of the really great new features in
version 2.0 of ADO.NET is the new class named DataTableReader. You can use it to stream
data out of a DataSet or a stand-alone DataTable instance, just like you stream data from a relational database using a DataReader instance such as SqlDataReader or OleDbDataReader.
The DataTableReader exposes much the same set of methods and properties as the OleDbDataReader class (it does not expose the set of GetSqlxxx methods that are
available in SqlDataReader because it cannot used against a SQL Server database).
The methods available for DataTableReader include Read, NextResult, Close, GetName, GetOrdinal, and
the full set of methods to retrieve values as specific types - such as GetInt32, GetString, GetByte etc. Properties
include Depth, FieldCount and HasRows.
And you can read the values in each row using the Item collection, or by indexing the row
on the column name or the ordinal position - just as you do with a SqlDataReader.
The DataTableReader is a "live" reader, in that it reflects the values of the rows in the
table at the time the row is read, and not at the time the reader is opened (in
other words it is not operating over a snapshot of the table, but is accessing
the rows in the table directly when you call the Read method).
You can get an instance of a DataTableReader by calling the GetDataReader method (in Beta 1) or the CreateDataReader method (in Beta 2) of a DataSet or a single DataTable.
For a DataSet,
in Beta 1, the options are:
myDataTableReader = myDataSet.GetDataReader()
myDataTableReader = myDataSet.GetDataReader(array-of-tables)
In Beta 2 you use:
myDataTableReader =
myDataSet.CreateDataReader()
myDataTableReader =
myDataSet.CreateDataReader(array-of-tables)
If you don't provide a parameter to the GetDataReader/CreateDataReadermethod, it returns a reader over all the tables in this DataSet. Instead, if you only want to read some of the tables, or require
them to be read in a specific order, you can provide an array of DataTable references to get a DataTableReader that
will return a rowset for each of the tables in the array.
For a DataTable,
you just call the GetDataReader/CreateDataReader method with no parameters, to get a DataTableReader that is open over just
the DataTable that this method is called on:
myDataTableReader = myDataTable.GetDataReader()
{Beta 1}
myDataTableReader = myDataTable.CreateDataReader()
{Beta 2}
An alternative approach is to use the constructors of the DataTableReader class to create a reader over a table. You can create a DataTableReader over an existing DataTable using:
DataTableReader myReader = new DataTableReader(myDataTable)
Or you can create a DataTableReader over more than one DataTable by
specifying references to the tables in an array that you pass to the
constructor:
DataTableReader myReader = new
DataTableReader(array-of-tables)
You can open multiple DataTableReader instances over a table at
the same time, and read from them in any order (though they are, like the DataReader,
forward-only readers).
Using the DataTableReader Class to Copy a Table
Another great feature of the DataTableReader class is that it implements
the IDataReader interface. This means that you can use it as the data source to populate a DataSet or DataTable though
the Load method. For example, in Beta 1, you can create a DataTableReader on a DataTable, and
then use it to fill another DataTable like this:
myDataTableReader = myFirstDataTable.GetDataReader()
mySecondDataTable.Load(myDataTableReader)
The code shown next demonstrates this by loading a DataSet with a
table named Customers that contains five rows, and then creating a new DataTable named CustomerCopy within the DataSet. It then
uses the constructor of the DataTableReader class to create a new DataTableReader instance over the Customers table and load these rows into the new CustomerCopy table:
builder = new StringBuilder();
DataSet ds = new DataSet();
// create DataAdapter and get a DataSet
of rows
SqlDataAdapter da = new
SqlDataAdapter(SQL, connect);
da.Fill(ds, "Customers");
// display details of table in DataSet
builder.Append(String.Format("DataSet
contains table '{0}' "
+ "with {1} rows<p
/>",
ds.Tables[0].TableName,
ds.Tables[0].Rows.Count.ToString()));
// add new table to DataSet named
"CustomerCopy"
DataTable dt = new DataTable();
dt.TableName = "CustomerCopy";
ds.Tables.Add(dt);
// create DataTableReader over original
"Customers" table
DataTableReader reader = new
DataTableReader(ds.Tables["Customers"]);
// load new table with rows from original
table
dt.Load(reader);
// display rows in the GridView control
grid1.DataSource = dt;
grid1.DataBind();
Figure 7 shows the results of the code listed above. You can
see that five rows were read from the database into the DataSet, and then these five rows were
copied into the new table named CustomerCopy using a DataTableReader (you can use the [view
source] link at the foot of the page to see
the source code).

Figure 7 - Reading a Table with a DataTableReader and
Loading another DataTable
Using a DataTableReader to Copy Tables between DataSets
You can also use a DataTableReader to copy table rows from
one DataSet to another. For example, you can create a DataTableReader on a table in one DataSet, and then
use it to fill a table in another DataSet, like this:
// create array of source table references
DataTable[] tablearray = new DataTable[3];
tablearray[0] =
myFirstDataSet.Tables["Table"];
tablearray[1] = myFirstDataSet.Tables[1];
tablearray[2] =
myFirstDataSet.Tables["SomeOtherName"];
// get a DataTableReader over these tables
myDataTableReader = myFirstDataSet.GetDataReader(tablearray);
// create array of destination table names
String[] tablenames = new String[] {"Orders",
"Order Details", "Customers"};
// load data into destination tables
mySecondDataSet.Load(myDataTableReader,
LoadOption.OverwriteRow, tablenames);
As an example of this technique, the page shown in Figure 8
uses code similar to that listed above to copy three tables from one DataSet to
another. It fills a DataSet with three tables using the Fill method of a DataAdapter,
and - because the table names are not specified - they are automatically
assigned the names Table, Table1 and Table2. Then the
code creates a DataTableReader over these three tables using the GetDataReader method, and uses this to
load three tables in a new DataSet.

Figure 8 - Copying Rows between DataSets using a
DataTableReader
You can even use this technique to copy rows that are not all
in the same DataSet.
In other words, you can create a DataTableReader that will stream data from tables in
different DataSet instances. For example:
DataTable[] tablearray = new DataTable[3];
tablearray[0] =
myFirstDataSet.Tables["SomeTable"];
tablearray[1] =
mySecondDataSet.Tables["AnotherTable"];
tablearray[2] = myThirdDataSet.Tables[0];
// get a DataTableReader over these tables
myDataTableReader =
myFirstDataSet.GetDataReader(tablearray);
...
Batched Updates from a DataSet or DataTable
When you call the Update method of the version 1.xDataAdapter to
push the changes in a DataSet into the database, it iterates the rows in the DataSet looking for any that have their RowState property
set to Added, Deleted or Modified. For each
one it finds, it executes the command specified for the appropriate InsertCommand, DeleteCommand or UpdateCommand property of the DataAdapter.
This means that each update requires a round-trip to the database.
In version 2.0, you can instruct the DataAdapter to perform the updates in
batches, where the database you are using supports this feature (most do,
including SQL Server 7.0 and above, and Oracle when using the OracleClient classes). All that's required is to set the UpdateBatchSize property of the DataAdapter to an
integer value greater than 1:
myDataAdapter.UpdateBatchSize = batch_size
The default value for this property, if you
do not set it, is 1; and so the Update behaviour defaults to that of
version 1.x. Using the value zero forces all rows in the DataSet to be
processed as a single batch.
The batched update feature works inside any connection-based
transactions that you might have implemented, and dramatically reduces the
number of network round-trips required to perform the updates from the DataSet. And the
great thing is that it is virtually transparent in that, if the database you
are using cannot support batched updates, the DataAdapter automatically reverts to
using individual updates for each changed row.
Handling the RowUpdated Event for Batched Updates
As in version 1.x, the DataAdapter in version 2.0 exposes an event named RowUpdated, which you may find useful when performing batched updates. It is
fired when each batch is completed, and exposes properties that you can use to
get information about the process. Note that you don’t have to handle
this event when performing batched updates - but can be used to implement
process that require access to each batch update if you require this
functionality.
To use the RowUpdated event, you just declare a SqlRowUpdatedEventHandler routine, which receives a reference to a SqlRowUpdatedEventArgs instance that
contains information about the current batch. You can get a count of the number
of rows in the current batch from the RowCount property, and the number of rows
updated from the RecordsAffected property. There is also a property named StatementType that indicates the type of
statement being executed (Update, Insert or Delete), and a
write-only property named Status that you can set to indicate if the process should continue or halt (using a
value from the UpdateStatus enumeration).
The SqlRowUpdatedEventArgs class also exposes references to the current Command, any exceptions that have
occurred (the Errors property), and the table mapping in use by the DataAdapter (the TableMapping property). Finally, the Row property
returns a reference to the last row that was updated. However, one point to
watch out for is that the Row property returns null if there are fewer rows in the batch than you specified for the batch size
(which is likely to be the case for the last update batch).
The following listing shows a simple event handler for the RowUpdated event,
and the way that it is attached to the RowUpdated property of the DataAdapter:
// add event handler for RowUpdated event
da.RowUpdated += new
SqlRowUpdatedEventHandler(RowUpdatedHandler);
...
...
private void RowUpdatedHandler(Object
sender, SqlRowUpdatedEventArgs args)
{
builder.Append(String.Format("RowUpdated event, StatementType =
'{0}'<br />",
args.StatementType.ToString()));
builder.Append(String.Format("Updated {0} row(s)<br />",
args.RecordsAffected.ToString()));
try
{
// need try..catch as row may be null
if this is the last batch
builder.Append(String.Format("OrderID:
{0}, CustomerName: {1}, "
+ " City:
{2}",
args.Row["OrderID"],
args.Row["ShipName"], args.Row["ShipCity"]));
}
catch { }
builder.Append("<p />");
}
Viewing the Effects of Batched Updates
We provide an example page that
demonstrates the effects of setting the UpdateBatchSize property
of the DataAdapter when executing the Update method. Figure 9
shows the page, with the value zero selected for the UpdateBatchSize property. This causes the DataAdapter to attempt
all the updates within a single batch. The RowUpdated event handler
displays the statement type ("Batch"), the number
of rows that were updated within the current batch, and some values from the
last row that was updated. As you can see from Figure 9, all 77 rows stored in the DataSet were updated
within the database as a single batch.

Figure 9 - The Batched Updates Example When the Batch
Size is 1
If you select the value 1 for the UpdateBatchSize property (this is the
default is not specified), the DataAdapter reverts to its version 1.x behavior and
updates each row individually. Figure 10 shows the results, and you can see
that the statement type is now "Update" (rather than "Batch"), and
only one row is updated each time.

Figure 10 - The Result When the Batch Size is 1
Selecting the value 20 for the UpdateBatchSize causes the DataAdapter to
perform the updates in batches of 20, and the RowUpdated event is therefore raised four
times for the 77 rows in our DataSet.
Each time except for the last batch, the values in the last row within the
batch are displayed. However, for the last batch, there is no "current
row" when it completes and so no row values are shown.

Figure 11 - The Result When the Batch Size is 20
Finally, selecting a value greater than the total number of
rows in the DataSet table gives the same behavior as you get when selecting the value zero for the UpdateBatchSize property. The DataSet performs all the updates as a single batch, as you can see in Figure 12. Just
bear in mind the limitations on batch sizes, and so using a specific value is
probably a good idea in Beta 1; rather than using zero and relying on the DataAdapter to
manage the batch size. This may change, however, for the release version of
ADO.NET 2.0.

Figure 12 - The Result When the Batch Size is 100
Performance and Scalability Improvements
The DataSet acts rather like a relational database in the way that it stores and manages
rowsets. OK, so you can't execute SQL queries against the data, but the DataSet does
contain a lot of functionality that comes into play when you add, delete and
edit the data rows it holds. For example, if you have specified a primary key
for a table, the DataSet will not allow you to add new rows that match existing rows on the primary key,
or change the value of the primary key in one row to match that in any other
existing row.
Likewise, if you have a DataRelation defined between two tables
in the DataSet,
you can't (by default) violate referential integrity by adding new child rows
where there is no matching parent row. And (again, by default) if you delete or
edit the primary key value of a parent row, the matching child rows will be
deleted or updated automatically.
All of this means that the DataSet has to manage the rows it
contains. To maximize performance, and avoid operations such as a table scan to
check for existing values in each row, the DataSet maintains a set of internal indexes
on the rows. However, it does seem that Microsoft initially though the number
of rows developers would store in a DataSet would be modest, and so the
internal indexing algorithms are not as efficient as they could be. And, as it
turns out, developers seem to regularly use a DataSet to store huge numbers of rows -
10,000 to 100,000 being quite common. At this point, the process of inserting,
deleting and updating rows can slow down quite noticeably, and cause issues
with application response times and overall throughput.
The Version 2.0 DataSet and DataTable Load API
To solve this in version 2.0, Microsoft has optimized the
internal indexing of the DataSet through a complete re-write of the code that takes a new approach (called
"red/black trees"). The result is that inserting and deleting rows
gives a performance that drops by only around log-n as the number of
rows (n) increases. Meanwhile, performance for updates is virtually
constant irrespective of the number of rows.
OK, for less than around 30,000 rows, the difference in
performance (based on figures produced using the Beta release) is not really
noticeable. However, as the number of stored rows increases, the performance
gains are a lot more obvious. Preliminary figures show a reduction in loading
time of 25% at 30,000 rows and 50% at 50,000 rows.
This gain in performance is implemented through the internal
interfaces of the DataSet and DataTable classes, in what's knows as the "Load API". This API is used by many
methods of the DataSet and DataTable classes, including the Fill and the new Load methods we discussed earlier, and so the performance gains within the API are reflected
in improved performance of these methods as well.
The Version 2.0 DataSet Persistence Formats
It's worth looking briefly at some other changes to the DataSet that
improve performance and usability, but this time when saving or persisting
data. A common scenario for the DataSet is to persist it
across post-backs in an ASP.NET application, and deliver it across the tiers of
an application - or through a Web Service - to another location where it acts
as a data repository. The data it contains must therefore be serializable so
that it can be streamed across a network, stored in a disk file, or simply
persisted in memory.
The DataSet already exposes
methods to serialize the contents. It provides methods that can be used to
extract the data as XML (WriteXml and GetXml), and - because it implements the ISerializable interface - you can use the Serialize method to serialize a DataSet as an object to a Stream.
This allows you to do things like store it in the ASP.NET Session object, or
pass it back as a return value from a Web Service.
However, even if you use a BinaryFormatter and write you own code to
call the Serialize method of a DataSet in version 1.x, the content is always persisted as XML. Now, XML is a
great format for data interchange, but is less than ideal for applications that
simply want to pass data across a network. The XML format is "wordy"
and produces streams and files that are larger than most other data
representational formats.
As a result, binary serialization is now an option in version
2.0 of the DataSet class. The format is different from that used in pre-.NET version of the ADO Recordset, and is
optimized purely for performance with the version 2.0 DataSet. It is a fast to create and read,
and is especially compact, providing huge gains in performance as row counts within
the DataSet increase.
For example, the overall serialization, transmission and reloading sequence of
operations can be 80 times faster for a Dataset containing 100,000 rows. And even
with just a few hundred rows, there is a notable difference - as demonstrated
by the next example.
Activating Binary Persistence Format in a DataSet
To force a DataSet to persist the data in binary format, you just set
the new RemotingFormat property to the appropriate value from the SerializationFormat enumeration. The two
values in the enumeration are Binary and Xml. The following code uses a BinaryFormatter to create a disk file
containing the persisted content of a DataSet - first in XML format and then in
binary format:
// serialize the DataSet to a disk file as
XML
SerializeDataSet(ds,
"xml-from-dataset.xml", SerializationFormat.Xml);
// serialize the DataSet to a disk file in
binary format
SerializeDataSet(ds, "binary-from-dataset.dat",
SerializationFormat.Binary);
It calls a routine named SerializeDataSet twice, once with the value SerializationFormat.Xml and once with the value SerializationFormat.Binary for the third parameter. The first parameter is a reference to a populated DataSet (containing, in this example, around 800 rows) and the second parameter is the
name of a file to persist the DataSet contents to. The SerializeDataSet routine, with the
error-handling code removed for clarity, is shown below:
static void SerializeDataSet(DataSet ds,
String filename,
SerializationFormat format)
{
String path = @"C:\Temp\" +
filename;
// specify the remoting format for the
DataSet
ds.RemotingFormat = format;
// create a BinaryFormatter to serialize
the contents
IFormatter formatter = new
BinaryFormatter();
using (Stream output = new
FileStream(path, FileMode.Create,
FileAccess.Write, FileShare.None))
{
// serialize DataSet, tracking the time
taken
DateTime start = DateTime.Now;
formatter.Serialize(output, ds);
TimeSpan span =
DateTime.Now.Subtract(start);
builder.Append(String.Format("Serialization took {0} milliseconds, ",
span.Milliseconds.ToString()));
// display resulting file size
FileInfo fi = new FileInfo(path);
builder.Append(String.Format("File
size is {0} KBytes.<p />",
(fi.Length
% 1024).ToString()));
output.Close();
}
}
Figure 13 shows the result of running this example. You can
see that serialization in binary format was almost six times faster. And,
examining the resulting files, it's clear that the binary file is also much smaller
- around one quarter the size of the XML-formatted file. When combined with the
reduced network transmission time and the decreased reloading time for the data
when re-creating the DataSet,
you can see that dramatic performance improvements are available when you don't
need the data interchange capabilities of XML.

Figure 13 - Persisting the Content of a DataSet in Binary
Format
Summary
In this, the first of two articles that discuss the
enhancements to the DataSet class in version 2.0 of the .NET Framework, we looked at some new capabilities
when filling a DataSet,
including the new LoadOption enumeration that provides better control over how the incoming values are
handled when the DataSet tables contain rows that match on the primary key - a common scenario when you
refresh and update a DataSet through a DataAdapter.
The DataSet and the new stand-alone DataTable classes both support a new method named Load that can be used to fill tables with
data from a DataReader.
This also uses the new LoadOption enumeration to control the way that matching rows are updated.
The surfacing of the DataTable class as a stand-alone object
that can be used to store and manipulate data without requiring you to
explicitly create a DataSet is also a useful new feature. The DataTable now supports most of the
methods that were previously only available on the DataSet itself, such as reading and
writing XML, merging tables, and more.
Another useful new feature in version 2.0 is the provision
of a new class named DataTableReader,
which can be used to stream data out of a DataTable. As the DataTableReader implements the IDataReader interface, this means that it can also be used - like the other DataReader classes
such as SqlDataReader and OleDbDataReader - as the source for the new Load method.
The DataSet also has several features to improve performance and scalability. Updates to
the data store through the Update method of a DataAdapter can now be performed in batches, instead of individual commands which can eat
up network bandwidth. Plus, there have been huge improvements to the internal
indexing techniques used by the DataSet, and it now also allows a new binary format to be
specified when persisting the DataSet contents to a file or stream. Both of these
features can dramatically improve performance when you use a DataSet that
contains a large number of rows.
In the following article, we continue to examine the new
features of the DataSet,
but this time looking at two fundamental changes that support new features
available in SQL Server 2005. We'll look at how you can use the new XML data
type that SQL Server 2005 supports in a DataSet, and how the DataSet can now
read and manipulate user-defined data types (UDTs), which can also be stored in
and exposed by SQL Server 2005.