|
Introduction
The adoption of XML as the native file format for Excel 2007 has opened up a
whole host of new possibilities. Now for the first time we can create native
Excel files on the server using standard tools. No longer is there a need to
pollute your beautiful server room with a headless PC running Excel and an old
VB6 app that uses OLE Automation to create reports. Such things can be consigned
to the trash heap of IT history.
However, the new
Office Open XML
standard is so new there are precious few good code samples. This
article aims to correct the situation by providing a flexible set of open source
class libraries that you can use to create Excel spreadsheets on the server.
Why Create Excel Spreadsheets on the Server?
Excel has long been recognized as the de facto standard when it
comes to presenting management reports. The unique combination of great
calculation engine, excellent charting facilities and the possibility to perform
"what if" analysis, make it the "must have" business intelligence tool.
So when we came to replace our aging management reporting infrastructure, we
set one key requirement: the new system must be Web-based and provide a
"download in Excel" option. For our business intelligence project we built the
data warehouse using SQL Server 2005 populated from our PeopleSoft and Novient
implementations using SQL Server 2005 Integration Services (SSIS). The OLAP cube
was built on SQL Server 2005 Analysis Services (SSAS). SQL Server 2005 Reporting
Services (SSRS) provides the Web-based access to management reports and the all
important "download in Excel" option. So why do we need to create Excel on the
server?
The problem with SQL Server 2005 Reporting Services is that it the Excel
spreadsheets it generates are "dumb". They contain no formula - just the raw
data. So the recipient cannot perform a "what if" analysis by changing a few
values and expecting the spreadsheet to recalculate.
We considered a number of ways to overcome this issue, but by far the most
attractive is to create the Excel spreadsheet on the server, straight from the
OLAP cube data. So we created a web-part for SharePoint Server 2007 so the user
could enter their criteria and view the report on-line via Excel Services. Of
course users can download the file for off-line viewing in Excel 2007 or even
Excel 2003 file format. This SharePoint web-part and its associated web service
that does the file format conversion will be the topic of another article.
The Open Source ExcelPackage Assembly
The ExcelPackage assembly is a set of classes and wrappers around the .NET
3.0 System.IO.Packaging API and the new SpreadsheetML file format.
It extracts away the complexity of dealing with the individual XML components
that make up the new Excel 2007 file format. The classes are published as an
assembly called ExcelPackage which you can install in the GAC and use as the
basis of your own applications. In the sprit of open source projects, if you
wish to help extend the functionality offered by the ExcelPackage assembly then
join the team over at the ExcelPackage Open XML project.
Creating an Excel Spreadsheet from Scratch
Sample 1 shows how to create a new Excel spreadsheet containing some basic
data and calculations. So let's see how this is achieved.
using OfficeOpenXml;
...
FileInfo newFile = new FileInfo(@"C:\mynewfile.xlsx");
using (ExcelPackage xlPackage = new ExcelPackage(newFile)) { ... }
This creates a new instance of the all important ExcelPackage
class which gives you access to the Excel workbook and worksheets.
If mynewfile.xlsx already exists, then ExcelPackage will open the
existing file. Otherwise mynewfile.xlsx will be created from
scratch.
Let's start by adding a new worksheet called "Tinned Goods" and adding some
basic data and a simple calculation:
ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets.Add("Tinned Goods");
worksheet.Cell(1, 1).Value = "Product";
...
worksheet.Cell(4, 1).Value = "Peas";
worksheet.Cell(5, 1).Value = "Total";
worksheet.Cell(1, 2).Value = "Tins Sold";
ExcelCell cell = worksheet.Cell(2, 2);
cell.Value = "15";
string calcStartAddress = cell.CellAddress;
worksheet.Cell(3, 2).Value = "32";
...
worksheet.Cell(5, 2).Formula = string.Format("SUM({0}:{1})", calcStartAddress, calcEndAddress);
If all this seems a bit too easy - well yes it is! The
ExcelPackage assembly does all the hard work of creating the XML
elements that are needed to represent an Excel worksheet, the Excel rows, the
Excel cells etc. All you need to do is connect in the data!
The ExcelWorksheet class has all the properties and methods needed
to create and manipulate worksheets. A number of supporting classes (such
as ExcelCell, ExcelRow, ExcelColumn,
ExcelHeaderFooter etc.) provide properties and methods of each
worksheet component. They also provide helper functions that make it
easy to manipulate Excel data. For example, the
ExcelCell.GetCellAddress(iRow, iColumn) method turns your row and
column integers into Excel-style cell addresses.
Ok, so in our sample some of the data is too wide for the column, so let's
change the column size:
worksheet.Column(1).Width = 15;
Next, add some headers and footers to the spreadsheet. Note how we use the
PageNumber and NumberOfPages constants to insert codes into the footer text.
This causes Excel to insert the page number and the number of pages in the
document footer. worksheet.HeaderFooter.oddHeader.CenteredText = "Tinned Goods
Sales";
worksheet.HeaderFooter.oddFooter.RightAlignedText =
string.Format("Page {0} of {1}", ExcelHeaderFooter.PageNumber,
ExcelHeaderFooter.NumberOfPages);
OK, so let's write some real hard code. Let's insert a line into the
worksheet so we can add some more data. This will screw up our formula as it
will be referencing the wrong set of rows (i.e. the new row will not be included
in the total). worksheet.InsertRow(3);
Well hell no, the formula is correct. The InsertRow method not
only updates all the row and cell references in the underlying XML, but also
updates all the formulas in the spreadsheet! Ok, we now have our report, but we
want to ensure our corporate search engine can find the file later. So let's add
some standard and custom document properties.
xlPackage.Workbook.Properties.Title = "Sample 1";
xlPackage.Workbook.Properties.Author = "John Tunnicliffe";
xlPackage.Workbook.Properties.SetCustomPropertyValue("EmployeeID", "1147");
Now save the file and all its components.
xlPackage.Save();
Below is a screenshot of the final output showing the header and the document
properties.
Reading Data from an Excel Spreadsheet
Sample 2 shows how to read data from an existing Excel spreadsheet. We
will use the spreadsheet generated by Sample 1 as the source document. To
output the contents of column 2 to the console, this is all we need:
using (ExcelPackage xlPackage = new ExcelPackage(existingFile))
{
ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets[1];
int iCol = 2;
for (int iRow = 1; iRow < 6; iRow++)
Console.WriteLine("Cell({0},{1}).Value={2}", iRow, iCol, worksheet.Cell(iRow, iCol).Value);
Console.WriteLine("Cell({0},{1}).Formula={2}", 6, iCol,
worksheet.Cell(6, iCol).Formula);
}
Using a Template to Create an Excel Spreadsheet
Sample 3 shows how to create a new Excel spreadsheet based on an existing
file and populate it with data from a database. This is a much better
approach as you can quickly create a spreadsheet with the right formula and the
correct corporate 'look and feel' using Excel 2007. You can then have the
calculations in your template validated by the business before starting to write
any code. This whole approach saves a lot of coding time!
Before running the code sample, open the template and take a look at its
content. You will see it already has the desired layout and all the
formula and formatting required for the title and total lines.
However, it only has room for three "data rows" (i.e. rows 5, 6 & 7).
You will see how we cope with this later.
So let's start by creating a new Excel spreadsheet based on a template.
using OfficeOpenXml;
assembly
...
FileInfo newFile = new
FileInfo(@"C:\sample3.xlsx");
FileInfo template = new
FileInfo(@"C:\sample3template.xlsx");
using (ExcelPackage xlPackage = new
ExcelPackage(newFile, template)) {...}
Behind the scenes, the ExcelPackage constructor simply copies the template
and opens the new package. Now obtain a reference to the existing
worksheet and initialize some variables: ExcelWorksheet worksheet = xlPackage.Workbook.Worksheets["Sales"];
ExcelCell cell;
const int startRow = 5;
int row = startRow;
Next open a connection to the database and run the query. This example uses
data from the AdventureWorks sample database, so you will need this installed if
you want to run the sample.
while (sqlReader.Read())
{
int col = 1;
if (row > startRow) worksheet.InsertRow(row);
for (int i = 0; i < sqlReader.FieldCount; i++)
{
if (sqlReader.GetName(i) == "EmailAddress")
{
string hyperlink = "mailto:" + sqlReader.GetValue(i).ToString();
worksheet.Cell(row, 1).Hyperlink = new Uri(hyperlink, UriKind.Absolute);
}
else
{
if (sqlReader.GetValue(i) != null)
worksheet.Cell(row, col).Value = sqlReader.GetValue(i).ToString();
col++;
}
}
row++;
}
So now we have filled our worksheet with the entire dataset. Note how
we use the email address as a hyperlink. Using hyperlinks is useful when
you want to link one report up with another.
The purist among you will notice that all the data is written into the cell
as a string. However, the Cell(row, col).Value = "xxx"
property assignment code checks if the value is a number or a string and
sets the cell's data type accordingly.
As mentioned earlier, the template only has room for three data rows.
We cope with this by simply inserting rows into the template - thereby pushing
the "Total" row down the sheet. The InsertRow method
automatically updates the formula in the "Total" row so that they take into
account the extra rows.
As we have inserted a whole set of rows into the spreadsheet, they will not
have the correct style. We correct this by simply iterating through the
new rows and copying the style from the first row to all the other rows.
for (int iCol = 1; iCol <= 7; iCol++)
{
cell = worksheet.Cell(startRow, iCol);
for (int iRow = startRow; iRow <= row; iRow++)
{
worksheet.Cell(iRow, iCol).StyleID = cell.StyleID;
}
}
The Power of Named Styles
Anyone familiar with styling HTML with CSS will understand the power and
flexibility of using named styles rather than updating the style of every
individual element. With named styles, the look and feel of the whole
spreadsheet can be altered by changing one style definition. This
capability was introduced in Excel 2003, but Excel 2007 goes one step further
and makes it a dream to create a template using named styles.
We apply two built-in named styles to highlight the top achieving and the
worst performing sales reps.
worksheet.Cell(startRow, 6).Style = "Good";
worksheet.Cell(row, 6).Style = "Bad";
The biggest problem with named styles in Excel 2007 is that if they are not
used in your template, then Excel strips out the definition when the file is
saved. This is a real headache. There are two ways to cope with this
(1) add extra rows that have styles applied and delete them later (which is the
technique used in this sample) or (2) load your own style.xml file which
contains all the definitions you want to use.
Shared Formula
Excel 2007 has a neat feature which saves a lot of coding when it comes to
applying the same formula to a range of cells. A formula in one cell can
be marked as "shared" and all cells referenced by the shared formula obtain
their own version of the formula. So if cell E5 has the
formula D5*12, then cell E6 would have the formula
D6*12 etc. etc. To set up a shared formula simply call the
CreateSharedFormula method. In the following example, the formula
in cell E5 is marked as "shared" and all the other cells in the
range E5:E21 are assigned their own variation of the formula.
worksheet.CreateSharedFormula(worksheet.Cell(5, 5), worksheet.Cell(21, 5));
Ensuring Formula are Recalculated on File-Open
One problem we came across with Excel 2007 is that it does not automatically
re-calculate the spreadsheet when it is re-opened - even when the Calculate
option set to automatic! This is because the existing cells in the
template have both a formula and a value in the cell. So Excel
just assumes the value is correct and does not attempt to re-compute the
formula. Of course, we have just added twenty rows of data and updated the
formula references in the XML - but Excel has no why of knowing this, so assumes
the values must be right!
The only way to force the recalculation is to ensure the cell has no value -
just a formula. So the RemoveValue() method becomes very useful for
all formula in the worksheet. Hence:
worksheet.Cell(22, 5).RemoveValue();
Because of this phenomenon, we changed the ExcelCell.Formula property
assignment code so that it removes the cell's value when you assign the cell a
formula.
The final output of Sample 3 code should look something like this - much more
professional than anything that can be achieved starting from scratch.
Integrity Issues
As soon as you start deleting rows or even worksheets from the package, you
have potential for integrity issues. Your formulas will reference cells
(or worksheets) that no longer exist. The ExcelPackage assembly does a
good job of tidying up after you - but cannot cope with complex
situations. You will soon know if you have an integrity problem - Excel
will complain bitterly when opening the newly created file.
A classic problem is the calcChain.xml file. This tells
Excel in what order the calculations should be processed. So if you delete
a row that is referenced by the calcChain, Excel will complain. However,
if you simply remove the calcChain.xml from the package, Excel
re-creates it when the file is opened - and does not complain! So this is
an easy fix. The ExcelPackage assembly does exactly that - deletes the
calcChain.xml file from the template so that Excel simply
re-creates it when the file is opened.
The ExcelPackage assembly also provides you with direct access to each of the
XML documents that make up the package. So you can write your own code to
manipulate the XML directly. However, if you choose to do this, be careful
to ensure the XML conforms to the new Office Open XML standard. Otherwise,
Excel will simply strip out your all hard work as "badly formed".
Debugging your Application
If you want to understand exactly what is been written into each of the
component XML files by the ExcelPackage assembly, then simply add the following
line of code:
xlPackage.DebugMode = true;
This will cause the assembly to output the raw XML files in the same location
as the output file. You will see a sub-folder called 'xl' and another
callled 'docProps'.
Summary
This article has demonstrated just how easy it is to create Excel-based
reports on the server using the open source ExcelPackage assembly. We hope
you will be able to join us in extending the functionality of the assembly over
at the ExcelPackage Open XML project.
There is plenty still to do; charting, conditional formatting, inserting comments, to name just a few!
Good luck with your project!
Dr. John Tunnicliffe
|