Merry Christmas! 🎄🎅🎁
When working with large data sets, exporting data to Excel files often becomes challenging due to memory constraints. Recently, I undertook the task of optimizing memory consumption for this operation and successfully reduced the memory usage from over 50,000 MB to less than 500 KB. This article outlines the approach, the key improvements I implemented, benchmark results to support these claims, and the solution's limitations.
Libraries Used Initially
OpenXML SDK
OpenXml is a popular cross-platform library for creating and editing Excel files programmatically.
My initial code looked like this.
First of all, let's install the OpenXML NuGet package:
The usage of OpenXML SDK for exporting data to Excel is shown below:
While powerful and flexible, it allocates memory for each cell into memory, leading to high memory consumption for large datasets.
Other XLSX Libraries
Additional libraries such as
MiniExcel
andClosedXML
were tested to save models into XLSX format. Still, similar memory inefficiencies were observed, especially for operations requiring extensive formatting or handling large datasets.
My Solution: A Custom Library
To address this issue, I decided to develop a custom library tailored to the specific needs of the operation. The primary goals were:
Efficient Memory Management: Avoid unnecessary object creation and keep memory overhead minimal.
Optimized Data Streaming: This method handles data in small chunks rather than loading the entire dataset into memory.
Direct File Manipulation: Minimize intermediate processing by directly interacting with the file structure.
Excel File Structure
Before looking at the code changes, let's understand the Excel file structure. An Excel file is essentially a ZIP archive containing various XML files organized into a specific structure. Key components include:
Workbook: The container for all sheets in the file.
Worksheets: Individual sheets containing rows and columns of data.
Shared Strings: A table for storing reusable text strings to reduce duplication.
Styles: Definitions for formatting, including fonts, colors, and borders.
Metadata Files: Files like
[Content_Types].xml
andapp.xml
that store information about the workbook and its structure.
CustomExcelService
So the custom Excel service has the following code:
Key Improvements
1. Streaming Data Directly to Disk
The main bottleneck with existing libraries was their approach of loading the entire dataset into memory before writing to disk. In my custom implementation:
Data rows are written incrementally to the file, ensuring that memory usage stays consistent regardless of the dataset size.
This approach is particularly effective for large files as it avoids keeping a full in-memory representation of the file.
2. Avoiding Unnecessary Formatting
Many third-party libraries include extensive formatting capabilities by default. While this is convenient, it adds significant memory overhead. My library:
Strips down formatting operations to only what’s necessary.
Uses lightweight mechanisms for applying styles, avoiding duplication and excessive data structures.
3. Efficient Object Handling
Object creation is expensive in terms of memory. To reduce this overhead:
I reused objects where possible, such as cell definitions and styles.
Leveraged pooled memory for repetitive tasks, minimizing garbage collection pressure.
4. Simplified API
Focusing on simplicity allowed me to remove unneeded abstractions:
The API directly maps dataset rows to Excel rows, avoiding complex intermediate transformations.
This reduced not only memory usage but also the overall execution time.
Benchmarks
To validate the improvements and ensure accuracy, I used the Benchmark.NET library. Below are the results for exporting a dataset:
The benchmark for exporting IEnumerable<IEnumerable<string>>
(N represents number of columns and rows):
Method | N | Mean | Error | StdDev | Median | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|
RunOpenXmlService | 10 | 1.736 ms | 0.0647 ms | 0.1898 ms | 1.747 ms | 1.39 | 0.20 | 23.4375 | 3.9063 | - | 144.66 KB | 3.20 |
RunClosedXmlService | 10 | 2.862 ms | 0.0559 ms | 0.0903 ms | 2.868 ms | 2.29 | 0.22 | 78.1250 | 23.4375 | - | 487.52 KB | 10.78 |
RunCustomExcelService | 10 | 1.260 ms | 0.0387 ms | 0.1142 ms | 1.260 ms | 1.01 | 0.13 | 5.8594 | 3.9063 | - | 45.24 KB | 1.00 |
RunMiniExcelService | 10 | 8.661 ms | 0.3292 ms | 0.9706 ms | 8.702 ms | 6.93 | 1.00 | 2046.8750 | 2015.6250 | 2000.0000 | 18216.76 KB | 402.69 |
RunOpenXmlService | 1000 | 2,375.323 ms | 75.0723 ms | 221.3524 ms | 2,347.398 ms | 6.44 | 1.29 | 73000.0000 | 38000.0000 | 3000.0000 | 528389.66 KB | 6,262.69 |
RunClosedXmlService | 1000 | 3,229.506 ms | 119.1534 ms | 347.5761 ms | 3,215.195 ms | 8.75 | 1.82 | 157000.0000 | 7000.0000 | 3000.0000 | 1299735.07 KB | 15,404.98 |
RunCustomExcelService | 1000 | 382.250 ms | 26.5685 ms | 77.0800 ms | 360.150 ms | 1.04 | 0.28 | - | - | - | 84.37 KB | 1.00 |
RunMiniExcelService | 1000 | 1,155.879 ms | 71.9980 ms | 201.8902 ms | 1,129.137 ms | 3.13 | 0.78 | 81000.0000 | 5000.0000 | 1000.0000 | 509714.45 KB | 6,041.34 |
The benchmark for exporting IEnumerable<CustomType>
(N represents number of rows, Columns=10):
Method | N | Mean | Error | StdDev | Median | Ratio | RatioSD | Gen0 | Gen1 | Gen2 | Allocated | Alloc Ratio |
---|---|---|---|---|---|---|---|---|---|---|---|---|
RunOpenXmlService | 100 | 4.520 ms | 0.2621 ms | 0.7727 ms | 4.738 ms | 1.71 | 0.31 | 93.7500 | 46.8750 | - | 639.57 KB | 14.26 |
RunClosedXmlService | 100 | 11.297 ms | 0.2786 ms | 0.7721 ms | 11.450 ms | 4.28 | 0.41 | 250.0000 | 125.0000 | - | 1779.98 KB | 39.69 |
RunCustomExcelService | 100 | 2.651 ms | 0.0640 ms | 0.1866 ms | 2.610 ms | 1.00 | 0.10 | - | - | - | 44.85 KB | 1.00 |
RunMiniExcelService | 100 | 8.700 ms | 0.3912 ms | 1.1535 ms | 8.881 ms | 3.30 | 0.49 | 1859.3750 | 1796.8750 | 1796.8750 | 18451.16 KB | 411.39 |
RunOpenXmlService | 1000 | 38.818 ms | 0.5216 ms | 0.4879 ms | 38.665 ms | 4.05 | 0.95 | 923.0769 | 615.3846 | 384.6154 | 6172.39 KB | 137.61 |
RunClosedXmlService | 1000 | 60.844 ms | 4.1363 ms | 12.1961 ms | 60.364 ms | 6.35 | 1.97 | 2333.3333 | 1333.3333 | 666.6667 | 14444.3 KB | 322.03 |
RunCustomExcelService | 1000 | 9.922 ms | 0.5068 ms | 1.4458 ms | 10.389 ms | 1.03 | 0.29 | - | - | - | 44.85 KB | 1.00 |
RunMiniExcelService | 1000 | 15.503 ms | 1.6068 ms | 4.7378 ms | 13.453 ms | 1.62 | 0.63 | 2312.5000 | 1765.6250 | 1703.1250 | 21805.59 KB | 486.15 |
RunOpenXmlService | 10000 | 373.718 ms | 38.0834 ms | 112.2899 ms | 335.195 ms | 4.56 | 1.37 | 8666.6667 | 5000.0000 | 1333.3333 | 57846.68 KB | 1,255.46 |
RunClosedXmlService | 10000 | 672.539 ms | 23.8740 ms | 70.3930 ms | 700.251 ms | 8.21 | 0.89 | 18000.0000 | 5000.0000 | 3000.0000 | 140001.6 KB | 3,038.48 |
RunCustomExcelService | 10000 | 82.005 ms | 1.6256 ms | 2.3828 ms | 82.340 ms | 1.00 | 0.04 | - | - | - | 46.08 KB | 1.00 |
RunMiniExcelService | 10000 | 128.592 ms | 2.4406 ms | 3.1735 ms | 129.289 ms | 1.57 | 0.06 | 7666.6667 | 1666.6667 | 1666.6667 | 55419.23 KB | 1,202.77 |
The benchmark for exporting IEnumerable<IEnumerable<string>>
(IterationCount=5, 10_000 columns and 10_000 rows):
Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|
RunCustomExcelService | 44.770 s | 40.2934 s | 10.4641 s | - | - | - | 444.42 KB |
RunMiniExcelService | 151.494 s | 55.1570 s | 14.3241 s | 8545000.0000 | 183000.0000 | 1000.0000 | 52371165.3 KB |
The benchmark for exporting IEnumerable<CustomType>
(IterationCount=5, 10 columns, 1_000_000 rows):
Method | Mean | Error | StdDev | Gen0 | Gen1 | Gen2 | Allocated |
---|---|---|---|---|---|---|---|
RunCustomExcelServiceType | 3.464 s | 0.7515 s | 0.1163 s | - | - | - | 53.14 KB |
RunMiniExcelServiceType | 4.385 s | 0.8925 s | 0.2318 s | 648000.0000 | - | - | 3991855.36 KB |
Library code and benchmarks can be found on GitHub.
Limitations of the Custom Library
While the custom library is optimized for memory usage and performance, it has some limitations:
Limited Formatting Support:
- Advanced styles, such as conditional formatting and rich text, are not fully supported.
- Style definitions are simplified to reduce memory overhead.No Formula Evaluation:
- Formulas can be written to the file, but the library does not calculate or validate them.
Conclusion
Focusing on targeted optimizations like data streaming, reduced formatting, and efficient object handling, I successfully developed a solution that significantly reduces memory usage for exporting large datasets to Excel. This approach benefits applications running in constrained environments or handling large data volumes.
The benchmark results demonstrate the effectiveness of the custom library. However, it is important to consider its limitations, especially if advanced Excel features or extensive formatting is required. For most use cases, the trade-off between simplicity and performance makes this solution an excellent choice.
If you’re dealing with similar challenges, consider analyzing your current library’s memory footprint and evaluating whether a custom implementation might better suit your needs. The investment in time and effort can pay off significantly in terms of performance and resource efficiency.
Happy holidays!