One of the areas I have spoken about extensively at conferences and cover in our training classes is the unique issues associated with backing up and managing your ADCS Certificate Authority. There are several items I would like to address in this two-part series:
- CA Database and log file structure
- Unique issues with VM Snapshots with ADCS
- CA Private Key backups (and when they aren’t happening)
- Using PowerShell to Backup
April 1, 2020 – The issue described here is applicable to Windows Server 2016 and older. Windows Server 2019 has changed the behavior of log truncation – when a backup is performed, all unused logs are purged and deleted. As a result, the stop/start issue associated with log truncation is not applicable in Server 2019
CA Database Structure
Microsoft ADCS CAs use the Microsoft JET Database to provide the transaction level processing and storage for the CA. Jet uses log files to provide transaction level processing and rollback protection. While Jet has been around for a long time, it’ still used in many Microsoft products such as Active Directory and Exchange. Over the years many customers have asked the product team why they haven’t switched to SQL Server, the answer is inevitably the same “Why? What business need do you have to have us rewrite the entire database interface?” In the end, there has never been a compelling reason to make the change. Jet works fine, but has some unique quirks you may not realize.
You can see in Figure 1 what the database file structure looks like. The default location is %windir%\system32\certlog but is often placed onto a separate partition to isolate the database from the system volume. You will find the database file named with the CA name and an extension of “.edb”. In my example below, my database file is named “Contoso Issuing CA 02.edb” and is 133MB in size. There is a log index file “edb.log” and a series of sequentially named (hexadecimal) log files, mine start at edb0011A below. The syntax starts at edb00001.log and extends to edbfffff.log. Each log file is 1MB in size and when the space is consumed, the next log file is created. With this structure, you can have approximately 1TB of log space – more than enough right?
The unique issue with the way the log files are handled is that the log files are not purged and deleted when a transaction is completed. In addition, the log space storage is pretty inefficient – as it is designed for quick writes. In my lab, a new log file is created almost every 75 certificate requests with a 2K key size. Mathematically you can see there is mismatch in space consumption and enrollments. In addition, working with the CA to revoke certificates, delete items from the database, key archival and more all use working space, so your log files will be consumed for more than just enrollments.
The log files are cleaned/truncated/deleted only when two conditions occur. One, the CA must perform a database backup via GUI, command line, or Volume Shadow Snapshot (VSS). Second, the CA keeps file hooks on any log file that was used or created since the CA service was started. So even if you do a backup, only the log files freed from before the CA service was started will be truncated. So at some point, the CA must release the file handle to any unused log file before it can be deleted. In the examples below, In Figure 2, I perform a CA backup after I stopped and restarted the CA service, the backup completed and you can see that it deleted the files.
I do not believe it is worth stopping and starting your CA service just to perform a backup, as most CAs will reboot occasionally from updates, patches and operational needs. But you should be aware of this requirement.
ADCS and VM Snapshots
So, what is the issue? More and more organizations rely on virtual machine based CAs. This is a concept that wasn’t mainstream early in my days at Microsoft. Even after VMs become common, the problem of this CA backups and log file naming syntax wasn’t seen for many years at Microsoft.
When a virtual machine is backed up (specifically with SAN level snapshots), the OS and associated applications are not aware a backup is being performed. However, some newer products use OS based agents to facilitate backups though – what does yours use?
If you rely on a backup process other than the ADCS GUI, command line (certutil -backupdb) or VSS based solution, a problem may be lurking on your CA. As the log space is consumed and new logfiles are created, your backup process isn’t causing the database “housekeeping” function to occur in ADCS. Eventually your log file indexing will exhaust the index space FFFFF (1,408,575) files available and your CA will cease to function. How long will that take? It depends on the environment and use/transaction requests on your CA. It could be as little as a year or two, or never – but in any case, if left to persist, it will be an issue eventually.
To prevent this issue, you MUST ensure your backup is actually triggering the CA backup “housekeeping”. Check what log files are present before and after your backup to verify. You can also check the Application event log for the database engine (ESENT) being called for a VSS snapshot. The event source will be ESENT, Event ID 2005 as shown in Figure 3.
If the VSS service isn’t being called and your log files are persisting, then you can implement a local script to run on a semi regular basis to perform a scheduled backup of the CA database to a local folder. Your existing backup solution can still be used, and you can consider this occasional backup script to be for “housekeeping” purposes. I will be covering this in the second part of this blog and how you can use a PowerShell script to perform this backup.
CA Private Key Backup – Server 2008 & Server 2008 R2
Lastly, one other item about backups, if you are still running your CAs on Server 2008 or Server 2008 R2 and are NOT using a hardware crypto device for your CA key (Hardware Security Module) then you should be aware of a bug in those two OSes that will prevent VSS based backups from backing up your CA private key. Without this private key, you will NOT be able to recover your CA from a failure involving loss of the key. There are hotfixes available for both OSes – check out hotfix 2603469 listed on the PKI Solutions ADCS Hotfix Digest. Without this fix, you may be incorrectly assuming your backup of the CA is sufficient to recover from a failure.