Digital preservation is a form of data backup focused on ensuring continued future access to the digital information preserved by the backup process. Whereas a typical backup system is designed to simply duplicate the data so that it may be restored in the event the original data set or host computer is damaged, groups concerned with digital preservation are not just concerned with creating redundant copies of data but ensuring that future generations will be able to access that data.
A differential backup system backs up the changes made to a system between the initial backup and the current backup. Unlike an incremental backup system, wherein only the newest changes from the previous increment are recorded, the differential system backs up everything that has changed from the initial backup on each subsequent backup. One way to visualize this is to think of incremental backup data sets, where A is the initial backup, as A + B + C + D, whereas differential backups are A + B + BC + BCD.
Incremental backups make it feasible to store a higher number of backups in a limited amount of space by building on an initial complete backup. The data set is first completely backed up to the backup medium and then, at the scheduled backup times, the incremental changes to the data set are recorded. If using this backup model, the backup size will be the size of the entire data set plus whatever data has changed during the increment of time between the initial backup and the subsequent backups.
A full backup, or system image, is a backup of a computer system that includes not only user data stored on the machine (such as documents and photographs), but the entire operating system and installed applications. A full backup allows for a complete restoration of the system in the event of catastrophic failure.
Offline backup is a backup that is not accessible to the computer that initiated the backup process after the process is completed. If, for example, you backed up your data to an external hard drive but then removed the external hard drive and stored it, then the backup would be considered offline, as a catastrophe that could destroy the original data set (such as a power surge or malicious software) would not be able to touch the offline backup.
An online backup is a backup that is still accessible to the computer which initiated the backup process. For example, if you back up all your data from your personal computer to an external hard drive that remains attached to your personal computer at all times, it is, in the sense that it exists on two physical mediums, backed up.
A backup is a duplicate copy of a data set or entire hard drive that is stored on a separate storage medium from the source file (or disk). If the original data (say, family photos) is stored on the primary drive of a computer and those photos are copied onto a secondary drive, the data set on the secondary drive is the backup.
Inkjet printing is a process by which text and graphics are applied to a medium, typically paper, via a controlled and cold application of ink through micro-nozzles. Inkjet technology is used in a wide range of applications, including basic consumer home printing of small documents and the production of large banners; while the scale of the printer changes, the fundamental operation remains the same.
Electrostatic digital printing is the process by which laser printers apply images to paper. Inside the laser printer, a laser beam passes over an electrostatically charged drum; this laser beam changes the static charge on the drum to represent a portion of the image (be it text or graphics) sent to the printer. The drum then collects toner from the toner cartridge and the toner clings to the charged portion of the paper. The final step fuses the toner to the paper (if you’ve ever cleared a jam out of a laser printer and the toner dust wiped right off the page, the paper jammed before the fusing process).
A dye-sublimation printer is a printer that uses heat to transfer ink onto the print medium as opposed to inkjet printing which sprays micro droplets of ink onto the surface of the medium. The name is derived from the process itself, during the heating and application process the dye sublimates from a solid material to a gas without transitioning through a solid state.
Hybrid drives are a hybridization of traditional mechanical hard drive technology and more modern solid-state drive technology. The drives combine the large storage capacity of a traditional magnetic hard drive with the speed of a solid-state drive. The drives typically contain a 4-24GB solid-state storage area mated with a 500-1000GB magnetic storage area.
S.M.A.R.T., Self-Monitoring, Analysis, and Reporting Technology, typically written as simply SMART, is a monitoring system built into modern hard drives designed to detect and report a set of indicators which allows the end user to assess the stability of the drive.
Data transfer rate is the measurement that encompasses both the internal transfer speed of a hard drive (movement of data from the disk surface to the disk controller in the drive) and external transfer speed (data movement between the disk controller and the host operating system). The data transfer rate is typically benchmarked and recorded as the slowest of these two numbers in order to represent the real world conditions under which the device transfers data.
Seek time refers to the amount of time it takes a hard drive to respond to a request for a particular piece of data. In traditional magnetic hard drives, the seek time includes both the electronic communication between the operating system, motherboard, and hard drive itself, as well as the physical movement of the components within the hard drive (such as the actuator arm that moves the read/write head). Typical seek times for mechanical hard drives range from 4ms (for high speed server drives) to 15ms (for slower mobile or low-end consumer drives).
The Host Protected Area is a section of a hard disk that has been specially formatted and flagged so that it does not appear to the host operating system. This portion of the hard disk can be used for a variety of purposes including storing hidden data, security software to track stolen laptops, and vendor-specific utilities, but it is most typically used to house recovery software. Many desktop and laptop computers no longer ship with operating system reinstallation/recovery discs, for example, but instead include a large Host Protected Area that houses a recovery program that is accessible from the computer’s BIOS menu.
The last step in preparing a hard disk for use, high-level formatting is the process of setting up an empty file system in a new partition for use by an operating system. For example, when you install a new hard disk into your desktop computer, the final step after creating the partition is to instruct your formatting tool to create a file system (such as NTFS) in the given hard disk so that your operating system can access the drive and use it for storage.
Partitioning is the process of dividing a hard drive into a single logical storage unit or a series thereof. Partitioning allows a single physical disk to be divided into multiple logical disks for various purposes, including the separation of the operating system disk from the data storage disk, the installation of multiple operating systems, or other applications dependent on the division of data.
Low-level formatting is a hardware level process that marks the surface of the disk with a marker indicating the start of a recording block. This block is typically referred to as a sector marker and is referenced by the disk controller in order to read and write data to the disk.
Disk formatting is the process of preparing a disk or other storage medium for use by a particular operating system. The formatting process is typically divided into three distinct operations. First, a low-level format prepares the media for use. Second the media is partitioned with one or more partitions. Third, a high-level format applies a file system (such as FAT32 or NTFS) to the newly created partition(s).
Zero-filling is the process of overwriting data with a series of zeros. You partially or completely zero-fill a given storage container to either overwrite the space where a recently deleted file was on the hard disk or to completely wipe the hard disk and all the files, folders, and other data structures contained therein.
The Gutmann Method is a an algorithm for securely erasing the contents of a computer hard drive. Introduced by Peter Gutmann in 1996, it utilizes a series of 35 patterns to completely and redundantly overwrite the contents of a hard disk. The method, and the white paper in which Gutmann outlined its use, was widely misapplied and misinterpreted–although many people used the full 35-pass technique, Gutmann never intended for the method to be used from start to finish in such a fashion.
DBAN is an acronym for Darik’s Boot and Nuke, an open source project. DBAN is a designed to faciliate simple and secure erase of hard drives so that data is no longer recoverable. It uses random number overwrites and includes scripts for the Gutmann Method, Quick Erase, and Department of Defense approved overwrites (3 and 7 pass).
In cryptography, both analog and digital, a cipher is an algorithm for transforming plaintext to ciphertext (unencrypted to encrypted) and reversing the process. A cipher could be as easy as shifting the vowels of the alphabet forward one (a shift-cipher) so that A becomes B and B becomes C, all the way around until Z because A. Modern encryption relies on radically more sophisticated ciphers that use advanced computations, split keys, and other cryptographic tricks only feasible with the aid of computers.
In a public-key cryptographic scheme, there are two distinct keys used to encrypt and decrypt data. The public key is used to encrypt data for the recipient and then the recipient’s private key is used to decrypt it. Thus if someone wanted to send a secure email message to Bill Gates via the widely used Pretty Good Privacy (PGP) email encryption program, they could look up Mr. Gates’ public key on a public key server, use that key to encrypt his message, and then only he could turn around and use his private key to decrypt it.
In a private-key cryptographic scheme, the same key is used to encrypt the data as is used to decrypt the data. Although there is an inherent security risk with private-key encryption schemes as all parties must share the same key in order for the system to function, there are several widely adopted private-key encryption schemes, including TwoFish and AES.