By Noah Stegman Rechtin
Why digitize?
Why digitize documents in the first place? To some this may seem like an obvious question, but to others used to paper documents, it may seem like a downgrade. Flipping through pages in a digital reader can be clunky at times. Also, isn’t it easy for digital files to be deleted or destroyed?
First, redundancy or, to put it more simply, “lots of copies keep stuff safe”. Many museums have suffered from major disasters that resulted in the loss of large portions of their collections. While not exactly a formal “backup”, a digital surrogate constitutes a secondary copy that will continue to exist if the physical one is destroyed.
Second, having a duplicate that the general public can access aids in the preservation of the original. Every time a document is handled, it suffers wear and tear. When properly digitized, the vast majority of research requests can be satisfied with a virtual copy. They can also be accessed in situations that an original never would have been. For instance, a student can be handed a tablet computer with a digital facsimile on it for use in an education program.
For anyone who has ever ordered blueprints of a World War II aircraft from the National Air and Space Museum, this is nothing new. NASM does not supply the original drawings, but instead reels of microfilm. Ironically, this microfilm also demonstrates the pitfalls of not creating access copies. Much of it is illegible or difficult to read. This is a symptom of repeated use. (For a deeper explanation of this with examples, see an article on the subject by Ester Aube of AirCorps Library.)
Thanks to the advent of optical character recognition (OCR) technology, a computer program can automatically detect and identify the words in a text to generate a machine-readable copy. Directly, it results in the ability to perform a Ctrl+F search of the document to quickly find a subject. Those who have had to search through several hundred-page-long parts manuals to find a single component can appreciate the usefulness of such a feature.
More broadly, when indexed by search engines, it makes the content eminently more discoverable. The ability to search large bodies of information quickly enables new kinds of research that wouldn’t have been possible before. For instance, a significant portion of the material Roger Connor, a curator at the NASM, used for his paper on aerial bootlegging came from searching for articles that mentioned the phenomenon in various online newspaper databases.
To take it one step further, documents on the Internet can of course be accessed anywhere around the world. Not everyone has the ability to visit an archive and do the research in person and the job of a historian is to make more knowledge available to more people. Imagine the young kid halfway around the world who wants to learn more about aviation history. There may be no aviation museums for many miles around him and even if there are, their coverage of U.S. aircraft is likely limited. However, with nothing more than access to a computer and the Internet, he can read all about how the autopilot on a B-17 worked or what training was like at an airfield in the southwest.
To return to the initial question of this section, the answer is that digital copies should not be seen as a replacement for a physical copy, but an augment to them.
Difficulties
So, with all of these benefits, why don’t aviation museums just slap all of their materials in a scanner?
The biggest difficulty with digitization boils down to the same problem that any aircraft restoration project has: limited resources. The three chief factors are limited time, limited money and limited staff. Proper digitization is tedious, requires expensive equipment and needs dedicated and knowledgeable workers.
Just because you own a copy of something, doesn’t mean you have the right to make more of it. This principle is called copyright and it can get very complicated, very quickly. It is not possible to cover every detail here, but it is worth going over the broad rules. (For those who want to read about the subject in more detail, check out this post in a Warbird Information Exchange thread.)
Luckily, in the United States, any work produced by an employee of the federal government in the course of their duties is public domain. This means anything published by the military or the Federal Aviation Administration can be shared without restriction.
In addition, any documents published in the United States before 1978 with a defective copyright notice – that is, it lacks at least one of the following features: a copyright symbol or the word copyright, the name of the copyright holder or a date – are also public domain. Many aircraft manufacturers, for example, did not bother to include a proper notice on their materials, therefore making them public domain.
Furthermore, courts have ruled that the simple act of digitization alone does not produce a new copyright.
When it comes to the actual scanning itself, there are certain technical standards that need to be met. Again, these can get involved, but the three biggest concerns are resolution (measured in dpi), color depth (measured in bits) and file format.
It might seem like higher resolution is better because it means more detail, but that’s not always the case. It is unnecessary for many documents, such as books with nothing more than text, because there is no additional detail to capture. However, what it does result in are bigger file sizes, which just mean more unwieldy files.
The reason the latter point is important is that just like paper, digital files can degrade. The most common example is that of JPEG images. The file format uses “lossy compression” so that every time it is re-saved in a new location it becomes slightly more pixelated. This is part of the reason that the recommended format among professionals is TIFF.
No matter whether you’re using a flatbed or cradle scanner, the actual process is time intensive. With the former, the entire process of placing a page on the platen, or scanner bed, scanning it and replacing it with the next page can average 2-3 minutes. This may not seem like a lot, but for a 100 page document, it takes a total of 3-5 hours.
A quicker process, such as this cradle scanner at the Internet Archive, can be used for bound volumes in good condition, but loose papers and documents in poor condition will still require a slower, more manual process. [Internet Archive via YouTube]
Okay, so now you have all of your documents digitized. Where are you going to put them? The web hosting alone can be expensive, not to mention the cost of the software that runs on said host. One alternative to a proprietary system is the Internet Archive. While well known for attempting to archive the Internet, it also offers free hosting of digitized materials.
Even after all of that, metadata still has to be added. Metadata – usually defined as “data about data” – is all of the extra information that categorizes a digital file. It comes in multiple types that include, but are not limited to, technical (such as the date a file was created or its file size) and descriptive (such as the title of a document or the name of the author). This information is important because it enables the discoverability mentioned above.
What to digitize?
One of the biggest decisions that museums have to make is settling on what to digitize, as limited resources mean that they do not have the ability to scan everything.
First, rarity. If you have something that doesn’t exist in many other locations, then there is a greater value in digitizing it – both to ensure redundancy and preservation. For instance, a manual produced only by a specific air force base or distributed only to a certain command will have a higher priority than one published nationwide.
Closely related to rarity is duplication. While it may seem like the same consideration, it is not. The former deals with how many extant copies there are in the real world, whereas the latter refers to how many copies there are online. There is not much point to making a second copy of the same document available if there are other materials in the museum’s collection that are not.
While, as explained above, there is a benefit to having multiple copies available, the more there are, the less attractive a document is for digitization. For example, there are many, many copies of the P-51 pilot’s manual out there. (However, just to make things more complicated, manuals were periodically updated with multiple revisions. So, sometimes, the opposite is true. For more details, see this article by Ester Aube of AirCorps Library.)
Third, fragility. Documents that are at high risk of loss or in poor condition are generally prioritized for reasons the reasons explained above. As storage media – be it paper, film, tape, or other materials – begin to break down it becomes necessary to transfer the content to another format. (Periscope Film, a company specializing in recovering and digitizing old film, touched on a number of these issues, such as vinegar syndrome, in a video.)
Lastly, utility and public interest. In short, will anyone want to read what you have digitized? If the subject is very niche, then the audience will likely be small.
Uses
When it comes to aviation, often the motivating factor in determining what gets digitized are very practical concerns. For example, a blueprint that is needed for a restoration or a manual that is necessary for pilot instruction.
This brings up one of the most unique aspects of digitization in aviation museums: liability for aircraft production drawings. In the 1980s a series of lawsuits from the families of pilots killed in accidents nearly drove general aviation manufacturers in the United States out of business. While the General Aviation Revitalization Act of 1994 addressed many of these concerns, aviation museums took note of the risk and began requiring all requests for reproductions of production drawings to be accompanied with a general disclaimer stating that they are not to be used on airworthy aircraft.
On the flip side, family genealogy is quite popular and documents like training school classbooks (the largest digital collection of which is probably the Army Air Forces Collection website) can be a very valuable resource as to what a parent or grandparent did “during the war”.
A new group of virtual historians, such as Keith of the YouTube channel WWII US Bombers, have incorporated digitized documents directly into their videos. Others, such as Witold Jaworski of the blog Airplanes in 3D, have employed them to recreate entire airplanes in computer aided design programs. Still others have used them to great extent to increase the historical accuracy of video games they play, such as War Thunder or DCS World.
These factors can of course be at odds with each other. For example, an airman’s handwritten notes from training are certainly rare and can offer unique insights not found in official documentation – such as confirmation that certain nicknames for aircraft were actually in common use. However, outside of certain limited use cases, they may have limited utility. After all, how many people really want to read someone’s school notebook? In addition, since they are handwritten, OCR has a much harder time working on them.
ResultsSo, where can you find the results of these digitization programs? There are a number of museums and other organizations that lead the way in terms of digitization. They include:
- AirCorps Aviation, which operates AirCorps Library, a subscription based service offering access to original manuals and production drawings
- Air University Library, which has a digital archives page
- Museum of Flight, which has a digital collections page
- National Air and Space Museum, which uploads its scans to the larger Smithsonian Online Virtual Archives
- San Diego Air and Space Museum, which has both a Flickr account and an Internet Archive collection
- Tri-State Warbird Museum, which has an Internet Archive collection
- University of Texas at Dallas, which has some of its aviation history collection on the Portal to Texas History
- Wright State University, which has a special collections and archives page
Conclusion
There are many more related issues – whether or not to charge for access, how to handle government classification and what to do with unpublished documents like production drawings. None of this is to mention video, audio or three-dimensional digitization either. (For example, the National Air and Space Museum 3D has scanned some of their most famous craft, such as the Space Shuttle Discovery.)
While digitization cannot completely replace onsite research, it can, if done right, bring a significant portion of the vast array of information that aviation museums have in their collections to the greater aviation-loving public.
Related Articles
Zac, born and raised in New Zealand, grew up immersed in aviation, with his father working as a helicopter crewman and living at Wanganui Airport. His passion for aviation started in childhood, building scale model kits and following the global warbird scene. He later trained as a journalist but found mainstream media unfulfilling, leading him to pursue a career as an aircraft maintenance engineer.
Now residing in Blenheim, near the historic Omaka Aerodrome, Zac studies at RNZAF Base Woodbourne and aspires to become a private and warbird pilot. Known as "Handbag" in aviation circles, he shares his love for aviation through photography and writing, connecting with enthusiasts worldwide.
For my Eagle Scout Project, I took a big scrapbook of newspaper clippings (what’s a newspaper?) dating between the 1940s and 50s about Westchester County Airport (KHPN) in New York and digitized it. I scanned some loose clippings using a portable scanner my dad has and just took photos of the pages of the scrapbook with a real camera. Finally, I curated 10 informational plaques that currently hang in the observation room on the 3rd floor of the passenger terminal.