Digital Humanities Archiving

These notes were written to help a particular group of digital humanities researchers and archivists cover their bases. Pick the ones that make sense for your own project in the long term. Technologies change often, so do some parts (at least) of these notes.

For brevity, I've omitted the reasons for many proposals here. Please discuss these proposals with your archival team.

Workflow and Data Management

Backup

  • LOCKSS (Lots Of Copies Keep Stuff Safe) in different locations.
  • Backup milestones in the workflow, not just the final output.
  • For master files, avoid using any hard disk (older than 1 year) with moving parts and without its own power supply. Deposit master files in a central project repository for redundant offsite backups.

  • To deposit materials to the central project repository (for a project I’m involved in), share the files using this instruction. Send the share code to its administrator. To sync your files with your research group (whose copy will serve as secondary backup), also share the code with them. 

Searchability
Research could well be about “re-search” insofar we’re concerned about the extent to which, or ease with which, our collected data or information could be searched for further analysis and presentation.

All media

  • Filename length. Take advantage of the maximum number of characters (about 255 in most modern operating systems) for a filename. Note, however, that a filename is different from the title of the work.

  • File naming convention. Decide on a filenaming convention. Most important name or word starting from the left; the least, right. Filenames are different from entry titles. Avoid using special characters like "á" and "ñ".

  • Metadata. The whole point of having metadata is machine-readability and analyzability. Metadata are also determined in part by your research questions. Decide early on what metadata or information types would be included in the collection of media.

    Metadata are of three types: descriptive, structural, and content. Descriptive metadata are things about a digital item, including: title, creator, creation, publisher, subject or genre, media type (image, audio, video). Structural metadata are  about the digitized files that make up the digitized items like .mp3, .mp4, .mov and the like. Content metadata could be table of contents, track-listing, chapter list that identify the contents of digitized files. Overlaps are unavoidable (Rehbein 2014).

    These are initial metadata considerations (for a project I’m involved in: the Philippine Performance Archive Project):

    • Context. Make your metadata as rich as humanly possible by describing the context of the photo, video, audio recording, notes taken. The past is a foreign land, and the materials you’re taking now would soon be past, which could be alien to future researchers. Committing information simply to human memory is bad research policy.

    • Date and Time Stamp, and Location. Date every digital artifact. Date of file name is not necessarily the same as that of the image or recording itself. Use a geolocation app (e.g., “Geolocation”) on your GPS-enabled smartphone to get the Decimal Degrees (DD) reading of where you’re taking the photograph or video or conducting the interview. 

    • Length, size. Note the length of the master video or audio (in minutes) or the word count of text.

    • Key participants. Note of the names of key participants of the event.

    • Route, Length of Time. In the documentation of a parade or similar event, also note the route on the map and the time it takes to complete the event.

    • Crowd estimate. If available, get the official estimate. If not, get it from old-timers.

    • Files (especially media files) may already have their own meta data. They can be extracted using certain tools (a compilation is available here). Find out which ones might work for you.

  • Post-processing. Media files produced using processing software (Photoshop, Maya, etc.) should be archived with their source files.

Text

Contents in plain text are more searchable than rich-format files (eg., docx, pptx). Rich-format files should be archived with notes on software used (at least the name and version of the app).

Media Formats, Resolutions, Bitrates
Video, Image, and Audio. A master copy should be taken and kept in a lossless format (examples: photo - RAW; audio - FLAC; video - FFmpeg), maintaining the media quality of the original source. Media files have to be normalized for preservation upon ingest. One should not convert a lossy format to a lossless one; recommended conversion is the other way around. Please check your equipment if it’s capable of taking video, audio, or photo in lossless format.

For the video that you directly take, it should be at least on 4k resolution. Master copies of photos should also be on hi-res (recommended: between 2560px x 1440px and 3930px x 6000px, or better).

Audio: at least a bitrate of 128 kbps (max: 256; beyond that is likely wasteful).

Choosing one format over another, however, always involves trade-off. For what you might lose or gain,  see this link to a table, some rough guide for photo and video formats (Sudhakaran 2013).

For important media files culled from various sources other than your team, just try to get the highest resolution possible.



Photo Scans
For those who will have to deal with old pictures from the "baul" (trunk) or old records, use a good scanner set at 300 dpi (at least, for some 4"x6"). The bigger the picture or the more fine-grained its details are, the bigger the dpi value should be. A good scanner, however, cannot magically improve upon a bad picture. Sometimes for some reasons, a bad-quality picture is better than no picture at all.

Media Preservation Plan
Institute a media-type preservation plan. Part of such plan is to have at least two type of formats for your media files: "preservation" and "access". Access formats yield small file sizes that are readily downloadable and playable by your audience. Preservation formats are those that tend to keep the most information and quality of media over time. Here’s a practice at Archivematica, for example. 



Photo and Video Release Form
In compliance with the country’s privacy laws, a release form is expected for any photo or video involving any identifiable person or data subject. Essential text of such form is included in this template: privacyph.org/releaseform.

This has to be signed and dated by a data subject of legal age; for minors, their guardians.

Reference Management
References for your digital resources could be managed and annotated using Zotero (which in turn should work with a management and archival system for large collections of digital audio and video files). For geolocation info, use the "Extra" field on Zotero. See further info and instruction here as well as these user tips.

Checklists
To follow the technical requirements of your projects (with your own choices), please encourage maintaining checklists for field work, post-processing, and other important stages in your project workflow.

 See Also