File Format Standards
While you may wish to work in and your data using proprietary software such as Word or SPSS, you should archive your data in formats most likely to survive the test of time. File formats most likely to remain useable in the future share the following characteristics:
- complete and open documentation
- platform-independence
- non-proprietary (vendor-independent)
- no "lossy" or proprietary compression
- no embedded files, programs or scripts
- no full or partial encryption
- no password protection
Finally, consider using data formats that allow for re-use (e.g., .txt or .csv rather than .pdf). If you do store data in proprietary formats, be sure to document the software necessary to view the data. If data will be stored in one format during collection and analysis and then transferred to another format for preservation, consider documenting features that may be lost in data conversion such as system specific labels.
File Format Recommendations
Audio
Chemistry (spectra)
Documentation and scripts
Geospatial
Images
Qualtitative (text)
Quantitative tabular data with extensive metadata
Quantitative tabular data with minimal metadata
Video
Digital audio data |
Preferred Formats |
Other Acceptable Formats |
- AIFF (96kHz 16bit PCM) (*.aif, *.aiff)
- WAV (96kHz 24bit PCM) (*.wav)
|
- MPEG-1 Audio Layer 3 (.mp3)
- Audio Interchange File Format (AIFF) (.aif)
|
Chemistry data spectroscopy data and other plots which require the capability of representing contours as well as peak position and intensity |
Preferred Formats
Convert NMR, IR, Raman, UV, Mass Spectrometry, files to JCAMP format for ease in sharing. |
JCAMP file viewers: JSpecView, ChemDoodle |
Documentation and scripts |
Preferred Formats |
Other Acceptable Formats |
- plain text (.txt, encoding: USASCII, UTF-8, UTF-16 with BOM)
- PDF/A-1 (ISO 19005-1)
- XML (includes XSD/XSL/XHTML, etc.; with included or accessible schema)
|
- markdown (.md) or R markdown (.Rmd)
|
Geospatial data vector and raster data |
Preferred Formats |
Other Acceptable Formats |
- ESRI Shapefile (essential -- .shp,.shx, .dbf;
optional -- .prj, .sbx, .sbn)
- geo-referenced TIFF (.tif, .tfw)
- CAD data (.dwg)
- tabular GIS attribute data
- Keyhole Mark-up Language (KML) (.kml)
|
- ESRI Geodatabase format (.mdb)
- MapInfo Interchange Format (.mif) for vector data
|
Digital image data |
Preferred Formats |
Other Acceptable Formats |
- TIFF version 6 uncompressed (.tif)
|
- JPEG (.jpeg, .jpg)
- TIFF (other versions)(.tif, .tiff)
- JPEG 2000 (.jp2)
- Adobe Portable Document Format (PDF/A,
PDF) (.pdf)
|
Viewers: OMERO for conversion, viewing and metadata for biological microscope slides and other TIFF files. |
|
Qualitative data
textual |
Preferred Formats |
Other Acceptable Formats |
- eXtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml)
- Rich Text Format (.rtf)
- plain text data, UTF-8 (unicode) (.txt)
|
- plain text data, ASCII (.txt)
- Hypertext Mark-up Language (HTML) (.html)
- widely-used proprietary formats, e.g. MS Word (.doc/.docx)
- LaTeX (.tex)
|
Quantitative tabular data with extensive metadata
a dataset with variable labels, code labels, and defined missing values, in addition to the matrix of data |
Preferred Formats |
Other Acceptable Formats |
- Character delimited text (ASCII or Unicode preferred):
- Comma Separated Values (*.csv)
- Delimited Text (*.txt)
- SQL Data Definition Language
- Structured text or mark-up file containing metadata information, e.g. DDI XML or JSON
|
|
Quantitative tabular data with minimal metadata
a matrix of data with or without column headings or variable names, but no other metadata or labelling |
Preferred Formats |
Other Acceptable Formats |
- comma-separated values (CSV) file (.csv)
- tab-delimited file (.tab) including delimited text of given character set with SQL data definition statements where appropriate
|
- delimited text of given character set -- only characters not present in the data should be used as delimiters (.txt)
- widely-used formats, e.g. MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), dBase (.dbf) and OpenDocument Spreadsheet (.ods)
|
Digital video data |
Preferred Formats |
Other Acceptable Formats |
- MPEG-4 High Profile (.mp4)
|
|
Adapted from the UK Data Archive recommendations for file formats: Managing and Sharing Data and from the Cornell eCommons Recommended File Formats.
See also: Library of Congress Digital Format A-Z Directory