Skip to Main Content
University of Oregon
UO Libraries

Research Data Management

File Formats

File Format Standards

While you may wish to work in and your data using proprietary software such as Word or SPSS, you should archive your data in formats most likely to survive the test of time. File formats most likely to remain useable in the future share the following characteristics:

  • complete and open documentation
  • platform-independence
  • non-proprietary (vendor-independent)
  • no "lossy" or proprietary compression
  • no embedded files, programs or scripts
  • no full or partial encryption
  • no password protection

Finally, consider using data formats that allow for re-use (e.g., .txt or .csv rather than .pdf). If you do store data in proprietary formats, be sure to document the software necessary to view the data. If data will be stored in one format during collection and analysis and then transferred to another format for preservation, consider documenting features that may be lost in data conversion such as system specific labels.

File Format Recommendations

Audio
Chemistry (spectra)
Documentation and scripts
Geospatial
Images
Qualtitative (text)
Quantitative tabular data with extensive metadata
Quantitative tabular data with minimal metadata
Video

Digital audio data
Preferred Formats Other Acceptable Formats
  • AIFF (96kHz 16bit PCM) (*.aif, *.aiff)
  • WAV (96kHz 24bit PCM) (*.wav)
     
  • MPEG-1 Audio Layer 3 (.mp3)
  • Audio Interchange File Format (AIFF) (.aif)
Chemistry data spectroscopy data and other plots which require the capability of representing contours as well as peak position and intensity
Preferred Formats
Convert NMR, IR, Raman, UV, Mass Spectrometry, files to JCAMP format for ease in sharing.
JCAMP file viewers: JSpecView, ChemDoodle
Documentation and scripts
Preferred Formats Other Acceptable Formats
  • plain text (.txt, encoding: USASCII, UTF-8, UTF-16 with BOM)
  • PDF/A-1 (ISO 19005-1)
  • XML (includes XSD/XSL/XHTML, etc.; with included or accessible schema)
  • markdown (.md) or R markdown (.Rmd)
Geospatial data vector and raster data
Preferred Formats Other Acceptable Formats
  • ESRI Shapefile (essential -- .shp,.shx, .dbf;
    optional -- .prj, .sbx, .sbn)
  • geo-referenced TIFF (.tif, .tfw)
  • CAD data (.dwg)
  • tabular GIS attribute data
  • Keyhole Mark-up Language (KML) (.kml)
  • ESRI Geodatabase format (.mdb)
  • MapInfo Interchange Format (.mif) for vector data
Digital image data
Preferred Formats Other Acceptable Formats
  • TIFF version 6 uncompressed (.tif)
  • JPEG (.jpeg, .jpg)
  • TIFF (other versions)(.tif, .tiff)
  • JPEG 2000 (.jp2)
  • Adobe Portable Document Format (PDF/A,
    PDF) (.pdf)
Viewers: OMERO for conversion, viewing and metadata for biological microscope slides and other TIFF files.  
Qualitative data
textual
Preferred Formats Other Acceptable Formats
  • eXtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml)
  • Rich Text Format (.rtf)
  • plain text data, UTF-8 (unicode) (.txt)
  • plain text data, ASCII (.txt)
  • Hypertext Mark-up Language (HTML) (.html)
  • widely-used proprietary formats, e.g. MS Word (.doc/.docx)
  • LaTeX (.tex)
Quantitative tabular data with extensive metadata
a dataset with variable labels, code labels, and defined missing values, in addition to the matrix of data
Preferred Formats Other Acceptable Formats
  • Character delimited text (ASCII or Unicode preferred):
    • Comma Separated Values (*.csv)
    • Delimited Text (*.txt)
  • SQL Data Definition Language
  • Structured text or mark-up file containing metadata information, e.g. DDI XML or JSON
 
Quantitative tabular data with minimal metadata
a matrix of data with or without column headings or variable names, but no other metadata or labelling
Preferred Formats Other Acceptable Formats
  • comma-separated values (CSV) file (.csv)
  • tab-delimited file (.tab) including delimited text of given character set with SQL data definition statements where appropriate
  • delimited text of given character set -- only characters not present in the data should be used as delimiters (.txt)
  • widely-used formats, e.g. MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), dBase (.dbf) and OpenDocument Spreadsheet (.ods)
Digital video data
Preferred Formats Other Acceptable Formats
  • MPEG-4 High Profile (.mp4)
  • JPEG 2000 (.mj2)

Adapted from the UK Data Archive recommendations for file formats: Managing and Sharing Data and from the Cornell eCommons Recommended File Formats.

See also: Library of Congress Digital Format A-Z Directory