File Formats in DSP-API
Currently, only a limited number of file formats is accepted to be uploaded onto DSP. Some metadata is extracted from the files during the ingest but the file formats are not validated. Only image file formats are currently migrated into another format. Both, the migrated version of the file and the original are kept.
The following table shows the accepted file formats:
Category | Accepted format | Converted during ingest? |
---|---|---|
Text, XML *) | JSON, ODD, RNG, TXT, XML, XSD, XSL | No |
Tables | CSV, XLS, XLSX | No |
2D Images | JPG, JPEG, JP2, PNG, TIF, TIFF | Yes, converted to JPEG 2000 by Sipi |
Audio | MPEG (MP3), WAV | No |
Video | MP4 | No |
Office | PDF, DOC, DOCX, PPT, PPTX | No |
Archives | ZIP, TAR, GZ, Z, TAR.GZ, TGZ, GZIP, 7Z | No |
*) If your XML files represent text with markup (e.g. TEI/XML), it is possible to store it as Standoff/RDF, as described here.