Stanislav Pejša @ on
identification,validation, and charcterization of objects in repository
The versions of formats stored in the NEES Data repository are largely unknown. The file extension and MIME TYPE provide only approximate information in this regard. Format identification and validation provide accurate information about the current state of formats in the repository and possible risks due to format obsolescence.
There are software packages that identify and validate stored formats. These formats can later be related to their potential preservation risks, as one can see from the implementation of DROID in EPrints, which is one the most popular open source institutional repostory
for identification of formats a stand-alone application DROID can be used http://droid.sourceforge.net/ or the PRONOM bundle
Once the results and state of the NEES Data repository would be known
a) a registry can be build up
b) sets of supported formats can be identified
c) policies can written in respect what to do with unsupported formats.
The installation of the package itself is relatively easy.
Characterisation and validation of files should be part of the “ingest” procedure and we should accept only validated formats, because those will work properly and can be preserved. I estimate it won’t take more then a day to set it up, but implementation will take some planning.
FITS looks like viable option – it encapsulates several of the above-mentions tools http://code.google.com/p/fits/
would it work on NFS?