Fresh from the press, the latest ICEDIG deliverable on updating data standards in transcription.



An important ICEDIG Deliverable has recently been published as a paper in DATABASE.

This new deliverable summarizes and condenses the findings and conclusions of previous deliverables on data capture.  Through various digitization projects, the authors have experimented with transcription by volunteers, expert technicians, scientists, commercial transcription services and automated systems. The paper contains recommendations to improve transcription data as well as standards and covers issues related to verbatim transcription, missing or unknown data and problems related to language and script. Specific aspects of the core data fields of a specimen: “what”, ”where”, ”when”, “which” and ”who” were also addressed. These recommendations are directed to standards organizations, transcribers, curators and software developers.

The paper is split into two sections. The authors first address issues related to database implementation with relevance to data transcription, namely versioning, annotation, unknown and incomplete data and issues related to language. They then focus on particular data types that are relevant to biological collection specimens, namely nomenclature, dates, geography, collector numbers and uniquely identifying people.

The ultimate goal is to improve the overall quality of transcription data and facilitate interoperability between collection management systems.

Citation: Quentin Groom, Mathias Dillen, Helen Hardy, Sarah Phillips, Luc Willemse, Zhengzhe Wu, Improved standardization of transcribed digital specimen data, Database, Volume 2019, 2019, baz129,

Interested in more ICEDIG outcomes? Check them out on our website: or on ZENODO:



Share this post

karsten's picture


Related Articles