Ancestry is doing some excellent work with data stored in the exported GEDCOM files from their website but they appear to be deviating from the GEDCOM standard.
One interesting issue I came across is that the record size varies between a maximum of 255 characters when that record has a 4 character tag to 256 characters with a custom tag like _META.
I noticed this issue since Retrogen (using Berkeley ISAM database) typically has variable length files up to a maximum size. I had set this maximum to 255 characters and I noticed meta-data (_META) which is used for cemetery descriptive details (name of cemetery, person name, date and place of birth and death) and description records (_DSCR) used for transcriptions, had characters truncated.
The fix for Retrogen is easy enough, i.e. simply change the maximum within the file definition to 256 characters (excluding terminator). That being said, the GEDCOM 5.5.1 (2 October 1999) standard states “The total length of a GEDCOM line, including level number, cross-reference number, tag, value, delimiters, and terminator, must not exceed 255 (wide) characters.” (page 11). The terminator being carriage_return | line_feed | carriage_return + line_feed | line_feed + carriage_return (page 14). So I was not totally compliant since my sizing ignored the terminator.
My guess is that Ancestry has defined a size for their data export without taking into account the larger custom tag size.
Media Field Sizes
The GEDCOM 5.5.1 standard explains length of GEDCOM field sizes in detail, which is why I was surprised to read the data sizes allowed when storing media data on Ancestry. Ancestry allows the picture title or name to be 128 characters, location of 256 characters (which is split over 2 exported records with a CONC tag for the residual data) and 128 characters for date.
The last one surprised me – why would you need 128 characters to store the media date. The GEDCOM standard refers to DATE as between 4 and 35 characters (page 45). I would assume 4 allows for the year and I am not sure why the 35 character maximum (I tried some combinations using “between” and although it is possible to exceed 35 characters, rewording with abbreviations for “between” and “month name” can work, e.g. “Bet. 10 September-31 September 1900” or “Bet. 10 Sept. 1900 to 31 Sept. 1901” fits to the maximum).
The standard states the place name or location has up to 120 characters (page 58) against Ancestry’s 256 characters, and descriptive title allows for 248 characters (page 48) against Ancestry’s picture title of 128 characters.
It seems that Ancestry is either deviating from the standard or perhaps a new standard and size changes is being proposed or prepared. Nevertheless, to avoid data loss, the maximum length of a record or field needs to be observed, whether the maximum is from the GEDCOM standard or Ancestry themself. Legacy programming languages work well with standards with prescribed widths but many newer languages would be more leniant to size variations. Perhaps the sizes defined in the standard will one day be deemed archaic or obsolete but I personally find them invaluable to define data using the COBOL programming language, regardless of Ancestry’s deviations.
I was examining some truncated text relating to an event description. The GEDCOM standard has this at a length of 90 characters. By copying and pasting some text into a custom event description within Ancestry, it appears they allow a maximum of 256 characters for the description and with level and tag (EVEN), a total of 263 record length (excluding terminators). In this case, there is no CONC tag within the standard since the field length is supposed to be within the record length. I have now data truncation so it is frustrating.