Well, it has been afew days since my last post on the exciting new “RetroGen” project. In that time, I have been exploring header context within a GEDCOM file, an area that I have never really examined in any detail.
The GEDCOM standard 5.5.1 has defined the HEADer context, with items in bold font being required, is as follows;
0 HEAD 1 SOUR <APPROVED_SYSTEM_ID> 2 VERS <VERSION_NUMBER> 2 NAME <NAME_OF_PRODUCT> 2 CORP <NAME_OF_BUSINESS> 3 <<ADDRESS_STRUCTURE>> 2 DATA <NAME_OF_SOURCE_DATA> 3 DATE <PUBLICATION_DATE> 3 COPR <COPYRIGHT_SOURCE_DATA> 4 [CONT|CONC] <COPYRIGHT_SOURCE_DATA> 1 DEST <RECEIVING_SYSTEM_NAME> 1 DATE <TRANSMISSION_DATE> 2 TIME <TIME_VALUE> 1 SUBN @XREF:SUBN@ 1 SUBM @XREF:SUBM@ 1 FILE <FILE_NAME> 1 COPR <COPYRIGHT_GEDCOM_FILE> 1 GEDC 2 VERS <VERSION_NUMBER> 2 FORM <GEDCOM_FORM> 1 CHAR <CHARACTER_SET> 2 VERS <VERSION_NUMBER> 1 LANG <LANGUAGE_OF_TEXT> 1 PLAC 2 FORM <PLACE_HIERARCHY> 1 NOTE <GEDCOM_CONTENT_DESCRIPTION> 2 [CONT|CONC] <GEDCOM_CONTENT_DESCRIPTION>
When I examine the standard to an exported GEDCOM file from Ancestry, I noted that only some of these tags were actively used (indentation added for clarity), and that some required tags for submitter and destination were ignored;
0 HEAD 1 CHAR UTF-8 1 SOUR Ancestry.com Family Trees 2 VERS (2010.3) 2 NAME Ancestry.com Family Trees 2 CORP Ancestry.com 1 GEDC 2 VERS 5.5 2 FORM LINEAGE-LINKED
It seems that the missing DEST and SUBM could of been added by Ancestry, by simply adding a “1 DEST Ancestry.com Family Trees” and using the account username for submitter;
1 SUBM @SUB1@ 0 @SUB1@ SUBM 1 NAME Ancestry username
When I compared the above to Reunion, I noted some similarities, errors and additions in its GEDCOM export;
0 HEAD 1 SOUR Reunion 2 VERS V9.0 2 CORP Leister Productions 2 FORM LINEAGE-LINKED 1 DEST Reunion 1 DATE 17 MAY 2020 1 SUBM @SUB1@ 1 FILE My Family Tree 1 GEDC 2 VERS 5.5 1 CHAR UTF-8
Firstly, Reunion has added the required submitter and destination system as well as date of export, the GEDCOM filename, but for some reasons (in error, or at least by the older Reunion version 9), had the FORMat of “LINEAGE-LINKED” within the system source as opposed to the GEDCom section.
I used a minimal GEDCOM 5.5.1 file from the excellent Tamura Jones genealogy site to see how RetroGen faired in the intrepretation of the HEAD context, the file contained the following detail (Software MacKiev FTM 2019 (184.108.40.2060) GEDCOM 5.5.1 UTF-8 file with byte order marks shown in grey), with no identations added;
EF BB BF 0 HEAD 1 SOUR FTM 2 VERS 220.127.116.110 2 NAME Family Tree Maker for Windows 2 CORP The Software MacKiev Company 3 ADDR 30 Union Wharf 4 CONT Boston, MA 02109 3 PHON (617) 227-6681 1 DEST FTM 1 DATE 28 Sep 2019 1 CHAR UTF-8 1 FILE FTM2019.ged 1 SUBM @SUBM@ 1 GEDC 2 VERS 5.5.1 2 FORM LINEAGE-LINKED 0 @SUBM@ SUBM 1 NAME Not Given 0 @I1@ INDI 1 NAME /Test/ 1 SEX U 0 TRLR
The output from RetroGen was surprising good, with only the submitter record details marked as warning since they have not been implemented.
So, the next steps will be to implement the required submitter context, not currently used by Ancestry but part of the GEDCOM 5.5.1 standard.
So after some coding effort on a lazy Sunday afternoon;