GAG relies on a mySQL database with 8 tables, gathering annotation data from several sources and computed cross-references.
GenBank and Ensembl data are split into 6 tables (3 for each institution, labeled ncbi_* and embl_*), respectively storing data for RNA transcripts (embl_seq_gene and ncbi_seq_gene), link between RNA transcripts and gene IDs (embl_map_gene and ncbi_map_gene), and gene annotation (embl_annot_gene and ncbi_annot_gene ; including outgoing references to Uniprot/Swissprot).
HGNC data are stored in a separate table (hgnc_data).
Finally, the main table storing computed cross-references is xref_gene, which includes:
|Taxa_ID||The numerical taxa ID for considered gene|
|NCBI_Gene_ID||GenBank Gene ID|
|Embl_Gene_ID||Ensembl Gene ID|
|Status||A descriptive field indicating whether the cross-reference proposed is currently admitted by Genbank/Ensembl (Known/Validated and Known/Corrected), or predicted by the GAG process (Predicted).|
|Annotation_Score||Number of common words between Genbank and Ensembl descriptive annotation.|
|Common_Symbol||Do both Ensembl and Genbank gene share a same symbol|
|Common_Human_Homologs||Do both Ensembl and Genbank gene share at least one human homolog gene|
|Common_Mouse_Homologs||Do both Ensembl and Genbank gene share at least one mouse homolog gene|
|Common_Chicken_Homologs||Do both Ensembl and Genbank gene share at least one chicken homolog gene|
|Common_UniProt_ID||Do both Ensembl and Genbank gene share the same UniProt ID|