GAG

You can download here the whole GAG database SQL dump.

GAG relies on a mySQL database with 8 tables, gathering annotation data from several sources and computed cross-references.

GenBank and Ensembl data are split into 6 tables (3 for each institution, labeled ncbi_* and embl_*), respectively storing data for RNA transcripts (embl_seq_gene and ncbi_seq_gene), link between RNA transcripts and gene IDs (embl_map_gene and ncbi_map_gene), and gene annotation (embl_annot_gene and ncbi_annot_gene ; including outgoing references to Uniprot/Swissprot).

HGNC data are stored in a separate table (hgnc_data).

Finally, the main table storing computed cross-references is xref_gene, which includes:

Taxa_ID	The numerical taxa ID for considered gene
NCBI_Gene_ID	GenBank Gene ID
Embl_Gene_ID	Ensembl Gene ID
Status	A descriptive field indicating whether the cross-reference proposed is currently admitted by Genbank/Ensembl (Known/Validated and Known/Corrected), or predicted by the GAG process (Predicted).
Annotation_Score	Number of common words between Genbank and Ensembl descriptive annotation.
Common_Symbol	Do both Ensembl and Genbank gene share a same symbol
Common_Human_Homologs	Do both Ensembl and Genbank gene share at least one human homolog gene
Common_Mouse_Homologs	Do both Ensembl and Genbank gene share at least one mouse homolog gene
Common_Chicken_Homologs	Do both Ensembl and Genbank gene share at least one chicken homolog gene
Common_UniProt_ID	Do both Ensembl and Genbank gene share the same UniProt ID