Help & Documentation

Data archive strategy

Data for the International Mouse Phenotyping Consortium (IMPC) is generated at geographically distributed phenotyping centers. This data is then sent to the IMPC Data Coordination Center (DCC) at MRC Harwell for aggregation and quality control checking. The IMPC DCC archives this data by:

[INSERT DCC data archive strategy here]

Periodically, the data is released from the IMPC DCC to the IMPC Central Data Archive (CDA). The IMPC CDA stores the data and maintains a web portal and APIs for data access. Accessing the released IMPC data can be done through the web portal instance available at or through several APIs including: Raw data API, Genotype-phenotype API, and others. The portal and API access methods represent the current released data of the IMPC production process. As new data and new analysis methods become available, new IMPC data releases are deployed and the previous data release is then only available from an archive.

For long term storage, the IMPC data archive is hosted on an FTP server. A directory is created for each release.

See the latest release archive directory

This archive variously includes:

  • A set of reports focusing on different aspects of the data
  • A set of genotype-phenotype CSV formatted files with full results, useful for end users or for programmatic access to the data
  • A full copy of the MySQL data base
  • A full copy of the Solr cores required to run the web portal

The IMPC Newsletter

Get highlights of the most important data releases, news and events, delivered straight to your email inbox

Subscribe to newsletter