Overview

This page will guide you through the process of submitting your analysis data and meeting the standard required by FAANG. If you have any questions about this process please contact FAANG Data Coordination Centre for help.

Please contact faang-dcc@ebi.ac.uk to discuss submission of analysis data files. We are currently preparing the instructions for the different types of analysis submission for FAANG.

Prerequisites:

You should have already submitted your FAANG sample data and raw data files that this analysis is based on to the public archives.

  1. Please familiarise yourself with the latest FAANG analysis ruleset and the FAANG data sharing principles.
  2. You must have already submitted your sample information and obtained your BioSamples accessions ahead of submitting analysis data. You must then use these FAANG BioSamples accessions in your analysis submissions to ENA. The BioSamples accessions start with SAMEA followed by a unique number.
  3. Read the FAANG analysis metadata guidelines and gather the required information to meet the standards. The template below can be useful for gathering this information prior to starting the submission process.
  4. The relevant sequencing data should have already been submitted to ENA, take the notes of study accession, experiment accessions and run accessions which will be required in the template
  5. Please refers to ENA document: how to submit analysis. If the document could not solve the query or you need further help/clarification, please contact FAANG DCC

Steps required to submit sequencing data:

  1. Download the Excel template
  2. Complete the template

1. Download the Excel template

Please refer to ENA guidance on the requirements for submission and to the latest analysis ruleset specification. The rules for each attribute define if it is mandatory or optional and what sort of data is expected (numeric, date, text, etc.).

2. Complete the template

General submission guidance

The template is generated according to the FAANG analysis ruleset. In the ruleset, there are three rule groups: FAANG, ENA and EVA. The first two rule groups are mandatory and apply to all analysis records. The EVA rule group only applies when the analysis type is sequence_variation and the submitted file is a vcf file. Correspondingly there are three separate sheets in the template file. The complete analysis record is generated by combining rows in three sheets with the same analysis alias. Therefore it is extremely important to make sure that alias matches

'faang' tab

The faang tab contains the necessary information unique to FAANG ruleset which is not required by ENA

Alias: provide a unique alias for the analysis record. The same alias must be used in the other sheets to identify the same record

Project: Always use "FAANG" as the project

Secondary Project: makes it possible to load the data into the corresponding project page, e.g. AQUA-FAANG. If the acronym is not in the list, please contact FAANG DCC to add the required acronym.

Assay Type: Should use the same value as the one used in the related experiment(s)

Analysis Protocol: The link to the protocol used to carry out the analysis process. The protocol is required to be publicly available and follow the naming convention INSTITUTE_SOP_PROTOCOLNAME_YYYYMMDD.pdf, e.g. ROSLIN_SOP_Alignment-based RNA-Seq-Processing_20170917.pdf. Please check the organisation/group abbreviation page to find or add your organisation/group abbreviation for consistency across institutes. It is highly recommended to contact FAANG DCC to host your protocol.

Analysis Code: The link to the public code repository which hosts the codes used in the analysis. The instruction of how to install and execute the codes should be clearly presented at the code repository to let others easily reproduce the analysis.

Reference Genome: The reference genome used in the analysis, please use one of the listed values in the analysis ruleset specification. Use 'not applicable' if a reference genome was not required for this analysis type. Contact FAANG DCC to add new reference assembly.

'ena' tab

The analysis data is submitted to ENA as analysis objects, therefore it must meet the ENA analysis requirement. This tab contains but not limits to all mandatory fields required by ENA.

Alias: provide a unique alias for the analysis record. The same alias must be used in the other sheets to identify the same record

Title: the title of the analysis record

Analysis type: indicates the type of the analysis. Depending on the chosen value, additional type-specific rule groups may apply

Description: it is a free text field. It is highly recommended to include as much information as possible in the description

Reference genomes

The metadata rules (FAANG analysis ruleset) require you to reference a genome used in your analysis, if applicable. Please contact faang-dcc@ebi.ac.uk if the reference genome you have used is not listed.