Organize your Data Upload

When organizing your data for upload we have a preferred organization (flattened data layout) and an alternate option (hierarchy data layout) if your DCC requires that.

NOTE: If you and your contributing site decide to use a hierarchical file structure within your cloud storage location, please remember that each top-level folder and all of its subfolders must contain data of the same type (see details below).

Top Level Folder Names

Top level folders correspond to the datasets being submitted. See the examples below. You can name your datasets in a way that is descriptive for your contributing site. The DCC may create empty, common top-level folders for you to get started.

You can use either the Hierarchy or Flattened data layout according to the examples below. In the hierarchical case, you would fill in one manifest and include all files in experiment/batches; in the flattened case, you would fill in one manifest for each top level folder.

Flattened Data Layout Example

This is the preferred dataset organization option for use with the Data Curator App. Each dataset folder contains the same datatype, and there aren’t nested folders containing datasets.

CODE

.
├── biospecimen_experiment_1
    ├── manifest1.csv
├── biospecimen_experiment_2
    ├── manifestA.csv
├── single_cell_RNAseq_batch_1
    ├── manifestX.csv
    ├── fileA.txt
    ├── fileB.txt
    ├── fileC.txt
    └── fileD.txt
└── single_cell_RNAseq_batch_2
    ├── manifestY.csv
    └── file1.txt

Hierarchy Data Layout Example

In this option, subfolders must be of the same data type and level as the root folder they are contained in. For example, you cannot put a biospecimen and a clinical demographics subfolder within the same folder. Your files should be reasonably descriptive in stating the assay type and level and be consistently prefixed with the assay type.

each dataset folder must have Synapse annotation contentType:dataset
a dataset folder can’t be inside another dataset folder
dataset folders must have unique names
folder hierarchy may contain non-dataset folders (e.g. storing reports or other kinds of entities)

CODE

.
├── clinical_diagnosis
├── clinical_demographics
├── biospecimen
   ├── experiment_1
        ├── manifest1.csv
    └── experiment_2
        ├── manifestA.csv
└── single_cell
    ├── batch_1
        ├── manifestX.csv
        ├── fileA.txt
        ├── fileB.txt
        ├── fileC.txt
        └── fileD.txt
    └── batch_2
        ├── manifestY.csv
        └── file1.txt

To Create a DCC Fileview with scope set to the DCC project:

Add column contentType to the Fileview schema (default parameters for the column schema will work).
Give every Team Download level access to this fileview.

Note: creating this file view will not be possible if files/folders don’t yet exist in the center-specific projects; Synapse will not allow you to create a file view with an empty scope.
Make sure to add both file and folder entities to the scope of the Fileview.