Upload Data to Google Cloud Storage (recommended for large data)
The DCC will create buckets for centers to use the gsutils to upload directly to buckets. Below are requirements and necessary instructions such as object permissions and other flags for files to be automatically synced onto Synapse.
REQUIREMENTS
Send the DCC the Google identities of users who will require upload access to the bucket. (A Google identity can be either a Gmail address or any institutional email that uses Google Workspace, formerly known as G Suite.)
(IMPORTANT) When uploading data:
Dataset names may NOT begin with a number
Example Commands:
Using gsutil cp:
gsutil cp <file> gs://MyBucket/MyFolder/
Using gsutil rsync:
gsutil rsync -r dir gs://MyBucket/MyFolder/dir
For large files, parallel composite uploads may be enabled for faster upload speeds. Please note that if this is done, you must provide a base-64 encoded MD5 as a metadata tag content-md5
for each file upon upload (see example below). In addition, users who download files uploaded as composite objects must have a compiled crcmod installed.
gsutil -h x-goog-meta-content-md5:<md5> cp <file> gs://<MyBucket>/<MyFolder>/
Once your data is in the bucket, it will automatically be synced with your center’s Synapse project. This process can take anywhere from a few minutes to up to a day depending on the size of your data. Once it is present on Synapse, you can proceed to annotate your metadata.
Note: If you would like to make changes to your data, please do so directly from the Google Storage Bucket and not from the Synapse web or programmatic clients. Changes made to the bucket will automatically be updated on the Synapse project.