12  Specific naming conventions

As mentioned in the general naming conventions section, naming should be both human as well as machine readable. Most of this can be solved by using BIDS as explained in the previous section, however there are some GUTS-specific naming conventions that need to be adhered to.

Measures

To comply with the machine-readable requirement, processed measure file names should always include "_task-[short name]_" *. The short name of a measure, as specified in the codebook, should be used to ensure each collection site uses the same measure name. For example, the Interpersonal Reactivity Index (IRI) has been given the short name “iri”. Its file name should therefore always include "_task-iri_". It is crucial that the short names match the ones in the codebook, as our scripts specifically search for these names.

*An exception to the rule is non-tabular structural mri data.

Note that only letters and numbers are allowed in short names.

Sessions will be named according to the BIDS standard, meaning that they start with 'ses-'.

Cohorts A, B, D are scheduled to collect data in year 2, 5 and 8 of the ten-year GUTS project. Their respective sessions will be named 'ses-01', 'ses-02', 'ses-03'.

Cohort D, in addition to the regular sessions, will collect data between session 1 and 2, and between session 2 and 3: `ses-01a', 'ses-02a'.

Cohort C plans to collect data twice a year in year 2 and 3. The first sessions of a year will be named 'ses-01', 'ses-02', while the second measurements, taking place six months after the first measurement, will be labeled `ses-01a', 'ses-02a'.

Pilot

The pilot sessions will be named 'ses-pilot'

The subject ID naming convention varies slightly for each data storage location. To prevent accidental overlap between collected data from different cohorts, an abbreviation of the location will be added to each subject ID, as illustrated below.

Given that approximately 400-1000 subjects will participate at each location, we advise to use a number between 0 and 1000. For parents of subjects, the subjectid should start with ‘5’. The subject id number should always consist of four digits, e.g., subject 1 will be assigned the number 0001, subject 15 will receive the number 0015. A parent would get the subjectid 5001, 5002, 5003, etc.

Note that all data from questionnaires that are answered by the parents about the child should fall under the subject id of the child. All data from questionnaires answered by parents about themselves should remain under their own subject id.

Family ID

To be able to identify any sibling and parent-child relationships within our data, subjects get a familyid. For example, if subjects gutslei0001, gutslei0002, and gutslei50001 are part of one and the same family, all of them will be appointed family id 0001.

Data storage location Subject ID naming convention participant_id family_id
Erasmus University Rotterdam (EUR) sub-gutseur# sub-gutseur0001 0001
Leiden University (LEI) sub-gutslei# sub-gutslei0001 0001
Vrije Universiteit Amsterdam (VU) sub-gutsvu# sub-gutsvu0001 0001
Amsterdam UMC (AUMC) sub-gutsaumc# sub-gutsaumc0001 0001

Subject IDs pilot

During the pilot, 'pilot' will be added to the subject ID: 'sub-gutspilot[location]#'.

To prevent duplicate file names, it is essential that for each data storage location to add their abbreviation to the file name. For participant-level files, such as brain imaging data, the location will be integrated into the subject ID in the file name. For group-level files, the location must be added separately. Additionally, file names should consistently include the session and task, as outlined earlier.

Participant-level (individual) files: sub-[subjectid]_ses-[session]_task-[shortname].

Group-level files (all participants merged): guts[location]_ses-[session]_task-[shortname].

Demographics and participation

The demographics file will be called:

guts[location]_demographics.tsv

The participation file will be called:

guts[location]_participation.tsv

Raw vs processed files

Ultimately, it is crucial that the processed files strictly follow the correct naming conventions to ensure our scripts can be executed properly. However, future you will be grateful if the raw output also adheres to a clear and consistent naming convention, as this will facilitate data processing.

Therefore, when setting up a task (e.g., (f)MRI, EEG, E-Prime, Dynamometer, etc.) and if possible, please ensure adherence to the naming conventions by filling in subject names/session/task names. Note that for some measures (e.g., MRI), the option to fully decide on output file naming might not be available beforehand.

Below you can find some examples.

Type Raw Raw naming conventions Processed Processed naming conventions
Qualtrics - GUTS wide .sav 2024-02-26_gutslei_ses-02_quests-guts-wide_raw.sav .tsv

gutslei_ses-02_task-iri.tsv

gutslei_ses-02_task-ypi.tsv

Qualtrics - Cohort specific .sav 2024-02-26_gutslei_ses-02_quests-guts-specific_raw.sav .tsv

gutslei_ses-02_task-iri.tsv

gutslei_ses-02_task-ypi.tsv

ESM .csv 2024-02-26_gutseur_ses-01_esm_raw.csv/sub-gutseur0001_ses-01_task-e sm_raw.csv .tsv gutseur_ses-01_task-esm-pressure.tsv
(f)MRI .nii sub-gutseur0014_ses-03_task-fmrisddt_run-01_raw.nii .nii sub-gutseur0014_ses-03_task-fmrisddt_run-01.nii
.par sub-gutseur0014_ses-03_task-fmrisddt_run-01_raw.par -
.rec sub-gutseur0014_ses-03_task-fmrisddt_run-01_raw.rec -
EEG .bdf sub-gutseur0027_ses-01_task-eegflanker_raw.bdf .bdf sub-gutseur0027_ses-01_task-eegflanker.bdf
Dynanometer .csv? sub-gutseur0028_ses-03_task-dynosocialeffort_raw.csv .tsv gutseur_ses-03_task-dynosocialeffort.tsv
Behavioral .edat3 sub-gutsvu0002_ses-02_task-behsddt_raw.edat3 .tsv gutsvu_ses-02_task-behsddt.tsv
.txt sub-gutsvu0002_ses-02_task-behsddt_raw.txt - -
Physiological labels

sub-gutsaumc0005_ses-01_task-salivatesto_t0

sub-gutsaumc0007_ses-01_task-salivatesto_t30

.tsv gutsaumc_ses-01_task-salivatesto.tsv

For measures, involving tabular data, collected across multiple cohorts, the variable names must be harmonized. This harmonization allows us to easily merge the files belonging to a specific measure from various data storage locations into one file. To facilitate this, we propose the naming convention below. Please collaborate with representatives from overlapping cohorts to ensure uniformity in variable names. For measures with no overlap, the use of specific naming conventions is less crucial. However, we still recommend using a consistent naming pattern, such as the examples provided below.

s[session]_[shortname]_[subpart-task]_q/t0[question/trial #]

  • Names: lowercase letters with all distinct information separated by an_underscore

  • Labels: Full sentences starting with a capital letter.

  • Value labels: lowercase letters

Demographics

The shortname can be omitted in the demographics file. Questions that are asked only once do not need a session prefix (e.g., sex), questions that are asked multiple sessions, do need a session prefix (e.g., s01_age_years, s01_gender_q01, s01_health_etc).

The variable and value labels can be added to an SPSS file and later converted to a .tsv file + .json file, or manually incorporated into a .json file directly. For more information on creating .json files, see Chapter 15. Below are examples of variable names:

*Note that hyphens/dashes are not allowed in SPSS and should therefore not be used in variable names.

Variable name Variable label Value labels
s02_iri_pt_q03 Interpersonal Reactivity Index - Perspective taking scale Q3: I sometimes find it difficult to see things from the other guy’s point of view. 0 = does not describe me very well, 4 = described me very well
s02_dailyhassles_freq_q04 Parenting Daily Hassles scale - Frequency Q4: The kids won’t listen or do what they are asked without being nagged. 1 = never, 2 = rarely, 3 = sometimes, 4 = often, 5 = constantly
s03_pcg_exb1_perc_to2 Prosocial Cyberball Task - Exclusion Block 1: Percentage of throws to player 2.
s03_ddmoney_ind_day180 Delay Discounting Money: Indifference point day 180: Prefer to receive this amount of money now than 10 euros in 180 days.
s03_salivacort_d1_m1 Saliva Samples - Cortisol: Mean cortisol in nmol/l day 1, measurement 1.