12 Specific naming conventions

As mentioned in the general naming conventions section, naming should be both human as well as machine readable. Most of this can be solved by using BIDS as explained in the previous section, however there are some GUTS-specific naming conventions that need to be adhered to.

Measures

To comply with the machine-readable requirement, processed measure file names should always include the key ‘task’: "_task-[short name]_" *. The short name of a measure, as specified in the codebook, should be used to ensure each collection site uses the same measure name. For example, the Interpersonal Reactivity Index (IRI) has been given the short name “iri”. Its file name should therefore always include "_task-iri_". It is crucial that the short names match the ones in the codebook, as our scripts specifically search for these names.

*An exception to the rule is non-tabular structural mri data.

Note that only letters and numbers are allowed in short names.

Sessions

Sessions will be named according to the BIDS standard, meaning that they start with the key 'ses'.

Cohorts A, B, D are scheduled to collect data in year 3, 6 and 9 of the ten-year GUTS project. Their respective sessions will be named 'ses-01', 'ses-02', 'ses-03'.

Cohort D, in addition to the regular sessions, will collect data between session 1 and 2, and between session 2 and 3: `ses-01a', 'ses-02a'.

Cohort C plans to collect data several times in year 2, 3, and 4. The behavioral sessions will be named 'ses-01', 'ses-02', 'ses-03', and 'ses-04' while the MRI sessions will be labeled `ses-01a' and 'ses-03a'.

Pilot

The pilot sessions will be named 'ses-pilot'

Subject IDs

The subject ID naming convention varies slightly for each data storage location. To prevent accidental overlap between collected data from different cohorts, an abbreviation of the location will be added to each subject ID, as illustrated below. In file names, subject IDs get the BIDS key ‘sub’.

Given that approximately 400-1000 subjects will participate at each location, we advise to use a number between 0 and 1000. For parents of subjects, the subjectid should start with ‘5’. The subject id number should always consist of four digits, e.g., subject 1 will be assigned the number 0001, subject 15 will receive the number 0015. A parent would get the subjectid 5001, 5002, 5003, etc.

Note that all data from questionnaires that are answered by the parents about the child should fall under the subject id of the child (advised to add an identifier in the variable names for these cases, e.g., _p suffix). All data from questionnaires answered by parents about themselves should remain under their own subject id.

Family ID

To be able to identify any sibling and parent-child relationships within our data, subjects get a familyid. For example, if subjects gutslei0001, gutslei0002, and gutslei5001 are part of one and the same family, all of them will be appointed family id 0001.

Data storage location	Subject ID naming convention	participant_id	family_id
Erasmus University Rotterdam (EUR)	sub-gutseur#	sub-gutseur0001	fam-gutseur0001
Leiden University (LEI)	sub-gutslei#	sub-gutslei0001	fam-gutslei0001
Vrije Universiteit Amsterdam (VU)	sub-gutsvu#	sub-gutsvu0001	fam-gutsvu0001
Amsterdam UMC (AUMC)	sub-gutsaumc#	sub-gutsaumc0001	fam-gutsaumc0001

Subject IDs pilot

During the pilot, 'pilot' will be added to the subject ID: 'sub-gutspilot[location]#'.

File names

To prevent duplicate file names, it is essential that for each data storage location to add their abbreviation to the file name. For participant-level files, such as brain imaging data, the location will be integrated into the subject ID in the file name. For group-level files, the location must be added separately. Additionally, file names should consistently include the session and task, as outlined earlier.

Participant-level (individual) files: sub-[subjectid]_ses-[session]_task-[shortname].

Group-level files (all participants merged): guts[location]_ses-[session]_task-[shortname].

Demographics and participation

The demographics file will be called:

guts[location]_demographics.tsv

The participation file will be called:

guts[location]_participation.tsv

Raw vs processed files

Ultimately, it is crucial that the processed files strictly follow the correct naming conventions to ensure our scripts can be executed properly. We have not set up fixed naming conventions for raw data, however, future you will be grateful if the raw output also adheres to a clear and consistent naming convention, as this will facilitate data processing.

Therefore, when setting up a task (e.g., (f)MRI, EEG, E-Prime, Dynamometer, etc.) and if possible, please include subject names/session/task names in the filename. Note that for some measures (e.g., MRI), the option to fully decide on output file naming might not be available beforehand.

Below you can find some examples.

Type	Raw	Example raw naming conventions	Processed	Processed naming conventions
Qualtrics - GUTS wide	.sav	2024-02-26_gutslei_ses-02_quests-guts-wide_raw.sav	.tsv	gutslei_ses-02_task-iri.tsv gutslei_ses-02_task-ypi.tsv
Qualtrics - Cohort specific	.sav	2024-02-26_gutslei_ses-02_quests-guts-specific_raw.sav	.tsv	gutslei_ses-02_task-iri.tsv gutslei_ses-02_task-ypi.tsv
ESM	.csv	2024-02-26_gutseur_ses-01_esm_raw.csv/sub-gutseur0001_ses-01_task-e sm_raw.csv	.tsv	gutseur_ses-01_task-esm-pressure.tsv
(f)MRI	.nii.gz	sub-gutseur0014_ses-03_task-fmrisddt_run-01_bold.nii.gz	.nii.gz	sub-gutseur0014_ses-03_task-fmrisddt_run-01_bold.nii.gz
	.rec	sub-gutseur0014_ses-03_task-fmrisddt_run-01_bold.rec	-
EEG	.bdf	sub-gutseur0027_ses-01_task-eegflanker_eeg.bdf	.bdf	sub-gutseur0027_ses-01_task-eegflanker_eeg.bdf
Dynamometer	.acq	sub-gutseur0028_ses-03_task-dynosocialeffort_physio.acq	.acq	gutseur_ses-03_task-dynosocialeffort_physio.acq
Behavioral	.edat3	sub-gutsvu0002_ses-02_task-behsddt_raw.edat3	.tsv	gutsvu_ses-02_task-behsddt.tsv
	.txt	sub-gutsvu0002_ses-02_task-behsddt_raw.txt	-	-
Physiological	labels	sub-gutsaumc0005_ses-01_task-salivatesto_t0 sub-gutsaumc0007_ses-01_task-salivatesto_t30	.tsv	gutsaumc_ses-01_task-salivatesto.tsv

Variables

For measures, involving tabular data, collected across multiple cohorts, the variable names must be harmonized. This harmonization allows us to easily merge the files belonging to a specific measure from various data storage locations into one file. To facilitate this, we propose the naming convention below. Please collaborate with representatives from overlapping cohorts to ensure uniformity in variable names. For measures with no overlap, the use of specific naming conventions is less crucial. However, we still recommend using a consistent naming pattern, such as the examples provided below.

s[session]_[shortname]_[subpart-task]_q/t0[question/trial #]

Names: lowercase letters with all distinct information separated by an_underscore
Labels: Full sentences starting with a capital letter.
Value labels: lowercase letters

Demographics

The shortname can be omitted in the demographics file. Questions that are asked only once do not need a session prefix (e.g., sex), questions that are asked multiple sessions, do need a session prefix (e.g., s01_age_years, s01_gender_q01, s01_health_etc).

The variable and value labels can be added to an SPSS file and later converted to a .tsv file + .json file, or manually incorporated into a .json file directly. For more information on creating .json files, see Chapter 15. Below are examples of variable names:

*Note that hyphens/dashes are not allowed in SPSS and should therefore not be used in variable names.

Variable name	Variable label	Value labels
s02_iri_pt_q03	Interpersonal Reactivity Index - Perspective taking scale Q3: I sometimes find it difficult to see things from the other guy’s point of view.	0 = does not describe me very well, 4 = describes me very well
s02_dailyhassles_freq_q04	Parenting Daily Hassles scale - Frequency Q4: The kids won’t listen or do what they are asked without being nagged.	1 = never, 2 = rarely, 3 = sometimes, 4 = often, 5 = constantly
s03_pcg_exb1_perc_to2	Prosocial Cyberball Task - Exclusion Block 1: Percentage of throws to player 2.
s03_ddmoney_ind_day180	Delay Discounting Money: Indifference point day 180: Prefer to receive this amount of money now than 10 euros in 180 days.
s03_salivacort_d1_m1	Saliva Samples - Cortisol: Mean cortisol in nmol/l day 1, measurement 1.
s01_obvl_q01_p	Opvoedingsbelasting vragenlijst Q1: Raising my child is a difficult task.	1 = does not apply, 2 = applies a little, 3 = applies quite a bit, 4 = fully applies