DataLad extension module for neuroimaging studies¶

Change log¶
____ _ _ _
| _ \ __ _ | |_ __ _ | | __ _ __| |
| | | | / _` || __| / _` || | / _` | / _` |
| |_| || (_| || |_ | (_| || |___ | (_| || (_| |
|____/ \__,_| \__| \__,_||_____| \__,_| \__,_|
Hirni
This is a high level and scarce summary of the changes between releases. We would recommend to consult log of the Git repository for more details.
0.0.8 (February 2, 2021)¶
- compatibility with datalad 0.14 series
- switch CI to AppVeyor
0.0.7 (October 7, 2020)¶
- adapt for datalad 0.13 series and raise dependency to 0.13.4
- raised dependency on datalad-neuroimaging
- several fixes for hirni’s tests and CI setup
0.0.6 (April 6, 2020)¶
- update dependencies so everything works with current datalad 0.12.5
0.0.5 (January 8, 2020)¶
- lots of bugfixes
- enhance documentation
- work with datalad >= 0.12.0rc6
0.0.3 (May 24, 2019) – We’re getting there¶
- rework rule system for easier customization and configurability
- config routine setup_hirni_dataset renamed to cfg_hirni
- work with datalad 0.12rc series
- use datalad-metalad for metadata handling
- fix issues with dependencies
0.0.2 (May 2, 2019)¶
- Provide configuration procedure for hirni datasets
- Provide webapp for editing specification files
- Reworked specification structure
- Enhance CLI to create specification snippets for any file
0.0.1 (May 24, 2018) – The Release¶
- Minimal functionality to import DICOMs and convert them into a BIDS dataset using a automatically pre-populated editable non-code specification.
Acknowledgments¶
DataLad development is being performed as part of a US-German collaboration in computational neuroscience (CRCNS) project “DataGit: converging catalogues, warehouses, and deployment logistics into a federated ‘data distribution’” (Halchenko/Hanke), co-funded by the US National Science Foundation (NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1411). Additional support is provided by the German federal state of Saxony-Anhalt and the European Regional Development Fund (ERDF), Project: Center for Behavioral Brain Sciences, Imaging Platform
Overview¶
Introduction¶
What is datalad-hirni?¶
Datalad-hirni aims to provide the means to enable automated provenance capturing of a (neuroimaging) study as well as automated, metadata-driven conversion. In a way, datalad-hirni is two things. A conceptional idea on how to bind and structure all (raw) data, metadata, code and computational environments of a study, and a software package to support the consequential workflow to achieve that. On the software side, datalad-hirni is a Python package and an extension to Datalad.
Note
Technically, datalad-hirni (and its approach in general) isn’t limited to neuroimaging. In a different context there’s just less convenience provided by the default routines ATM.
Where do I get it?¶
As a Python package, you can install datalad-hirni via pip:
pip install datalad-hirni
What now?¶
If you want to get a grasp on how it works, you should start here. For diving right into usage have a look at the examples, especially at the study dataset demo for a start.
Concepts¶
One thing to bind them all¶
The central piece of the envisioned structure is a Datalad dataset, containing (i.e. referencing) all the raw data of a study alongside code, containers, protocols, metadata, … you name it. Everything that is making up your study and can be considered raw, in that it cannot be derived from other raw information that is already included. Furthermore, this dataset is supposed to have a subdirectory per each acquisition, which in turn should contain a subdataset for the DICOM files. All other data acquired besides (like physiological data, stimulation log files, etc.) as well as any other information specific to that acquisition should go beneath the respective acquisition subdirectory. Apart from that, it’s entirely up to you to decide about the structure within those acquisition directories.

This raw dataset will then be referenced by the converted (i.e. to BIDS) dataset the same way itself references the DICOM datasets. It’s recommended to follow this principle for further processing, analyses, etc. Let the results of a process be captured in a dataset and reference the input data as a subdataset. In that regard think of datasets as software packages and subdatasets as dependencies. In other words: Learn from YODA!
The second piece is the study specification. This is an additional metadata layer, the conversion of the dataset will be based on. It consists of JSON files within the dataset and describes contained data (files, directories or “logical entities”) in terms of the conversion target. For example in case of BIDS as the targeted layout it should assign the BIDS terms used by BIDS to determine file names and locations to each image series contained in the DICOM files. In addition the specification allows to define (custom) conversion routines for each data entity it describes. Furthermore, it can carry any additional metadata you might have no other place to record (like a comment about an aborted scan) or that you might need to be available to your conversion routine for a particular type of file.
Finally, by default a toolbox dataset will be part of the study dataset, providing a collection of conversion routines and container images to run them in, which can be referenced from within the study specification. This isn’t mandatory, though, but rather a default for convenience. You can have your own toolbox or none at all.
Trust automation¶
Binding everything in that one dataset will allow us to use automatically generated references to particular versions of each file for provenance capture. This is mostly relying on DataLad’s run command, which runs arbitrary executables and produces a (machine readable) record of the exact version of inputs, the executed call, possibly the container it ran in and its results. Thereby you can trace back the provenance of all derived data files throughout the entire history of the dataset (and its subdatasets) and reproduce the results from scratch. Another aspect of automation is the creation of the above mentioned study specification. Datalad-hirni comes with a routine to import DICOM files, extract the header data as Datalad metadata and derive a specification for the contained image series for conversion to BIDS.
Conversion¶
The conversion of a study dataset was build with BIDS in mind, but technically the mechanism can target any other layout specification (or another version of the same standard). The idea here is to create a new dataset, which is to become the converted dataset. The original study dataset is then linked into it as a subdataset. So, again the result of the conversion process is referencing the exact version of the data it is based upon. The conversion is executing whatever is specified in the study dataset’s specification and is executed per specification file, allowing to convert a single acquisition or even just a single file only, for example. In addition an anonymization can be included in the conversion.
Don’t trust automation¶
While automation can help a lot, in reality there’s no “one size fits it all”. Rules to derive BIDS terms from DICOM headers won’t capture all cases, information no one included at the scanner can not be made up by an automation routine and there’s a lot of potential for human error. This is why many adjustments can be made. First off, that is the reason for the specification to be added as a separate layer. If the automatically derived information is wrong, incomplete or insufficient, you can review that specification and edit or enhance it before even trying to convert anything. Since the specification is stored in JSON format, it can easily be edited programmatically as well as manually. To ease the manual review or edition Datalad-hirni comes with a browser based GUI for this purpose. Secondly, the rules to derive the specification from DICOM metadata are customizable. You can have your own rules or configure a set of such rules to be applied in a particular order. Also the specification of conversion routines is quite flexible. You can plug in whatever routine you like, the conversion can be an entire list of routines to execute, those routines have access to all the fields in the specification they are running on and their execution can depend on whether or not the anonymization switch was used for the conversion command. That way the default defacing - which also can be replaced by another routine, of course - is implemented, for example.
For a more detailed explanation of how it works and how to approach customization, you might be interested in watching this talk.
Basic workflow¶
Datalad-hirni comes with a set of commands aiming to support the following workflow to curate a study dataset and to convert it. This workflow to a degree reflects the envisioned structure of a study dataset as described in the concepts section.
This is a somewhat abstract description of what it is supposed to do. It might be more convenient for you to first see what you have to do by looking at the examples, showing this exact workflow: The creation of a study dataset here and afterwards the conversion.
Build your study dataset¶
In order to build such a dataset that binds all your raw data, the first thing to do is to create a new dataset. To set
it up as a hirni dataset, you can use a builtin routine called cfg_hirni
which is implemented as a Datalad
procedure. Ideally you create your dataset right at the moment you start (planning) your study. Even without any actual
data, there is basic metadata you might already be able to capture by (partially) filling dataset_description.json
,
README
, CHANGELOG
etc. Hirni’s webUI might be of help here.
The idea is then to add all data to the dataset as it comes into existence. That is, for each acquisition, import the
DICOMs, import all additional data, possibly edit the specification. It’s like writing documentation for your code: If
you don’t do it at the beginning, chances are you’ll never properly do it at all.
You can always edit and add things later, of course.

Import DICOM files¶
Importing the DICOMS consists of several steps. The hirni-import-dcm
command will help you, given you can provide it
a tarball containing all DICOMs of an acquisition (internal structure of the tarball doesn’t matter). Of course you can
achieve the same result differently.
The first step is to retrieve the tarball, of course, extract its content and create a dataset from it. If you passed an
acquisition directory to the command it will create this dataset in dicoms/
underneath that directory. Otherwise it
is created at a temporary location.
Then the DICOM metadata is extracted. If the acquisition directory wasn’t given, a name for the acquisition is derived
from that metadata (how exactly this is done is configurable) the respective directory created and the dataset is moved
into it from its temporary location.
Either way there’s a new subdataset beneath the respective acquisition directory by now and it provides extracted DICOM
metadata. Note, that the metadata doesn’t technically describe DICOM files, but rather image series that are found in
those files. The final step is now to use that metadata to derive a specification. This is done by hirni-dicom2spec
,
which automatically is called by hirni-import-dcm
. However, if you need to skip hirni-ìmport-dcm
for whatever
reason (say you already have a DICOM dataset you want to use instead of creating a new one by such a tarball), you can
run hirni-dicom2spec
. How the rule system is used to derive the specification deserves its own
chapter (at least if you wish to adjust those rules). This should now result in a
studyspec.json
within the respective acquisition directory. You can now review the autogenerated entries and correct
or enhance them.
Add arbitrary data¶
Once an acquisition is established within a study dataset, you may add arbitrary additional files to that acquisition.
Protocols, stimulation log files, other data modalities … whatever else belongs to that acquisition. There are no
requirements on how to structure those additional files within the acquisition directory.
A specification for arbitrary data can be added as well, of course. It works the exact same way as for the DICOM data,
with the only exception that there’s no automated trial to derive a specification from the data. There is, however, the
command hirni-spec4anything
to help with the creation of such a specification. It will fill the specification not
based on the data, but based on what is already specified (for the DICOMs, for example). So, hirni-spec4anything
will assume that specification values, that are unambiguous throughout the existing specification of an acquisition, are
valid for additional data as well. For example, if all existing specifications of an acquisition agree on a subject
identifier, this will be the guess for additional files.
This is how to create such a study dataset including its specification. See also this example.
Convert your dataset¶
The conversion of such datasets is meant to target a new dataset. That is, you create a new, empty dataset which is the
target of the conversion and make the study dataset a subdataset of that new one. Thereby the converted dataset keeps a
reference to the data is was created from. From within the target dataset you can then call hirni-spec2bids
to
execute the actual conversion as specified in the specification files.
Note, that it is not required to convert the entire dataset at once. Instead, the conversion is called on particular
specification files and can be further limited to convert a particular type of data as listed in the respective
specification file.
Furthermore, hirni-spec2bids
comes with an --anonymize
switch. This will do several things: It will choose what
subject identifier to use in the converted dataset. For that a specification has a subject and a anon_subject field
to chose from. So, usually subject will contain the identifier as it comes from the DICOMs (likely pseudo-anonymized),
while anon_subject allows you to specify an anonymized identifier in addition.
Secondly, --anonymize
will cause the conversion to encrypted generated commit messages in order to disguise possibly
revealing paths. Finally, conversion procedures listed in specifications can declare to be executed only if the
--anonymize
switch was used. This mechanism allows to trigger things like a defacing after the conversion of DICOM
to Nifti.
An example of such a conversion is to be found here.

Basic usage examples¶
An example study¶
This is a simple example showing how to create a study dataset with datalad-hirni and how to import data into such a dataset. The raw data we use for this demo is publicly available from two example repositories at GitHub. For reference what this data is about, simply visit https://github.com/datalad/example-dicom-functional/ and https://github.com/datalad/example-dicom-structural/ in your browser. For all commands to work in the exact form shown here, create a directory for that demo first and switch into it:
% mkdir demo
% cd demo
Creating a raw dataset¶
First off, we need a study raw dataset to bundle all raw data in a structured way:
% datalad create my_raw_dataset
% cd my_raw_dataset
% datalad run-procedure cfg_hirni
The first command will create a datalad dataset with nothing special about it. The last, however, runs a hirni procedure, that will do several things to make this a study dataset.
Apart from setting some configurations like enabling the extraction of DICOM metadata, it will create a default README file, a dataset_description.json template file, an initial study specification file and it will install hirni’s toolbox dataset dataset as a subdataset of my_raw_dataset.
Note, that by default the toolbox is installed from GitHub. If you need to install from elsewhere, you can set the datalad.hirni.toolbox.url config to point to another URL prior to running cfg_hirni
.
It now should look like this:
% tree -L 2
.
├── code
│ └── hirni-toolbox
├── dataset_description.json
├── README
└── studyspec.json
And from a datalad perspective like this:
% datalad ls -r
. [annex] master ✗ 2019-02-28/12:37:01 ✓
code/hirni-toolbox [annex] master ✗ 2019-02-27/21:23:59 ✓
We now have an initial study dataset and should start by editing the study metadata, which is stored in dataset_description.json. For convenience when doing this manually we can use hirni’s web UI:
% datalad webapp --dataset . hirni
The output following this command should end reading Running on http://127.0.0.1:5000/ (Press CTRL+C to quit). Now this address can be opened in a browser and should look like this:

Choose “Edit study metadata” (we have no acquisition yet) to get to this form:

It’s not required to fill this at this point (technically it’s not required to be filled at any point), but generally recommended to record whatever information you have ASAP into that dataset. Its recorded history is just as useful as you allow it to be.
Acquiring data¶
Now, we want the actual data. To import a DICOM tarball into the study dataset, there is a dedicated hirni command hirni-import-dcm. This will add the DICOMS to our dataset, extract metadata from their headers and derive a specification for each series it finds in those DICOM files. A hirni study dataset is supposed to put all data of each acquisition into a dedicated subdirectory, which also contains a specification file for that acquisition. We can give the command a name for such an acquisition or let it try to derive one from what it finds in the DICOM headers. Everything that is automatically concluded from the metadata can be overwritten by options to that command, of course. Something that can’t automatically be derived, of course, are anonymized subject identifiers. This association will be needed for anonymized conversion. You can add those IDs later, of course, but we can do it right from the start via the option –anon-subject. datalad hirni-import-dcm can import such tarballs either from a local path or an URL. For this demo we use the above mentioned example data available from GitHub:
% datalad hirni-import-dcm --anon-subject 001 https://github.com/datalad/example-dicom-structural/archive/master.tar.gz acq1
This should create a new acquisition directory acq1, containing a studyspec.json and a subdataset dicoms. Note, that this subdataset contains the original tarball itself (in a hidden way) and the extracted DICOMS. However, as long as we don’t need to operate on the DICOM files, we don’t really their content to be there in extracted form. This is why hirni-import-dcm results in the DICOM files having no content. We can get them again any time via datalad get acq1/dicoms/*.
Import the second acquisition the same way:
% datalad hirni-import-dcm --anon-subject 001 https://github.com/datalad/example-dicom-functional/archive/master.tar.gz acq2
Note, that this imports and extracts metadata from about 6000 DICOM files. It will take a few minutes. This time we have something else to import for that acquisition: the events file. Generally, you can add arbitrary files to the dataset. Protocols, logfiles, physiological data, code - it is meant to bundle all raw data of study. The functional data already provides an events.tsv file and therefore we can find it already in the dicoms subdataset we just created. Since such a file is usually not included in a DICOM tarball you’d start with, lets pretend it’s not actually in that archive and import it separately again. We use git annex addurl to retrieve that file and then save the new state of our dataset by calling datalad save:
% git annex addurl https://github.com/datalad/example-dicom-functional/raw/master/events.tsv --file acq2/events.tsv
% datalad save --message "Added stimulation protocol for acquisition 2"
Note
The calls to git annex addurl and datalad save currently replace a single call to datalad download-url due to a bug in that command.
Please note, that the choice where exactly to put such a file within an acquisition directory is entirely up to you. datalad-hirni doesn’t expect any particular structure within an acquisition. As long as the specification files are correctly referencing the locations of the data, everything is fine. Now, for a later conversion there is no general conversion rule for tsv files. We need to tell the system what it is supposed to do with that file (if anything) on conversion. For that, we add a specification for that file using hirni-spec4anything. This command allows to add (or replace) a specification for arbitrary things. By default it will generate a specification that already “inherits” everything, that is unambiguously uniform in the existing specifications of that acquisition. That means, if our automatically created specification for the functional DICOMs managed to derive all required BIDS terms (in this case it’s about “subject”, “task” and “run”) and their values for the dicomseries, spec4anything will use that as well for the new specification (except we overrule this). So, all we need to do here, is to specify a conversion routine. For correct BIDS conversion we only need to copy that file to its correct location. Such a “copy-converter” is provided by the toolbox we have installed at the beginning. Editing or adding such a specification is again possible via the webUI. For the purpose of this demo, however, we will this time use the command line to show how that looks like:
% datalad hirni-spec4anything acq2/events.tsv --properties '{"procedures": {"procedure-name": "copy-converter", "procedure-call": "bash {script} {{location}} {ds}/sub-{{bids-subject}}/func/sub-{{bids-subject}}_task-{{bids-task}}_run-{{bids-run}}_events.tsv"}, "type": "events_file"}'
What we pass here into the properties option is a JSON string. This is the underlying structure of what you can see in the webUI. The necessary quoting/escaping at the command line is admittedly not always easy for manual editing. Note, that instead of such a string you can also pass a path to JSON file. (and more generally: All of Datalad and the datalad-hirni extension is accessible via a Python API as well) For a more extensive description of the specification (and therefore those properties) see the specification page.
If you ran all the commands in this demo the exact same way as posted, your dataset should now look exactly like this: https://github.com/psychoinformatics-de/hirni-demo For comparison you can examine it on GitHub or install it locally to have a closer look via:
% cd ..
% datalad install -s https://github.com/psychoinformatics-de/hirni-demo --recursive
We now bound all information on that study and its acquisitions in its native, absolutely unmodified form together in a dataset that can now serve as a starting point for any kind of processing. This dataset is much less likely to suffer from software bugs than a ready-to-analyze dataset with NIfTIs etc, but the software stack that actually touched the data files is minimal.
DICOM datasets that have been imported into a study raw dataset can (additionally) be collected in scanner (or institution or lab) specific superdatasets. This allows for convenient record keeping of all relevant MR data acquisitions ever made in a given context. The example script at the bottom of this page shows how to bootstrap such a database.
Such superdatasets are lightweight, as they do not contain actual imaging data, and can be queried using a flexible language. In the DICOM context it is often desired to limit the amount of metadata to whole datasets and their image series. This can be achieved using the following configuration, which only needs to be put into the top-most dataset, not every DICOM dataset:
% cat .datalad/config
[datalad "dataset"]
id = 349bb81a-1afe-11e8-959f-a0369f7c647e
[datalad "search"]
index-autofield-documenttype = datasets
default-mode = autofield
With this setup the DataLad search command will automatically discover metadata for any contained image series, and build a search index that can be queried for values in one or more individual DICOM fields. This allows for a variety of useful queries.
Example queries¶
Report scans made on any male patients in a given time span:
% datalad search dicom.Series.AcquisitionDate:'[20130410 TO 20140101]' dicom.Series.PatientSex:'M'
search(ok): lin/7t/xx99_2022/dicoms (dataset)
Report any scans for a particular subject ID:
% datalad search 'xx99*'
[INFO ] Query completed in 0.019682836998981657 sec. Reporting up to 20 top matches.
search(ok): lin/7t/xx99_2022/dicoms (dataset)
search(ok): lin/7t/xx99_2014/dicoms (dataset)
search(ok): lin/7t/xx99_2015/dicoms (dataset)
search(ok): lin/3t/xx99_0138/dicoms (dataset)
search(ok): lin/3t/xx99_0139/dicoms (dataset)
search(ok): lin/3t/xx99_0140/dicoms (dataset)
action summary:
search (ok: 6)
For each search hit ALL available metadata is returned. This allows for sophisticated output formatting. Here is an example that reports all studies a particular subject has participated in:
% datalad -f '{metadata[dicom][Series][0][StudyDescription]}' search -f 'xx99*' | uniq
[INFO] Query completed in 0.02244874399912078 sec. Reporting up to 20 top matches.
Studies^Michael_Hanke
transrep2
Transrep2
Demo script to bootstrap a DICOM database from scan tarballs¶
The following script shows how a bunch of DICOM tarballs from two different scanners can be imported into a DataLad superdataset for each scanner. Those two scanner datasets are than assembled into a joint superdataset for acquisition hardware of the institution. Metadata from any acquisition session can then be aggregated into this dataset, to track all acquisitions made on those devices, as well as to be able to query for individual scan sessions, DICOM series, or individual DICOM images (see above for query examples).
# create a super dataset that will have all acquisitions the 7T ever made
datalad create 7t
cd 7t
datalad run-procedure cfg_hirni
# import a bunch of DICOM tarballs (simulates daily routine)
datalad hirni-import-dcm \
/home/data/psyinf/forrest_gump/pandora/data/xx99/raw/dicom/xx99_2022.20130410.103515.930000.tar.gz
datalad hirni-import-dcm \
/home/data/psyinf/forrest_gump/7T_ad/data/xx99/raw/dicom/xx99_2014.20130408.123933.087500.tar.gz
datalad hirni-import-dcm \
/home/data/psyinf/forrest_gump/7T_ad/data/xx99/raw/dicom/xx99_2015.20130408.140515.147500.tar.gz
# done for now
cd ..
# now the same for 3t
datalad create 3t
cd 3t
datalad run-procedure cfg_hirni
# import a bunch of DICOM tarballs
datalad hirni-import-dcm \
/home/data/psyinf/forrest_gump/3T_av_et/mri/xx99_0138.20140425.121603.06.tar.gz
datalad hirni-import-dcm \
/home/data/psyinf/forrest_gump/3T_av_et/mri/xx99_0139.20140425.142752.07.tar.gz
datalad hirni-import-dcm \
/home/data/psyinf/forrest_gump/3T_visloc/mri/xx99_0140.20140425.155736.23.tar.gz
# done
cd ..
# one dataset for the entire institute's scan (could in turn be part of one that also
# includes other modalities/machines)
# this first part only needs to be done once
datalad create lin
cd lin
datalad install -d . -s ../7t
datalad install -d . -s ../3t
# this second part needs to be done every time the metadata DB shall be updated
# get the latest state of the scanner datasets (no heavy stuff is moved around)
datalad update --merge -r
# aggregate from the aggregated metadata
datalad meta-aggregate -r
# ready to search
Demo: Conversion to BIDS¶
This demo shows how to convert a hirni study dataset into a BIDS compliant dataset. The study dataset we use is the one created by the study dataset demo. We will use a published version of that dataset available from GitHub, but you can also build it yourself by following said demo and use that one.
BIDS Dataset¶
The idea is to create a new dataset, that will become the BIDS Dataset and reference our study dataset - that bundles all the raw data - by making it a subdataset of the derived one. Please note, that this does NOT mean, that the new BIDS dataset contains the raw data. It just references it and thereby creates a fully reproducible history record of how it came to be. The study dataset does NOT need to be shared if you want to share the BIDS dataset. Rather it is possible to trace everything back to the original raw data for everyone who has the BIDS dataset IF he also has access/permission to get that subdataset.
In order to get our to-be BIDS dataset from the raw dataset, we create a new dataset and run the cfg_bids procedure to configure it:
% datalad create demo_bids
% cd demo_bids
% datalad run-procedure cfg_bids
Now we install our study dataset as a subdataset into our new dataset at its subdirectory sourcedata. By that, we reference the exact state of our study dataset at the moment of installation. While this may create some data duplication, please note several things: First, the new subdataset doesn’t need to hold all of the actual content of the study dataset’s files (although it can retrieve it, it doesn’t by default during installation). Rather it’s about referencing the input data (including the code and environments in hirni’s toolbox) at their exact version to achieve full reproducibility. We can thereby track the converted data back to the raw data and the exact conversion routine that brought it into existence. Second, this subdataset can later be removed by datalad uninstall, freeing the space on the filesystem while keeping the reference:
% datalad install --dataset . --source https://github.com/psychoinformatics-de/hirni-demo sourcedata --recursive
Note, that if you want to use a local study dataset (i.e. created yourself via the study dataset demo) you can simply replace that URL with the path to your local one.
The actual conversion is based on the specification files in the study dataset. You can convert a single one of them (meaning: Everything such a file specifies) or an arbitrary number, including everything at once, of course. Lets first convert the study level specification and second all the acquisitions by the following call:
% datalad hirni-spec2bids --anonymize sourcedata/studyspec.json sourcedata/*/studyspec.json
The anonymize switch will cause the command to use the anonymized subject identifiers and encode all records of where exactly the data came from into hidden sidecar files, that can then be excluded from publishing/sharing this dataset.
datalad hirni-spec2bids will run datalad procedures on the raw data as specified in the specification files (remember for example that we set a procedure “copy-converter” for our events.tsv file). Those procedures are customizable. The defaults we are using here, come from hirni’s toolbox dataset. The default procedure to convert the DICOM files uses a containerized converter. It will NOT use, what you happen to have locally, but this defined and in the datasets referenced environment to do the conversion. This requires a download of that container (happens automatically) and enables the reproducibility of this routine, since the exact environment the conversion was ran in will be recorded in the dataset’s history. In addition, this will cause datalad to retrieve the actual data of the study subdataset in sourcedata. Remember that you can datalad uninstall that subdataset after conversion or use datalad drop to throw away its copy of particular files. If you use the BIDS-Validator (https://bids-standard.github.io/bids-validator/) to check the resulting dataset, there should be an error message, though. This is because our events.tsv file references stimuli files, we don’t actually have available to add to the dataset. For the purpose of this demo, this should be fine.
Other than that, we have a valid BIDS dataset now, that can be used with BIDS-Apps or any kind of software that is able to deal with this standard. Since we have the raw data in a subdataset, we can aggregate DICOM metadata from it into the BIDS dataset, which would be available even when the study dataset was uninstalled from the BIDS dataset. If we keep using datalad-run / datalad containers-run for any processing to follow (as hirni internally does), we are able to trace back the genesis and evolution of each file to the raw data, the exact code and the environments it ran in to alter this file or bring it into its existence.
Demo: A Reproducible GLM Analysis¶
order: | 483 |
---|
This demo shows how to use datalad and datalad-hirni datasets to perform and record reproducible analyses. We will again use the study dataset as created by the respective demo to provide the raw data.
Prepare the Data for Analysis¶
Before analyzing imaging data, we typically have to convert them from their original DICOM format into NIfTI files. We gain a lot here by adopting the BIDS standard. Up front, it saves us the effort of creating an ad-hoc directory structure. But more importantly, by structuring our data in a standard way (and an increasingly common one), it opens up possibilities for us to easily feed our dataset into existing analysis pipelines and tools.
For the purpose of this demo, we will simply list the commands needed to get a BIDS dataset from the study dataset. For reference: This is the exact same thing we do in the conversion demo
% datalad create localizer_scans
% cd localizer_scans
% datalad run-procedure cfg_bids
% datalad install --dataset . --source https://github.com/psychoinformatics-de/hirni-demo sourcedata --recursive
% datalad hirni-spec2bids --anonymize sourcedata/studyspec.json sourcedata/*/studyspec.json
We should now have a BIDS dataset looking like this:
% tree -L 2
.
├── dataset_description.json
├── participants.tsv -> .git/annex/objects/KF/5x/MD5E-s50--12d3834067b61899d74aad5d48fd5520.tsv/MD5E-s50--12d3834067b61899d74aad5d48fd5520.tsv
├── README
├── sourcedata
│ ├── acq1
│ ├── acq2
│ ├── code
│ ├── dataset_description.json
│ ├── README
│ └── studyspec.json
├── sub-001
│ ├── anat
│ ├── func
│ └── sub-001_scans.tsv -> ../.git/annex/objects/8v/Gj/MD5E-s179--2e78ce543c5bcc8f0b462b7c9b334ad2.tsv/MD5E-s179--2e78ce543c5bcc8f0b462b7c9b334ad2.tsv
└── task-oneback_bold.json -> .git/annex/objects/3J/JW/MD5E-s1452--62951cfb0b855bbcc3fce91598cbb40b.json/MD5E-s1452--62951cfb0b855bbcc3fce91598cbb40b.json
% datalad subdatasets -r
subdataset(ok): sourcedata (dataset)
subdataset(ok): sourcedata/acq1/dicoms (dataset)
subdataset(ok): sourcedata/acq2/dicoms (dataset)
subdataset(ok): sourcedata/code/hirni-toolbox (dataset)
action summary:
subdataset (ok: 4)
We are now done with the data preparation. We have the skeleton of a BIDS-compliant dataset that contains all data in the right format and using the correct file names. In addition, the computational environment used to perform the DICOM conversion is tracked, as well as a separate dataset with the input DICOM data. This means we can trace every single file in this dataset back to its origin, including the commands and inputs used to create it.
This dataset is now ready. It can be archived and used as input for one or more analyses of any kind. Let’s leave the dataset directory now:
% cd ..
A Reproducible GLM Demo Analysis¶
With our raw data prepared in BIDS format, we can now conduct an analysis. We will implement a very basic first-level GLM analysis using FSL that runs in just a few minutes. We will follow the same principles that we already applied when we prepared the localizer_scans dataset: the complete capture of all inputs, computational environments, code, and outputs.
Importantly, we will conduct our analysis in a new dataset. The raw localizer_scans dataset is suitable for many different analysis that can all use that dataset as input. In order to avoid wasteful duplication and to improve the modularity of our data structures, we will merely use the localizer_scans dataset as an input, but we will not modify it in any way. We will simply link it in a new analysis dataset the same way we did during conversion with the raw data:
% datalad create glm_analysis
% cd glm_analysis
Following the same logic and commands as before, we will add the localizer_scans dataset as a subdataset of the new glm_analysis dataset to enable comprehensive tracking of all input data within the analysis dataset:
% datalad install --dataset . --source ../localizer_scans inputs/rawdata
Regarding the layout of this analysis dataset, we unfortunately cannot yet rely on automatic tools and a comprehensive standard (but such guidelines are actively being worked on). However, DataLad nevertheless aids efforts to bring order to the chaos. Anyone can develop their own ideas on how a dataset should be structured and implement these concepts in dataset procedures that can be executed using the datalad run-procedure command. Here we are going to adopt the YODA principles: a set of simple rules on how to structure analysis dataset. But here, the only relevant aspect is that we want to keep all analysis scripts in the code/ subdirectory of this dataset. We can get a readily configured dataset by running the YODA setup procedure:
% datalad run-procedure cfg_yoda
Before we can fire up FSL for our GLM analysis, we need two pieces of custom code:
- a small script that can convert BIDS events.tsv files into the EV3 format that FSL can understand, available at https://raw.githubusercontent.com/myyoda/ohbm2018-training/master/section23/scripts/events2ev3.sh
- an FSL analysis configuration template script available at https://raw.githubusercontent.com/myyoda/ohbm2018-training/master/section23/scripts/ffa_design.fsf
Any custom code needs to be tracked if we want to achieve a complete record of how an analysis was conducted. Hence we will store those scripts in our analysis dataset. We use the datalad download-url command to download the scripts and include them in the analysis dataset:
% datalad download-url --path code \
https://raw.githubusercontent.com/myyoda/ohbm2018-training/master/section23/scripts/events2ev3.sh \
https://raw.githubusercontent.com/myyoda/ohbm2018-training/master/section23/scripts/ffa_design.fsf
Note, that the commit message shows the URL where each script has been downloaded from:
% git log
At this point, our analysis dataset contains all of the required inputs. We only have to run our custom code to produce the inputs in the format that FSL expects. First, let’s convert the events.tsv file into EV3 format files. We use the datalad run command to execute the script at code/events2ev3.sh. It requires the name of the output directory (use sub-001) and the location of the BIDS events.tsv file to be converted. The –input and –output options are used to let DataLad automatically manage these files for you. Important: The subdataset does not actually have the content for the events.tsv file yet. If you use –input correctly, DataLad will obtain the file content for you automatically. Check the output carefully, the script is written in a sloppy way that will produce some output even when things go wrong. Each generated file must have three numbers per line:
% datalad run -m 'Build FSL EV3 design files' \
--input inputs/rawdata/sub-001/func/sub-001_task-oneback_run-01_events.tsv \
--output 'sub-001/onsets' \
bash code/events2ev3.sh sub-001 {inputs}
Now we’re ready for FSL! And since FSL is certainly not a simple, system program, we will use it in a container and add that container to this analysis dataset. A ready-made container with FSL (~260 MB) is available from shub://ReproNim/ohbm2018-training:fsln. Use the datalad containers-add command to add this container under the name fsl. Then use the datalad containers-list command to verify that everything worked:
% datalad containers-add fsl --url shub://ReproNim/ohbm2018-training:fsln
% datalad containers-list
With this we have completed the analysis setup. At such a milestone it can be useful to label the state of a dataset that can be referred to later on. Let’s add the label ready4analysis here:
% datalad save --version-tag ready4analysis
All we have left is to configure the desired first-level GLM analysis with FSL. The following command will create a working configuration from the template we stored in code/. It uses the arcane, yet powerful sed editor. We will again use datalad run to invoke our command so that we store in the history how this template was generated (so that we may audit, alter, or regenerate this file in the future — fearlessly):
% datalad run \
-m "FSL FEAT analysis config script" \
--output sub-001/1stlvl_design.fsf \
bash -c 'sed -e "s,##BASEPATH##,{pwd},g" -e "s,##SUB##,sub-001,g" \
code/ffa_design.fsf > {outputs}'
The command that we will run now in order to compute the analysis results is a simple feat sub-001/1stlvl_design.fsf. However, in order to achieve the most reproducible and most portable execution, we should tell the datalad containers-run command what the inputs and outputs are. DataLad will then be able to obtain the required NIfTI time series file from the localizer_scans raw subdataset. The following command takes around 5 minutes to complete on an average system:
% datalad containers-run --container-name fsl -m "sub-001 1st-level GLM" \
--input sub-001/1stlvl_design.fsf \
--input sub-001/onsets \
--input inputs/rawdata/sub-001/func/sub-001_task-oneback_run-01_bold.nii.gz \
--output sub-001/1stlvl_glm.feat \
fsl5.0-feat '{inputs[0]}'
Once this command finishes, DataLad will have captured the entire FSL output, and the dataset will contain a complete record all the way from the input BIDS dataset to the GLM results (which, by the way, performed an FFA localization on a real BOLD imaging dataset, take a look!). The BIDS subdataset in turn has a complete record of all processing down from the raw DICOMs onwards.
Get Ready for the Afterlife¶
Once a study is complete and published it is important to archive data and results, for example, to be able to respond to inquiries from readers of an associated publication. The modularity of the study units makes this straightforward and avoid needless duplication. We now that the raw data for this GLM analysis is tracked in its own dataset (localizer_scans) that only needs to be archived once, regardless of how many analyses use it as input. This means that we can “throw away” this subdataset copy within this analysis dataset. DataLad can re-obtain the correct version at any point in the future, as long as the recorded location remains accessible. We can use the datalad diff command and git log to verify that the subdataset is in the same state as when it was initially added. Then use datalad uninstall to delete it:
% datalad diff -- inputs/rawdata
% git log -- inputs/rawdata
% datalad uninstall --dataset . inputs/rawdata --recursive
Before we archive these analysis results, we can go one step further and verify their computational reproducibility. DataLad provides a rerun command that is capable of “replaying” any recorded command. The following command we re-execute the FSL analysis (the command that was recorded since we tagged the dataset as “ready4analysis”). It will record the recomputed results in a separate Git branch named “verify” of the dataset. We can then automatically compare these new results to the original ones in the “master” branch. We will see that all outputs can be reproduced in bit-identical form. The only changes are observed in log files that contain volatile information, such as time steps:
# rerun FSL analysis from scratch (~5 min)
% datalad rerun --branch verify --onto ready4analysis --since ready4analysis
% # check that we are now on the new `verify` branch
% git branch
% # compare which files have changes with respect to the original results
% git diff master --stat
% # switch back to the master branch and remove the `verify` branch
% git checkout master
% git branch -D verify
So, hopefully we’ve shown that:
- we can implement a complete imaging study using DataLad datasets to represent units of data processing
- each unit comprehensively captures all inputs and data processing leading up to it
- this comprehensive capture facilitates re-use of units, and enables computational reproducibility
- carefully validated intermediate results (captured as a DataLad dataset) are a candidate for publication with minimal additional effort
Details¶
Specification¶
Overview¶
The specification of a study dataset captures metadata on data entities within that dataset and in particular is meant to specify how exactly to convert them to a targeted data/layout standard (to BIDS). Technically this mechanism is by no means limited to the BIDS standard but it was build with BIDS in mind.
By default this specification is distributed throughout several files with the default name studyspec.json
in the
dataset’s root directory as well as each acquisition directory. However, not only can you change the default name (via
configuration variable datalad.hirni.studyspec.filename
), but you can also add an arbitrary
number of such files at arbitrary locations within the dataset. This is because any command that would touch those files
can be given a path to the file(s) to consider. Furthermore, the conversion can be restricted to particular
specification files and within them to a particular type of entities to convert. It might therefore be desirable to
distribute the specification in a more fine-grained fashion. Last but not least this might also ease the generation and
editing of those files.
As the name suggests, the files themselves are basically JSON files except they are what is called a JSON stream. This means, that each line in the files is valid JSON dictionary, which is what we call a specification snippet. Each of those dictionaries is the specification of a single data entity. Because of that, these files can be processed on a per-line basis, which for example allows to ignore some error in one snippet and proceed with the next one, instead of needing to evaluate the entirety of the content only to be able to start processing the actual content, which would be the case if the file was plain JSON (that is: one single structure, in that case a list of dictionaries). However, this comes at a cost: Some standard tools/libraries for handling JSON might not be able to deal with those files out of the box.
- TODO
- point out, how exactly those specifications are used and link to respective sections on conversion and customization
- add a few words regarding the term “data entities”
Structure¶
- TODO
- required fields
- approve flag
- (semi-)autogeneration
- tags, procedures
- arbitrary additions
Customization¶
There are a lot of ways to customize datalad-hirni. Some things are just a matter of configuration settings, while others involve a few lines of (Python) code.
Configuration¶
As a DataLad extension, datalad-hirni uses DataLad’s config mechanism. It just adds some additional variables. If you look for a possible configuration to change some specific behaviour of the commands, refer also to the help pages for those commands. Please don’t hesitate to file an issue on GitHub if there’s something you would like become configurable as well.
- datalad.hirni.toolbox.url
This can be used to overwrite the default url to get the toolbox from. The url is then respected by the
cfg_hirni
procedure. Please note, that therefore it will have no effect, if the toolbox was already installed into your dataset.This configuration may be used to refer to an offline version of hirni’s toolbox or to switch to another toolbox dataset altogether.
- datalad.hirni.studyspec.filename
- Use this configuration to change the default name for specification files (
studyspec.json
). - datalad.hirni.dicom2spec.rules
- Set this to point to a Python file defining rules for how to derive a specification from DICOM metadata. (See below for more on implementing such rules). This configuration can be set multiple times, which will result in those rules overwriting each other. Therefore the order in which they are specified matters, with the later rules overwriting earlier ones. As with any DataLad configuration in general, the order of sources would be system, global, local, dataset. This could be used for having institution-wide rules via the system level, a scanner-based rule at the global level (of a specific computer at the scanner site), user-based and study-specific rules, each of which could either go with what the previous level decided or overwrite it.
- datalad.hirni.import.acquisition-format
- This setting allows to specify a Python format string, that will be used by
datalad hirni-import-dcm
if no acquisition name was given. It defines the name to be used for an acquisition (the directory name) based on DICOM metadata. The default value is{PatientID}
. Something that is enclosed with curly brackets will be replaced by the value of a variable with that name everything else is taken literally. Every field of the DICOM headers is available as such a variable. You could also combine several like{PatientID}_{PatientName}
.
Procedures¶
DataLad procedures are used in different places with datalad-hirni. Wherever this is the case you can use your own procedure instead (or in addition). Most notably procedures are the drivers of the conversion and therefore the pieces used to plugin arbitrary conversion routines (in fact, the purpose of a procedure is up to you - for example, one can use the conversion specification and those procedures for preprocessing as well). The following is an outlining of how this works.
A specification snippet defines a list of procedures and how exactly they are called. Any DataLad procedure can be referenced therein, it is, however, strongly recommended to include them in the dataset they are supposed to run on or possibly in a subdataset thereof (as is the case with the toolbox). For full reproducibility you want to avoid referencing a procedure, that is not tracked by the dataset or any of its subdataset. Sooner or later this would be doomed to become a reference to nowhere.
Those procedures are then executed by datalad hirni-spec2bids
in the order they are appearing in that list. A single
entry in that list is a dictionary, specifying the name of the procedure and optionally a format string to use for
calling it and, also optionally, a flag indicating whether it should be executed only, if datalad hirni-spec2bids
was called with --anonymize
.
For example (taken from the demo dataset, acquisition2)
(a part of) the snippet of the specification for the DICOM image series and another one specifying the use of the
“copy converter” for an events.tsv
file. (See the study dataset demo for context):
{"location":"dicoms",
"dataset-id":"7cef7b58-400d-11e9-a522-e8b1fc668b5e",
"dataset-refcommit":"2f98e53c171d410c4b54851f86966934b78fc870",
"type":"dicomseries:all"
"id":{"approved":false,
"value":401},
"description":{"approved":false,
"value":"func_task-oneback_run-1"},
"anon-subject":{"approved":false,
"value":"001"},
"subject":{"approved":false,
"value":"02"},
"bids-modality":{"approved":false,
"value":"bold"},
"bids-run":{"approved":false,
"value":"01"},
"bids-session":{"approved":false,
"value":null},
"bids-task":{"approved":false,
"value":"oneback"},
"procedures":[
{"procedure-name":{"approved":false,
"value":"hirni-dicom-converter"}
"procedure-call":{"approved":false,
"value":null},}
"on-anonymize":{"approved":false,
"value":false}
}
]
}
{"location":"events.tsv",
"dataset-id":"3f27c348-400d-11e9-a522-e8b1fc668b5e",
"dataset-refcommit":"4cde2dc1595a1f3ba694f447dbb0a1b1ec99d69c",
"type":"events_file",
"id":{"approved":false,"value":401},
"description":{"approved":false,"value":"func_task-oneback_run-1"},
"anon-subject":{"approved":false,"value":"001"},
"subject":{"approved":false,"value":"02"}
"bids-modality":{"approved":false,"value":"bold"},
"bids-run":{"approved":false,"value":"01"},
"bids-session":{"approved":false,"value":null},
"bids-task":{"approved":false,"value":"oneback"},
"procedures":[
{"procedure-name":{"approved":true,
"value":"copy-converter"},
"procedure-call":{"approved":true,
"value":"bash {script} {{location}} {ds}/sub-{{bids-subject}}/func/sub-{{bids-subject}}_task-{{bids-task}}_run-{{bids-run}}_events.tsv"}
}
]
}
Such format strings to define the call can use replacements (TODO: refer to datalad-run/datalad-run-procedure) by
enclosing valid variables with curly brackets, which is then replaced by the values of those variables when this is
executed. For procedures referenced in the specification snippets and executed by datalad hirni-spec2bids
all fields
of the currently processed specification snippets are available for that way of passing them to the procedures. That way
any conversion routine you might want to make (likely wrap into) such a procedure can be made aware of all the metadata
recorded in the respective snippet.
The format strings to define how exactly a particular procedure should be called, can be provided by the procedure
itself, if that procedure is registered in a dataset. This is treated as a default and can be overwritten by the
specification. If the default is sufficiently generic, the call-format
field in the specification can remain empty.
The only specification field actually mandatory for a procedure is procedure-name
, of course.
- TODO
- have an actual step-by-step example implementation of a (conversion) procedure
Rules¶
The rule system to derive a specification for DICOM image series from the DICOM metadata consists of two parts. One is a configuration determining which existing rule(s) to use and the other is providing such rules that then can be configured to be the one to be used.
- TODO
- config vs. implementation
- TODO
- Say a thing or two about those:
- https://github.com/psychoinformatics-de/datalad-hirni/blob/master/datalad_hirni/resources/rules/custom_rules_template.py
- https://github.com/psychoinformatics-de/datalad-hirni/blob/master/datalad_hirni/resources/rules/test_rules.py
- likely walk through a reasonably small example implementation
Command manuals¶
Hirni commands¶
datalad-hirni-import-dcm¶
Synopsis¶
datalad-hirni-import-dcm [-h] [-d PATH] [--subject SUBJECT] [--anon-subject ANON_SUBJECT] [--properties PATH or JSON string] PATH [ACQUISITION ID]
Description¶
Import a DICOM archive into a study raw dataset.
This creates a subdataset, containing the extracted DICOM files, under ACQUISITION ID/dicoms. Metadata is extracted from the DICOM headers and a study specification will automatically be prefilled, based on the metadata in DICOM headers. The specification is written to AQUISTION ID/studyspec.json by default. To this end after the creation of the subdataset and the extraction of DICOM metadata, hirni-dicom2spec is called internally. Therefore whatever you configure regarding dicom2spec applies here as well. Please refer to hirni-dicom2spec’s documentation on how to configure the deduction from DICOM metadata to a study specification.
Options¶
PATH¶
path or URL of the dicom archive to be imported. Constraints: value must be a string
ACQUISITION ID¶
acquisition identifier for the imported DICOM files. This is used as the name the of directory, that is supposed to contain all data related to that acquisition. If not specified, an attempt will be made to derive ACQUISITION_ID from DICOM metadata. You can specify how to deduce that identifier from the DICOM header fields by configuring DATALAD.HIRNI.IMPORT.ACQUISITION-FORMAT with a python format string referencing DICOM header field names as variables. For example, the current default value for that configuration is “{PatientID}”. Constraints: value must be a string
-h, –help, –help-np¶
show this help message. –help-np forcefully disables the use of a pager for displaying the help message
-d PATH, –dataset PATH¶
specify the dataset to import the DICOM archive into. If no dataset is given, an attempt is made to identify the dataset based on the current working directory and/or the given PATH. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path)
–subject SUBJECT¶
subject identifier. If not specified, an attempt will be made to derive SUBJECT from DICOM headers. See hirni-dicom2spec for details. Constraints: value must be a string
–anon-subject ANON_SUBJECT¶
an anonymized subject identifier. This is needed for anonymized conversion via spec2bids –anonymize and will be stored in the specification snippet for the imported DICOMs. Hence it can be added later and isn’t mandatory for the import. Constraints: value must be a string
–properties PATH or JSON string¶
a JSON string or a path to a JSON file, to provide overrides/additions to the to be created specification snippets for this acquisition. Constraints: value must be a string
Authors¶
datalad is developed by DataLad developers <team@datalad.org>.
datalad-hirni-dicom2spec¶
Synopsis¶
datalad-hirni-dicom2spec [-h] [-s SPEC] [-d DATASET] [--subject SUBJECT] [--anon-subject ANON_SUBJECT] [--acquisition ACQUISITION] [--properties PATH or JSON string] PATH [PATH ...]
Description¶
Derives a specification snippet from DICOM metadata and stores it in a JSON file.
The derivation is based on a rule system. You can implement your own rules as a python class. See the documentation page on customization for details. If you have such rules in dedicated files, their use and priority is configured via the datalad.hirni.dicom2spec.rules config variable. It takes a path to a python file containung such a rule definition. This configuration can be specified multiple times and at different levels (system-wide, user, dataset, local repository). If there are indeed several occurences of that configuration, the respective rules will be applied in order. Hence “later” appearances will overwrite “earlier” ones. Thereby you can have institution rules for example and still apply additional rules tailored to your needs or a particular study.
Options¶
PATH¶
path to DICOM files. Constraints: value must be a string
-h, –help, –help-np¶
show this help message. –help-np forcefully disables the use of a pager for displaying the help message
-s SPEC, –spec SPEC¶
file to store the specification in. Constraints: value must be a string
-d DATASET, –dataset DATASET¶
specify a dataset containing the DICOM metadata to be used. If no dataset is given, an attempt is made to identify the dataset based on the current working directory. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path)
–subject SUBJECT¶
subject identifier. If not specified, an attempt will be made to derive SUBJECT from DICOM headers. Constraints: value must be a string
–anon-subject ANON_SUBJECT¶
TODO. Constraints: value must be a string
–acquisition ACQUISITION¶
acquisition identifier. If not specified, an attempt will be made to derive an identifier from DICOM headers. Constraints: value must be a string
–properties PATH or JSON string¶
Constraints: value must be a string
Authors¶
datalad is developed by DataLad developers <team@datalad.org>.
datalad-hirni-spec2bids¶
Synopsis¶
datalad-hirni-spec2bids [-h] [-d DATASET] [--anonymize] [--only-type TYPE] [SPEC_FILE [SPEC_FILE ...]]
Description¶
Convert to BIDS based on study specification
Options¶
SPEC_FILE¶
path(s) to the specification file(s) to use for conversion. If a directory at the first level beneath the dataset’s root is given instead of a file, it’s assumed to be an acqusition directory that contains a specification file. By default this is a file named ‘studyspec.json’ in the acquisition directory. This default name can be configured via the ‘datalad.hirni.studyspec.filename’ config variable. Constraints: value must be a string
-h, –help, –help-np¶
show this help message. –help-np forcefully disables the use of a pager for displaying the help message
-d DATASET, –dataset DATASET¶
bids dataset. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path)
–anonymize¶
whether or not to anonymize for conversion. By now this means to use ‘anon_subject’ instead of ‘subject’ from spec and to use datalad-run with a sidecar file, to not leak potentially identifying information into its record.
–only-type TYPE¶
specify snippet type to convert. If given only this type of specification snippets is considered for conversion. Constraints: value must be a string
Authors¶
datalad is developed by DataLad developers <team@datalad.org>.
datalad-hirni-spec4anything¶
Synopsis¶
datalad-hirni-spec4anything [-h] [-d PATH] [--spec-file SPEC_FILE] [--properties PATH or JSON string] [--replace] [PATH [PATH ...]]
Options¶
PATH¶
path(s) of the data to create specification for. Each path given will be treated as a data entity getting its own specification snippet. Constraints: value must be a string
-h, –help, –help-np¶
show this help message. –help-np forcefully disables the use of a pager for displaying the help message
-d PATH, –dataset PATH¶
specify the dataset. If no dataset is given, an attempt is made to identify the dataset based on the current working directory and/or the PATH given. Constraints: Value must be a Dataset or a valid identifier of a Dataset (e.g. a path)
–spec-file SPEC_FILE¶
path to the specification file to modify. By default this is a file named ‘studyspec.json’ in the acquisition directory. This default name can be configured via the ‘datalad.hirni.studyspec.filename’ config variable. Constraints: value must be a string
–properties PATH or JSON string¶
Constraints: value must be a string
–replace¶
if set, replace existing spec if values of ‘type’, ‘location’ ” “and ‘id’ match. Note, that only the first match will be ” “replaced.
Authors¶
datalad is developed by DataLad developers <team@datalad.org>.
Python API¶
Python module reference¶
This module reference extends the manual with a comprehensive overview of the available functionality. Each module in the package is documented by a general summary of its purpose and the list of classes and functions it provides.
import_dicoms |
Import a DICOM tarball into a study dataset |
dicom2spec |
Derive a study specification snippet describing a DICOM series based on the DICOM metadata as provided by datalad. |
spec2bids |
Convert DICOM data to BIDS based on the respective study specification |
spec4anything |
Create specification snippets for arbitrary paths |