Cell Genom. 2023 Aug 18;3(9):100381. doi: 10.1016/j.xgen.2023.100381. eCollection 2023 Sep 13.


It is widely accepted that large-scale genomic data (e.g., whole-genome sequencing, whole-exome sequencing, and genome-wide association study data) be shared through a controlled-access mechanism. This protects the privacy of research participants and ensures downstream uses of data align with participants’ informed consent regarding future sharing of their data. In 2019, GA4GH approved the Data Use Ontology (DUO) standard to define data use terms with machine-readable representations to represent how a dataset can be used. We endeavored to determine the parity of existing data use restrictions (“Data Use Limitations” [DULs]) for datasets registered in the National Institutes of Health database for Genotypes and Phenotypes (dbGaP) with the DUO standard. We found substantial (93%) parity between the dbGaP DULs (n = 3,575) and DUO. This study demonstrates the comprehensiveness of the DUO standard and encourages data stewards to standardize data use restrictions in machine-readable formats to facilitate data sharing.

