OSTI to/from xBRC Conversion
This page describes how this project converts metadata between OSTI E-Link records and the xBRC schema used for BRC data on bioenergy.org.
In this repository, xBRC refers to the cross-BRC metadata model represented
by brc_schema.yaml. It captures the fields needed by the Bioenergy Research
Centers and the bioenergy.org data portal, including BRC affiliation, dataset
landing pages, BRC-specific topics and themes, organism metadata, repository
metadata, and portal-oriented status fields.
The conversion logic lives in:
src/brc_schema/transform/osti_to_brc.yamlsrc/brc_schema/transform/brc_to_osti.yamlsrc/brc_schema/transform.py
When to Use Each Direction
Use osti_to_brc when records have been retrieved from OSTI E-Link and need to
be loaded, validated, or reviewed as xBRC DatasetCollection data:
uv run brcschema transform -T osti_to_brc -o brc_datasets.yaml osti_records.json
Use brc_to_osti when xBRC metadata needs to be prepared for OSTI E-Link
submission or update workflows:
uv run brcschema transform -T brc_to_osti -o osti_records.yaml brc_datasets.yaml
Input files may be JSON or YAML. Output format is selected from the output file extension.
OSTI to xBRC
The osti_to_brc transform expects an OSTI-style wrapper object with a
records array and produces an xBRC DatasetCollection with datasets.
Key mappings include:
| OSTI source | xBRC target | Notes |
|---|---|---|
records |
datasets |
Each OSTI Record becomes one xBRC Dataset. |
title |
title |
Direct mapping. |
description |
description, abstract |
OSTI description is copied into both xBRC fields. |
publication_date or entry_date |
date |
Publication date is preferred. |
osti_id |
identifier |
Rendered as https://www.osti.gov/biblio/<osti_id>. |
doi |
bibliographicCitation |
Rendered as https://doi.org/<doi>. |
keywords or subjects |
keywords |
Comma-separated strings are split into individual keywords. |
persons, authors, organizations |
creator, contributors |
Structured persons are preferred; legacy author strings are fallback input. |
sponsor organizations or sponsor_orgs |
funding |
Award identifiers are copied when available. |
related_identifiers |
relatedItem |
OSTI related identifiers become xBRC related items. |
identifiers, related_identifiers, and legacy identifier fields |
has_related_ids |
Contract numbers are excluded; BioProject URLs and IDs are normalized where possible. |
media |
media |
OSTI media package metadata is preserved. |
site_url or first links entry |
dataset_url |
site_url is preferred. |
workflow_status |
active |
R maps to true; missing status maps to false. |
xBRC Affiliation
The xBRC brc field is derived from OSTI metadata in this order:
- Use
site_ownership_codedirectly when present. - Otherwise inspect contract fields and
CN_DOEidentifiers. - Map known BRC contract numbers:
| Contract number | xBRC brc |
|---|---|
SC0018420 |
CABBI |
SC0018409 |
GLBRC |
AC36-08GO28308 |
CBI |
AC02-05CH11231 |
JBEI |
If neither site_ownership_code nor a known contract number is present, the
transform cannot infer the BRC affiliation.
Current xBRC-Specific Gaps
Some xBRC fields are more specific than the current OSTI E-Link record model or are intended for bioenergy.org display and filtering. These fields are not currently populated from OSTI records unless the transform explicitly maps them:
speciesplasmid_featuresanalysisTypedatasetTypetopicthemecategoryontology_annotations
For species and taxon identifiers specifically, OSTI does not currently provide
a dedicated species slot in this transform. If an OSTI record includes a clear
NCBI Taxonomy identifier in related identifiers, additional transform logic
would be needed before it can populate the xBRC species slot.
xBRC to OSTI
The brc_to_osti transform expects an xBRC DatasetCollection and produces an
OSTI-style object with a records array.
Key mappings include:
| xBRC source | OSTI target | Notes |
|---|---|---|
datasets |
records |
Each xBRC Dataset becomes one OSTI Record. |
title |
title |
Direct mapping. |
description |
description |
Direct mapping. |
date |
publication_date, released_to_osti_date |
The same xBRC date is used for both OSTI fields. |
identifier |
osti_id |
Only extracted when the identifier contains osti.gov/biblio/. |
bibliographicCitation |
doi |
Supports https://doi.org/... and doi:... forms. |
brc |
site_ownership_code |
Direct BRC code passthrough. |
keywords |
keywords |
Direct mapping. |
dataset_url or non-OSTI identifier |
site_url, links |
The dataset landing page is preferred. |
active |
workflow_status |
true maps to R; falsey values map to SR. |
creator, contributors |
authors, persons |
Names are split into OSTI first/middle/last fields when possible. |
funding, brc |
organizations, sponsor_orgs, research_orgs |
xBRC funding and BRC affiliation are converted into OSTI organization structures. |
has_related_ids |
identifiers, related_identifiers |
BioProject IDs and DOI references get special handling. |
media |
media |
Media package metadata is preserved. |
The transform also sets these OSTI defaults:
access_limitations:["UNL"]product_type:DA
BRC-Derived OSTI Fields
When xBRC brc is present, the transform emits BRC-specific OSTI metadata:
site_ownership_code- DOE contract identifiers
- default sponsor organization information when no xBRC funding is provided
- BRC research organization names for OSTI
research_orgs
This behavior is specific to xBRC. Generic metadata conversion tools should not assume that all BRC-derived OSTI fields can be recovered from arbitrary OSTI records.
Round-Trip Expectations
The transforms are designed for operational metadata exchange, not a lossless round trip. Several fields are normalized, inferred, defaulted, or omitted.
Expected non-lossless behavior includes:
- OSTI
descriptionmaps to both xBRCdescriptionandabstract; converting back uses only xBRCdescription. - xBRC
activeis represented as OSTIworkflow_status, not as a native xBRC boolean in OSTI. - xBRC-specific display and filtering fields, such as
topic,theme,category, andontology_annotations, are not currently emitted to OSTI. - OSTI fields without xBRC equivalents may be omitted unless they have an explicit mapping.
- BRC affiliation may be inferred from contract numbers in
osti_to_brc, whilebrc_to_ostiemits contract numbers from the xBRCbrcvalue.
For conversion validation, use the repository test suite:
make test