Writing FaceBase Data Management and Sharing Plans
This page is a work in progress.
Almost every section of a project’s Data Management and Sharing (DMS) Plan will require information specific to that project, which can be provided only by the data contributor. Some sections will additionally require some information about the repository; this guide includes some FaceBase-specific information and text snippets that can be useful in completing those sections.
This guide is based on the NIH optional DMS plan format document.
We refer to section titles from that document in bold; our comments appear in plain text, and our suggestions for text to include in your plan appear in italics.
Element 1: Data Type
A. Types and amount of scientific data expected to be generated in the project:
The Key Concepts for FaceBase Data Contributors guide provides an overview of the types and formats of data we accept. A more exhaustive list of supported data types can be found here. If you plan to submit a data type that isn’t listed, please contact the FaceBase Hub to discuss it.
C. Metadata, other relevant data, and associated documentation:
The basic structure of FaceBase data is described in the Key Concepts for FaceBase Data Contributors guide. FaceBase also has minimum metadata requirements for each type of data. The protocol and metadata requirements for each type of data are described in the Quality Control Rules section of the Key Concepts document. An example of a possible starting point for your metadata description is provided below:
The dataset will include information about the types of experiments performed, the species, developmental stages, and anatomy of biosamples collected, and protocols for each experiment.
In addition, FaceBase requires additional metadata for each type of experiment. For RNA-seq, and scRNA-seq assays:
The metadata for each experiment must include strandedness and read number.
For ChIP-seq assays:
The metadata for each experiment must include strandedness, read number, target of assay, and (for non-control assays) a link to the record for the associated control assay.
Additional standard metadata elements that we collect, and which may or may not be relevant to your project, include:
At the experiment level:
- Molecule Type
- Strandedness
- RNA-seq Selection
- Chromatin Modifier
- Transcription Factor
- Histone Modification
At the biosample level:
- Specimen
- Gene
- Genotype
- Strain
- Mutation
- Stage
- Anatomy
- Origin
- Phenotype
- Treatment
- Sex
- Litter
- Collection Date
Please see the Key Concepts for FaceBase Data Contributors guide for more details.
Element 2: Related Tools, Software and/or Code:
If relevant to your data types, you could include:
Facebase provides web-based tools for visualizing and annotating 2-dimensional imaging data, for visualizing 3-dimensional imaging data, scRNA-seq data, and (through the UCSC Genome Browser) track data on its website, www.facebase.org. All software developed for FaceBase is open-source and hosted on github.
Element 3: Standards
Ontologies used in FaceBase include the following; please list any that apply:
- Anatomy: Uberon
- Chromatin modifier: ZFIN, NGI, HGNC, Ensemble, MGI
- Data type: OBI, SMOMEDCT, CHMO
- Experiment type: MMO, ERO, CHMO, SCTID, OBI, STATO
- Gene: NCBI
- Phenotype: Chemical Methods (CHMO), cmmo, Foundational Model of Anatomy (FMA), Craniofacial Mouse Malformation Ontology (CMMO), MP, HP, DOID
- Sex: UBERON
- Species: NCBI Taxon
- Strain: MGI
- Syndrome: MONDO
- Transcription factor: MGI, ZFIN, Gene_ORFName, Ensembl, HGNC
FaceBase uses these reference genomes: humans (hg38, hg19), mouse (mm9, mm39, mm10), and chimpanzee (panTro4).
Common data formats accepted by FaceBase include:
- Sequencing Data: “raw” sequencing data (fastq files).
- Processed Data: fastqc reports (.fastqc.tgz or .fastqc.zip), count files (.count, .tpm, .fpkm), measures in tab-separated format (.tsv), and alignment mapping files (.bam) and indexes (.bam.bai).
- Track Data: (.bed), bigBed (.bb), and bigWig (.bw) files.
- Array Data: “raw” microarray data (CEL files).
- Imaging Data: high-resolution 3D or 2D imaging data, such as micro-CT accepted in NIfTI format gzipped (.nii.gz), confocal or other microscopy sources in TIFF or OME-TIFF (.tiff or .ome.tiff), and other sources in JPEG (.jpg or .jpeg). Other formats may be considered on an as needed basis.
- Surface Model / Mesh Data: Wavefront OBJ format.
Element 4: Data Preservation, Access, and Associated Timelines
B. How scientific data will be findable and identifiable:
Each data item in FaceBase is provided with a persistent unique record identifier (RID) on creation. In addition, each dataset can be assigned a DataCite DOI.
C. When and how long the scientific data will be made available:
FaceBase will make all data available for the life of the FaceBase project. In addition, FaceBase will make its best effort to preserve a read-only archive of all data other than human-protected data for at least five years after the last data has been deposited to FaceBase.
Element 5: Access, Distribution, or Reuse Considerations
B. Whether access to scientific data will be controlled:
With the exception of protected human subjects data, all other data will be shared publicly after the associated studies are published. Protected human subjects data will be stored in a separate, secure location and shared only after a review is carried out by the FaceBase Data Access Committee and a Data Use Agreement is signed. More information about this process can be found here.
C. Protections for privacy, rights, and confidentiality of human research participants:
The review and approval process described above protects data housed at FaceBase. The Data Use Agreement protects data that has been shared by FaceBase (following the approval process described above).
Element 6: Oversight of Data Management and Sharing
FaceBase will perform its normal quality assurance processes to assure that metadata is complete and that industry best practices are followed for data housed at FaceBase.