To best coordinate FaceBase imaging data, the Hub is providing a list of metadata that must be included with each dataset submitted to the Hub. This annotation will improve our ability to search data and even compare data across species.
The following is a summary of the types of metadata required.
Imaging Metadata for Mouse, Zebrafish and Human
Imaging metadata captures important information about the subject and the imaging files. Some fields are required and where possibly are selectable (as opposed to requiring manual entry). The following are a summary of the sections and fields of metadata. Fields in italics indicate selectable.
- Accession number, Project, Investigator, Funding
- Genotype, Strain, Species, Specimen, Litter ID (mouse only)
- Mutation, Stage, Anatomy, Phenotype, Collection Date
- The following are human only: Diagnosis from OMIM, OMIM Code, Age, Height, Weight, Head Circumference, Molecular/Cytogenetic diagnosis (Yes/No?), Supporting Diagnostic Evidence, and Notes
- Modality/Device, Model, File Type/Format, Intent, Filename, File Checksum, Time, File Size, Description, Caption, Dimensionality, VoxelX, VoxelY, VoxelZ, ImageSizeX, ImageSizeY, ImageSizeZ
Bioinformatics, including CHiP-Seq, RNA-Seq and microarray, require even more level of detail:
- Same as above
- Same as above
- Same as above plus Origin
- Replicate and Biosample:
- Observation Accession number, Biological Replicate ID, Sample Accession Number, Sample Composition, Sample Purification, Markers, Isolation Protocol, Cell Count, Protocol, Pretreatment, Fragmentation Method, Reagent, Reagent Source, Reagent Catalog Number, Reagent Batch Number, Selection
- Library Preparation:
- Sample Accession Number, Library Adapters, PCR Cycles, Library Yield, Size Selection
- Sequencing Stats:
- Platform, Sequencing Quality, Abundant Sequence Contamination Statistics, Non-source Contamination, Where Quality Scores Altered
- Aligner, Aligner Version, Aligner Flags, Reference Genome, Transcriptome Model, Sequence Trimming, Trimmed Sequences, Trimming Method, Duplicate Removal, Pre-alignment Sequence Removal
- Visualization Tracks
- Format, Software Used to Generate, Version, Settings
- Data File:
- Sample Accession Number, Data File Accession Number, Equipment Model, File Type (Format), Filename, File Checksum, Time, File Size, Description, Caption, Dimensionality
- Mice-specific Section:
- Sample Accession Number, Estimate of enrichment or homogeneity of sample from associated biological elements, RNA information, Targeted size range/Results, Type pf RNA targeted (poly A+ or A-), Type of RNA capping (5' capped or uncapped), Protocol to isolate RNA, Amount of ribosomal RNA present in Poly A+, Does experiment generate strand-specific information? (y/n), Amplification strategy, Method and estimated fold of amplification, Performance of RNA Seq experiment, Method cDNA sequencing (short <200nt or long >200nt), Optimal levels of sample (cluster numbers for illumina), Were spike-ins used? (y/n), Amount & source of spike-ins (if yes to above), cDNA sequencing method:pooled, cDNA sequencing method:bar coded RNA targets, Length of reads, Type of reads (single or pair-end), Thresholds for mapping: allowable mis-matches, Thresholds for mapping: minimal score, Thresholds for mapping: Other, Plan for "split reads"& Junctions, File of gene/transcript coordinates and normalized # of assigned reads (RPKM/FPKM)
The OCDM project has been working with each spoke to collect the terms relevant for their data and correlating them with appropriate ontologies where available. We will post the spreadsheet when it is finalized.