Common Coordinate Frameworks for the BICCN

Reference atlases are of primary importance in neuroscience research.  To effectively compare anatomic and cell specific data from different modalities and experimental protocols it is important to have all data in the same coordinate framework.  The creation of such atlases is a long-standing problem originally achieved by drawing on 2D histology plates. Modern coordinate frameworks enable the use of common landmarks for the integration of reference maps at differing scales in one common framework. A common coordinate framework (CCF) for the mouse or human brain must uniquely and reproducibly define any location in the human body. The CCF is a projected coordinate system in space that is defined relative to one or more origins. The challenges to developing any CCF is how to define a robust set of origin points that are practical over different anatomical scales and across the natural variability of the organism.
The Allen Mouse Common Coordinate Framework (CCF) (http://mouse.brain-map.org/) is the main anatomic data browser and coordinate environment for mouse data within the BICCN. An essential tool to understand the structure and function of the mouse brain at molecular, cellular, system and behavioral levels, it has been successfully used for large-scale data mapping, quantification, presentation, and analysis and has evolved through the creation of multiple versions. The first version (in 2005) of the CCF (col. 1) was created to support the product goals of the Allen Mouse Brain Atlas (http://mouse.brain-map.org/.) The framework was based upon the Allen Reference Atlas (ARA) specimen, in which a 3-D volume was reconstructed using 528 Nissl sections of a near-complete brain, with assistance of the PICSL group at UPenn. Approximately 200 structures were extracted from the 2-D atlas drawings to create 3-D annotations.


A second version (col. 2) was constructed in 2011 to support the objectives of the Allen Mouse Brain Connectivity Atlas where a double-sided and more deeply annotated framework was needed. During the development, flaws in the 3-D reconstructions were corrected and the volume was mirrored across the mid-line to create a symmetric space. Eight hundred and sixty structures were extracted and interpolated to create symmetric 3-D annotations. In 2012, with the launching of the Allen Institute cell type program, a next generation CCF was needed to support integration of data generated at the cellular level in cortex. Version 3 (v3) (col. 3) of the CCF is based on a 3-D 10μm isotropic, highly detailed population average of 1675 specimens using 2-photon imaging. Currently, CCF v3 consists of 207 newly drawn structures in 3-D: 123 subcortical structures, 41 fiber tracts (plus ventricular systems), and 43 cortical regions, including primary visual and higher visual areas. Still under active annotation, the final product of CCF will ultimately consist of more than 500 gray matter structures, cortical layers, ~80 fiber tracts, and ventricle structures in 3-D.  Distinct individual 2-photon serial tomographs are shown below. CCFv3 is now in use as the spatial framework to integrate and ultimately compare data collected across multiple large-scale projects and will be used in the BICCN.


Spatial Data Management

Management of the diverse data types generated through the BICCN and their workflows will require creative approaches to include the broadest set of contributors. Fundamentally, all data produced within the BICCN will have spatial attributes, however there is distinction in the ways that various modalities are approached. In addition to single cell transcriptome data, the BICCN will produce complex spatially resolved data types such as multiplex FISH, electrophysiology including multi- electrode recording, calcium imaging and potentially others. These data require flexibility and innovation in management allowing a workspace and methodology from which basic computational results can be prepared for deployment in the Brain Cell Data Center (BCDC). In this sub-aim we focus primarily on the computation and deployment of spatially resolved data, including imaging data and other sources for which registration other than tagged location may be necessary. Spatially resolved imaging data within the BCDC falls essentially into two categories:

A) Cell centric data whose positioning within the common coordinate framework (CCF) must rely on coordinate based, visual, or ontological tagging.
B) Image centric data for which an accurate registration to CCF using image processing tools is possible. Given the challenge of determining anatomic context for any reference atlas even within the mouse, precise image registration cannot be accomplished in general unless a whole brain image series (or reasonable fraction of the brain) is present with sufficient distinctive anatomical landmarks.
While the Allen Institute CCF is reasonably mature as a product and mapping environment, it will require some work to make this atlas framework extensible for more general community use. Cross linking this environment with new emerging standards will further expand the mapping potential by allowing new image registration tools, other ontologies, and other rodent strains and species into the framework. There is also a strong opportunity for innovation in the development of primate brain coordinate systems, including marmoset, rhesus macaque and human. While fully resolving the problem of standardization in data mapping in these species is beyond our scope, we can offer innovative approaches to ontologically identifying cell location in the human and providing new tools for the mapping workflow, allowing these data to be searchable and spatially localized. As new standards become more accepted they will be integrated with the BCDC.
For more information about Common Coordinate Frameworks within the BICCN please contact info@biccn.org

BICCN Standards, Best Practices, and Recommendations

As per FAIR, the BICCN seeks to utilize existing standards where they are available, while developing BICCN standards as necessary to fill in the gaps. The BCDC works closely with the BRAIN archives through the Infrastructure Working Group to ensure that standards are harmonized across the consortium where possible. Below are lists of standards and best practices in use in the BICCN.

Metadata and File Formats

High level descriptive metadata to describe BICCN data collections. Instructions and templates for providing information about projects and data collections, and their descriptive metadata which must be submitted for each data collection milestone.  In use by: BCDC for the BICCN Data Catalog.

Basic descriptive metadata for samples and specimens profiled in BICCN. Instructions and templates for data submission are provided. An inventory of samples and associated cells or brains profiled for each sample for each dataset, reported on a quarterly basis. This is suitable for identifying the transgenic line and anatomic region sampled in a manageable spreadsheet, but is not intended to include more detailed provenance. In use by: BCDC for the BICCN Data Catalog.

Metadata standard for a 3D microscopy dataset to support reuse by scientists who did not generate the data. 3D microscopy data submitted to BIL must adhere to this metadata standard.  In use by: BIL Data Archive.

Data standard for neurophysiology, providing neuroscientists with a common standard to share, archive, use, and build common analysis tools for neurophysiology data. In use by: DANDI Archive.Endorsed by: INCF.

Data standard for neuroimaging data (e.g., MRI).   In use by: DANDI Archive.

A proposal extending Brain Imaging Data Structure (BIDS) to 3D Microscopy. 3D microscopy submitted specifically to DANDI archive will adhere to this standard.  In use by: DANDI Archive.

Open data format for encoding neuro morphologies.  Required for morphological data submitted to BIL and NeuroMorpho.org.  NeuroMorpho.org uses the tool L-Measure to convert common formats into SWC. In use by: BIL Data Archive, Neuromorpho.org.

NeuroMorpho documentation on use of SWC

Discussion on SWC at the NEURON forum

Spatial and Semantic Standards

The Allen Mouse Common Coordinate Framework (CCF)  is the main anatomic data browser and coordinate environment for mouse data within the BICCN. If you are acquiring spatial data in the mouse, it should be registered to the CCF V3. Registered data is available through BIL data archive. Anatomic terms associated with the this CCF are used as annotations of samples and specimens. In use by: BCDC for the BICCN Data Catalog.

Description

List of tools for registering images.  

Templates for registering different data types.

Link to atlas viewer

The Allen Human Atlas - 3D is a is a parcellation of the non-linear average of the MNI152 using the same anatomical ontology as the section based Allen Human Reference Atlas. If you are acquiring samples in the human, approximate location of the sample should be registered to the Allen Human Atlas - 3D. In use by: BCDC for the BICCN Data Catalog.

Ontology for describing electrophysiology stimulation parameters, developed by an INCF cross project working group. This ontology is suitable for use in NWB formats.

Ontology for multi-species anatomical terms. All anatomical metadata should be mapped to UBERON identifiers or a particular atlas. A mapping between the Allen Brain Atlas has been completed.  If the latter, the full reference for the atlas should be supplied.

Covers major neuroscience techniques.  Based on NIFSTD methods ontology, OBI and EFO.   Use this ontology to map techniques, modalities, assays, devices and tools to a controlled vocabulary. In use by: BCDC for the BICCN Data Catalog, BIL, NeMO

Instructions on working with and adding a term can be found here.

Ontology for representing data-defined cell types (omics) from the BICCN flagship paper. Used by BRAIN Data Standards group for -omics defined cell types. In use by: BICCN Mouse Primary Cortex Mini-Atlas – Cell Types Explorer.

Identifiers and Protocols

ORCIDs are unique identifiers for researchers.  All investigators should provide ORCIDs for contributors to datasets to facilitate cross-dataset linkage and to receive credit for publishing datasets.

Orcid.org website

See documentation at individual archives for adding ORCIDs.

Unique identifier for antibodies, digital tools, cell lines, organisms, plasmids and biosamples. If you create a new tool for BICCN, please obtain an RRID.  In use by: BICCN Data Catalog, BICCN.org, journals and publications.

BICCN investigators are required to share experimental protocols. Protocols.io is one option for this that provides a DOI and version history for protocols.  In use by: BICCN members, BICCN Data Catalog

BICCN Working Groups

Working groups comprise members of the BICCN consortium working on advanced protocols and standards for uncovering the cellular make-up of the brain. BICCN Working Groups span the core set of activities that require accurate documentation in protocol, metadata, and feature description. Activities include write up of all protocols across laboratories for comparison and future reference for analysis discrepancies. Another key charge is in the determination of appropriate features and data resolution relevant to cell type and to be reported to BCDC.

BICCN 2.0 Whole Mouse Brain WG.   Chairs: Eran Mukamel, UCSD, Hongkui Zeng, Allen Institute

Charter
  • Data integration for transcriptomic & epigenomic data across entire mouse brain and spinal cord
  • Pipeline processing
  • Collaborative analysis to support the creation of a cell census and consortium publication(s)
  • Support FAIR data standards
  • Create high impact outputs of the scientific work
Deliverables
  • Omics standards
  • Sample level QC
  • Sequence data mapping
  • Cell level standards
  • Cluster level standards
  • Uniform pipelines and BICCN Whole Mouse Brain Workspace
  • Terra workspace

BICCN 2.0 Cell-Type Specific Enhancers.  Chairs: Bosiljka Tasic, Allen Institute, Bing Ren, UCSD

Charter
  • Develop methods to identify cell-type specific enhancers that can drive systemic delivery of reporter genes to select subclasses or types of brain cells in mice and primates.
  • Obtain experimental data to support the validity of these methods.
Deliverables
  • Single-cell multiomic atlases of the primary motor cortex in mice and primates
  • Computational tools for predicting brain cell-type specific enhancers
  • Experimentally validated enhancers active in select cortical neuronal sub-types

BICCN Infrastructure & Metadata WG. Chairs: Carol Thompson, Allen Institute, Mike Hawrylycz, Allen Institute

Charter
  • Data standards
  • Data collection processes and reporting
  • BICCN infrastructure ecosystem improvements
  • Support FAIR data standards
Deliverables

BICCN Morphology WG. Chairs: Giorgio Ascoli, GMU, Hanchuan Peng, SEU-Allen Joint Center

Charter
  • Morphology metadata
  • Data formats
  • Standards
  • CCF mapping
  • Reconstructions
  • Community usage
Deliverables
  • Standardized on NIFTI and SWC data formats
  • Standard metadata file (metainfo.txt)
  • fMOST QC checks via Data Core.
    • Data quality
    • Reconstruction difficulty assessment
  • Reference annotation and reference terminologies
    • CCF 3.0 visualization
    • JSON format for soma locations and projection targets
  • Standardized inventory of BICCN cell morphologies

BICCN 2.0 Molecular Wiring Diagrams Working Group WG. Chair: Hongwei Dong

Charter
  • Begin planning on BICCN 2.0 project on Molecularly Annotated Wiring Diagrams of the Mammalian Brain
  • Finalize the mini-atlas joint anatomy paper
Deliverables
  • Mapping of internal metadata to BIL standard metadata
  • QC checks via Data Core.
    • Completeness of files and datasets
  • Registration software input/output metadata standards
    • NRRD
    • Sidecar JSON file
    • VTK for low res, JP2000, TIFF for high res compressed/uncompressed images

BICCN Mini-Atlas Flagship Publication WG. Chairs: Joe Ecker, Salk Institute, Hongkui Zeng, Allen Institute, John Ngai, NIH

Charter
  • Planning, coordinating and writing the mini-atlas flagship paper.
  • Supporting FAIR data standards in the mini-atlas paper and across companion papers.

BICCN Human & NHP WG. Chairs: Ed Lein, Allen Institute, Arnold Kriegstein, UCSF

Charter
  • Coordination on regional sampling plans
  • Methods comparisons
  • Scope out future big human and NHP atlasing
  • Joint comparative analysis
Deliverables

Active

BICCN 2.0 Developing Brain Atlas WG. Chairs: Paola Arlotta, Harvard University, Tomasz Nowakowski, UCSF

Charter
  • Mouse, human, and NHP development
Deliverables

Active

BICCN 2.0 Proteomic Brain Atlas WG. Chairs: Kwanghun Chung, MIT and Elizabeth Hillman, Columbia University

Charter
  • Technology platform development
  • Proteomic data acquisition and integration
  • Collaborative analysis
  • Technology, reagents, protocols, and data dissemination
Deliverables

Toward an Integrated BICCN Data Set (IDS)


Data, Tools, and Knowledge

A central organizing principle of the Brain Cell Data Center (BCDC) is the data, tools, and knowledge structure. This three-tiered approach forms a natural conceptual organization serving to best manage information architecture, utility, and flow of data from raw numerical observational data, through mapping standards and analysis methods, to summary knowledge and education. A federated architecture based on a public facing Cell Registry and Portal indexes all data providing lower resolution data summaries and has appropriate pointers to raw data sources that will allow for a flexible and more easily maintainable database. At the Data level within the registry, QC/QA data meeting standards of quantification and mapping are populated in the BCDC. Robust and consistent data models across providers, driven by the Data Standards Committee, insures that data is available for access within network and ready for effective mapping in the CCF at the Tools level, where it can be processed with various analysis and clustering tools, or visualized through helpful applications. At the highest Knowledge level, the Collaborative Analysis Working Group will facilitate analysis between BICCN partners to reconcile and highlight corroborated results, as well as provide input into which results should be presented or referenced.


The BCDC Portal is the main external facing entry to the BICCN, its members, data modalities, resources and results. The portal is intended to be a living resource that can be frequently updated with news and relevant cell type related content and provides BICCN Consortium description with goals, Cell type classification and its value for neuroscience, individual project pages, BCDC strategy and timeline, and current data availability.  Some initial possibilities for content are described below.

Data Levels

This is some text iA convenient way to track data from primary raw data through information rich structured data is through data levels.   Each data modality produced by the BICCN starts as Level 0 raw data and migrates through higher levels as data QC/QA (Level 1), spatial localization within the Common Coordinate Framework (CCF) (Level 2), computational feature extraction (Level 3), and finally integrated or knowledge level content is extracted.   The general definition of data levels is described below and will be described for each U01/U19 as these definitions become available.
  • Level 0: (Raw) Raw data including mapped sequence reads, raw non- or minimally-compressed image series.  Multivariate data sets are generated where applicable. These data are center specific and not meant for being publicly distributed or distributed to the BCDC.  a div block.
  • Level 1: (QA/QC) Preprocessed data sets having passed an initial round of QA/QC or computation.   This is data intended for sharing within the BICCN network and storage in R24 archive centers,  and it uses the appropriate format and compression level for derived computation and analysis.
  • Level 2: (Linked) These data are mapped to a spatially determined region or stereotaxic site.   Ultimately this data is accessible and displayable within the Common Coordinate Framework (CCF.)
  • Level 3: (Featured) A derived data set summarizing relevant features and properties for a modality of interest.   This level enables shared and novel algorithmic approaches across or within centers. Examples include differential expression, clustering analyses, cross modality analysis.
  • Level 4: (Integrated) Informative data sets and that meet standards for public deployment and have value for the BICCN network and larger community in advancing our understanding of cell types in the brain.   Appropriate results will be release on the BICCN portal.

The BCDC Cell Registry and BICCN Integrated Data Set

As data moves through levels from Level 0 raw data to Level 4 integrated data it meets increasingly rigorous and systematic standards for mapping and presentation.  All experimental cell data meeting Level 1 quality assurance is archived and will be made available through an indexing and search engine, the BCDC Cell Registry. This application, presently under development is accessible from the portal and will provide a detailed survey of all BICCN data presented at the time.   As data is spatially resolved in location or ontology it can be presented within the CCF (see Standards, BICCN Standards for CCF).   When common sets of features and standard data summaries are computed in higher levels,  the corresponding higher resolution data becomes available in the IDS viewer which can use Allen Institute tools and products.


The BCDC Cell Registry, (presently under development) provides an index of all level 1 (or level 0 for transcriptomic data), pointers to raw data access, indication of data modality, provenance and associated metadata.  The Cell Registry does not offer any explicit analysis or computational resources, but based search on associated metadata. Raw level 0/1 data of the BICCN is intended to be stored in the R24 archives at the University of Maryland (NeMO) for transcriptomic/epigenomic data, and at the Pittsburgh Supercomputer Center (MBIC).    From a practical perspective, the BCDC will prefer to only be aware of data that has been deposited with the R24 archives.  As we have argued, a global repository for all high-resolution data across all modalities stored at a single facility, Cloud based or otherwise is at present not feasible.
The BICCN Integrated Data Set (IDS) is the full realization of Level 3 mapped and feature rich data that can be integrated with Allen Institute internal data. Extracted quantitative features of each cell, together with access to the raw data, tools to manipulate and analyze the data, and description its relationship to the current state of our ontology. Data achieving Level 3 will be both listed in the Cell Registry and viewable in any of the Allen Institute online resources.
All properly quality controlled and archivally deposited Level 1 data will be indexed in the BCDC Cell Registry.   To enable cross comparison and study data must reach Level 3 in such that it is anatomically localized and described by a compact feature attribute list.   Data achieving this level of structure is potentially comparable with Allen Institute internally generated modalities and forms part a larger Integrated Data Set (IDS).

For more information about the BICCN Integrated data set and upcoming plans please contact info@biccn.org