Other Tools

Cumulus is a cloud-based single-cell genomics and spatial transcriptomics data analysis framework that is scalable to massive amounts of data, cost-effective, and able to process a variety of data types. It is cloud natively developed, leverages the Pegasus library and many other open-source software (e.g. cellranger/starsolo), and is publicly available in GitHub as well as pre-installed in Terra (the BCDC cloud computing environment). It supports analysis of single-cell RNA-seq, CITE-seq, Perturb-seq, single-cell ATAC-seq, single-cell immune repertoire and spatial transcriptomics data.

RRID: SCR_021644

Generic Web URL: https://cumulus.readthedocs.io/en/stable

Terra URL: https://app.terra.bio/#workspaces/kco-tech/Cumulus

Contact: cumulus-support@googlegroups.com

Loom is an efficient file format for very large omics datasets, consisting of a main matrix, optional additional layers, a variable number of row and column annotations, and sparse graph objects. Loom files are portable, self-contained and ensure that metadata travels with the data. Under the hood, Loom files are HDF5 and can be opened from many programming languages, including Python, R, C, C++, Java, MATLAB, Mathematica, and Julia. The Loom file format is natively supported by popular scRNA-seq packages including Scanpy, Seurat, SCope and scVI. (LoomPy, RRID:SCR_016666)

Access Loom here:
https://loompy.org

S. Linnarson Lab (http://linnarssonlab.org/), Karolinska Institute

The Blue Brain Cell Atlas is a comprehensive online resource that describes the number, types, and positions of cells in all areas of the mouse brain. The atlas provides the densities and positions of all excitatory, inhibitory and neuromodulatory neurons, as well as astrocytes, oligodendrocytes and microglia in each of the brain regions defined in the Allen Mouse Brain Atlas. Users can download cell numbers for statistical analysis, cell positions and types for modeling and visualizing brain areas. The underlying workflow uses imaging data from the Allen Institute Common Coordinate Framework to generate cell positions and assign their type using the API for data access. (Blue Brain Project, RRID:SCR_002994)

Access the Blue Brain Cell Atlas here:
https://bbp.epfl.ch/nexus/cell-atlas/

Blue Brain Project (https://bluebrain.epfl.ch/), EPFL

ModelDB, Biophysical Models - The advanced cognitive capabilities of the human brain are often attributed to our recently evolved neocortex. However, it is not known whether the basic building blocks of human neocortex, the pyramidal neurons, possess unique biophysical properties that might impact on cortical computations. The Segev group has shown that layer 2/3 pyramidal neurons from human temporal cortex (HL2/3 PCs) have a specific membrane capacitance (Cm) of ~0.5 µF/cm2, half of the commonly accepted “universal” value (~1 µF/cm2) for biological membranes. This finding was predicted by fitting in vitro voltage transients to theoretical transients then validated by direct measurement of Cm in nucleated patch experiments. This is the first demonstration that human cortical neurons have distinctive membrane properties, suggesting important implications for signal processing in human neocortex. They also have developed detailed models of pyramidal cells from human neocortex, including models on their excitatory synapses, dendritic spines, dendritic NMDA- and somatic/axonal Na+ spikes that provided new insights into signal processing and computational capabilities of these principal cells. (ModelDB, RRID:SCR_007271)

I. Segev Lab (https://elsc.huji.ac.il/faculty-staff/idan-segev), Hebrew University

MetaNeighbor quantifies the degree to which cell types replicate across datasets, and enables rapid identification of clusters with high similarity. MetaNeighbor first measures the replicability of neuronal identity, comparing results across eight technically and biologically diverse datasets to define best practices for more complex assessments.   By taking the correlations between all pairs of cells a network is built where every node is a cell and the edges represent how similar each cell is to each other cell. This network can be extended to include data from multiple experiments (multiple datasets).  To assess cell-type identity across experiments neighbor voting is used for cross-validation, systematically hiding the labels from one dataset at a time for testing. Cells within the test set are predicted as similar to the cell types from other training sets using a neighbor-voting formalism. Whether these scores prioritize cells as the correct type within the dataset determines the performance, expressed as the AUROC. Comparative assessment of cells occurs only within a dataset, but is based only on training information from outside that dataset. (MetaNeighbor, RRID:SCR_016727)

J. Gillis Lab (http://gillislab.labsites.cshl.edu/), Cold Spring Harbor Laboratory

TreePersVec - Feature quantification of morphology data is an important part of determining cell type identity. TreePersVec is a topological based analysis tool used to generate 1D persistence feature vectors of neuron trees. TreePersVec first uses a descriptor function mapping nodes in the neuron tree to positive real values.  The default descriptor function uses geodesic distance from any tree node to the root. With function values assigned, TreePersVec will then calculate corresponding persistence diagram and output to a file consisting of  a set of 2D points. Coordinates represent the birth and death times of features, and the difference between a feature’s birth and death time shows the importance of the feature. Finally, TreePersVec converts the persistence diagram summary into a 1D persistence feature vector and outputs persistence vector files.. The function for calculating descriptor function values is written in Java, the other functions are written in C++. Source code and instruction are available on GitHub.

Y. Wang Lab (https://web.cse.ohio-state.edu/~wang.1016/), Ohio State University

P. Mitra (https://www.cshl.edu/research/faculty-staff/partha-mitra/), Cold Spring Harbor Lab

L-Measure is a software tool designed to extract a wide variety of quantitative morphological measurements from neuromorphological reconstructions. Both local parameters (e.g. bifurcation angles) and global descriptors (e.g. total arbor length) can be extracted and combined in many useful analyses, including the popular distributions of surface area as a function of path distance from the soma. Users can specify the target of the analysis by structural domains (e.g. axons vs dendrites) or by morphological features (e.g. terminal branches) . The tool has built-in capability to search for neurons with specific morphological characteristics from a large collection or to compare two neuronal populations with parametric and non-parametric statistical tests. The user-friendly graphical user interface is written in JAVA and can run remotely through a web browser or locally on Linux, Windows, or Mac. The number-crunching engine is written in C++ and can be called from batch scripts for faster execution of large-scale computations. (L-Measure, RRID:SCR_003487)

Access L-Measure here:
http://cng.gmu.edu:8080/Lm/

Related Nature Protocols publication:
https://www.nature.com/articles/nprot.2008.51

G. Ascoli Lab (http://krasnow1.gmu.edu/cn3/ascoli/), George Mason University

BICCN Pipelines


Overview

The Broad Institute’s Data Sciences Platform (DSP) develops production-level data processing pipelines in collaboration with multiple consortia including BICCN. Thank you to everyone who has worked with us to create and improve these pipelines. For more details please see our BICCN collaborators and how to cite the pipelines.
Pipelines are available for multiple data types hosted in the BICCN network, including single-cell and single-nucleus transcriptomics, methylomics, and ATAC-seq data.

Pipeline

WARP WDL Code

Input Data

Overview

Terra Workspace

Publication (if applicable)

MethylC-Seq (CEMBA; RRID:SCR_021219)

CEMBA

Multiplexed single-nucleus bisulfite sequencing data

CEMBA Overview

CEMBA

Luo et al. 2017

Optimus (RRID:SCR_018908)

Optimus

10x Genomics V2 and V3 3' single-cell and single-nucleus data 

Optimus Overview

Optimus

 

Single-Cell ATAC (scATAC; RRID:SCR_018919)

scATAC

Single-cell ATAC-seq data from nuclear isolates

scATAC Overview

scATAC 

Fang et al. (2021)

Smart-seq2 Single Nucleus Multi-Sample (RRID:SCR_021312)

Smart-seq2 Single Nucleus Multi-Sample

Single-cell data generated with Smart-seq2 assays

Smart-seq2 Single Nucleue Multi-Sample Overview

Smart-seq2 Single Nucleus Multi-Sample

 

Pipeline Standards, Maintenance, and Availability

Pipelines are cloud-optimized and developed to ensure portability as well as data reproducibility and interoperability. To this aim, the pipelines are:
  • Open-access and developed with GA4GH standards.
  • Written in the Workflow Description Language (WDL), a community-maintained, human-readable workflow language that can run on Cromwell, a portable execution engine that can be launched anywhere, locally or in the cloud.
  • Containerized using public Docker instances, allowing researchers to exactly reproduce the workflow software.
The pipeline code is available from multiple sources, including GitHub, Dockstore, and the BCDC cloud computing environment (Terra).
  • Code is developed and maintained in the WDL Analysis Research Pipelines (WARP) repository in GitHub. Overviews and workflow code for BICCN-related pipelines are linked in the table above; additionally, relevant pipelines can be identified by typing the keyword “BICCN” in the WARP Documentation search bar.The WARP Overview details navigating the repository, pipeline development, and running the workflows.
  • Workflows are available for export from Dockstore, a GA4GH-compliant platform for sharing Docker-based tools. Search “warp” on Dockstore to find all WARP pipelines, including those used in the BICCN.
  • Workflows are also available to test on Terra, the cloud-based bioinformatics platform used for BCDC data processing. To get started, register for Terra using the registration guide. To try a pipeline, navigate to the pipeline’s workspace linked in the table above or search for the “BICCN” tag in the workspaces tag search bar. Each workspace contains downsampled data, detailed instructions for using the workflows, and cost guidelines. Learn more about Terra with the Getting Started guides.

Citing the Pipelines

Each BICCN pipeline (see table above) has a SciCrunch resource identifier (RRID) that can be cited in publications. Follow the SciCrunch citation guidelines.
Example: (Optimus Pipeline, RRID:SCR_018908)
Additionally, please refer to the table above to cite any publications associated with the pipeline.

Additional Single-cell Transcriptomic Pipeline Resources

These pipelines produce outputs and quality control metrics that can be further analyzed and visualized with downstream community resources. Tutorials for combining single-cell transcriptomic data and pipelines with common community tools are available in the following resources:

BICCN Omics Workshop Workspace

This tutorial Terra workspace is a step-by-step guide to analyzing BICCN 10x Genomics single-cell data. Using this workspace, researchers learn how to:
  • Import an example 10x dataset (FASTQs) from NeMO
  • Align example 10x FASTQs and produce a raw count matrix with quality metrics using the Optimus workflow
  • Filter, normalize, and cluster the raw count matrix with the Cumulus workflow
  • Explore single-cell data in a Seurat Jupyter Notebook

BICCN Omics Workshop Webinar Recording

This is some text insideThis Brain Initiative Cell Census Network (BICCN) virtual workshop guides you through finding data in the Neuroscience Multi-Omic (NeMO) Archive, analyzing that data in Terra through workflows (pipelines) and interactive analysis, then publishing the results to a study in the Single Cell Portal (SCP). of a div block.

BICCN Omics Workshop Blog

This high-level overview of the BICCN Omics Workshop describes the BICCN Omics workshop content and provides a link to the webinar demonstration.

Acknowledgments

We thank the following BICCN collaborators and Broad Pipelines Team members for their work on these pipelines:
MethylC-Seq (CEMBA)
Our gratitude to the Joseph Ecker Lab and special thanks to Joseph Ecker, Chongyuan Luo, Eran Mukamel, Hanqing Liu, Benjamin Carlin, Dan Moran, and Jeff Korte.
Single-Cell ATAC (scATAC)
Our gratitude to the Bing Ren Lab and special thanks to Bing Ren, Rongxin Fang, Yang Li, Sebastian Preissl, Nick Barkas, and Kylee Degatano.
Smart-seq2 Single Nucleus
Our gratitude to the Allen Institute, the Eran Mukamel Lab, and the NeMO team. Special thanks to Eran Mukamel, Fangming Xie, Zizhen Yao, Changkyu Lee, Jeff Goldy, Brian Herb, Cindy van Velthoven, Carrie McCracken, Kishori Konwar, Farzaneh Khajouei, Jessica Way, and Kylee Degatano.
Optimus
Our gratitude to Alex Dobin and the Eran Mukamel Lab. Special thanks to Kishori Konwar, Farzaneh Khajouei, Jessica Way, Ambrose Carr, Jishu Xu, Jose Soto, and Nick Barkas. This pipeline is currently being updated for the BICCN; more acknowledgments will be added as the work progresses.