Unravelling Cellular Complexity: Exploring 3D Genome Structure and DNA Methylation with the snM3C Pipeline

Understanding the three-dimensional genome architecture is an important feature when analysing gene regulation, particularly in the context of development and disease.

The Single Nucleus Methly-Seq and Chromatin Capture (snM3C) pipeline is specifically designed for the simultaneous profiling of 3D genome structure and DNA methylation within single-cell data. This innovative approach integrates the single nucleus chromosome conformation capture method (3C) with the bisulfite conversion method for library preparation (sn-m3C-seq protocol) [1].

The snM3C pipeline performs secondary analysis of the sn-m3C-seq protocol, developed in collaboration with Joseph Ecker’s laboratory for the Brain Research Through Advancing Innovative Neurotechnologies (BRAIN) Initiative and the Human Cell Atlas Project. The snM3C method was applied to human complex tissue samples (e.g., prefrontal cortex) to determine cell-type specific DNA methylation and 3D genome profiles in the human brain [1,2]. The association between cell-type specific chromatin conformation and DNA methylation in the human brain has been confirmed, suggesting significant crosstalk between these epigenomic features and advantage of this multi-omics approach.

Although the snM3C method has been used on brain tissue in the BRAIN initiative, it can be used in other complex tissues and contribute to defining specific cell-type epigenetic features. SnM3C can help in better understand cellular identity and gene regulation during development, disease progression, and beyond. The snM3C method is now available in all Velsera Seven Bridges platforms, enabling researchers to use petabytes of publicly available and private data to understand the connections between epigenomics and disease.

snM3C pipeline on the Seven Bridges Platforms

The snM3C pipeline can be used for comprehensive analysis of the cell-specific 3C and mC epigenetic patterns, leaning on the strong, cell-type specific relationship between cytosine methylation (mC) and 3D genome structure.

The snM3C pipeline is implemented on the Velsera Seven Bridges platform as a user-friendly application with a detailed description and instructions on how to run it. It allows users to set desired parameters and provide different inputs without needing to read the code and uploading configuration files for each run. The output files of the analysis are saved in the working project, from which they can be further analysed.

Figure 1. Overview of the snM3C pipeline available on the Velsera Seven Bridges platforms. The pipeline performs multi-cell reads’ demultiplexing, single cell reads’ mapping (including reads’ sorting, trimming, alignment, alignment files’ processing, calling chromatin contacts, and methylation extraction), and creating a summary report.

This workflow is part of the CEMBA project (CEMBA GitHub with original code), developed by the Ecker lab. Recently, the pipeline code has been added to the WARP repository, which provides robust, standardized data analyses large consortia, such as the Human Cell Atlas and the BRAIN Initiative.

WARP pipelines utilize the WDL language, while Velsera employs the Common Workflow Language (CWL). CWL offers high portability and reproducibility, allowing workflows to run on various platforms including laptops, high-performance computing clusters, and cloud infrastructure. The validation process confirmed that CWL and WDL implementations yield identical results when using the same inputs.

The snM3C pipeline was tested for cost benchmarking on AWS on-demand instances, with different inputs and parameters’ settings, available in the app description. The runtime was about an hour with a cost of up to $1.5 for 8 primers. As the price of execution for this workflow mostly depends on the number of primers for demultiplexing, preparing random primer indices according to the use case and setting the parameter that will remove the empty files during the demultiplexing will reduce the task’s duration and cost.

Explore the snM3C pipeline

To explore the snM3C pipeline, we recommend starting with the epigenetic features of the BRAIN datasets hosted on the NCBI PRJNA971896 project[2]. To get this data in your project, you can use one of our SRA tools for immediate access.

You can also easily upload your own data to the Velsera’s Seven Bridges environment and to run the subsequent analyses. Additional information on how to get started is available in the SevenBridges QuickStart, CGC knowledge center, BDC and CAVATICAdocumentation. Please contact us if having any questions or need for support.

For more details on the pipeline, its specific inputs and outputs, as well as detailed instructions on how to run the workflow, please see its description page on the Velsera’s Seven Bridges Public Apps Gallery.

Citations

1 – Simultaneous profiling of 3D genome structure and DNA methylation in single human cells

2- Single Cell DNA Methylation and 3D Genome Architecture in the Human Brain