Making the Most of Multi-Modal Tissue Atlases with Cloud Compute
Why Tissue Atlases?
The omics revolution has fundamentally altered our understanding of cellular heterogeneity, allowing us to classify cells with ever greater accuracy and to understand subtle nuances in cellular states. This is largely underpinned by technological advances in the domain of single cell sequencing, enabling remarkable resolution across multiple omics layers, from the chromatin landscape through to the transcriptome, allowing ever increasingly detailed dissection of cellular states, types, and roles in both healthy and diseased tissue. However, these techniques rely on the dissociation of cells from their native tissues, thereby missing crucial context (Figure 1.).
In complex environments such as solid tumours, the “where” is of equal importance as the “what.” The spatial location of a cell within the tissue is paramount to our understanding of its function and the consequences of dysregulation. This information is best captured by spatial omics techniques such as highly multiplexed tissue imaging and mass cytometry. Tissue atlases comprise data from multiple omics layers allowing us to investigate the micro, and the macro, the molecular, and the spatial, to better understand the key drivers of pathology and identify novel opportunities for clinical intervention.
The Human Tumor Atlas Network (HTAN) is dedicated to producing such atlases, combining the above strategies to create three dimensional atlases of cancer transitions, tracking a diverse set of tumor types through time with the aim of identifying novel predictive biomarkers and therapeutically relevant cell types and targeting strategies.
This blog will focus on HTAN data and how the Seven Bridges Platform provides the infrastructure, tools, and curated metadata to make such datasets accessible and actionable.
Making Tissue Atlases FAIR
To date, HTAN has generated 14 atlases comprising 1,703 cases and >6,000 samples, covering 66 different organs. The raw data (over 27,000 files!) are available on the Seven Bridges Cancer Genomics Cloud (CGC), linked by structured metadata allowing users to easily explore the data via faceted search. The HTAN data are largely comprised of imaging data, capturing tissue architecture and protein expression (H&E and cyclic immunofluorescence images), and sequencing experiments to investigate the underlying events driving tumour progression (bulk DNA-seq, bulk RNA-seq, scRNA-seq, and scATAC-seq) (Figure 2.).
Figure 2 - HTAN Top 10 Assays & Organs by Sample No. (NCI Human Tumor Atlas Network [2])
Crucially, to make use of such diverse and wide-ranging data, structured, uniform, and connected metadata are required to successfully traverse the various omic planes, whilst keeping track of samples and linking cause and effect. As studies grow, and metadata ontologies expand managing such large meta-datasets can become a burden in of themselves. Fortunately, the Seven Bridges Platform is designed with FAIR (Findable, Accessible, Reusable, Interoperable) data in mind and provides intuitive interfaces to explore big data via their metadata [3-4].
Finding and accessing data is unfortunately, not the only barrier holding back researchers from making the most of big data. Large, complex datasets spanning multiple techniques, tissues, and tumours, require intensive computational analysis and powerful infrastructure. Researchers are posed with the problem of creating complex pipelines to process the raw data, wrangling with command line tools to perform peak calling, or image segmentation, variant analyses, and differential expression. Then mapping the results to a common framework to ensure interoperability and ease of downstream analysis. This is no mean feat, even for the bioinformatically aware. Add to this the need to maintain data integrity and security whilst keeping analyses version controlled and in line with best practices and it becomes clear why the Seven Bridges Platform is the tool of choice for multi-modal data.
The Seven Bridges Platform hosts over 900 ready to use pipelines, maintained and curated by our expert bioinformatics team. Our workflows offer end-to-end analysis, from raw data to report, with dedicated suites of tools for single cell and imaging analyses, making popular tools such as Seurat and MC MICRO easy to use through a GUI. Once configured, our cost and performance optimised workflows are run in the cloud, offering vast scale and speed and negating the need for dedicated compute. Furthermore, our use of Common Workflow Language (CWL), an open source, community driven standard, makes those tools and workflows reusable in any computational environment for full reproducibility.
Computational analyses of imaging data is a rapidly growing field, in which the use of AI/ML based techniques is making big strides, for a run down on the tools and features enabling such analysis see our previous blog post: Machine Learning and Image Processing: Tools For Success - Seven Bridges.
But how does one select the images required for training datasets? Or verify the results of algorithmic processing? Sometimes seeing is believing. Which is why we have an integrated image viewer available on the Seven Bridges CGC, allowing users to view standardized images in the Digital Imaging and Communication in Medicine (DICOM) format, which is used widely throughout research science, being the preferred data format for HTAN and The Cancer Genome Atlas (TCGA) [5], and powering the Imaging Data Commons [6]. Additionally, DICOM is the predominant data format for clinical imaging (PET, CT, MR, etc.). Providing users with the opportunity to incorporate clinical imaging data into their analyses and observations (Figure 3).
Figure 3. OHIF viewer displaying medical (A.) and histological (B.) DICOM images
Using The Seven Bridges CGC for HTAN Atlas Analyses
In December 2023, Velsera hosted the inaugural HTAN Data Jamboree. An event dedicated to enabling researchers (of all computational abilities) to make use of the single-cell and spatial omics data constituting the HTAN tissue atlases.
For this event researchers used the Seven Bridges Cancer Genomics Cloud as a learning and exploration tool, to access connected data from the HTAN Data Portal and across the Cancer Research Data Commons (CRDC). The CGC provided the perfect playground for scientists from all backgrounds, allowing them to quickly process data from multiple modalities using Seven Bridges workflows to create actionable data that could be readily explored via interactive analyses in Jupyter Lab and RStudio, running on dedicated Data Studio instances. HTAN Data Jamboree participants formed teams of mixed disciplines and focused on cutting edge projects such as integrating multi-modal data to power new visualizations, AI/ML algorithm development, and incorporating other public datasets for analysis alongside HTAN atlases.
“I loved learning about HTAN, CGC, and the exposure to new/other datasets and tools” – HTAN data jamboree attendee.
Participants rated the CGC highly as a collaborative tool, enabling remote teamwork and making accessing and analysing multimodal data straightforward, whilst providing the tools and resources to perform bioinformatic analyses in addition to more complex tasks, such as using Large Language Models for cell type annotations!
“I was new to both scRNA-seq data and Visium so I got to be hands on with scRNA-seq data and learn about both.”
By hosting HTAN tissue atlases, and providing the context, tools, and infrastructure to work with them, all through a no-code interface. We hope to better equip researchers to make the most of this valuable resource, so that it may be used to further enrich our understanding of cancer biology.
Want to get your hands on the data?
Our platforms utilize the latest global interoperability standards allowing data on the CGC to be easily accessed on the Seven Bridges Platform.
Learn more about the Seven Bridges Platform here: The Seven Bridges Platform - Seven Bridges
All HTAN images available on the CGC are open access, and accounts can be made free of charge with $300 of Pilot Credits available to new users for cloud costs.
Sign up to the CGC here: Cancer Genomics Cloud
Bibliography
- Heumos, L., Schaar, A. C., Lance, C., Litinetskaya, A., Drost, F., Zappia, L., … Single-cell Best Practices Consortium. (2023). Best practices for single-cell analysis across modalities. Nature Reviews Genetics, 24(8), 550–572. https://doi.org/10.1038/s41576-023-00586-w
- Human Tumor Atlas Network, HTAN Data Portal, (2024-02-09), https://data.humantumoratlas.org/
- Hudson, A., Fournier, M., Coulombe, J., Daee, D., (2023), Using existing pediatric cancer data from the Gabriella Miller Kids First Data Resource Program, JNCI Cancer Spectrum, Volume 7, Issue 6. https://doi.org/10.1093/jncics/pkad079
- Malhotra, R., Seth, I., Lehnert, E., Zhao, J., Kaushik, G., Williams, E. H., Sethi, A., & Davis-Dusenbery, B. N., (2017). Using the Seven Bridges Cancer Genomics Cloud to access and analyze petabytes of cancer data. Current Protocols in Bioinformatics, 60, 11.16.1–11.16.32. https://doi.org/10.1002/cpbi.39
- Gorman, C., Punzo, D., Octaviano, I. et al., (2023). Interoperable slide microscopy viewer and annotation tool for imaging data science and computational pathology. Nat Commun 14, 1572. https://doi.org/10.1038/s41467-023-37224-2
- Fedorov, A., Longabaugh, W., Pot, D., Clunie, D., Pieper, S., Lewis, R., … Kikinis, R., (2021). NCI Imaging Data Commons. International Journal of Radiation Oncology, Biology, Physics, 111(3), e101. https://doi.org/10.1016/j.ijrobp.2021.07.495