Large Language Models (LLMs) are increasingly being used in various stages of Precision Medicine R&D, including processing clinical documentation, analyzing scientific literature, identifying research tools, and interpreting genomic data. These Artificial Intelligence (AI) systems are valuable for managing large amounts of unstructured medical text and data, reducing the need for manual review. LLMs also can look across a massively high-dimensional space and find connections that have not yet been explored. LLMs will be just as valuable in SaaS and cloud platforms, providing real-time insights, enhancing user experiences, and supporting bioinformatics coding tasks. With cloud infrastructure, organizations can scale AI-driven processes to meet the growing demand for precise, context-aware analytics in healthcare and biomedical research. Similarly to how Velsera pioneered bioinformatics in the cloud, we can “bring GenAI to the scientific data.”
Velsera’s Seven Bridges platform brings together the best of both worlds – cutting-edge AI capabilities and a secure, user-friendly research environment. For professionals who may not be experts in AI or cloud computing, the platform abstracts away the complexity. It provides ready-to-use GPU power, secure data management, and integrations with key datasets, so researchers can focus on science rather than IT. This empowerment means even teams with limited AI experience can start training and fine-tuning models and leveraging generative AI for their projects, confidently and safely. There are many other options for using GenAI in the cloud today, which are extremely performant, include foundation models that cost millions of dollars to develop, and have excellent UI. Velsera uniquely provides a secure, compliant environment where your multi-omic and clinical data, along with about 50 PB of research datasets, already are available. These data can be used to fine tune foundational models, ask scientific questions, and all the while being certain that no sensitive or controlled data is unintentionally being egressed to third party model developers for training future models.
Advantages of the Seven Bridges Platform
The Seven Bridges platform’s robust file and data security, GPU-enabled compute nodes for high-performance computing and LLM inference, real-time interactions through web-based apps, and easy-to-use pipelines for streamlined execution has the power to accelerate precision medicine. Here’s how Velsera’s platform can do this:
- A secure, Cloud-Based GPU Computing: The scalable, cloud-based environment with access to GPU hardware for heavy AI computations (Seven Bridges Bioinformatics Platform | Velsera enables scientists to train fit for purpose models and fine-tune more complex or larger models (which often require powerful GPUs), without investing in on-premise infrastructure.
- Network Isolation for Data Security: Users can disable external network access during model training and analysis tasks, which ensures no sensitive data can be transmitted outside the environment. It creates an “air-gapped” analysis setup, where high-value research data stays within the platform. This level of control is crucial for meeting compliance standards, respecting patient consent, and safeguarding proprietary or patient-related data.
- Secure Access to innumerous relevant datasets: The Seven Bridges platform enables researchers to work with access controlled datasets in place, without the need for download. Seven Bridges is integrated with NIH data authorization systems (including dbGaP) for seamless access to population-scale databases (Seven Bridges Bioinformatics Platform | Velsera).
- User-friendly interface and ready-to-use pipelines: Researchers that are not experts in bioinformatics and programming, can easily run workflows using the hundreds of cloud optimized tools, reference files, and obtain data insights.
The data remains in a secure environment, and the platform’s compliance certifications (HIPAA, etc.) mean it meets government and industry standards for data security. The environment allows external network access to be disabled, enabling researchers to securely handle sensitive patient and research subject data without any worry about data exfiltration to outside entities.
How to securely implement LLMs on Velsera’s Seven Bridges Platform
Open-source LLMs, as highlighted in recent industry developments like Llama and Deepseek, continue to advance in capability, reduce training costs, and require fewer resources for running inference (the process of querying or processing input via a chatbot or LLM).
The Velsera team has developed a proof-of-concept implementation,[JD1] focused on remaining inside security boundaries of the platform, as a first step to bring AI to the data and keeping it there. Future work will be focused on rapidly improving performance and user-interface. We used two Common Workflow Language (CWL) applications—the Model Downloader and the LLM Runner—that broaden AI accessibility for biomedical researchers.
By emphasizing reproducibility, security, scalability, and operational efficiency, these tools significantly enhance how AI is integrated into biomedical workflows.
Model Downloader
The Model Downloader addresses the challenges of managing large language models by downloading them once and securely storing them in Velsera’s secure environment. This approach eliminates redundant downloads, ensures no data is shared with external providers, and accelerates research by providing faster model initialization and a standardized model instance. Testing with downloading Llama 3.1 70 billion parameter version, took slightly over an hour and cost $0.98. Taking advantage of the cloud environment, this operation only needs to be done once and copied to other projects as needed. Using Seven Bridges’ cloud storage capability is a further advantage as is well equipped to handle the size of these large foundational models of over ~300 GB (3x the size of a typical whole genome BAM file).
Figure 1: Demonstration of downloading llama 3.1, time, and cost.
Figure 2: Model directory, broken up into several files, linked back to the downloading task.
LLM Runner
The LLM Runner enables researchers to pose complex queries to the downloaded 70B-parameter Llama 3 model. By leveraging GPU-enabled workflows, the application supports (near) real-time inference, meeting modern AI’s intensive computational demands. Researchers can focus on deriving insights without the burden of infrastructure management. Its seamless integration with CWL workflows ensures easy adoption into existing pipelines, enhancing accessibility and operational fluency.
Figure 3: By using the Velsera workflow canvas, we can see the minimum required inputs to run LLM Runner. Additional workflows can be added, separate models, training datasets, etc.
Seven Bridges’ user-friendly interface can be used to implement real world biomedical research examples such as BioNeuralNet, a practical example of using AI to integrate multiple types of “omics” data (like genomics, proteomics, metabolomics) and that can identify hidden relationships between genes, proteins, and clinical data, improving disease prediction and our understanding of biological systems. Beyond omics data, fine-tuned generative models are driving many innovations in biomedical research. For example, AI can be used to design new drug molecules by generating candidate chemical structures, or to predict protein structures and interactions (a task made famous by AI systems like AlphaFold). Generative models also create synthetic patient data (such as realistic genomic data or medical records) to help train other models when real data is limited. Each of these applications benefits from fine-tuning foundation models on domain-specific data – and all can be pursued on a secure platform like Velsera’s Seven Bridges.
The Seven Bridges platform makes this process ideal by ensuring it’s done securely (protecting patient data and IP), efficiently, and near valuable data sources. In summary, the platform enables biotech researchers to harness generative AI effectively, speeding up the pace of innovation in a way that is secure, compliant, and tailored to the unique challenges of biomedical data. To learn more, contact us at support@velsera.com, and stay tuned for future developments!