DeepVariant is a deep learning-based variant caller that takes aligned reads (in BAM or CRAM format), produces pileup image tensors from them, classify each tensor using a convolutional neural network, and finally reports the results in a standard VCF or gVCF file.
DeepVariant supports:
We recommend using our Docker solution. The command will look like this:
BIN_VERSION="1.0.0"
docker run \
-v "YOUR_INPUT_DIR":"/input" \
-v "YOUR_OUTPUT_DIR:/output" \
google/deepvariant:"${BIN_VERSION}" \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \ **Replace this string with exactly one of the following [WGS,WES,PACBIO,HYBRID_PACBIO_ILLUMINA]**
--ref=/input/YOUR_REF \
--reads=/input/YOUR_BAM \
--output_vcf=/output/YOUR_OUTPUT_VCF \
--output_gvcf=/output/YOUR_OUTPUT_GVCF \
--num_shards=$(nproc) **This will use all your cores to run make_examples. Feel free to change.**
To see all flags you can use, run: docker run google/deepvariant:"${BIN_VERSION}" --help
If you're using GPUs, or want to use Singularity instead, see Quick Start for more details or see all the setup options available including solutions on external platforms.
For more information, also see:
If you're using DeepVariant in your work, please cite:
A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology 36, 983–987 (2018).
Ryan Poplin, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, Jojo Dijamco, Nam Nguyen, Pegah T. Afshar, Sam S. Gross, Lizzie Dorfman, Cory Y. McLean, and Mark A. DePristo.
doi: https://doi.org/10.1038/nbt.4235
Additionally, if you are generating multi-sample calls using our DeepVariant and GLnexus Best Practices, please cite:
Accurate, scalable cohort variant calls using DeepVariant and GLnexus. bioRxiv
10.1101/2020.02.10.942086v1 (2020).
Taedong Yun, Helen Li, Pi-Chuan Chang, Michael F. Lin, Andrew Carroll, and Cory Y.
McLean.
doi: https://doi.org/10.1101/2020.02.10.942086
(1): Time estimates do not include mapping.
For more information on the pileup images and how to read them, please see the "Looking through DeepVariant's Eyes" blog post.
DeepVariant relies on Nucleus, a library of Python and C++ code for reading and writing data in common genomics file formats (like SAM and VCF) designed for painless integration with the TensorFlow machine learning framework. Nucleus was built with DeepVariant in mind and open-sourced separately so it can be used by anyone in the genomics research community for other projects. See this blog post on Using Nucleus and TensorFlow for DNA Sequencing Error Correction.
Below are the official solutions provided by the Genomics team in Google Health.
Name | Description |
---|---|
Docker | This is the recommended method. |
Build from source | DeepVariant comes with scripts to build it on Ubuntu 14 and 16, with Ubuntu 16 recommended. To build and run on other Unix-based systems, you will need to modify these scripts. |
Prebuilt Binaries | Available at gs://deepvariant/ . These are compiled to use SSE4 and AVX instructions, so you will need a CPU (such as Intel Sandy Bridge) that supports them. You can check the /proc/cpuinfo file on your computer, which lists these features under "flags". |
The following pipelines are not created or maintained by the Genomics team in Google Health. Please contact the relevant teams if you have any questions or concerns.
Name | Description |
---|---|
Running DeepVariant on Google Cloud Platform | Docker-based pipelines optimized for cost and speed. Code can be found here. |
DeepVariant-on-spark from ATGENOMIX | A germline short variant calling pipeline that runs DeepVariant on Apache Spark at scale with support for multi-GPU clusters (e.g. NVIDIA DGX-1). |
NVIDIA Clara Parabricks | An accelerated DeepVariant pipeline with multi-GPU support that runs our WGS pipeline in just 40 minutes, at a cost of $2-$3 per sample. This provides a 7.5x speedup over a 64-core CPU-only machine at lower cost. |
DNAnexus DeepVariant App | Offers parallelized execution with a GUI interface (requires platform account). |
Nextflow Pipeline | Offers parallel processing of multiple BAMs and Docker support. |
DNAstack Pipeline | Cost-optimized DeepVariant pipeline (requires platform account). |
Please open a pull request if you wish to contribute to DeepVariant. Note, we have not set up the infrastructure to merge pull requests externally. If you agree, we will test and submit the changes internally and mention your contributions in our release notes. We apologize for any inconvenience.
If you have any difficulty using DeepVariant, feel free to open an issue. If you have general questions not specific to DeepVariant, we recommend that you post on a community discussion forum such as BioStars.
DeepVariant happily makes use of many open source packages. We would like to specifically call out a few key ones:
We thank all of the developers and contributors to these packages for their work.
This is not an official Google product.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
1. 开源生态
2. 协作、人、软件
3. 评估模型