1 Star 0 Fork 0

openvinotoolkit-prc / DeepVariant

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
BSD-3-Clause

DeepVariant

release announcements blog

DeepVariant is a deep learning-based variant caller that takes aligned reads (in BAM or CRAM format), produces pileup image tensors from them, classify each tensor using a convolutional neural network, and finally reports the results in a standard VCF or gVCF file.

DeepVariant supports:

  • Germline variant-calling in diploid organisms.
    • For somatic data or any other samples where the genotypes go beyond two copies of DNA, DeepVariant will not work out of the box because the only genotypes supported are hom-alt, het, and hom-ref.
    • The models included with DeepVariant are only trained on human data. For other organisms, see the blog post on non-human variant-calling for some possible pitfalls and how to handle them.
  • Calling from NGS and long-read sequencing data.

How to run

We recommend using our Docker solution. The command will look like this:

BIN_VERSION="1.0.0"
docker run \
  -v "YOUR_INPUT_DIR":"/input" \
  -v "YOUR_OUTPUT_DIR:/output" \
  google/deepvariant:"${BIN_VERSION}" \
  /opt/deepvariant/bin/run_deepvariant \
  --model_type=WGS \ **Replace this string with exactly one of the following [WGS,WES,PACBIO,HYBRID_PACBIO_ILLUMINA]**
  --ref=/input/YOUR_REF \
  --reads=/input/YOUR_BAM \
  --output_vcf=/output/YOUR_OUTPUT_VCF \
  --output_gvcf=/output/YOUR_OUTPUT_GVCF \
  --num_shards=$(nproc) **This will use all your cores to run make_examples. Feel free to change.**

To see all flags you can use, run: docker run google/deepvariant:"${BIN_VERSION}" --help

If you're using GPUs, or want to use Singularity instead, see Quick Start for more details or see all the setup options available including solutions on external platforms.

For more information, also see:

How to cite

If you're using DeepVariant in your work, please cite:

A universal SNP and small-indel variant caller using deep neural networks. Nature Biotechnology 36, 983–987 (2018).
Ryan Poplin, Pi-Chuan Chang, David Alexander, Scott Schwartz, Thomas Colthurst, Alexander Ku, Dan Newburger, Jojo Dijamco, Nam Nguyen, Pegah T. Afshar, Sam S. Gross, Lizzie Dorfman, Cory Y. McLean, and Mark A. DePristo.
doi: https://doi.org/10.1038/nbt.4235

Additionally, if you are generating multi-sample calls using our DeepVariant and GLnexus Best Practices, please cite:

Accurate, scalable cohort variant calls using DeepVariant and GLnexus. bioRxiv 10.1101/2020.02.10.942086v1 (2020).
Taedong Yun, Helen Li, Pi-Chuan Chang, Michael F. Lin, Andrew Carroll, and Cory Y. McLean.
doi: https://doi.org/10.1101/2020.02.10.942086

Why Use DeepVariant?

  • High accuracy - In 2016 DeepVariant won PrecisionFDA Truth Challenge for best SNP Performance. DeepVariant maintains high accuracy across data from different sequencing technologies, prep methods, and species. For lower coverage, using DeepVariant makes an especially great difference. See metrics for the latest accuracy numbers on each of the sequencing types.
  • Flexibility - Out-of-the-box use for PCR-positive samples and low quality sequencing runs, and easy adjustments for different sequencing technologies and non-human species.
  • Ease of use - No filtering is needed beyond setting your preferred minimum quality threshold.
  • Cost effectiveness - With a single non-preemptible n1-standard-16 machine on Google Cloud, it costs ~$9.11 to call a 30x whole genome and ~$0.39 to call an exome. With preemptible pricing, the cost is $2.19 for a 30x whole genome and $0.09 for whole exome (not considering preemption).
  • Speed - On a 64-core CPU-only machine, DeepVariant completes a 50x WGS in 5 hours and an exome in 16 minutes (1). Multiple options for acceleration exist, taking the WGS pipeline to as fast as 40 minutes (see external solutions).
  • Usage options - DeepVariant can be run via Docker or binaries, using both on-premise hardware or in the cloud, with support for hardware accelerators like GPUs and TPUs.

(1): Time estimates do not include mapping.

How DeepVariant works

diagram of stages in DeepVariant

For more information on the pileup images and how to read them, please see the "Looking through DeepVariant's Eyes" blog post.

DeepVariant relies on Nucleus, a library of Python and C++ code for reading and writing data in common genomics file formats (like SAM and VCF) designed for painless integration with the TensorFlow machine learning framework. Nucleus was built with DeepVariant in mind and open-sourced separately so it can be used by anyone in the genomics research community for other projects. See this blog post on Using Nucleus and TensorFlow for DNA Sequencing Error Correction.

DeepVariant Setup

Prerequisites

  • Unix-like operating system (cannot run on Windows)
  • Python 2.7

Official Solutions

Below are the official solutions provided by the Genomics team in Google Health.

Name Description
Docker This is the recommended method.
Build from source DeepVariant comes with scripts to build it on Ubuntu 14 and 16, with Ubuntu 16 recommended. To build and run on other Unix-based systems, you will need to modify these scripts.
Prebuilt Binaries Available at gs://deepvariant/. These are compiled to use SSE4 and AVX instructions, so you will need a CPU (such as Intel Sandy Bridge) that supports them. You can check the /proc/cpuinfo file on your computer, which lists these features under "flags".

External Solutions

The following pipelines are not created or maintained by the Genomics team in Google Health. Please contact the relevant teams if you have any questions or concerns.

Name Description
Running DeepVariant on Google Cloud Platform Docker-based pipelines optimized for cost and speed. Code can be found here.
DeepVariant-on-spark from ATGENOMIX A germline short variant calling pipeline that runs DeepVariant on Apache Spark at scale with support for multi-GPU clusters (e.g. NVIDIA DGX-1).
NVIDIA Clara Parabricks An accelerated DeepVariant pipeline with multi-GPU support that runs our WGS pipeline in just 40 minutes, at a cost of $2-$3 per sample. This provides a 7.5x speedup over a 64-core CPU-only machine at lower cost.
DNAnexus DeepVariant App Offers parallelized execution with a GUI interface (requires platform account).
Nextflow Pipeline Offers parallel processing of multiple BAMs and Docker support.
DNAstack Pipeline Cost-optimized DeepVariant pipeline (requires platform account).

Contribution Guidelines

Please open a pull request if you wish to contribute to DeepVariant. Note, we have not set up the infrastructure to merge pull requests externally. If you agree, we will test and submit the changes internally and mention your contributions in our release notes. We apologize for any inconvenience.

If you have any difficulty using DeepVariant, feel free to open an issue. If you have general questions not specific to DeepVariant, we recommend that you post on a community discussion forum such as BioStars.

License

BSD-3-Clause license

Acknowledgements

DeepVariant happily makes use of many open source packages. We would like to specifically call out a few key ones:

We thank all of the developers and contributors to these packages for their work.

Disclaimer

This is not an official Google product.

Copyright 2020 Google LLC. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

简介

DeepVariant is a deep learning-based variant caller that takes aligned reads (in BAM or CRAM format), produces pileup image tensors from them, classify each tensor using a convolutional neural network, and finally reports the results in a standard VCF or gVCF file. 展开 收起
Python
BSD-3-Clause
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
Python
1
https://gitee.com/openvinotoolkit-prc/deepvariant.git
git@gitee.com:openvinotoolkit-prc/deepvariant.git
openvinotoolkit-prc
deepvariant
DeepVariant
r1.0

搜索帮助