# Shasta long read assembler
___

**The complete user documentation is available [here](https://chanzuckerberg.github.io/shasta/).**

**For quick start information see [here](https://chanzuckerberg.github.io/shasta/QuickStart.html).**

See [Shafin et al, Nature Biotechnology 2020](https://www.nature.com/articles/s41587-020-0503-6)
for an error analysis of the Shasta assembler and more.
Reads from this paper are available 
[here](https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html).
The assembly results are
[here](https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=assemblies).

[Here](https://github.com/human-pangenomics/assembly-analysis) is a QUAST analysis of a Shasta assembly of CHM13 
and comparison with other assemblers.

**Requests for help:** please file GitHub issues to report problems, request help or ask questions. **Please keep each issue on a single topic when possible.** 
___

The goal of the Shasta long read assembler is to rapidly 
produce accurate assembled sequence using as input DNA reads
generated by [Oxford Nanopore](https://nanoporetech.com) flow cells.

Computational methods used by the Shasta assembler include:

* Using a
[run-length](https://en.wikipedia.org/wiki/Run-length_encoding)
representation of the read sequence.
This makes the assembly process more resilient to errors in
homopolymer repeat counts, which are the most common type
of errors in Oxford Nanopore reads. 

* Using in some phases of the computation a representation
of the read sequence based on *markers*, a fixed
subset of short k-mers (k ≈ 10).

As currently implemented, Shasta can run an assembly 
of a human genome at coverage around 60x
in about 3 hours using a single, large machine (AWS instance type
`x1.32xlarge`, with 128 virtual processors and 1952 GB of memory).
The compute cost of such an assembly is around $20 at AWS spot market or reserved prices.

Shasta assembly quality is comparable or better 
than assembly quality achieved by other long read assemblers -
see [this paper](https://www.biorxiv.org/content/10.1101/715722v1)
for an extensive analysis.
However,
**adjustments of assembly parameters are generally necessary** to 
achieve optimal assembly results. 
A set of sample configuration files is provided (in the `conf` directory)
to assist with this process.




#### Acknowledgments

The Shasta software uses various external software packages.
See [here](https://chanzuckerberg.github.io/shasta/Acknowledgments.html) for more information.

#### Reporting Security Issues
Please note: If you believe you have found a security issue, please responsibly disclose by contacting security@chanzuckerberg.com.
___

**The complete user documentation is available [here](https://chanzuckerberg.github.io/shasta/).**

**For quick start information see [here](https://chanzuckerberg.github.io/shasta/QuickStart.html).**
___




