GenProSeq:Generating Protein Sequences with Deep Generative Models
Generative modeling for protein engineering is key to solving fundamental problems in synthetic biology, medicine,
and material science. Machine learning has enabled us to
generate useful protein sequences on a variety of scales.
Generative models are machine learning methods which seek to
model the distribution underlying the data, allowing for the
generation of novel samples with similar properties to those on
which the model was trained. Generative models of proteins can
learn biologically meaningful representations helpful for a
variety of downstream tasks. Furthermore, they can learn to
generate protein sequences that have not been observed before
and to assign higher probability to protein sequences that
satisfy desired criteria. In this package, common deep
generative models for protein sequences, such as variational
autoencoder (VAE), generative adversarial networks (GAN), and
autoregressive models are available. In the VAE and GAN, the
Word2vec is used for embedding. The transformer encoder is
applied to protein sequences for the autoregressive model.