3 Star 1 Fork 1

热门极速下载 / min-dalle

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
MIT

min(DALL·E)

Colab   Hugging Face Spaces   Replicate   Discord

YouTube Walk-through by The AI Epiphany

This is a fast, minimal port of Boris Dayma's DALL·E Mini (with mega weights). It has been stripped down for inference and converted to PyTorch. The only third party dependencies are numpy, requests, pillow and torch.

To generate a 3x3 grid of DALL·E Mega images it takes:

  • 55 sec with a T4 in Colab
  • 33 sec with a P100 in Colab
  • 15 sec with an A10G on Hugging Face

Here's a more detailed breakdown of performance on an A100. Credit to @technobird22 and his NeoGen discord bot for the graph.
min-dalle

The flax model and code for converting it to torch can be found here.

Install

$ pip install min-dalle

Usage

Load the model parameters once and reuse the model to generate multiple images.

from min_dalle import MinDalle

model = MinDalle(
    models_root='./pretrained',
    dtype=torch.float32,
    device='cuda',
    is_mega=True, 
    is_reusable=True
)

The required models will be downloaded to models_root if they are not already there. Set the dtype to torch.float16 to save GPU memory. If you have an Ampere architecture GPU you can use torch.bfloat16. Set the device to either "cuda" or "cpu". Once everything has finished initializing, call generate_image with some text as many times as you want. Use a positive seed for reproducible results. Higher values for supercondition_factor result in better agreement with the text but a narrower variety of generated images. Every image token is sampled from the top_k most probable tokens. The largest logit is subtracted from the logits to avoid infs. The logits are then divided by the temperature. If is_seamless is true, the image grid will be tiled in token space not pixel space.

image = model.generate_image(
    text='Nuclear explosion broccoli',
    seed=-1,
    grid_size=4,
    is_seamless=False,
    temperature=1,
    top_k=256,
    supercondition_factor=32,
    is_verbose=False
)

display(image)
min-dalle

Credit to @hardmaru for the example

Saving Individual Images

The images can also be generated as a FloatTensor in case you want to process them manually.

images = model.generate_images(
    text='Nuclear explosion broccoli',
    seed=-1,
    grid_size=3,
    is_seamless=False,
    temperature=1,
    top_k=256,
    supercondition_factor=16,
    is_verbose=False
)

To get an image into PIL format you will have to first move the images to the CPU and convert the tensor to a numpy array.

images = images.to('cpu').numpy()

Then image $i$ can be coverted to a PIL.Image and saved

image = Image.fromarray(images[i])
image.save('image_{}.png'.format(i))

Progressive Outputs

If the model is being used interactively (e.g. in a notebook) generate_image_stream can be used to generate a stream of images as the model is decoding. The detokenizer adds a slight delay for each image. Set progressive_outputs to True to enable this. An example is implemented in the colab.

image_stream = model.generate_image_stream(
    text='Dali painting of WALL·E',
    seed=-1,
    grid_size=3,
    progressive_outputs=True,
    is_seamless=False,
    temperature=1,
    top_k=256,
    supercondition_factor=16,
    is_verbose=False
)

for image in image_stream:
    display(image)
min-dalle

Command Line

Use image_from_text.py to generate images from the command line.

$ python image_from_text.py --text='artificial intelligence' --no-mega
min-dalle
Copyright 2022 Brett Kuprel Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

简介

暂无描述 展开 收起
Jupyter Notebook 等 3 种语言
MIT
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/mirrors_trending/min-dalle.git
git@gitee.com:mirrors_trending/min-dalle.git
mirrors_trending
min-dalle
min-dalle
main

搜索帮助