Nvidia shrinks AI image generation method to size of a WhatsApp message

cyptouser1 years agoCryptocurrencies News370

Nvidia shrinks AI image generation method to size of a WhatsApp message

Nvidia researchers have developed a new AI image generation technique that could allow highly customized text-to-image models with a fraction of the storage requirements.

According to a paper published on arXiv, the proposed method called “Perfusion” enables adding new visual concepts to an existing model using only 100KB of parameters per concept.

Perfusion AI
Source: Nvidia Research

As the paper’s authors describe, Perfusion works by “making small updates to the internal representations of a text-to-image model.”

More specifically, it makes carefully calculated changes to the parts of the model that connect the text descriptions to the generated visual features. Applying minor, parameterized edits to the cross-attention layers allows Perfusion to modify how text inputs get translated into images.

Therefore, Perfusion doesn’t totally retrain a text-to-image model from scratch. Instead, it slightly adjusts the mathematical transformations that turn words into pictures. This allows it to customize the model to produce new visual concepts without needing as much compute power or model retraining.

The Perfusion method needs only 100kb.

Perfusion achieved these results with two to five orders of magnitude fewer parameters than competing techniques.

While other methods may require hundreds of megabytes to gigabytes of storage per concept, Perfusion needs only 100KB – comparable to a small image, text, or WhatsApp message.

This dramatic reduction could make deploying highly customized AI art models more feasible.

According to co-author Gal Chechik,

“Perfusion not only leads to more accurate personalization at a fraction of the model size, but it also enables the use of more complex prompts and the combination of individually-learned concepts at inference time.”

The method allowed creative image generation, like a “teddy bear sailing in a teapot,” using personalized concepts of “teddy bear” and “teapot” learned separately.

Perfusion AI
Source: Nvidia Research

Possibilities of Efficient Personalization

Perfusion’s unique capability to enable the personalization of AI models using just 100KB per concept opens up a myriad of potential applications:

This method paves the way for individuals to easily tailor text-to-image models with new objects, scenes, or styles, eliminating the need for expensive retraining. The efficiency of Perfusion’s 100KB parameter update per concept allows models that are customized with this technique to be implemented on consumer devices, enabling on-device image creation.

One of the most striking aspects of this technique is the potential it offers for sharing and collaboration around AI models. Users could share their personalized concepts as small add-on files, circumventing the need to share cumbersome model checkpoints.

In terms of distribution, models that are tailored to particular organizations could be more easily disseminated or deployed at the edge. As the practice of text-to-image generation continues to become more mainstream, the ability to achieve such significant size reductions without sacrificing functionality will be paramount.

It’s important to note, however, that Perfusion primarily provides model personalization rather than full generative capability itself.

Limitations and Release

While promising, the technique does have some limitations. The authors note that critical choices during training can sometimes over-generalize a concept. More research is still needed to seamlessly combine multiple personalized ideas within a single image.

The authors note that code for Perfusion will be made available on their project page, indicating an intention to release the method publicly in the future, likely pending peer review and an official research publication. However, specifics on public availability remain unclear since the work is currently only published on arXiv. On this platform, researchers can upload papers before formal peer review and publication in journals/conferences.

While Perfusion’s code is not yet accessible, the authors’ stated plan implies that this efficient, personalized AI system could find its way into the hands of developers, industries, and creators in due course.

As AI art platforms like MidJourney, DALL-E 2, and Stable Diffusion gain steam, techniques that allow greater user control could prove critical for real-world deployment. With clever efficiency improvements like Perfusion, Nvidia appears determined to retain its edge in a rapidly evolving landscape.


The content on this website comes from the Internet. Due to the inconvenience of proofreading the authenticity and accuracy of the copyright or content of some content, it may be temporarily impossible to confirm the authenticity and accuracy of the copyright or content. For copyright issues or other issues caused by this, please Call or email this site. It will be deleted or changed immediately after verification.

related articles

Meta’s Zuckerberg sees metaverse as long-term goal despite $3.7B loss, AI investments ‘paying off’

During the Q2 Earnings Call, Meta CEO Mark Zuckerberg outlined the company’s commitment to its...

Friend Tech faces continued sniper bot issue, pushing price of popular creators before shares hit market

Friend Tech (FT), the web3 social token platform that saw a resurgence in user activity recently, has seen an increase in “sniper bots,” which have been causing significant shifts in share prices.

According to a detailed analysis performed by  X user @unexployed_ of Castle Capital, these bots, beyond their normally expected functionality, are deploying a technique of ‘sniping’ to gain control over high-value profile shares.

In the case of DappRadar’s recent registration on FT, Unexployed revealed that the share prices started at an unusually high point of 0.26 ETH. This was not triggered by a registered account but seemingly by a sniping address interacting directly with the smart contracts, demonstrating the influence of these bots on the market.

Digging deeper into basescan.org, Unexployed was able to trace the chronological order of buyers and sellers. Within the first four blocks, there were already 65 shares on the market. And DappRadar was not alone. Other entities, such as Moonshilla and Rektdiomedes, also faced a similar situation where snipers gained immediate control over their FT supply.

The primary sniper, identified as 0x081…951, executed over 20,000 transactions to acquire the shares. The first 46 transactions failed with the error “Fail with error ‘Insufficient payment” and were reverted, according to Basescan.

CryptoSlate analysis of the transactions revealed that the account attempted to purchase the shares before the owner of the account had purchased the first share (a requirement of FT.) The transaction log states Fail with the error, “Only the shares’; subject can buy the first share”

Friend Tech faces continued sniper bot issue, pushing price of popular creators before shares hit ma

Friend Tech faces continued sniper bot issue, pushing price of popular creators before shares hit ma

Friend Tech (FT), the web3 social token platform that saw a resurgence in user activity re...

World Mobile launches Ethereum and BNB Chain bridges, strengthening multichain strategy

World Mobile (WMT), a global mobile network based on blockchain, has launched dedicated cross-chain...

Bitcoin, Solana community calls out Coinbase CEO on crypto payment vision

Bitcoin, Solana community calls out Coinbase CEO on crypto payment vision

The vision for a future where crypto payments are instant, accessible, and global has ignited a spir...

Riot Platforms power strategy reaps $31.7M in Texas energy credits

Continuing its strategic shift aimed at mitigating losses, Riot Platforms, a prominent Bitcoin miner...

Ethereum MEV incentives limit decentralization new report shows

Ethereum MEV incentives limit decentralization new report shows

On blockchain networks like Ethereum, decentralized validation underpins the entire ecosystem....