These startups are building cutting-edge AI models without the need for a data center

By: blockbeats|2025/05/01 18:05:30

GPT AI

Alien Invasion

Large Language Model Based

Ai.com

AI Protocol

Researchers have utilized GPUs distributed globally, combined with private and public data, to train a new type of large language model (LLM). This move indicates that the mainstream approach to building artificial intelligence may be disrupted.

Two unconventional AI-building startups, Flower AI and Vana, collaborated to develop this new model, named Collective-1.

Flower's developed technology allows the training process to be distributed across hundreds of connected computers over the internet. The company's tech has been used by some firms to train AI models without the need for centralized computing resources or data. Vana, on the other hand, provided data sources such as private messages on X, Reddit, and Telegram.

By modern standards, Collective-1 is relatively small-scale, with 7 billion parameters—these parameters collectively empower the model—compared to today's most advanced models (such as those powering ChatGPT, Claude, and Gemini) with hundreds of billion parameters.

Nic Lane, a computer scientist at the University of Cambridge and co-founder of Flower AI, stated that this distributed approach is expected to scale well beyond Collective-1. Lane added that Flower AI is currently training a 300 billion parameter model with conventional data and plans to train a 1 trillion parameter model later this year—approaching the scale offered by industry leaders. "This could fundamentally change people's perception of AI, so we are going all-in," Lane said. He also mentioned that the startup is incorporating images and audio into training to create multimodal models.

Distributed model building may also shake up the power dynamics shaping the AI industry.

Currently, AI companies construct models by combining massive training data with large-scale computing resources centralized in data centers. These data centers are equipped with cutting-edge GPUs and interconnected via ultra-high-speed fiber-optic cables. They also heavily rely on datasets created by scraping public (though sometimes copyrighted) materials such as websites and books.

This approach implies that only the wealthiest companies and nations with a large number of powerful chips can effectively develop the most robust, valuable models. Even open-source models like Meta's Llama and DeepSeek's R1 are constructed by companies with large data centers. A distributed approach could allow small companies and universities to build advanced AI by aggregating homogeneous resources. Alternatively, it could enable countries lacking traditional infrastructure to build stronger models by networking multiple data centers.

Lane believes that the AI industry will increasingly move towards allowing training in novel ways that break out of a single data center. The distributed approach "allows you to scale computation in a more elegant way than a data center model," he said.

Helen Toner, an AI governance expert at the Emerging Technology Security Center, stated that Flower AI's approach is "interesting and potentially quite relevant" to AI competition and governance. "It may be hard to keep up at the cutting edge, but it may be an interesting fast-follower approach," Toner said.

Divide and Conquer

Distributed AI training involves rethinking how computation is allocated to build powerful AI systems. Creating LLMs requires feeding a model large amounts of text, adjusting its parameters to generate useful responses to prompts. In a data center, the training process is segmented to run parts of tasks on different GPUs and then periodically aggregated into a single master model.

The new approach allows work typically done in large data centers to be performed on hardware potentially miles apart and connected by relatively slow or unreliable internet connections.

Some major companies are also exploring distributed learning. Last year, Google researchers demonstrated a new scheme called DIstributed PAth COmposition (DiPaCo) for segmenting and integrating computation to make distributed learning more efficient.

To build Collective-1 and other LLMs, Lane collaborated with academic partners in the UK and China to develop a new tool called Photon to make distributed training more efficient. Lane stated that Photon enhances Google's approach by adopting a more efficient data representation and shared and integrated training schemes. This process is slower than traditional training but more flexible, allowing for the addition of new hardware to accelerate training, Lane said.

Photon was developed through a collaboration between researchers at Beijing University of Posts and Telecommunications and Zhejiang University. The team released the tool under an open-source license last month, allowing anyone to use this approach.

As part of Flower AI's efforts in building Collective-1, their partner Vana is developing a new method for users to share their personal data with AI builders. Vana's software enables users to contribute private data from platforms like X and Reddit to the training of large language models, specifying potential final uses and even receiving financial benefits from their contributions.

Anna Kazlauskas, co-founder of Vana, stated that the idea is to make unused data available for AI training while giving users more control over how their information is used in AI. "This data is usually unable to be included in AI models because it's not public," Kazlauskas said. "This is the first time that data contributed directly by users is being used to train foundational models, with users owning the AI model created from their data."

University College London computer scientist Mirco Musolesi has suggested that a key benefit of distributed AI training approaches may be unlocking novel data. "Extending this to cutting-edge models will allow the AI industry to leverage vast amounts of distributed and privacy-sensitive data, such as in healthcare and finance, for training without the risks of centralization," he said.

-- Price

The combination of AI and crypto is still in its early stages, with both serving as complementary "middleware": AI translates human intentions into executable programs, while cryptographic technology provides verifiable and tamper-proof guarantees for computational processes and results. In the dire...

Deconstructing Anthropic: The Best AI Company, Possibly Also a Type of Organizational Invention

Instead of competing with ambition, focusing on restraint, how does Anthropic leverage extreme strategic focus and an "counterintuitive" geek culture to counterattack OpenAI on the AI battlefield?

Every exchange is a "Universal Exchange."

You initially build infrastructure for something, then realize it can also be used for many other things, and then you continuously expand the business to accommodate everything that the infrastructure can support.

The counterattack of traditional finance: Alliance chains are quietly reviving

Whether public chains win or consortium chains win has never been the focus.

Pantera Capital Partner: How Tokenization is Restructuring the Private Equity and Early Investment Ecosystem?

Top tech companies are going public later and later, leaving retail investors shut out during the high growth period. Can tokenization give ordinary people back this entry ticket?

Mastercard Launches Agent Pay for AI, Plans to Record AI Agent Payment Authorizations on Polygon

Mastercard launched Agent Pay for AI, a new payment protocol designed to help AI agents make small payments such as pay-per-use access to data and APIs. The system plans to record human-granted AI agent permissions on Polygon, focusing on verifiable authorization, identity, and payment controls.

Curve Deploys Llamalend v2 on Optimism With 250,000 OP Incentives

Curve launched Llamalend v2 on Optimism with 250,000 OP incentives from the Optimism Foundation. The upgrade expands Llamalend beyond its earlier crvUSD-focused model, adding broader collateral support, LlamaRisk market reviews, and the ability to use Curve LP tokens as collateral.

Raydium Old Liquidity Pool Reportedly Exploited, With $1.34 Million Moved to Ethereum and Tornado Cash

An old Raydium liquidity pool was reportedly exploited for around $1.34 million in USDC, RAY, and wSOL, with the stolen funds bridged to Ethereum and deposited into Tornado Cash. The incident highlights the tail risks of legacy DeFi pools, old contracts, and cross-chain fund laundering paths.

Kalshi Executive Challenges “SBF Backed AI Unicorns” Narrative, Says Leopold Aschenbrenner Was Key Figure

Kalshi executive John Wang questioned the “SBF backed AI unicorns” narrative, saying Leopold Aschenbrenner was the key figure behind major AI investment decisions.

New York Proposes Stricter Stablecoin Issuer Rules Aligned With Federal GENIUS Act

NYDFS proposed stricter stablecoin issuer rules aligned with the GENIUS Act, covering reserves, custody, redemption timelines, audits, and capital buffers.

CryptoQuant Says Bitcoin Profitable Supply Is Near 45% Pressure Zone as On-Chain Data Points to Market Repricing

CryptoQuant said Bitcoin’s profitable supply is nearing the 45% pressure zone, signaling rising market stress, unrealized losses, and a possible on-chain repricing phase.

Bitcoin Falls Below 200-Week Moving Average as On-Chain Data Shows Over Half of Supply in Loss

Bitcoin dropped below its 200-week moving average as on-chain data showed over 50% of circulating supply is now in loss, signaling rising market stress.

CFTC Reportedly Plans New Prediction Market Rules Focused on Manipulation Risk and Public Interest Review

The CFTC is reportedly preparing new prediction market rules focused on manipulation risk, public interest review, and retail trader protections.

Meet the new WEEX trial fund—your gateway to greater profits

Discover WEEX's new trial fund and trade with less risk. Use them to offset transaction fees, funding fees, and trading losses. Kickoff your trading journey with WEEX!

WEEX Labs Lands at Dutch Blockchain Week: A Disruptive Crypto × AI Conversation Sets Sail in Amsterdam

WEEX Labs lands in Amsterdam for Dutch Blockchain Week with Agentic Day: The AI Infrastructure Economy — featuring a Cointelegraph exclusive interview with Waqar Zaka, a live AI trading competition (no code required, just natural language), and a keynote from WEEX COO Andrew Weiner. June 22. Register now: https://luma.com/lo977l6h

SK Hynix Reportedly Plans U.S. ADR Listing as Early as August, With SEC Approval Possible in Late June

SK Hynix may pursue a U.S. ADR listing as early as August, with SEC approval reportedly possible in late June amid strong AI chip supply chain demand.

Morning Report | OpenAI has submitted an S-1 registration statement draft to the U.S. SEC; Morpho completes $175 million financing

Overview of Important Market Events on June 9th

Galaxy Deep Research Report: How Hyperliquid's HIP-4 Upgrade Changes the Landscape of Prediction Markets?

The platform that wins this competition will be the one whose execution layer is the hardest to replicate, whose builder ecosystem delivers the fastest, and whose regulatory path is the most open.

Latest research from 13 top universities including Cornell University: The current state, challenges, and misconceptions of the fusion of Crypto and AI