Home | ARK Labs

Scale. Guard. Align

At least 50% cheaper than industry giants

Spend and scale wisely. Proprietary algorithms utilize all grades of consumer GPUs, maximizing efficiency at every step of the process.
Unlike elsewhere, your data stays truly yours

Don’t publicize your know-how. Implemented solutions keep proprietary information shielded from third-party access.
Tailor made solutions, not off the shelf stock

Configure and fine-tune your own LLMs. Infrastructure and deployments alignable with any company-specific needs and standards.

Leverage technology & expertize

Stateful architecture

Intelligent token management that remembers context and reduces costs
- Reduced token consumption cost-effective conversations
- Context preservation memory maintenance across multiple interactions
- Cost optimization dynamic token allocation based on context
Cutting-edge features

Enterprise-grade capabilities for all kinds of needs
- 8k-128k context windows flexible, model-dependent, optimized for performance
- Load balancing redirecting traffic to different providers when needed
- Custom API distribution e.g., 75% ARK API, 25% OpenAI API
AI Engineering Consultancy

Expert advice for optimal AI implementation and customization
- Open-source model selection practical solutions for optimal AI inference
- New models integration rapid on client request
- Responsive support from experienced AI specialists

Plug and play

Over a hundred supported models, integrations, frameworks and libraries.

Request a demo

Meta

Mistral

DeepSeek

Bielik

Falcon

Qwen

Jina

HuggingFace

Deploy LLMs flexibly

On-premises Private Cloud

Deployed on your own hardware with
consumer-grade GPUs
- Maximum data privacy and control
- Reduced hardware costs
- Full infrastructure ownership
- Comprehensive, in-person technical support
Hybrid Private Cloud

Cloud solution based on dedicated consumer-grade GPU infrastructure
- Enterprise-grade security
- Managed maintenance
- Scalable resources
- Remote technical support

Explore API & pricing

OpenAI-compatible API with flexible pricing and advanced features for optimal cost efficiency

Dynamic Pricing

Save with our time-based (day, night) rates and flexible response limits - optimize costs by choosing when and how fast you need AI responses
Pay As You Go

Only pay for what you use with transparent token-based pricing and detailed usage analytics
Fixed Price Plans

Predict monthly costs for enterprise needs with graceful performance scaling when reaching usage limits

Chat
Chat streaming
Embedding
Embedding visualisation
Chat stateful

import openai

ark_api_key = "API_KEY"
ark_base_url = "http://10.10.10.4:8000/api/v1"

client = openai.OpenAI(api_key=ark_api_key, base_url=ark_base_url)

print("Waiting for response...")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a story about a brave knight traversing space
         in a small rocket who's lost because GPS only works on Earth. 200 words."}
        ],
)
print("Response:")
print(response.choices[0].message.content)
import openai

ark_api_key = "API_KEY"
ark_base_url = "http://10.10.10.4:8000/api/v1"

client = openai.OpenAI(api_key=ark_api_key, base_url=ark_base_url)

print("Waiting for response to start streaming...")

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a story about a brave knight traversing space in a small rocket who's lost because GPS only works on Earth.
        200 words."}
        ],
    stream=True )

print("Response:")
for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
import openai

ark_api_key = "API_KEY"
ark_base_url = "http://10.10.10.4:8000/api/v1"

client = openai.OpenAI(api_key=ark_api_key, base_url=ark_base_url)

print("Waiting for response...")

response = client.embeddings.create(
model="text-embedding-ada-002",
input="A shy knight traversing space in a small rocket, lost because GPS only works on Earth." )

embedding = response.data[0].embedding

print("Embedding:")
print(response.data[0].embedding)
import numpy as np
import openai

def rgb_color(value):
    gray = int(255 * value)
    return f'\033[38;2;{gray};{gray};{gray}m█\033[0m'

def print_emb(embedding):
    min_val, max_val = min(embedding), max(embedding)
    normalized_embedding = [(x - min_val) / (max_val - min_val) for x in embedding]

    compressed_embedding = np.array_split(normalized_embedding, bar_width)
    compressed_embedding = [np.mean(chunk) for chunk in compressed_embedding]

    bar = ''.join([rgb_color(x) for x in compressed_embedding])

    print(bar)

ark_api_key = "API_KEY"
ark_base_url = "http://10.10.10.4:8000/api/v1"
bar_width = 80

client = openai.OpenAI(api_key=ark_api_key, base_url=ark_base_url)

print("Waiting for response 1/3...")
response = client.embeddings.create(
    model="text-embedding-ada-002",
    input="A brave knight traversing space in a small rocket, lost because GPS only works on Earth." )

print("Waiting for response 2/3...")
response2 = client.embeddings.create(
    model="text-embedding-ada-002",
    input="A shy knight traversing space in a small rocket, lost because GPS only works on Earth." )

print("Waiting for response 3/3...")
response3 = client.embeddings.create(
    model="text-embedding-ada-002",
    input="A red fox jumped over a sleeping hedgehog." )

print("Embeddings visual comparison:")
print_emb(response.data[0].embedding)
print_emb(response2.data[0].embedding)
print_emb(response3.data[0].embedding)
import requests

ark_api_key = "API_KEY"
ark_base_url = "http://10.10.10.4:8000/api/v1/chat/completions"

session = requests.Session()

headers = {
    "Authorization": f"Bearer {ark_api_key}",
    "Content-Type": "application/json",
}

print("Waiting for the first response...")

response = session.post(
    ark_base_url,
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Tell me a story about a brave knight traversing space in a small rocket who's lost because GPS only works on Earth.
            200 words."}
        ],
    },
    headers=headers, )

if response.status_code == 200:
    print("Cookies received:", session.cookies.get_dict())
    print()
    print("First response:")
    data = response.json()
    print(data["choices"][0]["message"]["content"])
else:
    print("Failed to get response:", response.text)
    exit()

print()
print("Waiting for the second response...")

response = session.post(
    ark_base_url,
    json={
        "model": "gpt-4o",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Translate the story to German, please."}
        ],
    },
    headers=headers, )

print()
print("Second response:")
data = response.json()
print(data["choices"][0]["message"]["content"])

Request a demo

Browse blog

Latest insights in artificial intelligence and technology.

Find answers

What security measures are in place for GPU lenders and renters?

We use containerized environments, workload isolation, and automated monitoring to prevent misuse. Payments are processed securely, and lenders have full control over how their GPUs are used.
How do you ensure GPU renters get reliable performance?

We benchmark all GPUs on our platform and use automated performance monitoring to ensure they meet advertised specifications. If a rented GPU underperforms, our system dynamically reassigns workloads to ensure consistent performance.
How does your payment system ensure fair payouts for GPU owners?

Our platform tracks GPU usage in real-time and calculates payouts based on actual compute time and performance benchmarks. Payments are automated and transparent, with lenders receiving detailed usage reports.
Can I rent entire GPU clusters for large-scale workloads?

Yes, we offer access to GPU clusters for distributed AI training, HPC, and large-scale simulations. Users can configure multi-GPU and multi-node setups as needed.
How does your platform compare to traditional cloud GPU providers?

We offer lower costs, flexible pricing, and a decentralized network of GPUs, allowing users to access computing power on demand without long-term commitments.