FlexServ(Transformers Backend)

OpenAPI spec of FlexServ with Transformers Backend

Version 1.4.6 OpenAPI not loaded Loaded models: -

Overview

Compact system summary, quick metrics, and shortcuts into the main test workspaces.

Metrics Snapshot

Fast glance view. Open the Metrics page for the detailed charts.

Quick Actions

Loaded Models

Backend Capabilities

Gateway Tools

Models

Model pools, fetch operations, archive unpacking, and loaded-state management.

Backends

Backend management view scaffold. Wiring can land here once the server-side controls are ready.

Backend management is intentionally left empty for now.

Tests

One endpoint workspace at a time, with dedicated panels for text, media, pipeline, and YOLO testing.

Metrics

Compact cards for the latest snapshot, with the detailed resource reporter docked below.

FlexServ(Transformers Backend)

OpenAPI spec of FlexServ with Transformers Backend

Version: 1.4.6

This UI is created for easily testing FlexServ APIs
Visual Smoke Tests

Interactive tests for each API with parameterized inputs and structured output rendering.

This sends Authorization: Bearer <FLEXSERV_TOKEN>. Model fetch requests also send x-hf-token when set.

1) Gateway & Platform APIs

Gateway readiness

OpenAPI summary

Resource reporter (auto refresh every 30s)

3) Inference APIs

Depending on the model, some UI features may not be fully supported. If one model does not work properly, try a different model.

Chat Completions (/v1/chat/completions, stream/non-stream)

Prompt text (Markdown)
Preview
Paste text, image URLs, or actual image clipboard data. Multiple images are supported for VLM chat payloads.
Temperature
0.50
deterministiccreative
Top-p
1.00
focusedbroad
Compact controls: temperature/top-p/seed/frequency penalty in one responsive block.
Tools
Lets tool-capable models call functions. Most models can leave this empty.
Generation config
Power-user override for Transformers generation settings. Empty is safest.
Logit bias
Advanced: push specific token IDs up or down. Usually leave empty.

Responses (/v1/responses, stream/non-stream)

Input (Markdown)
Preview
Responses input is text-focused here; stored history and long prompt attachments are composed into the outgoing request.
Only matters when tools are used; otherwise it has no effect.
Metadata
Attach your own tags/notes for tracking requests.
Generation config
Advanced override for Transformers generation settings. Empty is safest.

Completions (/v1/completions, stream/non-stream)

Embeddings (/v1/embeddings, batch)

Audio transcription (/v1/audio/transcriptions)

2) Model Management APIs

Model pools (drag model cards between pools)

Checking owner permission...
Public Pool
Private Pool

Fetch model (batch)

Fetch result

Unpack archives