Overview

Compact system summary, quick metrics, and shortcuts into the main test workspaces.

Metrics Snapshot

Fast glance view. Open the Metrics page for the detailed charts.

Quick Actions

Chat playground Jump straight into conversational inference testing.

Responses playground Inspect structured reasoning and streamed output.

Pipeline APIs Task and loaded-pipeline checks stay behind capability detection.

YOLO APIs Image inference testing for the YOLO APIs.

Loaded Models

Backend Capabilities

Gateway Tools

Models

Model pools, fetch operations, archive unpacking, and loaded-state management.

Backends

Backend management view scaffold. Wiring can land here once the server-side controls are ready.

Backend management is intentionally left empty for now.

Tests

One endpoint workspace at a time, with dedicated panels for text, media, pipeline, and YOLO testing.

Metrics

Compact cards for the latest snapshot, with the detailed resource reporter docked below.

FlexServ(Transformers Backend)

OpenAPI spec of FlexServ with Transformers Backend

Version: 1.4.6

This UI is created for easily testing FlexServ APIs

Visual Smoke Tests

Interactive tests for each API with parameterized inputs and structured output rendering.

FlexServ token (Bearer auth for gateway requests and generated cURL) HuggingFace token (optional, for gated model fetch) Show outgoing request payload

This sends Authorization: Bearer <FLEXSERV_TOKEN>. Model fetch requests also send x-hf-token when set.

1) Gateway & Platform APIs

Gateway readiness

OpenAPI summary

Resource reporter (auto refresh every 30s)

3) Inference APIs

Depending on the model, some UI features may not be fully supported. If one model does not work properly, try a different model.

Chat Completions (/v1/chat/completions, stream/non-stream)

Prompt text (Markdown)

Preview

Paste text, image URLs, or actual image clipboard data. Multiple images are supported for VLM chat payloads.

Model System message Max tokens Caps the response length. Smaller is faster; larger allows longer answers.

Stream response Multi-turn conversation

Temperature

0.50

deterministiccreative

Top-p

1.00

focusedbroad

Seed

Clear to omit. Set an integer for reproducible sampling paths.

Frequency penalty

Leave empty to omit. Positive values reduce repetition.

Compact controls: temperature/top-p/seed/frequency penalty in one responsive block.

Tools

Lets tool-capable models call functions. Most models can leave this empty.

Generation config

Power-user override for Transformers generation settings. Empty is safest.

Logit bias

Advanced: push specific token IDs up or down. Usually leave empty.

Responses (/v1/responses, stream/non-stream)

Input (Markdown)

Preview

Responses input is text-focused here; stored history and long prompt attachments are composed into the outgoing request.

Model Instructions (optional) Sets the model's behavior style, like "be concise" or "explain simply". Max output tokens (optional) Caps response length. Lower means shorter answers and quicker responses.

Stream response Multi-turn conversation Parallel tool calls

Only matters when tools are used; otherwise it has no effect.

Temperature

0.50

deterministiccreative

Lower is steadier and safer; higher is more varied and creative. Top-p

1.00

focusedbroad

Controls how many candidate words are considered each step.

Seed

Clear to omit. Set an integer for reproducible sampling paths.

Metadata

Attach your own tags/notes for tracking requests.

Generation config

Advanced override for Transformers generation settings. Empty is safest.

Completions (/v1/completions, stream/non-stream)

Model Prompt Max tokens Stream response

Embeddings (/v1/embeddings, batch)

Model Input texts (one per line)

Audio transcription (/v1/audio/transcriptions)

Model Audio file Local file path for cURL (optional)

YOLO APIs

Image inference testing for the five YOLO tasks in one shared workspace.

YOLO Inference (/v1/yolo/*)

Image inference

Upload an image and run a single YOLO request.

Boxes

Model Name Image Size

Upload image

Confidence threshold 0.25 IoU threshold 0.70

Show Labels Show Confidence Include annotated output

Advanced options

Max detections Box format Classification top-k

Preview

The current input preview and the latest response appear here.

Task: detect Mode: image Model: -

Choose an image to begin.

2) Model Management APIs

Model pools (drag model cards between pools)

Checking owner permission...

Public Pool

Private Pool

Fetch model (batch)

Fetch result

Request ID

FlexServ(Transformers Backend)

Overview

Metrics Snapshot

Quick Actions

Loaded Models

Backend Capabilities

Gateway Tools

Models

Backends

Tests

Metrics

FlexServ(Transformers Backend)

1) Gateway & Platform APIs

Gateway readiness

OpenAPI summary

Resource reporter (auto refresh every 30s)

3) Inference APIs

Chat Completions (/v1/chat/completions, stream/non-stream)

Responses (/v1/responses, stream/non-stream)

Completions (/v1/completions, stream/non-stream)

Embeddings (/v1/embeddings, batch)

Audio transcription (/v1/audio/transcriptions)

Pipeline APIs (/v1/pipelines)

YOLO APIs

YOLO Inference (/v1/yolo/*)

Image inference

Preview

2) Model Management APIs

Model pools (drag model cards between pools)

Public Pool

Private Pool

Fetch model (batch)

Fetch result

Unpack archives