feat(transcription): explicit STT settings with provider, model, prompt (#5926)

4 weeks ago · 238f27dea1
parent ef55013418
commit 238f27dea1
20 changed files with 2962 additions and 389 deletions
--- a/docs/superpowers/plans/2026-05-02-transcription-settings.md
+++ b/docs/superpowers/plans/2026-05-02-transcription-settings.md
--- a/docs/superpowers/specs/2026-05-02-transcription-settings-design.md
+++ b/docs/superpowers/specs/2026-05-02-transcription-settings-design.md
@ -0,0 +1,155 @@
+# Transcription (STT) settings — design
+
+**Date:** 2026-05-02
+**Scope:** Backend + frontend. Schema-additive (no migration required).
+
+## Problem
+
+Memos has one AI feature today: audio transcription (speech-to-text). The current design has three concrete problems:
+
+1. **Model is hard-coded per provider type.** `internal/ai/models.go` pins OpenAI to `gpt-4o-transcribe` and Gemini to `gemini-2.5-flash`. Users who want `whisper-1` (cheaper, often more accurate for non-English) or third-party Whisper-compatible endpoints (Groq's `whisper-large-v3-turbo`, self-hosted whisper.cpp / Speaches via OpenAI-compatible URL) cannot configure them at all.
+2. **No explicit transcription configuration.** `InstanceAISetting.providers` is a generic credentials list. The frontend (`MemoEditor/index.tsx:65`) implicitly picks "the first provider with an API key whose type is in TRANSCRIPTION_PROVIDER_TYPES." Users cannot:
+   - Choose which provider runs transcription when they have multiple.
+   - Set a default language (Whisper API supports it but it is never sent).
+   - Set a `prompt` hint to bias spelling of proper nouns / jargon (a documented Whisper feature, surfaced by every other STT product).
+3. **Gemini fails for browser-recorded audio.** `internal/ai/gemini.go:23` does not list `audio/webm` in `geminiSupportedContentTypes`, but `MediaRecorder` in browsers defaults to `audio/webm`. So selecting a Gemini provider for in-editor recording produces a content-type error every time.
+
+## Goal
+
+Let the operator configure transcription explicitly: which provider, which model, default language, and a spelling-hint prompt. Make the OpenAI provider work as a universal "OpenAI-compatible" engine so Groq / self-hosted Whisper / Speaches are reachable through endpoint override.
+
+## Non-goals
+
+- Adding STT engines beyond OpenAI and Gemini (Azure, Deepgram, AWS Transcribe — out of scope; the schema admits them later via `AIProviderType` enum).
+- Other AI features (summarization, embeddings, tag suggestion). The schema is shaped so they fit later, but none are designed here.
+- Per-call provider override at recording time. Research across all surveyed products (OpenWebUI, LibreChat, Whisper Memos, Superwhisper, etc.) confirms STT engine is a global preference, not an action-time choice. We follow the same pattern.
+- Server-side audio transcoding (e.g., webm → wav for Gemini). See "Gemini webm" below for the chosen mitigation.
+- Multi-user or per-user override of admin defaults. Memos' STT setting is instance-scoped, like every other instance setting.
+
+## Naming
+
+Field and message names follow cross-platform STT conventions, not Memos-internal shorthand:
+
+| Concept | Chosen name | Rationale |
+|---|---|---|
+| Config message | `TranscriptionConfig` | AssemblyAI uses this exact identifier; matches OpenAI's `CreateTranscription*` verb family and Memos' existing `Transcribe` RPC. The `STT` acronym is not used as a type name in any major STT API. |
+| Provider reference | `provider_id` (string) | Plain protobuf convention for a string-ID reference (`field_id`, `user_id` style). `engine` was rejected as an OpenWebUI-only term; typed message refs are not needed since providers are addressed by string ID. |
+| Model | `model` | Unanimous across OpenAI, Google v2, Deepgram, OpenWebUI, LibreChat. Not `model_id`. |
+| Default language | `language` | Bare `language` is the modern convention (OpenAI, Whisper family, Deepgram, Wyoming). `language_code` is the older Google/AWS form; we accept ISO 639-1 short codes the same way OpenAI does. |
+| Spelling hint | `prompt` | OpenAI's public API field name and AssemblyAI's. Whisper's internal name is `initial_prompt`, but `prompt` is what users of `audio.transcriptions.create` recognize. |
+
+A note on the message name collision: `proto/api/v1/ai_service.proto` already declares a `TranscriptionConfig` for **per-call** prompt/language overrides. The new store-level `TranscriptionConfig` lives in package `memos.store`, so the two compile cleanly. Memos already uses parallel `api.v1.X` / `store.X` message pairs (e.g. `User`, `Memo`); this matches that pattern.
+
+## Architecture
+
+### Schema (additive)
+
+`proto/store/instance_setting.proto`:
+
+```proto
+message InstanceAISetting {
+  repeated AIProviderConfig providers = 1;   // unchanged — credential pool
+  TranscriptionConfig transcription = 2;     // NEW — feature config
+}
+
+message TranscriptionConfig {
+  // References an entry in providers[].id. Empty string = transcription disabled.
+  string provider_id = 1;
+  // Free text. Empty string = engine default (whisper-1 for OPENAI, gemini-2.5-flash for GEMINI).
+  string model = 2;
+  // ISO 639-1 short code. Empty string = auto-detect.
+  string language = 3;
+  // Up to ~200 tokens. Used as the OpenAI Whisper `prompt` parameter and as
+  // a "Context and spelling hints:" block in the Gemini prompt.
+  string prompt = 4;
+}
+```
+
+`proto/api/v1/ai_service.proto`:
+
+- `TranscribeRequest.provider_id` becomes optional. When omitted, the server resolves the provider from `InstanceAISetting.transcription.provider_id`.
+- `TranscribeRequest.config` (per-call `TranscriptionConfig` with `prompt` / `language`) is kept for advanced overrides but its fields, when empty, fall back to the persisted defaults from `InstanceAISetting.transcription`.
+
+### Backend changes
+
+1. **`internal/ai/models.go`** — `DefaultTranscriptionModel` already exists; reuse it as the fallback when `TranscriptionConfig.model` is empty. No new code, just used from a new call site.
+2. **`server/router/api/v1/ai_service.go`**:
+   - Read `InstanceAISetting.transcription` at the start of `Transcribe`.
+   - Resolve `provider_id` from request → fall back to `transcription.provider_id`. If both empty, return `FailedPrecondition` with a clear "transcription not configured" message.
+   - Resolve `model` similarly: request override → `transcription.model` → engine default via `DefaultTranscriptionModel`.
+   - Merge `language` and `prompt`: per-call overrides win; otherwise fall through to persisted defaults.
+3. **`internal/ai/gemini.go`** — out of scope to fix the webm content-type list here. See mitigation below.
+
+### Frontend changes
+
+`web/src/components/Settings/AISection.tsx` is restructured into two settings groups inside the existing `SettingSection`:
+
+1. **AI Integrations** (renamed from "Providers" — current behavior): list of credential entries (id, title, type, endpoint, api key). No functional changes; the rename communicates that this section is just credentials.
+2. **Transcription** (new): three-segment form
+   - **Provider** — Select dropdown listing entries from group 1 by `title`. First option is "None — transcription disabled". Disabled with a hint "Add an AI integration first ↑" when group 1 is empty.
+   - **Model** — text input. Placeholder updates dynamically based on the selected provider's type (`whisper-1` for OPENAI, `gemini-2.5-flash` for GEMINI). Help text below: "Free text. Use the provider's model identifier — e.g., whisper-1, gpt-4o-transcribe, whisper-large-v3-turbo."
+   - **Default language** — text input, ISO 639-1 placeholder, empty = auto.
+   - **Prompt hints** — textarea, ~200 token soft limit, help text "Improves spelling of proper nouns and jargon. Whisper limit is ~224 tokens."
+
+`web/src/components/MemoEditor/index.tsx:65` changes:
+
+- Replace the "first provider with apiKey in TRANSCRIPTION_PROVIDER_TYPES" lookup with this enable rule: transcribe button shows iff `aiSetting.transcription.providerId` is non-empty AND the referenced provider exists in `aiSetting.providers` AND that provider has `apiKeySet === true`.
+- The editor no longer needs to know the provider object itself for the call — see service change below.
+
+`web/src/components/MemoEditor/services/transcriptionService.ts` is simplified: it stops accepting a `provider` argument and simply omits `provider_id` from the request. The server resolves the provider, model, language, and prompt from `InstanceAISetting.transcription`. (No override path is exposed at the editor layer; advanced callers can still pass `provider_id` directly via the proto if needed in the future.)
+
+### How "OpenAI-compatible" backends work
+
+To use Groq, Speaches, or self-hosted whisper.cpp:
+
+1. In **AI Integrations**, add a provider with type `OPENAI`, set `endpoint` to e.g. `https://api.groq.com/openai/v1` or `http://speaches:8000/v1`, set the API key, give it a recognizable title ("Groq", "Self-hosted Whisper").
+2. In **Transcription**, select that provider and set `model` to the backend's model identifier (`whisper-large-v3-turbo`, `Systran/faster-distil-whisper-large-v3`, etc.).
+
+This is the universal escape hatch confirmed across OpenWebUI, LibreChat, and Whisper Obsidian plugin: don't enumerate every backend — let the OpenAI engine be a transport, not a brand.
+
+## Gemini webm mitigation
+
+The Gemini `audio/webm` failure is a real user-blocking bug but separate from the settings redesign. Three options were considered:
+
+- **(a) Server-side transcode** with ffmpeg. Adds a heavy runtime dep; rejected as YAGNI.
+- **(b) Switch MediaRecorder format** when STT engine is Gemini. Browser support for `audio/mp4` and `audio/wav` in `MediaRecorder` is patchy across Firefox / Safari / Chrome; rejected as fragile.
+- **(c) Inline hint + accept the limitation.** Selected. The Transcription section shows a small warning under the model field when the chosen provider type is `GEMINI`: "Gemini does not accept browser-recorded `audio/webm`. For in-editor recording, use an OpenAI-compatible provider."
+
+Server-side transcoding can be revisited later as a self-contained change if Gemini demand grows.
+
+## Validation
+
+Server validation (`server/router/api/v1/ai_service.go`):
+
+- `transcription.provider_id`, when set, must reference an existing entry in `providers[]`. On `UpdateInstanceSetting` for the AI key, reject with `InvalidArgument` if it doesn't.
+- `transcription.model` length cap: 256 chars (covers `Systran/faster-distil-whisper-large-v3`-style names with margin).
+- `transcription.language` length cap: 32 chars (existing constant `maxTranscriptionLanguageLength`).
+- `transcription.prompt` length cap: 4096 chars (existing constant `maxTranscriptionPromptLength`).
+
+Frontend validation in `AISection.tsx`:
+
+- "Save" disabled if `transcription.providerId` is set but the referenced provider was just deleted from the integrations list (in the same unsaved edit).
+- Inline warning shown (but Save still allowed) if the referenced provider exists but has `apiKeySet === false` — surfacing the broken state so the operator can fix it without blocking unrelated edits to other settings.
+
+## Backwards compatibility
+
+The schema change is purely additive. Existing instances with `providers` configured but no `transcription` field default to `provider_id = ""`, which means transcription is disabled until the operator visits the new Transcription section and selects a provider.
+
+This is a small UX regression for instances that were relying on the implicit "first provider wins" behavior — they now must make a one-click selection. Acceptable trade-off because:
+
+- It makes the choice explicit (the implicit pick was the source of confusion when users had multiple providers).
+- A one-time migration that auto-fills `transcription.provider_id` with the first STT-capable provider is feasible but adds complexity for a one-line user action. Skip the migration; document the change in the release notes.
+
+## Testing
+
+- `internal/ai/transcription_test.go` (existing) covers the transcribe RPC. Add cases for: empty `provider_id` falls back to setting; empty `model` falls back to `DefaultTranscriptionModel`; per-call overrides win over settings.
+- `server/router/api/v1/test/ai_service_test.go` (existing) covers the API service. Add cases for the validation rules above (unknown provider_id, oversized model/language/prompt).
+- Frontend: manual verification via the dev server (`pnpm dev` in `web/`) — load Settings, add a provider, configure transcription, verify the home editor's record button enables/disables based on `provider_id`. No new component tests required (existing AISection has none).
+
+## Out of scope, explicitly
+
+- Multiple transcription configurations / per-tag or per-user routing.
+- Per-call provider override exposed in the editor UI.
+- Test-transcription button in settings (worth doing later; deferred to keep this scope tight).
+- Glossary / vocabulary list as a separate field — folded into `prompt` for now (Joplin/Superwhisper split this; we can add later if users ask).
+- TTS settings. Memos has none today and none planned.
--- a/internal/ai/models.go
+++ b/internal/ai/models.go
@ -4,7 +4,7 @@ import "github.com/pkg/errors"

 const (
 	// DefaultOpenAITranscriptionModel is the built-in OpenAI transcription model.
-	DefaultOpenAITranscriptionModel = "gpt-4o-transcribe"
+	DefaultOpenAITranscriptionModel = "whisper-1"
 	// DefaultGeminiTranscriptionModel is the built-in Gemini transcription model.
 	DefaultGeminiTranscriptionModel = "gemini-2.5-flash"
 )
--- a/proto/api/v1/ai_service.proto
+++ b/proto/api/v1/ai_service.proto
@ -15,27 +15,13 @@ service AIService {
      post: "/api/v1/ai:transcribe"
      body: "*"
    };
-    option (google.api.method_signature) = "provider_id,config,audio";
+    option (google.api.method_signature) = "audio";
  }
 }

 message TranscribeRequest {
-  // Required. The instance AI provider ID to use.
-  string provider_id = 1 [(google.api.field_behavior) = REQUIRED];
-
-  // Required. Transcription options.
-  TranscriptionConfig config = 2 [(google.api.field_behavior) = REQUIRED];
-
  // Required. Audio input.
-  TranscriptionAudio audio = 3 [(google.api.field_behavior) = REQUIRED];
-}
-
-message TranscriptionConfig {
-  // Optional. A prompt to improve transcription quality.
-  string prompt = 1 [(google.api.field_behavior) = OPTIONAL];
-
-  // Optional. The language of the input audio.
-  string language = 2 [(google.api.field_behavior) = OPTIONAL];
+  TranscriptionAudio audio = 1 [(google.api.field_behavior) = REQUIRED];
 }

 message TranscriptionAudio {
--- a/proto/api/v1/instance_service.proto
+++ b/proto/api/v1/instance_service.proto
@ -227,6 +227,10 @@ message InstanceSetting {
  message AISetting {
    // providers is the list of AI provider configurations available instance-wide.
    repeated AIProviderConfig providers = 1;
+
+    // transcription is the speech-to-text feature configuration.
+    // When unset or transcription.provider_id is empty, transcription is disabled.
+    TranscriptionConfig transcription = 2;
  }

  // AIProviderConfig represents one callable AI provider connection.
@ -249,6 +253,25 @@ message InstanceSetting {
    OPENAI = 1;
    GEMINI = 2;
  }
+
+  // TranscriptionConfig configures the speech-to-text feature.
+  message TranscriptionConfig {
+    // provider_id references an entry in AISetting.providers[].id.
+    // Empty string means transcription is disabled.
+    string provider_id = 1;
+
+    // model is the provider-specific model identifier.
+    // Empty string falls back to the engine default
+    // (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
+    string model = 2;
+
+    // language is the default ISO 639-1 language hint sent to the provider.
+    // Empty string lets the provider auto-detect.
+    string language = 3;
+
+    // prompt is a default spelling/vocabulary hint passed to the provider.
+    string prompt = 4;
+  }
 }

 // Request message for GetInstanceSetting method.
--- a/proto/gen/api/v1/ai_service.pb.go
+++ b/proto/gen/api/v1/ai_service.pb.go
@ -24,12 +24,8 @@ const (

 type TranscribeRequest struct {
 	state protoimpl.MessageState `protogen:"open.v1"`
-	// Required. The instance AI provider ID to use.
-	ProviderId string `protobuf:"bytes,1,opt,name=provider_id,json=providerId,proto3" json:"provider_id,omitempty"`
-	// Required. Transcription options.
-	Config *TranscriptionConfig `protobuf:"bytes,2,opt,name=config,proto3" json:"config,omitempty"`
 	// Required. Audio input.
-	Audio         *TranscriptionAudio `protobuf:"bytes,3,opt,name=audio,proto3" json:"audio,omitempty"`
+	Audio         *TranscriptionAudio `protobuf:"bytes,1,opt,name=audio,proto3" json:"audio,omitempty"`
 	unknownFields protoimpl.UnknownFields
 	sizeCache     protoimpl.SizeCache
 }
@ -64,20 +60,6 @@ func (*TranscribeRequest) Descriptor() ([]byte, []int) {
 	return file_api_v1_ai_service_proto_rawDescGZIP(), []int{0}
 }

-func (x *TranscribeRequest) GetProviderId() string {
-	if x != nil {
-		return x.ProviderId
-	}
-	return ""
-}
-
-func (x *TranscribeRequest) GetConfig() *TranscriptionConfig {
-	if x != nil {
-		return x.Config
-	}
-	return nil
-}
-
 func (x *TranscribeRequest) GetAudio() *TranscriptionAudio {
 	if x != nil {
 		return x.Audio
@ -85,60 +67,6 @@ func (x *TranscribeRequest) GetAudio() *TranscriptionAudio {
 	return nil
 }

-type TranscriptionConfig struct {
-	state protoimpl.MessageState `protogen:"open.v1"`
-	// Optional. A prompt to improve transcription quality.
-	Prompt string `protobuf:"bytes,1,opt,name=prompt,proto3" json:"prompt,omitempty"`
-	// Optional. The language of the input audio.
-	Language      string `protobuf:"bytes,2,opt,name=language,proto3" json:"language,omitempty"`
-	unknownFields protoimpl.UnknownFields
-	sizeCache     protoimpl.SizeCache
-}
-
-func (x *TranscriptionConfig) Reset() {
-	*x = TranscriptionConfig{}
-	mi := &file_api_v1_ai_service_proto_msgTypes[1]
-	ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
-	ms.StoreMessageInfo(mi)
-}
-
-func (x *TranscriptionConfig) String() string {
-	return protoimpl.X.MessageStringOf(x)
-}
-
-func (*TranscriptionConfig) ProtoMessage() {}
-
-func (x *TranscriptionConfig) ProtoReflect() protoreflect.Message {
-	mi := &file_api_v1_ai_service_proto_msgTypes[1]
-	if x != nil {
-		ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
-		if ms.LoadMessageInfo() == nil {
-			ms.StoreMessageInfo(mi)
-		}
-		return ms
-	}
-	return mi.MessageOf(x)
-}
-
-// Deprecated: Use TranscriptionConfig.ProtoReflect.Descriptor instead.
-func (*TranscriptionConfig) Descriptor() ([]byte, []int) {
-	return file_api_v1_ai_service_proto_rawDescGZIP(), []int{1}
-}
-
-func (x *TranscriptionConfig) GetPrompt() string {
-	if x != nil {
-		return x.Prompt
-	}
-	return ""
-}
-
-func (x *TranscriptionConfig) GetLanguage() string {
-	if x != nil {
-		return x.Language
-	}
-	return ""
-}
-
 type TranscriptionAudio struct {
 	state protoimpl.MessageState `protogen:"open.v1"`
 	// Types that are valid to be assigned to Source:
@ -156,7 +84,7 @@ type TranscriptionAudio struct {

 func (x *TranscriptionAudio) Reset() {
 	*x = TranscriptionAudio{}
-	mi := &file_api_v1_ai_service_proto_msgTypes[2]
+	mi := &file_api_v1_ai_service_proto_msgTypes[1]
 	ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 	ms.StoreMessageInfo(mi)
 }
@ -168,7 +96,7 @@ func (x *TranscriptionAudio) String() string {
 func (*TranscriptionAudio) ProtoMessage() {}

 func (x *TranscriptionAudio) ProtoReflect() protoreflect.Message {
-	mi := &file_api_v1_ai_service_proto_msgTypes[2]
+	mi := &file_api_v1_ai_service_proto_msgTypes[1]
 	if x != nil {
 		ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 		if ms.LoadMessageInfo() == nil {
@ -181,7 +109,7 @@ func (x *TranscriptionAudio) ProtoReflect() protoreflect.Message {

 // Deprecated: Use TranscriptionAudio.ProtoReflect.Descriptor instead.
 func (*TranscriptionAudio) Descriptor() ([]byte, []int) {
-	return file_api_v1_ai_service_proto_rawDescGZIP(), []int{2}
+	return file_api_v1_ai_service_proto_rawDescGZIP(), []int{1}
 }

 func (x *TranscriptionAudio) GetSource() isTranscriptionAudio_Source {
@ -251,7 +179,7 @@ type TranscribeResponse struct {

 func (x *TranscribeResponse) Reset() {
 	*x = TranscribeResponse{}
-	mi := &file_api_v1_ai_service_proto_msgTypes[3]
+	mi := &file_api_v1_ai_service_proto_msgTypes[2]
 	ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 	ms.StoreMessageInfo(mi)
 }
@ -263,7 +191,7 @@ func (x *TranscribeResponse) String() string {
 func (*TranscribeResponse) ProtoMessage() {}

 func (x *TranscribeResponse) ProtoReflect() protoreflect.Message {
-	mi := &file_api_v1_ai_service_proto_msgTypes[3]
+	mi := &file_api_v1_ai_service_proto_msgTypes[2]
 	if x != nil {
 		ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 		if ms.LoadMessageInfo() == nil {
@ -276,7 +204,7 @@ func (x *TranscribeResponse) ProtoReflect() protoreflect.Message {

 // Deprecated: Use TranscribeResponse.ProtoReflect.Descriptor instead.
 func (*TranscribeResponse) Descriptor() ([]byte, []int) {
-	return file_api_v1_ai_service_proto_rawDescGZIP(), []int{3}
+	return file_api_v1_ai_service_proto_rawDescGZIP(), []int{2}
 }

 func (x *TranscribeResponse) GetText() string {
@ -290,15 +218,9 @@ var File_api_v1_ai_service_proto protoreflect.FileDescriptor

 const file_api_v1_ai_service_proto_rawDesc = "" +
 	"\n" +
-	"\x17api/v1/ai_service.proto\x12\fmemos.api.v1\x1a\x1cgoogle/api/annotations.proto\x1a\x17google/api/client.proto\x1a\x1fgoogle/api/field_behavior.proto\"\xb6\x01\n" +
-	"\x11TranscribeRequest\x12$\n" +
-	"\vprovider_id\x18\x01 \x01(\tB\x03\xe0A\x02R\n" +
-	"providerId\x12>\n" +
-	"\x06config\x18\x02 \x01(\v2!.memos.api.v1.TranscriptionConfigB\x03\xe0A\x02R\x06config\x12;\n" +
-	"\x05audio\x18\x03 \x01(\v2 .memos.api.v1.TranscriptionAudioB\x03\xe0A\x02R\x05audio\"S\n" +
-	"\x13TranscriptionConfig\x12\x1b\n" +
-	"\x06prompt\x18\x01 \x01(\tB\x03\xe0A\x01R\x06prompt\x12\x1f\n" +
-	"\blanguage\x18\x02 \x01(\tB\x03\xe0A\x01R\blanguage\"\x9c\x01\n" +
+	"\x17api/v1/ai_service.proto\x12\fmemos.api.v1\x1a\x1cgoogle/api/annotations.proto\x1a\x17google/api/client.proto\x1a\x1fgoogle/api/field_behavior.proto\"P\n" +
+	"\x11TranscribeRequest\x12;\n" +
+	"\x05audio\x18\x01 \x01(\v2 .memos.api.v1.TranscriptionAudioB\x03\xe0A\x02R\x05audio\"\x9c\x01\n" +
 	"\x12TranscriptionAudio\x12\x1f\n" +
 	"\acontent\x18\x01 \x01(\fB\x03\xe0A\x04H\x00R\acontent\x12\x12\n" +
 	"\x03uri\x18\x02 \x01(\tH\x00R\x03uri\x12\x1f\n" +
@ -306,10 +228,10 @@ const file_api_v1_ai_service_proto_rawDesc = "" +
 	"\fcontent_type\x18\x04 \x01(\tB\x03\xe0A\x01R\vcontentTypeB\b\n" +
 	"\x06source\"(\n" +
 	"\x12TranscribeResponse\x12\x12\n" +
-	"\x04text\x18\x01 \x01(\tR\x04text2\x9a\x01\n" +
-	"\tAIService\x12\x8c\x01\n" +
+	"\x04text\x18\x01 \x01(\tR\x04text2\x86\x01\n" +
+	"\tAIService\x12y\n" +
 	"\n" +
-	"Transcribe\x12\x1f.memos.api.v1.TranscribeRequest\x1a .memos.api.v1.TranscribeResponse\";\xdaA\x18provider_id,config,audio\x82\xd3\xe4\x93\x02\x1a:\x01*\"\x15/api/v1/ai:transcribeB\xa6\x01\n" +
+	"Transcribe\x12\x1f.memos.api.v1.TranscribeRequest\x1a .memos.api.v1.TranscribeResponse\"(\xdaA\x05audio\x82\xd3\xe4\x93\x02\x1a:\x01*\"\x15/api/v1/ai:transcribeB\xa6\x01\n" +
 	"\x10com.memos.api.v1B\x0eAiServiceProtoP\x01Z0github.com/usememos/memos/proto/gen/api/v1;apiv1\xa2\x02\x03MAX\xaa\x02\fMemos.Api.V1\xca\x02\fMemos\\Api\\V1\xe2\x02\x18Memos\\Api\\V1\\GPBMetadata\xea\x02\x0eMemos::Api::V1b\x06proto3"

 var (
@ -324,23 +246,21 @@ func file_api_v1_ai_service_proto_rawDescGZIP() []byte {
 	return file_api_v1_ai_service_proto_rawDescData
 }

-var file_api_v1_ai_service_proto_msgTypes = make([]protoimpl.MessageInfo, 4)
+var file_api_v1_ai_service_proto_msgTypes = make([]protoimpl.MessageInfo, 3)
 var file_api_v1_ai_service_proto_goTypes = []any{
-	(*TranscribeRequest)(nil),   // 0: memos.api.v1.TranscribeRequest
-	(*TranscriptionConfig)(nil), // 1: memos.api.v1.TranscriptionConfig
-	(*TranscriptionAudio)(nil),  // 2: memos.api.v1.TranscriptionAudio
-	(*TranscribeResponse)(nil),  // 3: memos.api.v1.TranscribeResponse
+	(*TranscribeRequest)(nil),  // 0: memos.api.v1.TranscribeRequest
+	(*TranscriptionAudio)(nil), // 1: memos.api.v1.TranscriptionAudio
+	(*TranscribeResponse)(nil), // 2: memos.api.v1.TranscribeResponse
 }
 var file_api_v1_ai_service_proto_depIdxs = []int32{
-	1, // 0: memos.api.v1.TranscribeRequest.config:type_name -> memos.api.v1.TranscriptionConfig
-	2, // 1: memos.api.v1.TranscribeRequest.audio:type_name -> memos.api.v1.TranscriptionAudio
-	0, // 2: memos.api.v1.AIService.Transcribe:input_type -> memos.api.v1.TranscribeRequest
-	3, // 3: memos.api.v1.AIService.Transcribe:output_type -> memos.api.v1.TranscribeResponse
-	3, // [3:4] is the sub-list for method output_type
-	2, // [2:3] is the sub-list for method input_type
-	2, // [2:2] is the sub-list for extension type_name
-	2, // [2:2] is the sub-list for extension extendee
-	0, // [0:2] is the sub-list for field type_name
+	1, // 0: memos.api.v1.TranscribeRequest.audio:type_name -> memos.api.v1.TranscriptionAudio
+	0, // 1: memos.api.v1.AIService.Transcribe:input_type -> memos.api.v1.TranscribeRequest
+	2, // 2: memos.api.v1.AIService.Transcribe:output_type -> memos.api.v1.TranscribeResponse
+	2, // [2:3] is the sub-list for method output_type
+	1, // [1:2] is the sub-list for method input_type
+	1, // [1:1] is the sub-list for extension type_name
+	1, // [1:1] is the sub-list for extension extendee
+	0, // [0:1] is the sub-list for field type_name
 }

 func init() { file_api_v1_ai_service_proto_init() }
@ -348,7 +268,7 @@ func file_api_v1_ai_service_proto_init() {
 	if File_api_v1_ai_service_proto != nil {
 		return
 	}
-	file_api_v1_ai_service_proto_msgTypes[2].OneofWrappers = []any{
+	file_api_v1_ai_service_proto_msgTypes[1].OneofWrappers = []any{
 		(*TranscriptionAudio_Content)(nil),
 		(*TranscriptionAudio_Uri)(nil),
 	}
@ -358,7 +278,7 @@ func file_api_v1_ai_service_proto_init() {
 			GoPackagePath: reflect.TypeOf(x{}).PkgPath(),
 			RawDescriptor: unsafe.Slice(unsafe.StringData(file_api_v1_ai_service_proto_rawDesc), len(file_api_v1_ai_service_proto_rawDesc)),
 			NumEnums:      0,
-			NumMessages:   4,
+			NumMessages:   3,
 			NumExtensions: 0,
 			NumServices:   1,
 		},
--- a/proto/gen/api/v1/instance_service.pb.go
+++ b/proto/gen/api/v1/instance_service.pb.go
@ -1137,7 +1137,10 @@ func (x *InstanceSetting_NotificationSetting) GetEmail() *InstanceSetting_Notifi
 type InstanceSetting_AISetting struct {
 	state protoimpl.MessageState `protogen:"open.v1"`
 	// providers is the list of AI provider configurations available instance-wide.
-	Providers     []*InstanceSetting_AIProviderConfig `protobuf:"bytes,1,rep,name=providers,proto3" json:"providers,omitempty"`
+	Providers []*InstanceSetting_AIProviderConfig `protobuf:"bytes,1,rep,name=providers,proto3" json:"providers,omitempty"`
+	// transcription is the speech-to-text feature configuration.
+	// When unset or transcription.provider_id is empty, transcription is disabled.
+	Transcription *InstanceSetting_TranscriptionConfig `protobuf:"bytes,2,opt,name=transcription,proto3" json:"transcription,omitempty"`
 	unknownFields protoimpl.UnknownFields
 	sizeCache     protoimpl.SizeCache
 }
@ -1179,6 +1182,13 @@ func (x *InstanceSetting_AISetting) GetProviders() []*InstanceSetting_AIProvider
 	return nil
 }

+func (x *InstanceSetting_AISetting) GetTranscription() *InstanceSetting_TranscriptionConfig {
+	if x != nil {
+		return x.Transcription
+	}
+	return nil
+}
+
 // AIProviderConfig represents one callable AI provider connection.
 type InstanceSetting_AIProviderConfig struct {
 	state    protoimpl.MessageState         `protogen:"open.v1"`
@ -1275,6 +1285,83 @@ func (x *InstanceSetting_AIProviderConfig) GetApiKeyHint() string {
 	return ""
 }

+// TranscriptionConfig configures the speech-to-text feature.
+type InstanceSetting_TranscriptionConfig struct {
+	state protoimpl.MessageState `protogen:"open.v1"`
+	// provider_id references an entry in AISetting.providers[].id.
+	// Empty string means transcription is disabled.
+	ProviderId string `protobuf:"bytes,1,opt,name=provider_id,json=providerId,proto3" json:"provider_id,omitempty"`
+	// model is the provider-specific model identifier.
+	// Empty string falls back to the engine default
+	// (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
+	Model string `protobuf:"bytes,2,opt,name=model,proto3" json:"model,omitempty"`
+	// language is the default ISO 639-1 language hint sent to the provider.
+	// Empty string lets the provider auto-detect.
+	Language string `protobuf:"bytes,3,opt,name=language,proto3" json:"language,omitempty"`
+	// prompt is a default spelling/vocabulary hint passed to the provider.
+	Prompt        string `protobuf:"bytes,4,opt,name=prompt,proto3" json:"prompt,omitempty"`
+	unknownFields protoimpl.UnknownFields
+	sizeCache     protoimpl.SizeCache
+}
+
+func (x *InstanceSetting_TranscriptionConfig) Reset() {
+	*x = InstanceSetting_TranscriptionConfig{}
+	mi := &file_api_v1_instance_service_proto_msgTypes[16]
+	ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
+	ms.StoreMessageInfo(mi)
+}
+
+func (x *InstanceSetting_TranscriptionConfig) String() string {
+	return protoimpl.X.MessageStringOf(x)
+}
+
+func (*InstanceSetting_TranscriptionConfig) ProtoMessage() {}
+
+func (x *InstanceSetting_TranscriptionConfig) ProtoReflect() protoreflect.Message {
+	mi := &file_api_v1_instance_service_proto_msgTypes[16]
+	if x != nil {
+		ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
+		if ms.LoadMessageInfo() == nil {
+			ms.StoreMessageInfo(mi)
+		}
+		return ms
+	}
+	return mi.MessageOf(x)
+}
+
+// Deprecated: Use InstanceSetting_TranscriptionConfig.ProtoReflect.Descriptor instead.
+func (*InstanceSetting_TranscriptionConfig) Descriptor() ([]byte, []int) {
+	return file_api_v1_instance_service_proto_rawDescGZIP(), []int{2, 8}
+}
+
+func (x *InstanceSetting_TranscriptionConfig) GetProviderId() string {
+	if x != nil {
+		return x.ProviderId
+	}
+	return ""
+}
+
+func (x *InstanceSetting_TranscriptionConfig) GetModel() string {
+	if x != nil {
+		return x.Model
+	}
+	return ""
+}
+
+func (x *InstanceSetting_TranscriptionConfig) GetLanguage() string {
+	if x != nil {
+		return x.Language
+	}
+	return ""
+}
+
+func (x *InstanceSetting_TranscriptionConfig) GetPrompt() string {
+	if x != nil {
+		return x.Prompt
+	}
+	return ""
+}
+
 // Custom profile configuration for instance branding.
 type InstanceSetting_GeneralSetting_CustomProfile struct {
 	state         protoimpl.MessageState `protogen:"open.v1"`
@ -1287,7 +1374,7 @@ type InstanceSetting_GeneralSetting_CustomProfile struct {

 func (x *InstanceSetting_GeneralSetting_CustomProfile) Reset() {
 	*x = InstanceSetting_GeneralSetting_CustomProfile{}
-	mi := &file_api_v1_instance_service_proto_msgTypes[16]
+	mi := &file_api_v1_instance_service_proto_msgTypes[17]
 	ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 	ms.StoreMessageInfo(mi)
 }
@ -1299,7 +1386,7 @@ func (x *InstanceSetting_GeneralSetting_CustomProfile) String() string {
 func (*InstanceSetting_GeneralSetting_CustomProfile) ProtoMessage() {}

 func (x *InstanceSetting_GeneralSetting_CustomProfile) ProtoReflect() protoreflect.Message {
-	mi := &file_api_v1_instance_service_proto_msgTypes[16]
+	mi := &file_api_v1_instance_service_proto_msgTypes[17]
 	if x != nil {
 		ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 		if ms.LoadMessageInfo() == nil {
@ -1352,7 +1439,7 @@ type InstanceSetting_StorageSetting_S3Config struct {

 func (x *InstanceSetting_StorageSetting_S3Config) Reset() {
 	*x = InstanceSetting_StorageSetting_S3Config{}
-	mi := &file_api_v1_instance_service_proto_msgTypes[17]
+	mi := &file_api_v1_instance_service_proto_msgTypes[18]
 	ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 	ms.StoreMessageInfo(mi)
 }
@ -1364,7 +1451,7 @@ func (x *InstanceSetting_StorageSetting_S3Config) String() string {
 func (*InstanceSetting_StorageSetting_S3Config) ProtoMessage() {}

 func (x *InstanceSetting_StorageSetting_S3Config) ProtoReflect() protoreflect.Message {
-	mi := &file_api_v1_instance_service_proto_msgTypes[17]
+	mi := &file_api_v1_instance_service_proto_msgTypes[18]
 	if x != nil {
 		ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 		if ms.LoadMessageInfo() == nil {
@ -1441,7 +1528,7 @@ type InstanceSetting_NotificationSetting_EmailSetting struct {

 func (x *InstanceSetting_NotificationSetting_EmailSetting) Reset() {
 	*x = InstanceSetting_NotificationSetting_EmailSetting{}
-	mi := &file_api_v1_instance_service_proto_msgTypes[19]
+	mi := &file_api_v1_instance_service_proto_msgTypes[20]
 	ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 	ms.StoreMessageInfo(mi)
 }
@ -1453,7 +1540,7 @@ func (x *InstanceSetting_NotificationSetting_EmailSetting) String() string {
 func (*InstanceSetting_NotificationSetting_EmailSetting) ProtoMessage() {}

 func (x *InstanceSetting_NotificationSetting_EmailSetting) ProtoReflect() protoreflect.Message {
-	mi := &file_api_v1_instance_service_proto_msgTypes[19]
+	mi := &file_api_v1_instance_service_proto_msgTypes[20]
 	if x != nil {
 		ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 		if ms.LoadMessageInfo() == nil {
@ -1552,7 +1639,7 @@ type InstanceStats_DatabaseStats struct {

 func (x *InstanceStats_DatabaseStats) Reset() {
 	*x = InstanceStats_DatabaseStats{}
-	mi := &file_api_v1_instance_service_proto_msgTypes[20]
+	mi := &file_api_v1_instance_service_proto_msgTypes[21]
 	ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 	ms.StoreMessageInfo(mi)
 }
@ -1564,7 +1651,7 @@ func (x *InstanceStats_DatabaseStats) String() string {
 func (*InstanceStats_DatabaseStats) ProtoMessage() {}

 func (x *InstanceStats_DatabaseStats) ProtoReflect() protoreflect.Message {
-	mi := &file_api_v1_instance_service_proto_msgTypes[20]
+	mi := &file_api_v1_instance_service_proto_msgTypes[21]
 	if x != nil {
 		ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 		if ms.LoadMessageInfo() == nil {
@ -1605,7 +1692,7 @@ const file_api_v1_instance_service_proto_rawDesc = "" +
 	"\finstance_url\x18\x06 \x01(\tR\vinstanceUrl\x12(\n" +
 	"\x05admin\x18\a \x01(\v2\x12.memos.api.v1.UserR\x05admin\x12\x16\n" +
 	"\x06commit\x18\b \x01(\tR\x06commit\"\x1b\n" +
-	"\x19GetInstanceProfileRequest\"\xf0\x19\n" +
+	"\x19GetInstanceProfileRequest\"\xcd\x1b\n" +
 	"\x0fInstanceSetting\x12\x17\n" +
 	"\x04name\x18\x01 \x01(\tB\x03\xe0A\bR\x04name\x12W\n" +
 	"\x0fgeneral_setting\x18\x02 \x01(\v2,.memos.api.v1.InstanceSetting.GeneralSettingH\x00R\x0egeneralSetting\x12W\n" +
@ -1671,9 +1758,10 @@ const file_api_v1_instance_service_proto_rawDesc = "" +
 	"\breply_to\x18\b \x01(\tR\areplyTo\x12\x17\n" +
 	"\ause_tls\x18\t \x01(\bR\x06useTls\x12\x17\n" +
 	"\ause_ssl\x18\n" +
-	" \x01(\bR\x06useSsl\x1aY\n" +
+	" \x01(\bR\x06useSsl\x1a\xb2\x01\n" +
 	"\tAISetting\x12L\n" +
-	"\tproviders\x18\x01 \x03(\v2..memos.api.v1.InstanceSetting.AIProviderConfigR\tproviders\x1a\x80\x02\n" +
+	"\tproviders\x18\x01 \x03(\v2..memos.api.v1.InstanceSetting.AIProviderConfigR\tproviders\x12W\n" +
+	"\rtranscription\x18\x02 \x01(\v21.memos.api.v1.InstanceSetting.TranscriptionConfigR\rtranscription\x1a\x80\x02\n" +
 	"\x10AIProviderConfig\x12\x0e\n" +
 	"\x02id\x18\x01 \x01(\tR\x02id\x12\x14\n" +
 	"\x05title\x18\x02 \x01(\tR\x05title\x12@\n" +
@ -1682,7 +1770,13 @@ const file_api_v1_instance_service_proto_rawDesc = "" +
 	"\aapi_key\x18\x05 \x01(\tB\x03\xe0A\x04R\x06apiKey\x12#\n" +
 	"\vapi_key_set\x18\b \x01(\bB\x03\xe0A\x03R\tapiKeySet\x12%\n" +
 	"\fapi_key_hint\x18\t \x01(\tB\x03\xe0A\x03R\n" +
-	"apiKeyHint\"j\n" +
+	"apiKeyHint\x1a\x80\x01\n" +
+	"\x13TranscriptionConfig\x12\x1f\n" +
+	"\vprovider_id\x18\x01 \x01(\tR\n" +
+	"providerId\x12\x14\n" +
+	"\x05model\x18\x02 \x01(\tR\x05model\x12\x1a\n" +
+	"\blanguage\x18\x03 \x01(\tR\blanguage\x12\x16\n" +
+	"\x06prompt\x18\x04 \x01(\tR\x06prompt\"j\n" +
 	"\x03Key\x12\x13\n" +
 	"\x0fKEY_UNSPECIFIED\x10\x00\x12\v\n" +
 	"\aGENERAL\x10\x01\x12\v\n" +
@ -1739,7 +1833,7 @@ func file_api_v1_instance_service_proto_rawDescGZIP() []byte {
 }

 var file_api_v1_instance_service_proto_enumTypes = make([]protoimpl.EnumInfo, 3)
-var file_api_v1_instance_service_proto_msgTypes = make([]protoimpl.MessageInfo, 21)
+var file_api_v1_instance_service_proto_msgTypes = make([]protoimpl.MessageInfo, 22)
 var file_api_v1_instance_service_proto_goTypes = []any{
 	(InstanceSetting_Key)(0),                             // 0: memos.api.v1.InstanceSetting.Key
 	(InstanceSetting_AIProviderType)(0),                  // 1: memos.api.v1.InstanceSetting.AIProviderType
@ -1760,19 +1854,20 @@ var file_api_v1_instance_service_proto_goTypes = []any{
 	(*InstanceSetting_NotificationSetting)(nil),          // 16: memos.api.v1.InstanceSetting.NotificationSetting
 	(*InstanceSetting_AISetting)(nil),                    // 17: memos.api.v1.InstanceSetting.AISetting
 	(*InstanceSetting_AIProviderConfig)(nil),             // 18: memos.api.v1.InstanceSetting.AIProviderConfig
-	(*InstanceSetting_GeneralSetting_CustomProfile)(nil), // 19: memos.api.v1.InstanceSetting.GeneralSetting.CustomProfile
-	(*InstanceSetting_StorageSetting_S3Config)(nil),      // 20: memos.api.v1.InstanceSetting.StorageSetting.S3Config
-	nil, // 21: memos.api.v1.InstanceSetting.TagsSetting.TagsEntry
-	(*InstanceSetting_NotificationSetting_EmailSetting)(nil), // 22: memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
-	(*InstanceStats_DatabaseStats)(nil),                      // 23: memos.api.v1.InstanceStats.DatabaseStats
-	(*User)(nil),                                             // 24: memos.api.v1.User
-	(*fieldmaskpb.FieldMask)(nil),                            // 25: google.protobuf.FieldMask
-	(*timestamppb.Timestamp)(nil),                            // 26: google.protobuf.Timestamp
-	(*color.Color)(nil),                                      // 27: google.type.Color
-	(*emptypb.Empty)(nil),                                    // 28: google.protobuf.Empty
+	(*InstanceSetting_TranscriptionConfig)(nil),          // 19: memos.api.v1.InstanceSetting.TranscriptionConfig
+	(*InstanceSetting_GeneralSetting_CustomProfile)(nil), // 20: memos.api.v1.InstanceSetting.GeneralSetting.CustomProfile
+	(*InstanceSetting_StorageSetting_S3Config)(nil),      // 21: memos.api.v1.InstanceSetting.StorageSetting.S3Config
+	nil, // 22: memos.api.v1.InstanceSetting.TagsSetting.TagsEntry
+	(*InstanceSetting_NotificationSetting_EmailSetting)(nil), // 23: memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
+	(*InstanceStats_DatabaseStats)(nil),                      // 24: memos.api.v1.InstanceStats.DatabaseStats
+	(*User)(nil),                                             // 25: memos.api.v1.User
+	(*fieldmaskpb.FieldMask)(nil),                            // 26: google.protobuf.FieldMask
+	(*timestamppb.Timestamp)(nil),                            // 27: google.protobuf.Timestamp
+	(*color.Color)(nil),                                      // 28: google.type.Color
+	(*emptypb.Empty)(nil),                                    // 29: google.protobuf.Empty
 }
 var file_api_v1_instance_service_proto_depIdxs = []int32{
-	24, // 0: memos.api.v1.InstanceProfile.admin:type_name -> memos.api.v1.User
+	25, // 0: memos.api.v1.InstanceProfile.admin:type_name -> memos.api.v1.User
 	11, // 1: memos.api.v1.InstanceSetting.general_setting:type_name -> memos.api.v1.InstanceSetting.GeneralSetting
 	12, // 2: memos.api.v1.InstanceSetting.storage_setting:type_name -> memos.api.v1.InstanceSetting.StorageSetting
 	13, // 3: memos.api.v1.InstanceSetting.memo_related_setting:type_name -> memos.api.v1.InstanceSetting.MemoRelatedSetting
@ -1780,34 +1875,35 @@ var file_api_v1_instance_service_proto_depIdxs = []int32{
 	16, // 5: memos.api.v1.InstanceSetting.notification_setting:type_name -> memos.api.v1.InstanceSetting.NotificationSetting
 	17, // 6: memos.api.v1.InstanceSetting.ai_setting:type_name -> memos.api.v1.InstanceSetting.AISetting
 	5,  // 7: memos.api.v1.UpdateInstanceSettingRequest.setting:type_name -> memos.api.v1.InstanceSetting
-	25, // 8: memos.api.v1.UpdateInstanceSettingRequest.update_mask:type_name -> google.protobuf.FieldMask
-	22, // 9: memos.api.v1.TestInstanceEmailSettingRequest.email:type_name -> memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
-	23, // 10: memos.api.v1.InstanceStats.database:type_name -> memos.api.v1.InstanceStats.DatabaseStats
-	26, // 11: memos.api.v1.InstanceStats.generated_time:type_name -> google.protobuf.Timestamp
-	19, // 12: memos.api.v1.InstanceSetting.GeneralSetting.custom_profile:type_name -> memos.api.v1.InstanceSetting.GeneralSetting.CustomProfile
+	26, // 8: memos.api.v1.UpdateInstanceSettingRequest.update_mask:type_name -> google.protobuf.FieldMask
+	23, // 9: memos.api.v1.TestInstanceEmailSettingRequest.email:type_name -> memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
+	24, // 10: memos.api.v1.InstanceStats.database:type_name -> memos.api.v1.InstanceStats.DatabaseStats
+	27, // 11: memos.api.v1.InstanceStats.generated_time:type_name -> google.protobuf.Timestamp
+	20, // 12: memos.api.v1.InstanceSetting.GeneralSetting.custom_profile:type_name -> memos.api.v1.InstanceSetting.GeneralSetting.CustomProfile
 	2,  // 13: memos.api.v1.InstanceSetting.StorageSetting.storage_type:type_name -> memos.api.v1.InstanceSetting.StorageSetting.StorageType
-	20, // 14: memos.api.v1.InstanceSetting.StorageSetting.s3_config:type_name -> memos.api.v1.InstanceSetting.StorageSetting.S3Config
-	27, // 15: memos.api.v1.InstanceSetting.TagMetadata.background_color:type_name -> google.type.Color
-	21, // 16: memos.api.v1.InstanceSetting.TagsSetting.tags:type_name -> memos.api.v1.InstanceSetting.TagsSetting.TagsEntry
-	22, // 17: memos.api.v1.InstanceSetting.NotificationSetting.email:type_name -> memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
+	21, // 14: memos.api.v1.InstanceSetting.StorageSetting.s3_config:type_name -> memos.api.v1.InstanceSetting.StorageSetting.S3Config
+	28, // 15: memos.api.v1.InstanceSetting.TagMetadata.background_color:type_name -> google.type.Color
+	22, // 16: memos.api.v1.InstanceSetting.TagsSetting.tags:type_name -> memos.api.v1.InstanceSetting.TagsSetting.TagsEntry
+	23, // 17: memos.api.v1.InstanceSetting.NotificationSetting.email:type_name -> memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
 	18, // 18: memos.api.v1.InstanceSetting.AISetting.providers:type_name -> memos.api.v1.InstanceSetting.AIProviderConfig
-	1,  // 19: memos.api.v1.InstanceSetting.AIProviderConfig.type:type_name -> memos.api.v1.InstanceSetting.AIProviderType
-	14, // 20: memos.api.v1.InstanceSetting.TagsSetting.TagsEntry.value:type_name -> memos.api.v1.InstanceSetting.TagMetadata
-	4,  // 21: memos.api.v1.InstanceService.GetInstanceProfile:input_type -> memos.api.v1.GetInstanceProfileRequest
-	6,  // 22: memos.api.v1.InstanceService.GetInstanceSetting:input_type -> memos.api.v1.GetInstanceSettingRequest
-	7,  // 23: memos.api.v1.InstanceService.UpdateInstanceSetting:input_type -> memos.api.v1.UpdateInstanceSettingRequest
-	8,  // 24: memos.api.v1.InstanceService.TestInstanceEmailSetting:input_type -> memos.api.v1.TestInstanceEmailSettingRequest
-	9,  // 25: memos.api.v1.InstanceService.GetInstanceStats:input_type -> memos.api.v1.GetInstanceStatsRequest
-	3,  // 26: memos.api.v1.InstanceService.GetInstanceProfile:output_type -> memos.api.v1.InstanceProfile
-	5,  // 27: memos.api.v1.InstanceService.GetInstanceSetting:output_type -> memos.api.v1.InstanceSetting
-	5,  // 28: memos.api.v1.InstanceService.UpdateInstanceSetting:output_type -> memos.api.v1.InstanceSetting
-	28, // 29: memos.api.v1.InstanceService.TestInstanceEmailSetting:output_type -> google.protobuf.Empty
-	10, // 30: memos.api.v1.InstanceService.GetInstanceStats:output_type -> memos.api.v1.InstanceStats
-	26, // [26:31] is the sub-list for method output_type
-	21, // [21:26] is the sub-list for method input_type
-	21, // [21:21] is the sub-list for extension type_name
-	21, // [21:21] is the sub-list for extension extendee
-	0,  // [0:21] is the sub-list for field type_name
+	19, // 19: memos.api.v1.InstanceSetting.AISetting.transcription:type_name -> memos.api.v1.InstanceSetting.TranscriptionConfig
+	1,  // 20: memos.api.v1.InstanceSetting.AIProviderConfig.type:type_name -> memos.api.v1.InstanceSetting.AIProviderType
+	14, // 21: memos.api.v1.InstanceSetting.TagsSetting.TagsEntry.value:type_name -> memos.api.v1.InstanceSetting.TagMetadata
+	4,  // 22: memos.api.v1.InstanceService.GetInstanceProfile:input_type -> memos.api.v1.GetInstanceProfileRequest
+	6,  // 23: memos.api.v1.InstanceService.GetInstanceSetting:input_type -> memos.api.v1.GetInstanceSettingRequest
+	7,  // 24: memos.api.v1.InstanceService.UpdateInstanceSetting:input_type -> memos.api.v1.UpdateInstanceSettingRequest
+	8,  // 25: memos.api.v1.InstanceService.TestInstanceEmailSetting:input_type -> memos.api.v1.TestInstanceEmailSettingRequest
+	9,  // 26: memos.api.v1.InstanceService.GetInstanceStats:input_type -> memos.api.v1.GetInstanceStatsRequest
+	3,  // 27: memos.api.v1.InstanceService.GetInstanceProfile:output_type -> memos.api.v1.InstanceProfile
+	5,  // 28: memos.api.v1.InstanceService.GetInstanceSetting:output_type -> memos.api.v1.InstanceSetting
+	5,  // 29: memos.api.v1.InstanceService.UpdateInstanceSetting:output_type -> memos.api.v1.InstanceSetting
+	29, // 30: memos.api.v1.InstanceService.TestInstanceEmailSetting:output_type -> google.protobuf.Empty
+	10, // 31: memos.api.v1.InstanceService.GetInstanceStats:output_type -> memos.api.v1.InstanceStats
+	27, // [27:32] is the sub-list for method output_type
+	22, // [22:27] is the sub-list for method input_type
+	22, // [22:22] is the sub-list for extension type_name
+	22, // [22:22] is the sub-list for extension extendee
+	0,  // [0:22] is the sub-list for field type_name
 }

 func init() { file_api_v1_instance_service_proto_init() }
@ -1830,7 +1926,7 @@ func file_api_v1_instance_service_proto_init() {
 			GoPackagePath: reflect.TypeOf(x{}).PkgPath(),
 			RawDescriptor: unsafe.Slice(unsafe.StringData(file_api_v1_instance_service_proto_rawDesc), len(file_api_v1_instance_service_proto_rawDesc)),
 			NumEnums:      3,
-			NumMessages:   21,
+			NumMessages:   22,
 			NumExtensions: 0,
 			NumServices:   1,
 		},
--- a/proto/gen/openapi.yaml
+++ b/proto/gen/openapi.yaml
@ -2701,6 +2701,12 @@ components:
                    items:
                        $ref: '#/components/schemas/InstanceSetting_AIProviderConfig'
                    description: providers is the list of AI provider configurations available instance-wide.
+                transcription:
+                    allOf:
+                        - $ref: '#/components/schemas/InstanceSetting_TranscriptionConfig'
+                    description: |-
+                        transcription is the speech-to-text feature configuration.
+                         When unset or transcription.provider_id is empty, transcription is disabled.
            description: AI provider configuration settings.
        InstanceSetting_GeneralSetting:
            type: object
@ -2808,6 +2814,29 @@ components:
                         so a single entry like "project/.*" matches all tags under that prefix.
                         Exact tag names are also valid (they are trivially valid regex patterns).
            description: Tag metadata configuration.
+        InstanceSetting_TranscriptionConfig:
+            type: object
+            properties:
+                providerId:
+                    type: string
+                    description: |-
+                        provider_id references an entry in AISetting.providers[].id.
+                         Empty string means transcription is disabled.
+                model:
+                    type: string
+                    description: |-
+                        model is the provider-specific model identifier.
+                         Empty string falls back to the engine default
+                         (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
+                language:
+                    type: string
+                    description: |-
+                        language is the default ISO 639-1 language hint sent to the provider.
+                         Empty string lets the provider auto-detect.
+                prompt:
+                    type: string
+                    description: prompt is a default spelling/vocabulary hint passed to the provider.
+            description: TranscriptionConfig configures the speech-to-text feature.
        InstanceStats:
            type: object
            properties:
@ -3538,18 +3567,9 @@ components:
            description: Request message for TestInstanceEmailSetting method.
        TranscribeRequest:
            required:
-                - providerId
-                - config
                - audio
            type: object
            properties:
-                providerId:
-                    type: string
-                    description: Required. The instance AI provider ID to use.
-                config:
-                    allOf:
-                        - $ref: '#/components/schemas/TranscriptionConfig'
-                    description: Required. Transcription options.
                audio:
                    allOf:
                        - $ref: '#/components/schemas/TranscriptionAudio'
@ -3577,15 +3597,6 @@ components:
                contentType:
                    type: string
                    description: Optional. The MIME type of the input audio.
-        TranscriptionConfig:
-            type: object
-            properties:
-                prompt:
-                    type: string
-                    description: Optional. A prompt to improve transcription quality.
-                language:
-                    type: string
-                    description: Optional. The language of the input audio.
        UpsertMemoReactionRequest:
            required:
                - name
--- a/proto/gen/store/instance_setting.pb.go
+++ b/proto/gen/store/instance_setting.pb.go
@ -962,7 +962,10 @@ func (x *InstanceNotificationSetting) GetEmail() *InstanceNotificationSetting_Em
 type InstanceAISetting struct {
 	state protoimpl.MessageState `protogen:"open.v1"`
 	// providers is the list of AI provider configurations available instance-wide.
-	Providers     []*AIProviderConfig `protobuf:"bytes,1,rep,name=providers,proto3" json:"providers,omitempty"`
+	Providers []*AIProviderConfig `protobuf:"bytes,1,rep,name=providers,proto3" json:"providers,omitempty"`
+	// transcription is the speech-to-text feature configuration.
+	// When unset or transcription.provider_id is empty, transcription is disabled.
+	Transcription *TranscriptionConfig `protobuf:"bytes,2,opt,name=transcription,proto3" json:"transcription,omitempty"`
 	unknownFields protoimpl.UnknownFields
 	sizeCache     protoimpl.SizeCache
 }
@ -1004,6 +1007,13 @@ func (x *InstanceAISetting) GetProviders() []*AIProviderConfig {
 	return nil
 }

+func (x *InstanceAISetting) GetTranscription() *TranscriptionConfig {
+	if x != nil {
+		return x.Transcription
+	}
+	return nil
+}
+
 type AIProviderConfig struct {
 	state    protoimpl.MessageState `protogen:"open.v1"`
 	Id       string                 `protobuf:"bytes,1,opt,name=id,proto3" json:"id,omitempty"`
@ -1081,6 +1091,85 @@ func (x *AIProviderConfig) GetApiKey() string {
 	return ""
 }

+// TranscriptionConfig configures the speech-to-text feature.
+type TranscriptionConfig struct {
+	state protoimpl.MessageState `protogen:"open.v1"`
+	// provider_id references an entry in InstanceAISetting.providers[].id.
+	// Empty string means transcription is disabled.
+	ProviderId string `protobuf:"bytes,1,opt,name=provider_id,json=providerId,proto3" json:"provider_id,omitempty"`
+	// model is the provider-specific model identifier.
+	// Empty string falls back to the engine default
+	// (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
+	Model string `protobuf:"bytes,2,opt,name=model,proto3" json:"model,omitempty"`
+	// language is the default ISO 639-1 language hint sent to the provider.
+	// Empty string lets the provider auto-detect.
+	Language string `protobuf:"bytes,3,opt,name=language,proto3" json:"language,omitempty"`
+	// prompt is a default spelling/vocabulary hint passed to the provider.
+	// Used as the OpenAI Whisper "prompt" parameter and folded into the Gemini
+	// generation prompt as a "Context and spelling hints" block.
+	Prompt        string `protobuf:"bytes,4,opt,name=prompt,proto3" json:"prompt,omitempty"`
+	unknownFields protoimpl.UnknownFields
+	sizeCache     protoimpl.SizeCache
+}
+
+func (x *TranscriptionConfig) Reset() {
+	*x = TranscriptionConfig{}
+	mi := &file_store_instance_setting_proto_msgTypes[12]
+	ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
+	ms.StoreMessageInfo(mi)
+}
+
+func (x *TranscriptionConfig) String() string {
+	return protoimpl.X.MessageStringOf(x)
+}
+
+func (*TranscriptionConfig) ProtoMessage() {}
+
+func (x *TranscriptionConfig) ProtoReflect() protoreflect.Message {
+	mi := &file_store_instance_setting_proto_msgTypes[12]
+	if x != nil {
+		ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
+		if ms.LoadMessageInfo() == nil {
+			ms.StoreMessageInfo(mi)
+		}
+		return ms
+	}
+	return mi.MessageOf(x)
+}
+
+// Deprecated: Use TranscriptionConfig.ProtoReflect.Descriptor instead.
+func (*TranscriptionConfig) Descriptor() ([]byte, []int) {
+	return file_store_instance_setting_proto_rawDescGZIP(), []int{12}
+}
+
+func (x *TranscriptionConfig) GetProviderId() string {
+	if x != nil {
+		return x.ProviderId
+	}
+	return ""
+}
+
+func (x *TranscriptionConfig) GetModel() string {
+	if x != nil {
+		return x.Model
+	}
+	return ""
+}
+
+func (x *TranscriptionConfig) GetLanguage() string {
+	if x != nil {
+		return x.Language
+	}
+	return ""
+}
+
+func (x *TranscriptionConfig) GetPrompt() string {
+	if x != nil {
+		return x.Prompt
+	}
+	return ""
+}
+
 type InstanceNotificationSetting_EmailSetting struct {
 	state         protoimpl.MessageState `protogen:"open.v1"`
 	Enabled       bool                   `protobuf:"varint,1,opt,name=enabled,proto3" json:"enabled,omitempty"`
@ -1099,7 +1188,7 @@ type InstanceNotificationSetting_EmailSetting struct {

 func (x *InstanceNotificationSetting_EmailSetting) Reset() {
 	*x = InstanceNotificationSetting_EmailSetting{}
-	mi := &file_store_instance_setting_proto_msgTypes[13]
+	mi := &file_store_instance_setting_proto_msgTypes[14]
 	ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 	ms.StoreMessageInfo(mi)
 }
@ -1111,7 +1200,7 @@ func (x *InstanceNotificationSetting_EmailSetting) String() string {
 func (*InstanceNotificationSetting_EmailSetting) ProtoMessage() {}

 func (x *InstanceNotificationSetting_EmailSetting) ProtoReflect() protoreflect.Message {
-	mi := &file_store_instance_setting_proto_msgTypes[13]
+	mi := &file_store_instance_setting_proto_msgTypes[14]
 	if x != nil {
 		ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
 		if ms.LoadMessageInfo() == nil {
@ -1273,15 +1362,22 @@ const file_store_instance_setting_proto_rawDesc = "" +
 	"\breply_to\x18\b \x01(\tR\areplyTo\x12\x17\n" +
 	"\ause_tls\x18\t \x01(\bR\x06useTls\x12\x17\n" +
 	"\ause_ssl\x18\n" +
-	" \x01(\bR\x06useSsl\"P\n" +
+	" \x01(\bR\x06useSsl\"\x98\x01\n" +
 	"\x11InstanceAISetting\x12;\n" +
-	"\tproviders\x18\x01 \x03(\v2\x1d.memos.store.AIProviderConfigR\tproviders\"\x9e\x01\n" +
+	"\tproviders\x18\x01 \x03(\v2\x1d.memos.store.AIProviderConfigR\tproviders\x12F\n" +
+	"\rtranscription\x18\x02 \x01(\v2 .memos.store.TranscriptionConfigR\rtranscription\"\x9e\x01\n" +
 	"\x10AIProviderConfig\x12\x0e\n" +
 	"\x02id\x18\x01 \x01(\tR\x02id\x12\x14\n" +
 	"\x05title\x18\x02 \x01(\tR\x05title\x12/\n" +
 	"\x04type\x18\x03 \x01(\x0e2\x1b.memos.store.AIProviderTypeR\x04type\x12\x1a\n" +
 	"\bendpoint\x18\x04 \x01(\tR\bendpoint\x12\x17\n" +
-	"\aapi_key\x18\x05 \x01(\tR\x06apiKey*\x95\x01\n" +
+	"\aapi_key\x18\x05 \x01(\tR\x06apiKey\"\x80\x01\n" +
+	"\x13TranscriptionConfig\x12\x1f\n" +
+	"\vprovider_id\x18\x01 \x01(\tR\n" +
+	"providerId\x12\x14\n" +
+	"\x05model\x18\x02 \x01(\tR\x05model\x12\x1a\n" +
+	"\blanguage\x18\x03 \x01(\tR\blanguage\x12\x16\n" +
+	"\x06prompt\x18\x04 \x01(\tR\x06prompt*\x95\x01\n" +
 	"\x12InstanceSettingKey\x12$\n" +
 	" INSTANCE_SETTING_KEY_UNSPECIFIED\x10\x00\x12\t\n" +
 	"\x05BASIC\x10\x01\x12\v\n" +
@ -1312,7 +1408,7 @@ func file_store_instance_setting_proto_rawDescGZIP() []byte {
 }

 var file_store_instance_setting_proto_enumTypes = make([]protoimpl.EnumInfo, 3)
-var file_store_instance_setting_proto_msgTypes = make([]protoimpl.MessageInfo, 14)
+var file_store_instance_setting_proto_msgTypes = make([]protoimpl.MessageInfo, 15)
 var file_store_instance_setting_proto_goTypes = []any{
 	(InstanceSettingKey)(0),                          // 0: memos.store.InstanceSettingKey
 	(AIProviderType)(0),                              // 1: memos.store.AIProviderType
@ -1329,9 +1425,10 @@ var file_store_instance_setting_proto_goTypes = []any{
 	(*InstanceNotificationSetting)(nil),              // 12: memos.store.InstanceNotificationSetting
 	(*InstanceAISetting)(nil),                        // 13: memos.store.InstanceAISetting
 	(*AIProviderConfig)(nil),                         // 14: memos.store.AIProviderConfig
-	nil,                                              // 15: memos.store.InstanceTagsSetting.TagsEntry
-	(*InstanceNotificationSetting_EmailSetting)(nil), // 16: memos.store.InstanceNotificationSetting.EmailSetting
-	(*color.Color)(nil),                              // 17: google.type.Color
+	(*TranscriptionConfig)(nil),                      // 15: memos.store.TranscriptionConfig
+	nil,                                              // 16: memos.store.InstanceTagsSetting.TagsEntry
+	(*InstanceNotificationSetting_EmailSetting)(nil), // 17: memos.store.InstanceNotificationSetting.EmailSetting
+	(*color.Color)(nil),                              // 18: google.type.Color
 }
 var file_store_instance_setting_proto_depIdxs = []int32{
 	0,  // 0: memos.store.InstanceSetting.key:type_name -> memos.store.InstanceSettingKey
@ -1345,17 +1442,18 @@ var file_store_instance_setting_proto_depIdxs = []int32{
 	6,  // 8: memos.store.InstanceGeneralSetting.custom_profile:type_name -> memos.store.InstanceCustomProfile
 	2,  // 9: memos.store.InstanceStorageSetting.storage_type:type_name -> memos.store.InstanceStorageSetting.StorageType
 	8,  // 10: memos.store.InstanceStorageSetting.s3_config:type_name -> memos.store.StorageS3Config
-	17, // 11: memos.store.InstanceTagMetadata.background_color:type_name -> google.type.Color
-	15, // 12: memos.store.InstanceTagsSetting.tags:type_name -> memos.store.InstanceTagsSetting.TagsEntry
-	16, // 13: memos.store.InstanceNotificationSetting.email:type_name -> memos.store.InstanceNotificationSetting.EmailSetting
+	18, // 11: memos.store.InstanceTagMetadata.background_color:type_name -> google.type.Color
+	16, // 12: memos.store.InstanceTagsSetting.tags:type_name -> memos.store.InstanceTagsSetting.TagsEntry
+	17, // 13: memos.store.InstanceNotificationSetting.email:type_name -> memos.store.InstanceNotificationSetting.EmailSetting
 	14, // 14: memos.store.InstanceAISetting.providers:type_name -> memos.store.AIProviderConfig
-	1,  // 15: memos.store.AIProviderConfig.type:type_name -> memos.store.AIProviderType
-	10, // 16: memos.store.InstanceTagsSetting.TagsEntry.value:type_name -> memos.store.InstanceTagMetadata
-	17, // [17:17] is the sub-list for method output_type
-	17, // [17:17] is the sub-list for method input_type
-	17, // [17:17] is the sub-list for extension type_name
-	17, // [17:17] is the sub-list for extension extendee
-	0,  // [0:17] is the sub-list for field type_name
+	15, // 15: memos.store.InstanceAISetting.transcription:type_name -> memos.store.TranscriptionConfig
+	1,  // 16: memos.store.AIProviderConfig.type:type_name -> memos.store.AIProviderType
+	10, // 17: memos.store.InstanceTagsSetting.TagsEntry.value:type_name -> memos.store.InstanceTagMetadata
+	18, // [18:18] is the sub-list for method output_type
+	18, // [18:18] is the sub-list for method input_type
+	18, // [18:18] is the sub-list for extension type_name
+	18, // [18:18] is the sub-list for extension extendee
+	0,  // [0:18] is the sub-list for field type_name
 }

 func init() { file_store_instance_setting_proto_init() }
@ -1378,7 +1476,7 @@ func file_store_instance_setting_proto_init() {
 			GoPackagePath: reflect.TypeOf(x{}).PkgPath(),
 			RawDescriptor: unsafe.Slice(unsafe.StringData(file_store_instance_setting_proto_rawDesc), len(file_store_instance_setting_proto_rawDesc)),
 			NumEnums:      3,
-			NumMessages:   14,
+			NumMessages:   15,
 			NumExtensions: 0,
 			NumServices:   0,
 		},
--- a/proto/store/instance_setting.proto
+++ b/proto/store/instance_setting.proto
@ -149,6 +149,10 @@ message InstanceNotificationSetting {
 message InstanceAISetting {
  // providers is the list of AI provider configurations available instance-wide.
  repeated AIProviderConfig providers = 1;
+
+  // transcription is the speech-to-text feature configuration.
+  // When unset or transcription.provider_id is empty, transcription is disabled.
+  TranscriptionConfig transcription = 2;
 }

 message AIProviderConfig {
@ -165,3 +169,24 @@ enum AIProviderType {
  OPENAI = 1;
  GEMINI = 2;
 }
+
+// TranscriptionConfig configures the speech-to-text feature.
+message TranscriptionConfig {
+  // provider_id references an entry in InstanceAISetting.providers[].id.
+  // Empty string means transcription is disabled.
+  string provider_id = 1;
+
+  // model is the provider-specific model identifier.
+  // Empty string falls back to the engine default
+  // (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
+  string model = 2;
+
+  // language is the default ISO 639-1 language hint sent to the provider.
+  // Empty string lets the provider auto-detect.
+  string language = 3;
+
+  // prompt is a default spelling/vocabulary hint passed to the provider.
+  // Used as the OpenAI Whisper "prompt" parameter and folded into the Gemini
+  // generation prompt as a "Context and spelling hints" block.
+  string prompt = 4;
+}
--- a/server/router/api/v1/ai_service.go
+++ b/server/router/api/v1/ai_service.go
@ -17,8 +17,6 @@ import (

 const (
 	maxTranscriptionAudioSizeBytes = 25 * MebiByte
-	maxTranscriptionPromptLength   = 4096
-	maxTranscriptionLanguageLength = 32
 	maxTranscriptionFilenameLength = 255
 )

@ -51,20 +49,6 @@ func (s *APIV1Service) Transcribe(ctx context.Context, request *v1pb.TranscribeR
 		return nil, status.Errorf(codes.Unauthenticated, "user not authenticated")
 	}

-	if strings.TrimSpace(request.ProviderId) == "" {
-		return nil, status.Errorf(codes.InvalidArgument, "provider_id is required")
-	}
-	if request.Config == nil {
-		return nil, status.Errorf(codes.InvalidArgument, "config is required")
-	}
-	prompt := strings.TrimSpace(request.Config.GetPrompt())
-	if len(prompt) > maxTranscriptionPromptLength {
-		return nil, status.Errorf(codes.InvalidArgument, "prompt is too long; maximum length is %d characters", maxTranscriptionPromptLength)
-	}
-	language := strings.TrimSpace(request.Config.GetLanguage())
-	if len(language) > maxTranscriptionLanguageLength {
-		return nil, status.Errorf(codes.InvalidArgument, "language is too long; maximum length is %d characters", maxTranscriptionLanguageLength)
-	}
 	if request.Audio == nil {
 		return nil, status.Errorf(codes.InvalidArgument, "audio is required")
 	}
@ -90,10 +74,31 @@ func (s *APIV1Service) Transcribe(ctx context.Context, request *v1pb.TranscribeR
 		return nil, status.Errorf(codes.InvalidArgument, "audio content type %q is not supported", contentType)
 	}

-	provider, model, err := s.resolveAIProviderForTranscription(ctx, request.ProviderId)
+	aiSetting, err := s.Store.GetInstanceAISetting(ctx)
+	if err != nil {
+		return nil, status.Errorf(codes.Internal, "failed to get AI setting: %v", err)
+	}
+	persisted := aiSetting.GetTranscription()
+
+	providerID := persisted.GetProviderId()
+	if providerID == "" {
+		return nil, status.Errorf(codes.FailedPrecondition, "transcription is not configured")
+	}
+
+	provider, err := s.resolveAIProvider(aiSetting, providerID)
 	if err != nil {
 		return nil, err
 	}
+
+	model := persisted.GetModel()
+	if model == "" {
+		defaultModel, err := ai.DefaultTranscriptionModel(provider.Type)
+		if err != nil {
+			return nil, status.Errorf(codes.InvalidArgument, "%v", err)
+		}
+		model = defaultModel
+	}
+
 	transcriber, err := ai.NewTranscriber(provider)
 	if err != nil {
 		return nil, status.Errorf(codes.InvalidArgument, "failed to create AI transcriber: %v", err)
@ -105,8 +110,8 @@ func (s *APIV1Service) Transcribe(ctx context.Context, request *v1pb.TranscribeR
 		ContentType: contentType,
 		Audio:       bytes.NewReader(content),
 		Size:        int64(len(content)),
-		Prompt:      prompt,
-		Language:    language,
+		Prompt:      persisted.GetPrompt(),
+		Language:    persisted.GetLanguage(),
 	})
 	if err != nil {
 		return nil, status.Errorf(codes.Internal, "failed to transcribe audio: %v", err)
@ -116,12 +121,7 @@ func (s *APIV1Service) Transcribe(ctx context.Context, request *v1pb.TranscribeR
 	}, nil
 }

-func (s *APIV1Service) resolveAIProviderForTranscription(ctx context.Context, providerID string) (ai.ProviderConfig, string, error) {
-	setting, err := s.Store.GetInstanceAISetting(ctx)
-	if err != nil {
-		return ai.ProviderConfig{}, "", status.Errorf(codes.Internal, "failed to get AI setting: %v", err)
-	}
-
+func (*APIV1Service) resolveAIProvider(setting *storepb.InstanceAISetting, providerID string) (ai.ProviderConfig, error) {
 	providers := make([]ai.ProviderConfig, 0, len(setting.GetProviders()))
 	for _, provider := range setting.GetProviders() {
 		if provider == nil {
@ -132,13 +132,9 @@ func (s *APIV1Service) resolveAIProviderForTranscription(ctx context.Context, pr

 	provider, err := ai.FindProvider(providers, providerID)
 	if err != nil {
-		return ai.ProviderConfig{}, "", status.Errorf(codes.NotFound, "AI provider not found")
-	}
-	selectedModel, err := ai.DefaultTranscriptionModel(provider.Type)
-	if err != nil {
-		return ai.ProviderConfig{}, "", status.Errorf(codes.InvalidArgument, "%v", err)
+		return ai.ProviderConfig{}, status.Errorf(codes.FailedPrecondition, "transcription provider is not configured")
 	}
-	return *provider, selectedModel, nil
+	return *provider, nil
 }

 func convertAIProviderConfigFromStore(provider *storepb.AIProviderConfig) ai.ProviderConfig {
--- a/server/router/api/v1/instance_service.go
+++ b/server/router/api/v1/instance_service.go
@ -20,6 +20,12 @@ import (
 	"github.com/usememos/memos/store"
 )

+const (
+	maxTranscriptionConfigModelLength    = 256
+	maxTranscriptionConfigLanguageLength = 32
+	maxTranscriptionConfigPromptLength   = 4096
+)
+
 // GetInstanceProfile returns the instance profile.
 func (s *APIV1Service) GetInstanceProfile(ctx context.Context, _ *v1pb.GetInstanceProfileRequest) (*v1pb.InstanceProfile, error) {
 	admin, err := s.GetInstanceAdmin(ctx)
@ -91,6 +97,7 @@ func (s *APIV1Service) GetInstanceSetting(ctx context.Context, request *v1pb.Get
 			return nil, status.Errorf(codes.PermissionDenied, "permission denied")
 		}
 	}
+	isAdminCaller := false
 	if instanceSetting.Key == storepb.InstanceSettingKey_AI {
 		user, err := s.fetchCurrentUser(ctx)
 		if err != nil {
@ -99,9 +106,22 @@ func (s *APIV1Service) GetInstanceSetting(ctx context.Context, request *v1pb.Get
 		if user == nil {
 			return nil, status.Errorf(codes.Unauthenticated, "user not authenticated")
 		}
+		isAdminCaller = user.Role == store.RoleAdmin
 	}

-	return convertInstanceSettingFromStore(instanceSetting), nil
+	result := convertInstanceSettingFromStore(instanceSetting)
+	if instanceSetting.Key == storepb.InstanceSettingKey_AI && !isAdminCaller {
+		// Non-admin callers only need transcription.provider_id to gate the
+		// editor's Transcribe button. Model / language / prompt are
+		// admin-entered defaults that may contain proprietary glossary terms,
+		// so they are redacted from non-admin responses.
+		if ai := result.GetAiSetting(); ai != nil && ai.Transcription != nil {
+			ai.Transcription.Model = ""
+			ai.Transcription.Language = ""
+			ai.Transcription.Prompt = ""
+		}
+	}
+	return result, nil
 }

 func (s *APIV1Service) UpdateInstanceSetting(ctx context.Context, request *v1pb.UpdateInstanceSettingRequest) (*v1pb.InstanceSetting, error) {
@ -508,7 +528,8 @@ func convertInstanceAISettingFromStore(setting *storepb.InstanceAISetting) *v1pb
 	}

 	aiSetting := &v1pb.InstanceSetting_AISetting{
-		Providers: make([]*v1pb.InstanceSetting_AIProviderConfig, 0, len(setting.Providers)),
+		Providers:     make([]*v1pb.InstanceSetting_AIProviderConfig, 0, len(setting.Providers)),
+		Transcription: convertTranscriptionConfigFromStore(setting.GetTranscription()),
 	}
 	for _, provider := range setting.Providers {
 		if provider == nil {
@ -533,7 +554,8 @@ func convertInstanceAISettingToStore(setting *v1pb.InstanceSetting_AISetting) *s
 	}

 	aiSetting := &storepb.InstanceAISetting{
-		Providers: make([]*storepb.AIProviderConfig, 0, len(setting.Providers)),
+		Providers:     make([]*storepb.AIProviderConfig, 0, len(setting.Providers)),
+		Transcription: convertTranscriptionConfigToStore(setting.GetTranscription()),
 	}
 	for _, provider := range setting.Providers {
 		if provider == nil {
@ -550,6 +572,30 @@ func convertInstanceAISettingToStore(setting *v1pb.InstanceSetting_AISetting) *s
 	return aiSetting
 }

+func convertTranscriptionConfigFromStore(setting *storepb.TranscriptionConfig) *v1pb.InstanceSetting_TranscriptionConfig {
+	if setting == nil {
+		return nil
+	}
+	return &v1pb.InstanceSetting_TranscriptionConfig{
+		ProviderId: setting.GetProviderId(),
+		Model:      setting.GetModel(),
+		Language:   setting.GetLanguage(),
+		Prompt:     setting.GetPrompt(),
+	}
+}
+
+func convertTranscriptionConfigToStore(setting *v1pb.InstanceSetting_TranscriptionConfig) *storepb.TranscriptionConfig {
+	if setting == nil {
+		return nil
+	}
+	return &storepb.TranscriptionConfig{
+		ProviderId: setting.GetProviderId(),
+		Model:      setting.GetModel(),
+		Language:   setting.GetLanguage(),
+		Prompt:     setting.GetPrompt(),
+	}
+}
+
 func validateInstanceSetting(setting *v1pb.InstanceSetting) error {
 	key, err := ExtractInstanceSettingKeyFromName(setting.Name)
 	if err != nil {
@ -619,6 +665,53 @@ func (s *APIV1Service) prepareInstanceAISettingForUpdate(ctx context.Context, se
 			return errors.Errorf("provider %q API key is required", provider.Id)
 		}
 	}
+
+	if err := preparePersistedTranscriptionConfig(setting, existing); err != nil {
+		return err
+	}
+	return nil
+}
+
+func preparePersistedTranscriptionConfig(setting *storepb.InstanceAISetting, existing *storepb.InstanceAISetting) error {
+	// Preserve the previously stored transcription config when the request omits it,
+	// matching the same "absence == keep" semantics used for API keys. The preserved
+	// config still falls through to validation below, so a stale provider_id is
+	// rejected if the same update removed or renamed its referenced provider.
+	if setting.Transcription == nil && existing != nil {
+		setting.Transcription = existing.GetTranscription()
+	}
+	if setting.Transcription == nil {
+		return nil
+	}
+
+	cfg := setting.Transcription
+	cfg.ProviderId = strings.TrimSpace(cfg.ProviderId)
+	cfg.Model = strings.TrimSpace(cfg.Model)
+	cfg.Language = strings.TrimSpace(cfg.Language)
+	cfg.Prompt = strings.TrimSpace(cfg.Prompt)
+
+	if cfg.ProviderId != "" {
+		referenced := false
+		for _, provider := range setting.Providers {
+			if provider != nil && provider.Id == cfg.ProviderId {
+				referenced = true
+				break
+			}
+		}
+		if !referenced {
+			return errors.Errorf("transcription provider_id %q does not reference any configured provider", cfg.ProviderId)
+		}
+	}
+
+	if len(cfg.Model) > maxTranscriptionConfigModelLength {
+		return errors.Errorf("transcription model is too long; maximum length is %d characters", maxTranscriptionConfigModelLength)
+	}
+	if len(cfg.Language) > maxTranscriptionConfigLanguageLength {
+		return errors.Errorf("transcription language is too long; maximum length is %d characters", maxTranscriptionConfigLanguageLength)
+	}
+	if len(cfg.Prompt) > maxTranscriptionConfigPromptLength {
+		return errors.Errorf("transcription prompt is too long; maximum length is %d characters", maxTranscriptionConfigPromptLength)
+	}
 	return nil
 }

--- a/server/router/api/v1/test/ai_service_test.go
+++ b/server/router/api/v1/test/ai_service_test.go
@ -21,8 +21,6 @@ func TestTranscribe(t *testing.T) {
 		defer ts.Cleanup()

 		_, err := ts.Service.Transcribe(ctx, &v1pb.TranscribeRequest{
-			ProviderId: "openai-main",
-			Config:     &v1pb.TranscriptionConfig{},
 			Audio: &v1pb.TranscriptionAudio{
 				Source:      &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
 				Filename:    "voice.wav",
@ -33,7 +31,7 @@ func TestTranscribe(t *testing.T) {
 		require.Contains(t, err.Error(), "user not authenticated")
 	})

-	t.Run("transcribes audio file with configured provider", func(t *testing.T) {
+	t.Run("transcribes audio file using persisted transcription setting", func(t *testing.T) {
 		ts := NewTestService(t)
 		defer ts.Cleanup()

@ -45,7 +43,8 @@ func TestTranscribe(t *testing.T) {
 			require.Equal(t, "/audio/transcriptions", r.URL.Path)
 			require.Equal(t, "Bearer sk-test", r.Header.Get("Authorization"))
 			require.NoError(t, r.ParseMultipartForm(10<<20))
-			require.Equal(t, "gpt-4o-transcribe", r.FormValue("model"))
+			require.Equal(t, "whisper-1", r.FormValue("model"))
+			require.Equal(t, "fr", r.FormValue("language"))
 			require.Equal(t, "names: Alice", r.FormValue("prompt"))

 			file, header, err := r.FormFile("file")
@ -73,16 +72,18 @@ func TestTranscribe(t *testing.T) {
 							ApiKey:   "sk-test",
 						},
 					},
+					Transcription: &storepb.TranscriptionConfig{
+						ProviderId: "openai-main",
+						Model:      "whisper-1",
+						Language:   "fr",
+						Prompt:     "names: Alice",
+					},
 				},
 			},
 		})
 		require.NoError(t, err)

 		resp, err := ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
-			ProviderId: "openai-main",
-			Config: &v1pb.TranscriptionConfig{
-				Prompt: "names: Alice",
-			},
 			Audio: &v1pb.TranscriptionAudio{
 				Source:      &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
 				Filename:    "voice.wav",
@ -117,14 +118,15 @@ func TestTranscribe(t *testing.T) {
 							ApiKey:   "sk-test",
 						},
 					},
+					Transcription: &storepb.TranscriptionConfig{
+						ProviderId: "openai-main",
+					},
 				},
 			},
 		})
 		require.NoError(t, err)

 		_, err = ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
-			ProviderId: "openai-main",
-			Config:     &v1pb.TranscriptionConfig{},
 			Audio: &v1pb.TranscriptionAudio{
 				Source:      &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
 				Filename:    "voice.wav",
@ -172,14 +174,15 @@ func TestTranscribe(t *testing.T) {
 							ApiKey:   "gemini-key",
 						},
 					},
+					Transcription: &storepb.TranscriptionConfig{
+						ProviderId: "gemini-main",
+					},
 				},
 			},
 		})
 		require.NoError(t, err)

 		resp, err := ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
-			ProviderId: "gemini-main",
-			Config:     &v1pb.TranscriptionConfig{},
 			Audio: &v1pb.TranscriptionAudio{
 				Source:      &v1pb.TranscriptionAudio_Content{Content: []byte("mp3 bytes")},
 				Filename:    "voice.mp3",
@ -190,7 +193,7 @@ func TestTranscribe(t *testing.T) {
 		require.Equal(t, "gemini transcript", resp.Text)
 	})

-	t.Run("uses built-in transcription model", func(t *testing.T) {
+	t.Run("falls back to engine default model when transcription model is empty", func(t *testing.T) {
 		ts := NewTestService(t)
 		defer ts.Cleanup()

@ -200,7 +203,7 @@ func TestTranscribe(t *testing.T) {

 		openAIServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
 			require.NoError(t, r.ParseMultipartForm(10<<20))
-			require.Equal(t, "gpt-4o-transcribe", r.FormValue("model"))
+			require.Equal(t, "whisper-1", r.FormValue("model"))
 			w.Header().Set("Content-Type", "application/json")
 			require.NoError(t, json.NewEncoder(w).Encode(map[string]string{
 				"text": "built-in model",
@ -221,14 +224,15 @@ func TestTranscribe(t *testing.T) {
 							ApiKey:   "sk-test",
 						},
 					},
+					Transcription: &storepb.TranscriptionConfig{
+						ProviderId: "openai-main",
+					},
 				},
 			},
 		})
 		require.NoError(t, err)

 		resp, err := ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
-			ProviderId: "openai-main",
-			Config:     &v1pb.TranscriptionConfig{},
 			Audio: &v1pb.TranscriptionAudio{
 				Source:      &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
 				Filename:    "voice.wav",
@ -247,27 +251,7 @@ func TestTranscribe(t *testing.T) {
 		require.NoError(t, err)
 		userCtx := ts.CreateUserContext(ctx, user.ID)

-		_, err = ts.Store.UpsertInstanceSetting(ctx, &storepb.InstanceSetting{
-			Key: storepb.InstanceSettingKey_AI,
-			Value: &storepb.InstanceSetting_AiSetting{
-				AiSetting: &storepb.InstanceAISetting{
-					Providers: []*storepb.AIProviderConfig{
-						{
-							Id:       "openai-main",
-							Title:    "OpenAI",
-							Type:     storepb.AIProviderType_OPENAI,
-							Endpoint: "https://example.com/v1",
-							ApiKey:   "sk-test",
-						},
-					},
-				},
-			},
-		})
-		require.NoError(t, err)
-
 		_, err = ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
-			ProviderId: "openai-main",
-			Config:     &v1pb.TranscriptionConfig{},
 			Audio: &v1pb.TranscriptionAudio{
 				Source:      &v1pb.TranscriptionAudio_Content{Content: []byte("not audio")},
 				Filename:    "notes.txt",
@ -277,4 +261,23 @@ func TestTranscribe(t *testing.T) {
 		require.Error(t, err)
 		require.Contains(t, err.Error(), "not supported")
 	})
+
+	t.Run("returns FailedPrecondition when transcription is not configured", func(t *testing.T) {
+		ts := NewTestService(t)
+		defer ts.Cleanup()
+
+		user, err := ts.CreateRegularUser(ctx, "alice-empty")
+		require.NoError(t, err)
+		userCtx := ts.CreateUserContext(ctx, user.ID)
+
+		_, err = ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
+			Audio: &v1pb.TranscriptionAudio{
+				Source:      &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
+				Filename:    "voice.wav",
+				ContentType: "audio/wav",
+			},
+		})
+		require.Error(t, err)
+		require.Contains(t, err.Error(), "transcription is not configured")
+	})
 }
--- a/server/router/api/v1/test/instance_service_test.go
+++ b/server/router/api/v1/test/instance_service_test.go
@ -2,6 +2,7 @@ package test

 import (
 	"context"
+	"strings"
 	"testing"

 	"github.com/stretchr/testify/require"
@ -731,4 +732,149 @@ func TestUpdateInstanceSetting(t *testing.T) {
 			"existing AI provider API key must be preserved when an empty value is sent")
 		require.Equal(t, "OpenAI primary", stored.GetProviders()[0].GetTitle())
 	})
+
+	t.Run("UpdateInstanceSetting - transcription provider_id must reference an existing provider", func(t *testing.T) {
+		ts := NewTestService(t)
+		defer ts.Cleanup()
+
+		hostUser, err := ts.CreateHostUser(ctx, "admin")
+		require.NoError(t, err)
+		adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
+
+		_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
+			Setting: &v1pb.InstanceSetting{
+				Name: "instance/settings/AI",
+				Value: &v1pb.InstanceSetting_AiSetting{
+					AiSetting: &v1pb.InstanceSetting_AISetting{
+						Providers: []*v1pb.InstanceSetting_AIProviderConfig{
+							{
+								Id:     "openai-main",
+								Title:  "OpenAI",
+								Type:   v1pb.InstanceSetting_OPENAI,
+								ApiKey: "sk-test",
+							},
+						},
+						Transcription: &v1pb.InstanceSetting_TranscriptionConfig{
+							ProviderId: "does-not-exist",
+						},
+					},
+				},
+			},
+		})
+		require.Error(t, err)
+		require.Contains(t, err.Error(), "transcription provider_id")
+	})
+
+	t.Run("UpdateInstanceSetting - transcription strings are length-capped", func(t *testing.T) {
+		ts := NewTestService(t)
+		defer ts.Cleanup()
+
+		hostUser, err := ts.CreateHostUser(ctx, "admin")
+		require.NoError(t, err)
+		adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
+
+		base := &v1pb.InstanceSetting{
+			Name: "instance/settings/AI",
+			Value: &v1pb.InstanceSetting_AiSetting{
+				AiSetting: &v1pb.InstanceSetting_AISetting{
+					Providers: []*v1pb.InstanceSetting_AIProviderConfig{
+						{
+							Id:     "openai-main",
+							Title:  "OpenAI",
+							Type:   v1pb.InstanceSetting_OPENAI,
+							ApiKey: "sk-test",
+						},
+					},
+				},
+			},
+		}
+
+		oversizedModel := strings.Repeat("a", 257)
+		base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
+			ProviderId: "openai-main",
+			Model:      oversizedModel,
+		}
+		_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
+		require.Error(t, err)
+		require.Contains(t, err.Error(), "transcription model")
+
+		oversizedLanguage := strings.Repeat("a", 33)
+		base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
+			ProviderId: "openai-main",
+			Language:   oversizedLanguage,
+		}
+		_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
+		require.Error(t, err)
+		require.Contains(t, err.Error(), "transcription language")
+
+		oversizedPrompt := strings.Repeat("a", 4097)
+		base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
+			ProviderId: "openai-main",
+			Prompt:     oversizedPrompt,
+		}
+		_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
+		require.Error(t, err)
+		require.Contains(t, err.Error(), "transcription prompt")
+	})
+
+	t.Run("UpdateInstanceSetting - transcription is preserved when omitted on update", func(t *testing.T) {
+		ts := NewTestService(t)
+		defer ts.Cleanup()
+
+		hostUser, err := ts.CreateHostUser(ctx, "admin")
+		require.NoError(t, err)
+		adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
+
+		_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
+			Setting: &v1pb.InstanceSetting{
+				Name: "instance/settings/AI",
+				Value: &v1pb.InstanceSetting_AiSetting{
+					AiSetting: &v1pb.InstanceSetting_AISetting{
+						Providers: []*v1pb.InstanceSetting_AIProviderConfig{
+							{
+								Id:     "openai-main",
+								Title:  "OpenAI",
+								Type:   v1pb.InstanceSetting_OPENAI,
+								ApiKey: "sk-test",
+							},
+						},
+						Transcription: &v1pb.InstanceSetting_TranscriptionConfig{
+							ProviderId: "openai-main",
+							Model:      "whisper-1",
+							Language:   "en",
+							Prompt:     "names: Alice",
+						},
+					},
+				},
+			},
+		})
+		require.NoError(t, err)
+
+		_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
+			Setting: &v1pb.InstanceSetting{
+				Name: "instance/settings/AI",
+				Value: &v1pb.InstanceSetting_AiSetting{
+					AiSetting: &v1pb.InstanceSetting_AISetting{
+						Providers: []*v1pb.InstanceSetting_AIProviderConfig{
+							{
+								Id:     "openai-main",
+								Title:  "OpenAI",
+								Type:   v1pb.InstanceSetting_OPENAI,
+								ApiKey: "",
+							},
+						},
+					},
+				},
+			},
+		})
+		require.NoError(t, err)
+
+		stored, err := ts.Store.GetInstanceAISetting(ctx)
+		require.NoError(t, err)
+		require.NotNil(t, stored.GetTranscription())
+		require.Equal(t, "openai-main", stored.GetTranscription().GetProviderId())
+		require.Equal(t, "whisper-1", stored.GetTranscription().GetModel())
+		require.Equal(t, "en", stored.GetTranscription().GetLanguage())
+		require.Equal(t, "names: Alice", stored.GetTranscription().GetPrompt())
+	})
 }
--- a/web/src/components/MemoEditor/index.tsx
+++ b/web/src/components/MemoEditor/index.tsx
@ -8,7 +8,7 @@ import { memoKeys } from "@/hooks/useMemoQueries";
 import { userKeys } from "@/hooks/useUserQueries";
 import { handleError } from "@/lib/error";
 import { cn } from "@/lib/utils";
-import { InstanceSetting_AIProviderType, InstanceSetting_Key } from "@/types/proto/api/v1/instance_service_pb";
+import { InstanceSetting_Key } from "@/types/proto/api/v1/instance_service_pb";
 import { useTranslate } from "@/utils/i18n";
 import { convertVisibilityFromString } from "@/utils/memo";
 import {
@ -28,11 +28,6 @@ import { EditorProvider, useEditorContext } from "./state";
 import type { MemoEditorProps } from "./types";
 import type { LocalFile } from "./types/attachment";

-const TRANSCRIPTION_PROVIDER_TYPES: InstanceSetting_AIProviderType[] = [
-  InstanceSetting_AIProviderType.OPENAI,
-  InstanceSetting_AIProviderType.GEMINI,
-];
-
 const MemoEditor = (props: MemoEditorProps) => (
  <EditorProvider>
    <MemoEditorImpl {...props} />
@ -61,10 +56,12 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({
  const [isTranscribingAudio, setIsTranscribingAudio] = useState(false);

  const memoName = memo?.name;
-  const transcriptionProvider = useMemo(
-    () => aiSetting.providers.find((provider) => provider.apiKeySet && TRANSCRIPTION_PROVIDER_TYPES.includes(provider.type)),
-    [aiSetting.providers],
-  );
+  const canTranscribe = useMemo(() => {
+    const providerId = aiSetting.transcription?.providerId ?? "";
+    if (!providerId) return false;
+    const provider = aiSetting.providers.find((p) => p.id === providerId);
+    return Boolean(provider?.apiKeySet);
+  }, [aiSetting.providers, aiSetting.transcription?.providerId]);

  // Get default visibility from user settings
  const defaultVisibility = userGeneralSetting?.memoVisibility ? convertVisibilityFromString(userGeneralSetting.memoVisibility) : undefined;
@ -129,7 +126,7 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({

  const handleTranscribeRecordedAudio = useCallback(
    async (localFile: LocalFile) => {
-      if (!transcriptionProvider) {
+      if (!canTranscribe) {
        dispatch(actions.addLocalFile(localFile));
        setIsTranscribingAudio(false);
        setIsAudioRecorderOpen(false);
@ -137,7 +134,7 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({
      }

      try {
-        const text = (await transcriptionService.transcribeFile(localFile.file, transcriptionProvider)).trim();
+        const text = (await transcriptionService.transcribeFile(localFile.file)).trim();
        if (!text) {
          dispatch(actions.addLocalFile(localFile));
          toast.error(t("editor.audio-recorder.transcribe-empty"));
@ -155,7 +152,7 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({
        setIsAudioRecorderOpen(false);
      }
    },
-    [actions, dispatch, insertTranscribedText, t, transcriptionProvider],
+    [actions, canTranscribe, dispatch, insertTranscribedText, t],
  );

  const audioRecorderActions = useMemo(
@ -223,7 +220,7 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({
  };

  const handleTranscribeAudioRecording = () => {
-    if (!transcriptionProvider || isTranscribingAudio) {
+    if (!canTranscribe || isTranscribingAudio) {
      return;
    }

@ -340,7 +337,7 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({
              onStop={audioRecorder.stopRecording}
              onCancel={handleCancelAudioRecording}
              onTranscribe={handleTranscribeAudioRecording}
-              canTranscribe={!!transcriptionProvider}
+              canTranscribe={canTranscribe}
              isTranscribing={isTranscribingAudio}
            />
          )}
--- a/web/src/components/MemoEditor/services/transcriptionService.ts
+++ b/web/src/components/MemoEditor/services/transcriptionService.ts
@ -1,15 +1,12 @@
 import { create } from "@bufbuild/protobuf";
 import { aiServiceClient } from "@/connect";
-import { TranscribeRequestSchema, TranscriptionAudioSchema, TranscriptionConfigSchema } from "@/types/proto/api/v1/ai_service_pb";
-import type { InstanceSetting_AIProviderConfig } from "@/types/proto/api/v1/instance_service_pb";
+import { TranscribeRequestSchema, TranscriptionAudioSchema } from "@/types/proto/api/v1/ai_service_pb";

 export const transcriptionService = {
-  async transcribeFile(file: File, provider: InstanceSetting_AIProviderConfig): Promise<string> {
+  async transcribeFile(file: File): Promise<string> {
    const content = new Uint8Array(await file.arrayBuffer());
    const response = await aiServiceClient.transcribe(
      create(TranscribeRequestSchema, {
-        providerId: provider.id,
-        config: create(TranscriptionConfigSchema, {}),
        audio: create(TranscriptionAudioSchema, {
          source: {
            case: "content",
--- a/web/src/components/Settings/AISection.tsx
+++ b/web/src/components/Settings/AISection.tsx
@ -1,7 +1,7 @@
 import { create } from "@bufbuild/protobuf";
 import { isEqual } from "lodash-es";
 import { MoreVerticalIcon, PlusIcon } from "lucide-react";
-import { useEffect, useMemo, useState } from "react";
+import { useEffect, useMemo, useRef, useState } from "react";
 import { toast } from "react-hot-toast";
 import ConfirmDialog from "@/components/ConfirmDialog";
 import { Button } from "@/components/ui/button";
@ -10,6 +10,7 @@ import { DropdownMenu, DropdownMenuContent, DropdownMenuItem, DropdownMenuTrigge
 import { Input } from "@/components/ui/input";
 import { Label } from "@/components/ui/label";
 import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from "@/components/ui/select";
+import { Textarea } from "@/components/ui/textarea";
 import { useInstance } from "@/contexts/InstanceContext";
 import {
  InstanceSetting_AIProviderConfig,
@ -17,6 +18,8 @@ import {
  InstanceSetting_AIProviderType,
  InstanceSetting_AISettingSchema,
  InstanceSetting_Key,
+  InstanceSetting_TranscriptionConfig,
+  InstanceSetting_TranscriptionConfigSchema,
  InstanceSettingSchema,
 } from "@/types/proto/api/v1/instance_service_pb";
 import { useTranslate } from "@/utils/i18n";
@ -36,6 +39,13 @@ type LocalAIProvider = {
  apiKeyHint: string;
 };

+type LocalTranscription = {
+  providerId: string;
+  model: string;
+  language: string;
+  prompt: string;
+};
+
 const providerTypeOptions = [InstanceSetting_AIProviderType.OPENAI, InstanceSetting_AIProviderType.GEMINI];

 const byokNotes = ["setting.ai.byok-key-note", "setting.ai.byok-storage-note", "setting.ai.byok-model-note"] as const;
@ -61,6 +71,13 @@ const toLocalProvider = (provider: InstanceSetting_AIProviderConfig): LocalAIPro
  apiKeyHint: provider.apiKeyHint,
 });

+const toLocalTranscription = (config: InstanceSetting_TranscriptionConfig | undefined): LocalTranscription => ({
+  providerId: config?.providerId ?? "",
+  model: config?.model ?? "",
+  language: config?.language ?? "",
+  prompt: config?.prompt ?? "",
+});
+
 const newProvider = (): LocalAIProvider => ({
  id: createProviderID(),
  title: "",
@ -80,11 +97,20 @@ const toProviderConfig = (provider: LocalAIProvider) =>
    apiKey: provider.apiKey,
  });

+const toTranscriptionConfig = (transcription: LocalTranscription) =>
+  create(InstanceSetting_TranscriptionConfigSchema, {
+    providerId: transcription.providerId,
+    model: transcription.model.trim(),
+    language: transcription.language.trim(),
+    prompt: transcription.prompt,
+  });
+
 const AISection = () => {
  const t = useTranslate();
  const saveInstanceSetting = useInstanceSettingUpdater();
  const { aiSetting: originalSetting } = useInstance();
  const [providers, setProviders] = useState<LocalAIProvider[]>(() => originalSetting.providers.map(toLocalProvider));
+  const [transcription, setTranscription] = useState<LocalTranscription>(() => toLocalTranscription(originalSetting.transcription));
  const [editingProvider, setEditingProvider] = useState<LocalAIProvider | undefined>();
  const [deleteTarget, setDeleteTarget] = useState<LocalAIProvider | undefined>();

@ -92,8 +118,50 @@ const AISection = () => {
    setProviders(originalSetting.providers.map(toLocalProvider));
  }, [originalSetting.providers]);

-  const originalProviders = useMemo(() => originalSetting.providers.map(toLocalProvider), [originalSetting.providers]);
-  const hasChanges = !isEqual(providers, originalProviders);
+  // Only re-sync the transcription draft when the server-side content actually
+  // changes — not on every originalSetting identity change. This prevents
+  // provider-side saves (which keep transcription unchanged on the server) from
+  // wiping an in-progress transcription draft.
+  const lastSyncedTranscription = useRef<LocalTranscription>(toLocalTranscription(originalSetting.transcription));
+  useEffect(() => {
+    const next = toLocalTranscription(originalSetting.transcription);
+    if (!isEqual(lastSyncedTranscription.current, next)) {
+      setTranscription(next);
+      lastSyncedTranscription.current = next;
+    }
+  }, [originalSetting.transcription]);
+
+  const originalTranscription = useMemo(() => toLocalTranscription(originalSetting.transcription), [originalSetting.transcription]);
+  const transcriptionHasChanges = !isEqual(transcription, originalTranscription);
+
+  const transcriptionProviderRef = useMemo(
+    () => providers.find((provider) => provider.id === transcription.providerId),
+    [providers, transcription.providerId],
+  );
+
+  // Persists the AI setting using a specific providers list and transcription
+  // value. Provider operations pass originalSetting.transcription so an
+  // in-progress transcription draft is never accidentally committed.
+  const persistAISetting = async (
+    nextProviders: LocalAIProvider[],
+    nextTranscription: InstanceSetting_TranscriptionConfig | undefined,
+    errorContext: string,
+  ) => {
+    return saveInstanceSetting({
+      key: InstanceSetting_Key.AI,
+      setting: create(InstanceSettingSchema, {
+        name: buildInstanceSettingName(InstanceSetting_Key.AI),
+        value: {
+          case: "aiSetting",
+          value: create(InstanceSetting_AISettingSchema, {
+            providers: nextProviders.map(toProviderConfig),
+            transcription: nextTranscription,
+          }),
+        },
+      }),
+      errorContext,
+    });
+  };

  const handleCreateProvider = () => {
    setEditingProvider(newProvider());
@ -103,7 +171,7 @@ const AISection = () => {
    setEditingProvider({ ...provider, apiKey: "" });
  };

-  const handleSaveProvider = (provider: LocalAIProvider) => {
+  const handleSaveProvider = async (provider: LocalAIProvider) => {
    const title = provider.title.trim();
    const endpoint = provider.endpoint.trim();

@ -116,41 +184,47 @@ const AISection = () => {
      return;
    }

-    const normalizedProvider = {
-      ...provider,
-      title,
-      endpoint,
-    };
-    setProviders((prev) => {
-      const exists = prev.some((item) => item.id === normalizedProvider.id);
-      if (!exists) {
-        return [...prev, normalizedProvider];
-      }
-      return prev.map((item) => (item.id === normalizedProvider.id ? normalizedProvider : item));
-    });
+    const normalizedProvider = { ...provider, title, endpoint };
+    const exists = providers.some((item) => item.id === normalizedProvider.id);
+    const nextProviders = exists
+      ? providers.map((item) => (item.id === normalizedProvider.id ? normalizedProvider : item))
+      : [...providers, normalizedProvider];
+
+    const ok = await persistAISetting(nextProviders, originalSetting.transcription, "Update AI provider");
+    if (!ok) return;
+    setProviders(nextProviders);
    setEditingProvider(undefined);
  };

-  const handleDeleteProvider = () => {
+  const handleDeleteProvider = async () => {
    if (!deleteTarget) return;
-    setProviders((prev) => prev.filter((provider) => provider.id !== deleteTarget.id));
+    const target = deleteTarget;
+    const nextProviders = providers.filter((provider) => provider.id !== target.id);
+
+    // If the persisted transcription references the deleted provider, the
+    // server would reject the save (provider_id must reference an existing
+    // provider). Send a cleared transcription in that case.
+    const persistedTranscription = originalSetting.transcription;
+    const nextTranscription =
+      persistedTranscription && persistedTranscription.providerId === target.id
+        ? create(InstanceSetting_TranscriptionConfigSchema, {})
+        : persistedTranscription;
+
+    const ok = await persistAISetting(nextProviders, nextTranscription, "Delete AI provider");
+    if (!ok) return;
+    setProviders(nextProviders);
+    if (transcription.providerId === target.id) {
+      setTranscription((prev) => ({ ...prev, providerId: "" }));
+    }
    setDeleteTarget(undefined);
  };

-  const handleSaveSetting = async () => {
-    await saveInstanceSetting({
-      key: InstanceSetting_Key.AI,
-      setting: create(InstanceSettingSchema, {
-        name: buildInstanceSettingName(InstanceSetting_Key.AI),
-        value: {
-          case: "aiSetting",
-          value: create(InstanceSetting_AISettingSchema, {
-            providers: providers.map(toProviderConfig),
-          }),
-        },
-      }),
-      errorContext: "Update AI providers",
-    });
+  const handleSaveTranscription = async () => {
+    if (transcription.providerId && !transcriptionProviderRef) {
+      toast.error(t("setting.ai.transcription-empty-providers"));
+      return;
+    }
+    await persistAISetting(providers, toTranscriptionConfig(transcription), "Update transcription");
  };

  return (
@ -183,7 +257,7 @@ const AISection = () => {
        </div>
      </SettingPanel>

-      <SettingGroup title={t("setting.ai.providers")} description={t("setting.ai.description")}>
+      <SettingGroup title={t("setting.ai.integrations-title")} description={t("setting.ai.integrations-description")}>
        <SettingTable
          columns={[
            {
@ -242,11 +316,23 @@ const AISection = () => {
        />
      </SettingGroup>

-      <div className="w-full flex justify-end">
-        <Button disabled={!hasChanges} onClick={handleSaveSetting}>
-          {t("common.save")}
-        </Button>
-      </div>
+      <SettingGroup
+        title={t("setting.ai.transcription-title")}
+        description={t("setting.ai.transcription-description")}
+        showSeparator
+        actions={
+          <Button disabled={!transcriptionHasChanges} onClick={handleSaveTranscription}>
+            {t("common.save")}
+          </Button>
+        }
+      >
+        <TranscriptionForm
+          providers={providers}
+          transcription={transcription}
+          onChange={setTranscription}
+          referencedProvider={transcriptionProviderRef}
+        />
+      </SettingGroup>

      <AIProviderDialog
        provider={editingProvider}
@ -267,6 +353,98 @@ const AISection = () => {
  );
 };

+interface TranscriptionFormProps {
+  providers: LocalAIProvider[];
+  transcription: LocalTranscription;
+  referencedProvider: LocalAIProvider | undefined;
+  onChange: (next: LocalTranscription) => void;
+}
+
+const TranscriptionForm = ({ providers, transcription, referencedProvider, onChange }: TranscriptionFormProps) => {
+  const t = useTranslate();
+  const noProviders = providers.length === 0;
+
+  const update = (partial: Partial<LocalTranscription>) => {
+    onChange({ ...transcription, ...partial });
+  };
+
+  const placeholderForProvider = (provider: LocalAIProvider | undefined) => {
+    if (!provider) return "";
+    return provider.type === InstanceSetting_AIProviderType.GEMINI
+      ? t("setting.ai.transcription-model-placeholder-gemini")
+      : t("setting.ai.transcription-model-placeholder-openai");
+  };
+
+  return (
+    <div className="grid grid-cols-1 sm:grid-cols-2 gap-3 max-w-3xl">
+      <div className="flex flex-col gap-1.5 sm:col-span-2">
+        <Label>{t("setting.ai.transcription-provider")}</Label>
+        <Select
+          value={transcription.providerId || "__none__"}
+          onValueChange={(value) => update({ providerId: value === "__none__" ? "" : value })}
+          disabled={noProviders}
+        >
+          <SelectTrigger className="w-full">
+            <SelectValue />
+          </SelectTrigger>
+          <SelectContent>
+            <SelectItem value="__none__">{t("setting.ai.transcription-no-provider")}</SelectItem>
+            {providers.map((provider) => (
+              <SelectItem key={provider.id} value={provider.id}>
+                {provider.title || provider.id}
+              </SelectItem>
+            ))}
+          </SelectContent>
+        </Select>
+        {noProviders && <p className="text-xs text-muted-foreground">{t("setting.ai.transcription-empty-providers")}</p>}
+        {referencedProvider && !referencedProvider.apiKeySet && (
+          <p className="text-xs text-destructive">{t("setting.ai.transcription-warning-no-key")}</p>
+        )}
+        {referencedProvider?.type === InstanceSetting_AIProviderType.GEMINI && (
+          <p className="text-xs text-muted-foreground">{t("setting.ai.transcription-warning-gemini-webm")}</p>
+        )}
+      </div>
+
+      <div className="flex flex-col gap-1.5 sm:col-span-2">
+        <Label>{t("setting.ai.transcription-model")}</Label>
+        <Input
+          value={transcription.model}
+          onChange={(e) => update({ model: e.target.value })}
+          placeholder={placeholderForProvider(referencedProvider)}
+          disabled={!transcription.providerId}
+          maxLength={256}
+        />
+        <p className="text-xs text-muted-foreground">{t("setting.ai.transcription-model-help")}</p>
+      </div>
+
+      <div className="flex flex-col gap-1.5">
+        <Label>{t("setting.ai.transcription-language")}</Label>
+        <Input
+          value={transcription.language}
+          onChange={(e) => update({ language: e.target.value })}
+          placeholder={t("setting.ai.transcription-language-placeholder")}
+          disabled={!transcription.providerId}
+          maxLength={32}
+        />
+        <p className="text-xs text-muted-foreground">{t("setting.ai.transcription-language-help")}</p>
+      </div>
+
+      <div className="flex flex-col gap-1.5 sm:col-span-2">
+        <Label>{t("setting.ai.transcription-prompt")}</Label>
+        <Textarea
+          value={transcription.prompt}
+          onChange={(e) => update({ prompt: e.target.value })}
+          placeholder={t("setting.ai.transcription-prompt-placeholder")}
+          rows={3}
+          disabled={!transcription.providerId}
+          maxLength={4096}
+        />
+        <p className="text-xs text-muted-foreground">{t("setting.ai.transcription-prompt-help")}</p>
+      </div>
+    </div>
+  );
+};
+
 interface AIProviderDialogProps {
  provider?: LocalAIProvider;
  onOpenChange: (open: boolean) => void;
--- a/web/src/locales/en.json
+++ b/web/src/locales/en.json
@ -427,13 +427,32 @@
      "edit-provider": "Edit provider",
      "endpoint": "Endpoint",
      "endpoint-hint": "Leave empty to use the official provider endpoint.",
+      "integrations-description": "Provider keys are supplied by the instance owner and used by server-side AI features.",
+      "integrations-title": "AI integrations",
      "keep-api-key": "Leave blank to keep the existing key",
      "label": "AI",
      "no-providers": "No AI providers configured.",
      "provider-title": "Provider name",
      "provider-title-required": "Provider name is required.",
      "provider-type": "Provider type",
-      "providers": "Providers"
+      "providers": "Providers",
+      "transcription-description": "Speech-to-text settings used when recording audio in the memo composer.",
+      "transcription-empty-providers": "Add an AI integration first to enable transcription.",
+      "transcription-language-help": "ISO 639-1 short code (e.g. en, de, zh). Leave empty to auto-detect.",
+      "transcription-language-placeholder": "auto-detect",
+      "transcription-language": "Default language",
+      "transcription-model-help": "Free text. Use the provider's model identifier — e.g. whisper-1, gpt-4o-transcribe, whisper-large-v3-turbo.",
+      "transcription-model-placeholder-gemini": "gemini-2.5-flash",
+      "transcription-model-placeholder-openai": "whisper-1",
+      "transcription-model": "Model",
+      "transcription-no-provider": "None — transcription disabled",
+      "transcription-prompt-help": "Improves spelling of proper nouns and jargon. Whisper limit is roughly 224 tokens.",
+      "transcription-prompt-placeholder": "Names: Alice, Bob. Glossary: kubernetes, OAuth.",
+      "transcription-prompt": "Prompt hints",
+      "transcription-provider": "Provider",
+      "transcription-title": "Transcription",
+      "transcription-warning-gemini-webm": "Gemini does not accept browser-recorded audio/webm. For in-editor recording, use an OpenAI-compatible provider.",
+      "transcription-warning-no-key": "The selected provider has no API key set. Edit the integration above to add one."
    },
    "instance": {
      "access-description": "Control sign-up, authentication, profile editing, and calendar defaults for this instance.",
--- a/web/src/types/proto/api/v1/ai_service_pb.ts
+++ b/web/src/types/proto/api/v1/ai_service_pb.ts
@ -13,30 +13,16 @@ import type { Message } from "@bufbuild/protobuf";
 * Describes the file api/v1/ai_service.proto.
 */
 export const file_api_v1_ai_service: GenFile = /*@__PURE__*/
-  fileDesc("ChdhcGkvdjEvYWlfc2VydmljZS5wcm90bxIMbWVtb3MuYXBpLnYxIpsBChFUcmFuc2NyaWJlUmVxdWVzdBIYCgtwcm92aWRlcl9pZBgBIAEoCUID4EECEjYKBmNvbmZpZxgCIAEoCzIhLm1lbW9zLmFwaS52MS5UcmFuc2NyaXB0aW9uQ29uZmlnQgPgQQISNAoFYXVkaW8YAyABKAsyIC5tZW1vcy5hcGkudjEuVHJhbnNjcmlwdGlvbkF1ZGlvQgPgQQIiQQoTVHJhbnNjcmlwdGlvbkNvbmZpZxITCgZwcm9tcHQYASABKAlCA+BBARIVCghsYW5ndWFnZRgCIAEoCUID4EEBIncKElRyYW5zY3JpcHRpb25BdWRpbxIWCgdjb250ZW50GAEgASgMQgPgQQRIABINCgN1cmkYAiABKAlIABIVCghmaWxlbmFtZRgDIAEoCUID4EEBEhkKDGNvbnRlbnRfdHlwZRgEIAEoCUID4EEBQggKBnNvdXJjZSIiChJUcmFuc2NyaWJlUmVzcG9uc2USDAoEdGV4dBgBIAEoCTKaAQoJQUlTZXJ2aWNlEowBCgpUcmFuc2NyaWJlEh8ubWVtb3MuYXBpLnYxLlRyYW5zY3JpYmVSZXF1ZXN0GiAubWVtb3MuYXBpLnYxLlRyYW5zY3JpYmVSZXNwb25zZSI72kEYcHJvdmlkZXJfaWQsY29uZmlnLGF1ZGlvgtPkkwIaOgEqIhUvYXBpL3YxL2FpOnRyYW5zY3JpYmVCpgEKEGNvbS5tZW1vcy5hcGkudjFCDkFpU2VydmljZVByb3RvUAFaMGdpdGh1Yi5jb20vdXNlbWVtb3MvbWVtb3MvcHJvdG8vZ2VuL2FwaS92MTthcGl2MaICA01BWKoCDE1lbW9zLkFwaS5WMcoCDE1lbW9zXEFwaVxWMeICGE1lbW9zXEFwaVxWMVxHUEJNZXRhZGF0YeoCDk1lbW9zOjpBcGk6OlYxYgZwcm90bzM", [file_google_api_annotations, file_google_api_client, file_google_api_field_behavior]);
+  fileDesc("ChdhcGkvdjEvYWlfc2VydmljZS5wcm90bxIMbWVtb3MuYXBpLnYxIkkKEVRyYW5zY3JpYmVSZXF1ZXN0EjQKBWF1ZGlvGAEgASgLMiAubWVtb3MuYXBpLnYxLlRyYW5zY3JpcHRpb25BdWRpb0ID4EECIncKElRyYW5zY3JpcHRpb25BdWRpbxIWCgdjb250ZW50GAEgASgMQgPgQQRIABINCgN1cmkYAiABKAlIABIVCghmaWxlbmFtZRgDIAEoCUID4EEBEhkKDGNvbnRlbnRfdHlwZRgEIAEoCUID4EEBQggKBnNvdXJjZSIiChJUcmFuc2NyaWJlUmVzcG9uc2USDAoEdGV4dBgBIAEoCTKGAQoJQUlTZXJ2aWNlEnkKClRyYW5zY3JpYmUSHy5tZW1vcy5hcGkudjEuVHJhbnNjcmliZVJlcXVlc3QaIC5tZW1vcy5hcGkudjEuVHJhbnNjcmliZVJlc3BvbnNlIijaQQVhdWRpb4LT5JMCGjoBKiIVL2FwaS92MS9haTp0cmFuc2NyaWJlQqYBChBjb20ubWVtb3MuYXBpLnYxQg5BaVNlcnZpY2VQcm90b1ABWjBnaXRodWIuY29tL3VzZW1lbW9zL21lbW9zL3Byb3RvL2dlbi9hcGkvdjE7YXBpdjGiAgNNQViqAgxNZW1vcy5BcGkuVjHKAgxNZW1vc1xBcGlcVjHiAhhNZW1vc1xBcGlcVjFcR1BCTWV0YWRhdGHqAg5NZW1vczo6QXBpOjpWMWIGcHJvdG8z", [file_google_api_annotations, file_google_api_client, file_google_api_field_behavior]);

 /**
 * @generated from message memos.api.v1.TranscribeRequest
 */
 export type TranscribeRequest = Message<"memos.api.v1.TranscribeRequest"> & {
-  /**
-   * Required. The instance AI provider ID to use.
-   *
-   * @generated from field: string provider_id = 1;
-   */
-  providerId: string;
-
-  /**
-   * Required. Transcription options.
-   *
-   * @generated from field: memos.api.v1.TranscriptionConfig config = 2;
-   */
-  config?: TranscriptionConfig | undefined;
-
  /**
   * Required. Audio input.
   *
-   * @generated from field: memos.api.v1.TranscriptionAudio audio = 3;
+   * @generated from field: memos.api.v1.TranscriptionAudio audio = 1;
   */
  audio?: TranscriptionAudio | undefined;
 };
@ -48,32 +34,6 @@ export type TranscribeRequest = Message<"memos.api.v1.TranscribeRequest"> & {
 export const TranscribeRequestSchema: GenMessage<TranscribeRequest> = /*@__PURE__*/
  messageDesc(file_api_v1_ai_service, 0);

-/**
- * @generated from message memos.api.v1.TranscriptionConfig
- */
-export type TranscriptionConfig = Message<"memos.api.v1.TranscriptionConfig"> & {
-  /**
-   * Optional. A prompt to improve transcription quality.
-   *
-   * @generated from field: string prompt = 1;
-   */
-  prompt: string;
-
-  /**
-   * Optional. The language of the input audio.
-   *
-   * @generated from field: string language = 2;
-   */
-  language: string;
-};
-
-/**
- * Describes the message memos.api.v1.TranscriptionConfig.
- * Use `create(TranscriptionConfigSchema)` to create a new message.
- */
-export const TranscriptionConfigSchema: GenMessage<TranscriptionConfig> = /*@__PURE__*/
-  messageDesc(file_api_v1_ai_service, 1);
-
 /**
 * @generated from message memos.api.v1.TranscriptionAudio
 */
@ -119,7 +79,7 @@ export type TranscriptionAudio = Message<"memos.api.v1.TranscriptionAudio"> & {
 * Use `create(TranscriptionAudioSchema)` to create a new message.
 */
 export const TranscriptionAudioSchema: GenMessage<TranscriptionAudio> = /*@__PURE__*/
-  messageDesc(file_api_v1_ai_service, 2);
+  messageDesc(file_api_v1_ai_service, 1);

 /**
 * @generated from message memos.api.v1.TranscribeResponse
@ -138,7 +98,7 @@ export type TranscribeResponse = Message<"memos.api.v1.TranscribeResponse"> & {
 * Use `create(TranscribeResponseSchema)` to create a new message.
 */
 export const TranscribeResponseSchema: GenMessage<TranscribeResponse> = /*@__PURE__*/
-  messageDesc(file_api_v1_ai_service, 3);
+  messageDesc(file_api_v1_ai_service, 2);

 /**
 * @generated from service memos.api.v1.AIService
--- a/web/src/types/proto/api/v1/instance_service_pb.ts
+++ b/web/src/types/proto/api/v1/instance_service_pb.ts