71 KiB
Transcription (STT) Settings — Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: Replace the implicit "first AI provider with an API key wins" transcription flow with an explicit, instance-level TranscriptionConfig that names a provider, model, default language, and prompt hint — enabling Whisper / Groq / self-hosted Whisper-compatible endpoints, restoring multi-provider flexibility, and exposing the Whisper API's prompt field for proper-noun spelling hints.
Architecture: Schema-additive. Add TranscriptionConfig (provider_id, model, language, prompt) to both proto/store/instance_setting.proto and proto/api/v1/instance_service.proto under InstanceAISetting / InstanceSetting.AISetting. Server-side Transcribe resolves provider/model/language/prompt from the persisted config when not overridden in the request, falling through to the existing DefaultTranscriptionModel for the model. UpdateInstanceSetting validates transcription.provider_id references an existing provider and that the persisted config's strings respect length caps. Frontend splits the existing AISection into two groups — "AI Integrations" (existing providers list, renamed in copy) and "Transcription" (new four-field form: Provider / Model / Language / Prompt) — and the home MemoEditor reads aiSetting.transcription instead of scanning providers.
Tech Stack: Backend Go 1.26, Connect RPC + protobuf via buf (remote plugins), github.com/pkg/errors, gRPC status.Errorf. Frontend React 18 + TypeScript 6, @bufbuild/protobuf v2, Connect-ES, Tailwind v4, Radix UI primitives via @/components/ui/*. Tests: Go testing + testify/require; frontend manual verification (no component tests in Settings/).
Spec: docs/superpowers/specs/2026-05-02-transcription-settings-design.md
Branch note: This work is intended for a fresh worktree off main (e.g. feat/transcription-settings). The spec was committed on feat/calendar-date-prefill because that branch was active at brainstorm time; before starting Task 1, create a new worktree:
git worktree add -b feat/transcription-settings ../memos-transcription main
cd ../memos-transcription
git cherry-pick <spec-commit-sha> # bring the spec doc onto the new branch
File map
Created
- (none — all changes are edits or generated)
Modified — protobuf source (changes regenerate Go + TS + OpenAPI via buf generate)
proto/store/instance_setting.proto— addTranscriptionConfigmessage andtranscriptionfield onInstanceAISetting.proto/api/v1/instance_service.proto— add parallelTranscriptionConfigmessage andtranscriptionfield inside the nestedAISettingmessage.
Modified — backend Go
server/router/api/v1/instance_service.go— extendconvertInstanceAISettingFromStore/convertInstanceAISettingToStoreto round-triptranscription; extendprepareInstanceAISettingForUpdateto validatetranscription.provider_idexists inproviders[](when set) and length-capmodel/language/prompt; preserve unchanged transcription fields when anUpdateInstanceSettingrequest omits them.server/router/api/v1/ai_service.go— readInstanceAISetting.transcriptionat the start ofTranscribe; resolve provider_id / model / language / prompt via "request override → persisted setting → engine default"; returnFailedPreconditionwhen no provider can be resolved; remove the now-redundantprovider_idREQUIRED gate (becomes optional in the proto).proto/api/v1/ai_service.proto— relaxTranscribeRequest.provider_idfrom REQUIRED to OPTIONAL.
Modified — backend tests
server/router/api/v1/test/ai_service_test.go— add cases: persistedtranscription.provider_idresolves when request omits it; persistedtranscription.modeloverrides default; per-callConfig.promptwins over persisted prompt;FailedPreconditionwhen neither request nor setting names a provider.server/router/api/v1/test/instance_service_test.go— add cases for the new validation: unknowntranscription.provider_idrejected; oversizedmodel/language/promptrejected; existing transcription preserved when the field is omitted on update.
Modified — frontend
web/src/components/Settings/AISection.tsx— restructure into twoSettingGroupblocks: "AI Integrations" (existing provider table) and "Transcription" (new). AddTranscriptionFormcomponent co-located in the same file or split if it grows past ~120 LOC. Wire local state, change tracking vialodash-es/isEqual, save to the sameInstanceSetting_Key.AIsetting.web/src/components/MemoEditor/index.tsx— replace thetranscriptionProviderlookup with acanTranscribeboolean derived fromaiSetting.transcription.providerIdplus the referenced provider's existence andapiKeySet.web/src/components/MemoEditor/services/transcriptionService.ts— drop theproviderparameter; calltranscribe()with noproviderId(server resolves from the setting).web/src/locales/en.json— add new strings for the Transcription form. Other locale files are left for the maintainer's translation pass (consistent with howbyok-*strings were originally added).
Task 1: Add TranscriptionConfig to the store proto
Files:
- Modify:
proto/store/instance_setting.proto
The store-level TranscriptionConfig is the persistent shape written to disk. Field numbers are fresh (1–4); the new field on InstanceAISetting reuses the next slot (2).
- Step 1: Edit
proto/store/instance_setting.proto
In the file, locate message InstanceAISetting { ... } (around lines 149–152) and replace it with the version below. Then append the new TranscriptionConfig message immediately after the existing AIProviderConfig block (after the AIProviderType enum at the bottom of the file).
message InstanceAISetting {
// providers is the list of AI provider configurations available instance-wide.
repeated AIProviderConfig providers = 1;
// transcription is the speech-to-text feature configuration.
// When unset or transcription.provider_id is empty, transcription is disabled.
TranscriptionConfig transcription = 2;
}
After the existing enum AIProviderType { ... } block, append:
// TranscriptionConfig configures the speech-to-text feature.
message TranscriptionConfig {
// provider_id references an entry in InstanceAISetting.providers[].id.
// Empty string means transcription is disabled.
string provider_id = 1;
// model is the provider-specific model identifier.
// Empty string falls back to the engine default
// (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
string model = 2;
// language is the default ISO 639-1 language hint sent to the provider.
// Empty string lets the provider auto-detect.
string language = 3;
// prompt is a default spelling/vocabulary hint passed to the provider.
// Used as the OpenAI Whisper "prompt" parameter and folded into the Gemini
// generation prompt as a "Context and spelling hints" block.
string prompt = 4;
}
- Step 2: Regenerate Go + TypeScript bindings
Run from the proto/ directory:
cd proto && buf format -w && buf generate
Expected: command exits 0; files under proto/gen/store/instance_setting.pb.go and web/src/types/proto/store/instance_setting_pb.ts updated to include TranscriptionConfig and the new Transcription field.
- Step 3: Verify Go compiles
Run from repo root:
go build ./...
Expected: PASS. (Backend code does not yet reference the new field, so this just confirms the generation is well-formed.)
- Step 4: Commit
git add proto/store/instance_setting.proto proto/gen/store/ web/src/types/proto/store/
git commit -m "feat(proto/store): add TranscriptionConfig to InstanceAISetting
Adds provider_id / model / language / prompt fields for the new
explicit transcription configuration. Schema-additive (field 2 on
InstanceAISetting); existing instances default to provider_id=\"\"
which means transcription is disabled until the operator selects
a provider in settings."
Task 2: Mirror TranscriptionConfig into the API proto
Files:
- Modify:
proto/api/v1/instance_service.proto
The API-level message mirrors the store version. They live in different proto packages (memos.api.v1 vs memos.store), matching the existing parallel-message pattern (User, Memo, AIProviderConfig, etc.).
- Step 1: Edit
proto/api/v1/instance_service.proto
Locate the nested message AISetting { ... } block (around lines 226–230) and replace it with:
// AI provider configuration settings.
message AISetting {
// providers is the list of AI provider configurations available instance-wide.
repeated AIProviderConfig providers = 1;
// transcription is the speech-to-text feature configuration.
// When unset or transcription.provider_id is empty, transcription is disabled.
TranscriptionConfig transcription = 2;
}
Immediately after the existing enum AIProviderType { ... } block (currently the last child of InstanceSetting, around lines 247–251), append the new nested message — keep the indentation: it lives inside message InstanceSetting { ... }:
// TranscriptionConfig configures the speech-to-text feature.
message TranscriptionConfig {
// provider_id references an entry in AISetting.providers[].id.
// Empty string means transcription is disabled.
string provider_id = 1;
// model is the provider-specific model identifier.
// Empty string falls back to the engine default
// (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
string model = 2;
// language is the default ISO 639-1 language hint sent to the provider.
// Empty string lets the provider auto-detect.
string language = 3;
// prompt is a default spelling/vocabulary hint passed to the provider.
string prompt = 4;
}
- Step 2: Regenerate
cd proto && buf format -w && buf generate
Expected: PASS. Updates proto/gen/api/v1/instance_service.pb.go, proto/gen/openapi.yaml, and web/src/types/proto/api/v1/instance_service_pb.ts.
- Step 3: Verify Go still compiles
go build ./...
Expected: PASS. Existing convertInstanceAISetting* functions still compile because the new field defaults to nil/zero on round-trip.
- Step 4: Commit
git add proto/api/v1/instance_service.proto proto/gen/ web/src/types/proto/api/
git commit -m "feat(proto/api): add TranscriptionConfig to AISetting
Mirrors the store-level TranscriptionConfig. Both messages live in
their own packages (memos.api.v1 vs memos.store) following the
existing parallel-message pattern used for AIProviderConfig."
Task 3: Round-trip transcription through convertInstanceAISetting{From,To}Store
Files:
- Modify:
server/router/api/v1/instance_service.go:505-551
The existing converters drop unknown fields silently because they only copy named fields. Without explicit handling, transcription would be lost on every round-trip. This task is purely plumbing — no validation yet.
- Step 1: Edit
convertInstanceAISettingFromStore
Replace the function body (currently lines 505–528) so the returned aiSetting carries the new field:
func convertInstanceAISettingFromStore(setting *storepb.InstanceAISetting) *v1pb.InstanceSetting_AISetting {
if setting == nil {
return nil
}
aiSetting := &v1pb.InstanceSetting_AISetting{
Providers: make([]*v1pb.InstanceSetting_AIProviderConfig, 0, len(setting.Providers)),
Transcription: convertTranscriptionConfigFromStore(setting.GetTranscription()),
}
for _, provider := range setting.Providers {
if provider == nil {
continue
}
apiKey := provider.GetApiKey()
aiSetting.Providers = append(aiSetting.Providers, &v1pb.InstanceSetting_AIProviderConfig{
Id: provider.GetId(),
Title: provider.GetTitle(),
Type: v1pb.InstanceSetting_AIProviderType(provider.GetType()),
Endpoint: provider.GetEndpoint(),
ApiKeySet: apiKey != "",
ApiKeyHint: maskAPIKey(apiKey),
})
}
return aiSetting
}
- Step 2: Edit
convertInstanceAISettingToStore
Replace the function body (currently lines 530–551):
func convertInstanceAISettingToStore(setting *v1pb.InstanceSetting_AISetting) *storepb.InstanceAISetting {
if setting == nil {
return nil
}
aiSetting := &storepb.InstanceAISetting{
Providers: make([]*storepb.AIProviderConfig, 0, len(setting.Providers)),
Transcription: convertTranscriptionConfigToStore(setting.GetTranscription()),
}
for _, provider := range setting.Providers {
if provider == nil {
continue
}
aiSetting.Providers = append(aiSetting.Providers, &storepb.AIProviderConfig{
Id: provider.GetId(),
Title: provider.GetTitle(),
Type: storepb.AIProviderType(provider.GetType()),
Endpoint: provider.GetEndpoint(),
ApiKey: provider.GetApiKey(),
})
}
return aiSetting
}
- Step 3: Add the two new converter helpers
Append immediately after convertInstanceAISettingToStore:
func convertTranscriptionConfigFromStore(setting *storepb.TranscriptionConfig) *v1pb.InstanceSetting_TranscriptionConfig {
if setting == nil {
return nil
}
return &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: setting.GetProviderId(),
Model: setting.GetModel(),
Language: setting.GetLanguage(),
Prompt: setting.GetPrompt(),
}
}
func convertTranscriptionConfigToStore(setting *v1pb.InstanceSetting_TranscriptionConfig) *storepb.TranscriptionConfig {
if setting == nil {
return nil
}
return &storepb.TranscriptionConfig{
ProviderId: setting.GetProviderId(),
Model: setting.GetModel(),
Language: setting.GetLanguage(),
Prompt: setting.GetPrompt(),
}
}
- Step 4: Build
go build ./...
Expected: PASS.
- Step 5: Commit
git add server/router/api/v1/instance_service.go
git commit -m "feat(api/instance): round-trip transcription through AI setting converters"
Task 4: Validate transcription in prepareInstanceAISettingForUpdate
Files:
- Modify:
server/router/api/v1/instance_service.go:564-623
The spec lists four validation rules: provider_id must reference an existing entry in providers[] (when set); length caps on model (256), language (32), prompt (4096). Plus the "preserve previous on omit" rule that mirrors how API keys are preserved when a request omits them.
- Step 1: Write the failing test for unknown provider_id
Open server/router/api/v1/test/instance_service_test.go and append a new sub-test inside the existing top-level TestUpdateInstanceSetting-equivalent function (the same one that currently contains "UpdateInstanceSetting - AI provider keys are write-only and preserved on empty" near line 670). Find the closing brace of that sub-test and insert before it:
t.Run("UpdateInstanceSetting - transcription provider_id must reference an existing provider", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
hostUser, err := ts.CreateHostUser(ctx, "admin")
require.NoError(t, err)
adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
Setting: &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "sk-test",
},
},
Transcription: &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "does-not-exist",
},
},
},
},
})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription provider_id")
})
t.Run("UpdateInstanceSetting - transcription strings are length-capped", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
hostUser, err := ts.CreateHostUser(ctx, "admin")
require.NoError(t, err)
adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
base := &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "sk-test",
},
},
},
},
}
oversizedModel := strings.Repeat("a", 257)
base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Model: oversizedModel,
}
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription model")
oversizedLanguage := strings.Repeat("a", 33)
base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Language: oversizedLanguage,
}
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription language")
oversizedPrompt := strings.Repeat("a", 4097)
base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Prompt: oversizedPrompt,
}
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription prompt")
})
t.Run("UpdateInstanceSetting - transcription is preserved when omitted on update", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
hostUser, err := ts.CreateHostUser(ctx, "admin")
require.NoError(t, err)
adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
Setting: &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "sk-test",
},
},
Transcription: &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Model: "whisper-1",
Language: "en",
Prompt: "names: Alice",
},
},
},
},
})
require.NoError(t, err)
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
Setting: &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "",
},
},
},
},
},
})
require.NoError(t, err)
stored, err := ts.Store.GetInstanceAISetting(ctx)
require.NoError(t, err)
require.NotNil(t, stored.GetTranscription())
require.Equal(t, "openai-main", stored.GetTranscription().GetProviderId())
require.Equal(t, "whisper-1", stored.GetTranscription().GetModel())
require.Equal(t, "en", stored.GetTranscription().GetLanguage())
require.Equal(t, "names: Alice", stored.GetTranscription().GetPrompt())
})
Confirm strings is already imported in this test file. If not, add "strings" to its import block.
- Step 2: Run the tests to verify they fail
go test -run TestUpdateInstanceSetting -v ./server/router/api/v1/test/... 2>&1 | tail -40
Expected: the three new sub-tests FAIL because prepareInstanceAISettingForUpdate does not yet validate or preserve transcription.
- Step 3: Add validation + preservation to
prepareInstanceAISettingForUpdate
Open server/router/api/v1/instance_service.go. Add these constants near the top of the file (or next to existing instance setting constants — search for any existing length cap constants and place these alongside):
const (
maxTranscriptionConfigModelLength = 256
maxTranscriptionConfigLanguageLength = 32
maxTranscriptionConfigPromptLength = 4096
)
Then, at the very end of the existing prepareInstanceAISettingForUpdate function (immediately before its closing return nil), insert:
if err := preparePersistedTranscriptionConfig(setting, existing); err != nil {
return err
}
And add this new function next to prepareInstanceAISettingForUpdate:
func preparePersistedTranscriptionConfig(setting *storepb.InstanceAISetting, existing *storepb.InstanceAISetting) error {
// Preserve the previously stored transcription config when the request omits it,
// matching the same "absence == keep" semantics used for API keys.
if setting.Transcription == nil {
if existing != nil {
setting.Transcription = existing.GetTranscription()
}
return nil
}
cfg := setting.Transcription
cfg.ProviderId = strings.TrimSpace(cfg.ProviderId)
cfg.Model = strings.TrimSpace(cfg.Model)
cfg.Language = strings.TrimSpace(cfg.Language)
cfg.Prompt = strings.TrimSpace(cfg.Prompt)
if cfg.ProviderId != "" {
referenced := false
for _, provider := range setting.Providers {
if provider != nil && provider.Id == cfg.ProviderId {
referenced = true
break
}
}
if !referenced {
return errors.Errorf("transcription provider_id %q does not reference any configured provider", cfg.ProviderId)
}
}
if len(cfg.Model) > maxTranscriptionConfigModelLength {
return errors.Errorf("transcription model is too long; maximum length is %d characters", maxTranscriptionConfigModelLength)
}
if len(cfg.Language) > maxTranscriptionConfigLanguageLength {
return errors.Errorf("transcription language is too long; maximum length is %d characters", maxTranscriptionConfigLanguageLength)
}
if len(cfg.Prompt) > maxTranscriptionConfigPromptLength {
return errors.Errorf("transcription prompt is too long; maximum length is %d characters", maxTranscriptionConfigPromptLength)
}
return nil
}
- Step 4: Run the tests to verify they pass
go test -run TestUpdateInstanceSetting -v ./server/router/api/v1/test/... 2>&1 | tail -40
Expected: PASS for all three new sub-tests plus all existing sub-tests.
- Step 5: Commit
git add server/router/api/v1/instance_service.go server/router/api/v1/test/instance_service_test.go
git commit -m "feat(api/instance): validate and preserve transcription config
Validates transcription.provider_id references an existing provider
and length-caps model (256), language (32), and prompt (4096). When
an update omits transcription, the previously stored config is
preserved — same semantics as the existing API-key preservation."
Task 5: Make TranscribeRequest.provider_id optional
Files:
- Modify:
proto/api/v1/ai_service.proto:24
The persisted setting becomes the source of truth; the request field is now an override for advanced callers.
- Step 1: Edit
proto/api/v1/ai_service.proto
Change line 24 from:
// Required. The instance AI provider ID to use.
string provider_id = 1 [(google.api.field_behavior) = REQUIRED];
to:
// Optional. The instance AI provider ID to use. When empty, the server
// resolves the provider from InstanceAISetting.transcription.provider_id.
string provider_id = 1 [(google.api.field_behavior) = OPTIONAL];
- Step 2: Regenerate
cd proto && buf format -w && buf generate
Expected: PASS.
- Step 3: Build
go build ./...
Expected: PASS. (The Connect/gRPC stub regenerates with the same Go field shape; field_behavior is metadata only.)
- Step 4: Commit
git add proto/api/v1/ai_service.proto proto/gen/
git commit -m "feat(proto/api): make TranscribeRequest.provider_id optional
When omitted, the server resolves the provider from the persisted
InstanceAISetting.transcription configuration."
Task 6: Resolve transcription config in the Transcribe RPC
Files:
- Modify:
server/router/api/v1/ai_service.go
The current implementation requires provider_id and uses DefaultTranscriptionModel for the model. The new flow: per-call request → persisted transcription → engine default. Per-call Config.prompt and Config.language already exist; they should now fall through to the persisted defaults when empty.
- Step 1: Write a failing test — persisted provider resolves when request omits provider_id
Open server/router/api/v1/test/ai_service_test.go and append, inside the existing TestTranscribe function (before the closing brace of the function — currently line 280), a new sub-test:
t.Run("resolves provider from persisted transcription setting when request omits provider_id", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
user, err := ts.CreateRegularUser(ctx, "alice-fallthrough")
require.NoError(t, err)
userCtx := ts.CreateUserContext(ctx, user.ID)
openAIServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
require.NoError(t, r.ParseMultipartForm(10<<20))
require.Equal(t, "whisper-1", r.FormValue("model"))
require.Equal(t, "fr", r.FormValue("language"))
require.Equal(t, "names: Alice", r.FormValue("prompt"))
w.Header().Set("Content-Type", "application/json")
require.NoError(t, json.NewEncoder(w).Encode(map[string]string{"text": "ok"}))
}))
defer openAIServer.Close()
_, err = ts.Store.UpsertInstanceSetting(ctx, &storepb.InstanceSetting{
Key: storepb.InstanceSettingKey_AI,
Value: &storepb.InstanceSetting_AiSetting{
AiSetting: &storepb.InstanceAISetting{
Providers: []*storepb.AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: storepb.AIProviderType_OPENAI,
Endpoint: openAIServer.URL,
ApiKey: "sk-test",
},
},
Transcription: &storepb.TranscriptionConfig{
ProviderId: "openai-main",
Model: "whisper-1",
Language: "fr",
Prompt: "names: Alice",
},
},
},
})
require.NoError(t, err)
resp, err := ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
Config: &v1pb.TranscriptionConfig{},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
Filename: "voice.wav",
ContentType: "audio/wav",
},
})
require.NoError(t, err)
require.Equal(t, "ok", resp.Text)
})
t.Run("per-call config overrides persisted prompt and language", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
user, err := ts.CreateRegularUser(ctx, "alice-override")
require.NoError(t, err)
userCtx := ts.CreateUserContext(ctx, user.ID)
openAIServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
require.NoError(t, r.ParseMultipartForm(10<<20))
require.Equal(t, "de", r.FormValue("language"))
require.Equal(t, "override prompt", r.FormValue("prompt"))
w.Header().Set("Content-Type", "application/json")
require.NoError(t, json.NewEncoder(w).Encode(map[string]string{"text": "ok"}))
}))
defer openAIServer.Close()
_, err = ts.Store.UpsertInstanceSetting(ctx, &storepb.InstanceSetting{
Key: storepb.InstanceSettingKey_AI,
Value: &storepb.InstanceSetting_AiSetting{
AiSetting: &storepb.InstanceAISetting{
Providers: []*storepb.AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: storepb.AIProviderType_OPENAI,
Endpoint: openAIServer.URL,
ApiKey: "sk-test",
},
},
Transcription: &storepb.TranscriptionConfig{
ProviderId: "openai-main",
Language: "fr",
Prompt: "names: Alice",
},
},
},
})
require.NoError(t, err)
_, err = ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
Config: &v1pb.TranscriptionConfig{
Language: "de",
Prompt: "override prompt",
},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
Filename: "voice.wav",
ContentType: "audio/wav",
},
})
require.NoError(t, err)
})
t.Run("returns FailedPrecondition when no provider configured", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
user, err := ts.CreateRegularUser(ctx, "alice-empty")
require.NoError(t, err)
userCtx := ts.CreateUserContext(ctx, user.ID)
_, err = ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
Config: &v1pb.TranscriptionConfig{},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
Filename: "voice.wav",
ContentType: "audio/wav",
},
})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription is not configured")
})
- Step 2: Run the tests to verify they fail
go test -run TestTranscribe -v ./server/router/api/v1/test/... 2>&1 | tail -60
Expected: the three new sub-tests FAIL — "resolves provider from persisted setting" fails because the current code requires request.ProviderId; "per-call config overrides" fails because the current code does not read the persisted prompt/language at all (so the persisted-only case isn't tested but the override path doesn't merge); "returns FailedPrecondition" fails because the current error is InvalidArgument: provider_id is required.
- Step 3: Refactor
Transcribeto resolve from the persisted setting
In server/router/api/v1/ai_service.go, replace the block from the start of the Transcribe method that validates provider_id and resolves the provider — currently lines 54–101 — with the version below. Keep the audio validation block (lines 68–91) intact: it stays AFTER the provider resolution because audio errors should still surface as InvalidArgument regardless of transcription config.
Specifically, replace lines 54–101 with:
if request.Config == nil {
return nil, status.Errorf(codes.InvalidArgument, "config is required")
}
if request.Audio == nil {
return nil, status.Errorf(codes.InvalidArgument, "audio is required")
}
if request.Audio.GetUri() != "" {
return nil, status.Errorf(codes.InvalidArgument, "audio uri is not supported")
}
content := request.Audio.GetContent()
if len(content) == 0 {
return nil, status.Errorf(codes.InvalidArgument, "audio content is required")
}
if len(content) > maxTranscriptionAudioSizeBytes {
return nil, status.Errorf(codes.InvalidArgument, "audio file is too large; maximum size is 25 MiB")
}
filename := strings.TrimSpace(request.Audio.GetFilename())
if len(filename) > maxTranscriptionFilenameLength {
return nil, status.Errorf(codes.InvalidArgument, "filename is too long; maximum length is %d characters", maxTranscriptionFilenameLength)
}
contentType := strings.TrimSpace(request.Audio.GetContentType())
if contentType == "" {
contentType = http.DetectContentType(content)
}
if !isSupportedTranscriptionContentType(contentType) {
return nil, status.Errorf(codes.InvalidArgument, "audio content type %q is not supported", contentType)
}
aiSetting, err := s.Store.GetInstanceAISetting(ctx)
if err != nil {
return nil, status.Errorf(codes.Internal, "failed to get AI setting: %v", err)
}
persisted := aiSetting.GetTranscription()
providerID := strings.TrimSpace(request.GetProviderId())
if providerID == "" {
providerID = persisted.GetProviderId()
}
if providerID == "" {
return nil, status.Errorf(codes.FailedPrecondition, "transcription is not configured")
}
provider, err := s.resolveAIProvider(aiSetting, providerID)
if err != nil {
return nil, err
}
model := strings.TrimSpace(request.GetConfig().GetModel())
if model == "" {
model = persisted.GetModel()
}
if model == "" {
defaultModel, err := ai.DefaultTranscriptionModel(provider.Type)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "%v", err)
}
model = defaultModel
}
prompt := strings.TrimSpace(request.GetConfig().GetPrompt())
if prompt == "" {
prompt = persisted.GetPrompt()
}
if len(prompt) > maxTranscriptionPromptLength {
return nil, status.Errorf(codes.InvalidArgument, "prompt is too long; maximum length is %d characters", maxTranscriptionPromptLength)
}
language := strings.TrimSpace(request.GetConfig().GetLanguage())
if language == "" {
language = persisted.GetLanguage()
}
if len(language) > maxTranscriptionLanguageLength {
return nil, status.Errorf(codes.InvalidArgument, "language is too long; maximum length is %d characters", maxTranscriptionLanguageLength)
}
Note: request.GetConfig().GetModel() requires that the API-level TranscriptionConfig actually have a model field. The current proto only has prompt and language. We don't add a model override field at this step — the GetModel() accessor will not exist. Remove the model-override line entirely so the precedence is persisted setting → engine default, with no per-call override:
Replace the model resolution block above with this simpler version (which is the actual code to commit):
model := persisted.GetModel()
if model == "" {
defaultModel, err := ai.DefaultTranscriptionModel(provider.Type)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "%v", err)
}
model = defaultModel
}
Also delete the existing helper resolveAIProviderForTranscription (currently lines 119–142) and replace it with this slimmer one that takes a pre-fetched setting:
func (s *APIV1Service) resolveAIProvider(setting *storepb.InstanceAISetting, providerID string) (ai.ProviderConfig, error) {
providers := make([]ai.ProviderConfig, 0, len(setting.GetProviders()))
for _, provider := range setting.GetProviders() {
if provider == nil {
continue
}
providers = append(providers, convertAIProviderConfigFromStore(provider))
}
provider, err := ai.FindProvider(providers, providerID)
if err != nil {
return ai.ProviderConfig{}, status.Errorf(codes.NotFound, "AI provider not found")
}
return *provider, nil
}
The remainder of Transcribe (the call to ai.NewTranscriber, the transcriber.Transcribe(...) call, the response construction) is unchanged — prompt, language, model, provider are all already in scope.
- Step 4: Run the tests
go test -run TestTranscribe -v ./server/router/api/v1/test/... 2>&1 | tail -60
Expected: PASS for all old sub-tests plus the three new ones.
- Step 5: Run the full backend suite
go test -race ./server/... ./internal/...
Expected: PASS.
- Step 6: Commit
git add server/router/api/v1/ai_service.go server/router/api/v1/test/ai_service_test.go
git commit -m "feat(api/ai): resolve transcription from persisted setting
Transcribe now resolves provider, model, language, and prompt with
this precedence: per-call request → persisted transcription config
→ engine default. provider_id may be omitted from the request when
the operator has selected a provider in settings. Returns
FailedPrecondition when no provider can be resolved."
Task 7: Frontend — restructure AISection into Integrations + Transcription
Files:
- Modify:
web/src/components/Settings/AISection.tsx - Modify:
web/src/locales/en.json
The existing provider list stays as-is in a renamed group. A new TranscriptionForm group is added below it. Both groups share a single Save action that writes the entire AISetting (this matches the existing pattern — the protobuf save is already whole-message).
- Step 1: Add new locale keys
Open web/src/locales/en.json. Locate the "ai": { ... } block (starting around line 411). Inside that block, add the following keys (alphabetically sorted to match the file's convention; most fall between keep-api-key and label):
"integrations-description": "Provider keys are supplied by the instance owner and used by server-side AI features.",
"integrations-title": "AI integrations",
"transcription-description": "Speech-to-text settings used when recording audio in the memo composer.",
"transcription-empty-providers": "Add an AI integration first to enable transcription.",
"transcription-language-help": "ISO 639-1 short code (e.g. en, de, zh). Leave empty to auto-detect.",
"transcription-language-placeholder": "auto-detect",
"transcription-language": "Default language",
"transcription-model-help": "Free text. Use the provider's model identifier — e.g. whisper-1, gpt-4o-transcribe, whisper-large-v3-turbo.",
"transcription-model-placeholder-gemini": "gemini-2.5-flash",
"transcription-model-placeholder-openai": "whisper-1",
"transcription-model": "Model",
"transcription-no-provider": "None — transcription disabled",
"transcription-prompt-help": "Improves spelling of proper nouns and jargon. Whisper limit is roughly 224 tokens.",
"transcription-prompt-placeholder": "Names: Alice, Bob. Glossary: kubernetes, OAuth.",
"transcription-prompt": "Prompt hints",
"transcription-provider": "Provider",
"transcription-title": "Transcription",
"transcription-warning-gemini-webm": "Gemini does not accept browser-recorded audio/webm. For in-editor recording, use an OpenAI-compatible provider.",
"transcription-warning-no-key": "The selected provider has no API key set. Edit the integration above to add one.",
Also leave the existing "providers": "Providers" key — AISection.tsx no longer uses it, but other locale files reference it; we won't churn translations for an unused string.
- Step 2: Restructure
AISection.tsx
Open web/src/components/Settings/AISection.tsx. The strategy: keep the existing provider table inside a new SettingGroup titled with setting.ai.integrations-title, and add a sibling SettingGroup for transcription. Reuse useState/isEqual change tracking, but for both providers and transcription combined.
Replace the file contents with the structure below. (This is a full rewrite of the file; the dialog component is unchanged from the existing implementation and is included verbatim at the bottom.)
import { create } from "@bufbuild/protobuf";
import { isEqual } from "lodash-es";
import { MoreVerticalIcon, PlusIcon } from "lucide-react";
import { useEffect, useMemo, useState } from "react";
import { toast } from "react-hot-toast";
import ConfirmDialog from "@/components/ConfirmDialog";
import { Button } from "@/components/ui/button";
import { Dialog, DialogContent, DialogDescription, DialogFooter, DialogHeader, DialogTitle } from "@/components/ui/dialog";
import { DropdownMenu, DropdownMenuContent, DropdownMenuItem, DropdownMenuTrigger } from "@/components/ui/dropdown-menu";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from "@/components/ui/select";
import { Textarea } from "@/components/ui/textarea";
import { useInstance } from "@/contexts/InstanceContext";
import {
InstanceSetting_AIProviderConfig,
InstanceSetting_AIProviderConfigSchema,
InstanceSetting_AIProviderType,
InstanceSetting_AISettingSchema,
InstanceSetting_Key,
InstanceSetting_TranscriptionConfig,
InstanceSetting_TranscriptionConfigSchema,
InstanceSettingSchema,
} from "@/types/proto/api/v1/instance_service_pb";
import { useTranslate } from "@/utils/i18n";
import SettingGroup from "./SettingGroup";
import { SettingPanel } from "./SettingList";
import SettingSection from "./SettingSection";
import SettingTable from "./SettingTable";
import useInstanceSettingUpdater, { buildInstanceSettingName } from "./useInstanceSettingUpdater";
type LocalAIProvider = {
id: string;
title: string;
type: InstanceSetting_AIProviderType;
endpoint: string;
apiKey: string;
apiKeySet: boolean;
apiKeyHint: string;
};
type LocalTranscription = {
providerId: string;
model: string;
language: string;
prompt: string;
};
const providerTypeOptions = [InstanceSetting_AIProviderType.OPENAI, InstanceSetting_AIProviderType.GEMINI];
const byokNotes = ["setting.ai.byok-key-note", "setting.ai.byok-storage-note", "setting.ai.byok-model-note"] as const;
const createProviderID = () => {
if (typeof crypto !== "undefined" && "randomUUID" in crypto) {
return crypto.randomUUID();
}
return `ai-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`;
};
const getProviderTypeLabel = (type: InstanceSetting_AIProviderType) => {
return InstanceSetting_AIProviderType[type] ?? "UNKNOWN";
};
const toLocalProvider = (provider: InstanceSetting_AIProviderConfig): LocalAIProvider => ({
id: provider.id,
title: provider.title,
type: provider.type,
endpoint: provider.endpoint,
apiKey: "",
apiKeySet: provider.apiKeySet,
apiKeyHint: provider.apiKeyHint,
});
const toLocalTranscription = (config: InstanceSetting_TranscriptionConfig | undefined): LocalTranscription => ({
providerId: config?.providerId ?? "",
model: config?.model ?? "",
language: config?.language ?? "",
prompt: config?.prompt ?? "",
});
const newProvider = (): LocalAIProvider => ({
id: createProviderID(),
title: "",
type: InstanceSetting_AIProviderType.OPENAI,
endpoint: "",
apiKey: "",
apiKeySet: false,
apiKeyHint: "",
});
const toProviderConfig = (provider: LocalAIProvider) =>
create(InstanceSetting_AIProviderConfigSchema, {
id: provider.id,
title: provider.title.trim(),
type: provider.type,
endpoint: provider.endpoint.trim(),
apiKey: provider.apiKey,
});
const toTranscriptionConfig = (transcription: LocalTranscription) =>
create(InstanceSetting_TranscriptionConfigSchema, {
providerId: transcription.providerId,
model: transcription.model.trim(),
language: transcription.language.trim(),
prompt: transcription.prompt,
});
const AISection = () => {
const t = useTranslate();
const saveInstanceSetting = useInstanceSettingUpdater();
const { aiSetting: originalSetting } = useInstance();
const [providers, setProviders] = useState<LocalAIProvider[]>(() => originalSetting.providers.map(toLocalProvider));
const [transcription, setTranscription] = useState<LocalTranscription>(() => toLocalTranscription(originalSetting.transcription));
const [editingProvider, setEditingProvider] = useState<LocalAIProvider | undefined>();
const [deleteTarget, setDeleteTarget] = useState<LocalAIProvider | undefined>();
useEffect(() => {
setProviders(originalSetting.providers.map(toLocalProvider));
setTranscription(toLocalTranscription(originalSetting.transcription));
}, [originalSetting.providers, originalSetting.transcription]);
const originalProviders = useMemo(() => originalSetting.providers.map(toLocalProvider), [originalSetting.providers]);
const originalTranscription = useMemo(() => toLocalTranscription(originalSetting.transcription), [originalSetting.transcription]);
const hasChanges = !isEqual(providers, originalProviders) || !isEqual(transcription, originalTranscription);
const transcriptionProviderRef = useMemo(
() => providers.find((provider) => provider.id === transcription.providerId),
[providers, transcription.providerId],
);
const handleCreateProvider = () => {
setEditingProvider(newProvider());
};
const handleEditProvider = (provider: LocalAIProvider) => {
setEditingProvider({ ...provider, apiKey: "" });
};
const handleSaveProvider = (provider: LocalAIProvider) => {
const title = provider.title.trim();
const endpoint = provider.endpoint.trim();
if (!title) {
toast.error(t("setting.ai.provider-title-required"));
return;
}
if (!provider.apiKeySet && !provider.apiKey.trim()) {
toast.error(t("setting.ai.api-key-required"));
return;
}
const normalizedProvider = { ...provider, title, endpoint };
setProviders((prev) => {
const exists = prev.some((item) => item.id === normalizedProvider.id);
if (!exists) {
return [...prev, normalizedProvider];
}
return prev.map((item) => (item.id === normalizedProvider.id ? normalizedProvider : item));
});
setEditingProvider(undefined);
};
const handleDeleteProvider = () => {
if (!deleteTarget) return;
setProviders((prev) => prev.filter((provider) => provider.id !== deleteTarget.id));
if (transcription.providerId === deleteTarget.id) {
setTranscription((prev) => ({ ...prev, providerId: "" }));
}
setDeleteTarget(undefined);
};
const handleSaveSetting = async () => {
if (transcription.providerId && !transcriptionProviderRef) {
toast.error(t("setting.ai.transcription-empty-providers"));
return;
}
await saveInstanceSetting({
key: InstanceSetting_Key.AI,
setting: create(InstanceSettingSchema, {
name: buildInstanceSettingName(InstanceSetting_Key.AI),
value: {
case: "aiSetting",
value: create(InstanceSetting_AISettingSchema, {
providers: providers.map(toProviderConfig),
transcription: toTranscriptionConfig(transcription),
}),
},
}),
errorContext: "Update AI setting",
});
};
return (
<SettingSection
title={t("setting.ai.label")}
actions={
<Button onClick={handleCreateProvider}>
<PlusIcon className="w-4 h-4 mr-2" />
{t("setting.ai.add-provider")}
</Button>
}
>
<SettingPanel className="bg-muted/30 px-4 py-3">
<div className="flex max-w-3xl flex-col gap-2">
<div className="flex flex-wrap items-center gap-2">
<span className="rounded-md border border-border bg-background px-2 py-0.5 text-xs font-medium text-foreground">
{t("setting.ai.byok-label")}
</span>
<h4 className="text-sm font-semibold text-foreground">{t("setting.ai.byok-title")}</h4>
</div>
<p className="text-sm text-muted-foreground">{t("setting.ai.byok-description")}</p>
<ul className="space-y-1 text-sm text-muted-foreground">
{byokNotes.map((note) => (
<li key={note} className="flex gap-2">
<span className="mt-2 size-1 rounded-full bg-muted-foreground/60" aria-hidden />
<span>{t(note)}</span>
</li>
))}
</ul>
</div>
</SettingPanel>
<SettingGroup title={t("setting.ai.integrations-title")} description={t("setting.ai.integrations-description")}>
<SettingTable
columns={[
{
key: "title",
header: t("common.name"),
render: (_, provider: LocalAIProvider) => (
<div className="flex flex-col gap-0.5">
<span className="text-foreground">{provider.title}</span>
<span className="font-mono text-xs text-muted-foreground">{provider.id}</span>
</div>
),
},
{
key: "type",
header: t("setting.ai.provider-type"),
render: (_, provider: LocalAIProvider) => <span>{getProviderTypeLabel(provider.type)}</span>,
},
{
key: "endpoint",
header: t("setting.ai.endpoint"),
render: (_, provider: LocalAIProvider) => (
<span className="font-mono text-xs">{provider.endpoint || t("setting.ai.default-endpoint")}</span>
),
},
{
key: "apiKeySet",
header: t("setting.ai.api-key"),
render: (_, provider: LocalAIProvider) => (
<span className="font-mono text-xs">{provider.apiKeySet ? provider.apiKeyHint || t("setting.ai.configured") : "-"}</span>
),
},
{
key: "actions",
header: "",
className: "text-right",
render: (_, provider: LocalAIProvider) => (
<DropdownMenu>
<DropdownMenuTrigger asChild>
<Button variant="outline" size="sm">
<MoreVerticalIcon className="w-4 h-auto" />
</Button>
</DropdownMenuTrigger>
<DropdownMenuContent align="end" sideOffset={2}>
<DropdownMenuItem onClick={() => handleEditProvider(provider)}>{t("common.edit")}</DropdownMenuItem>
<DropdownMenuItem onClick={() => setDeleteTarget(provider)} className="text-destructive focus:text-destructive">
{t("common.delete")}
</DropdownMenuItem>
</DropdownMenuContent>
</DropdownMenu>
),
},
]}
data={providers}
emptyMessage={t("setting.ai.no-providers")}
getRowKey={(provider) => provider.id}
/>
</SettingGroup>
<SettingGroup
title={t("setting.ai.transcription-title")}
description={t("setting.ai.transcription-description")}
showSeparator
>
<TranscriptionForm
providers={providers}
transcription={transcription}
onChange={setTranscription}
referencedProvider={transcriptionProviderRef}
/>
</SettingGroup>
<div className="w-full flex justify-end">
<Button disabled={!hasChanges} onClick={handleSaveSetting}>
{t("common.save")}
</Button>
</div>
<AIProviderDialog
provider={editingProvider}
onOpenChange={(open) => !open && setEditingProvider(undefined)}
onSave={handleSaveProvider}
/>
<ConfirmDialog
open={!!deleteTarget}
onOpenChange={(open) => !open && setDeleteTarget(undefined)}
title={deleteTarget ? t("setting.ai.delete-provider", { title: deleteTarget.title }) : ""}
confirmLabel={t("common.delete")}
cancelLabel={t("common.cancel")}
onConfirm={handleDeleteProvider}
confirmVariant="destructive"
/>
</SettingSection>
);
};
interface TranscriptionFormProps {
providers: LocalAIProvider[];
transcription: LocalTranscription;
referencedProvider: LocalAIProvider | undefined;
onChange: (next: LocalTranscription) => void;
}
const TranscriptionForm = ({ providers, transcription, referencedProvider, onChange }: TranscriptionFormProps) => {
const t = useTranslate();
const noProviders = providers.length === 0;
const update = (partial: Partial<LocalTranscription>) => {
onChange({ ...transcription, ...partial });
};
const placeholderForProvider = (provider: LocalAIProvider | undefined) => {
if (!provider) return "";
return provider.type === InstanceSetting_AIProviderType.GEMINI
? t("setting.ai.transcription-model-placeholder-gemini")
: t("setting.ai.transcription-model-placeholder-openai");
};
return (
<div className="grid grid-cols-1 sm:grid-cols-2 gap-3 max-w-3xl">
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.transcription-provider")}</Label>
<Select
value={transcription.providerId || "__none__"}
onValueChange={(value) => update({ providerId: value === "__none__" ? "" : value })}
disabled={noProviders}
>
<SelectTrigger className="w-full">
<SelectValue />
</SelectTrigger>
<SelectContent>
<SelectItem value="__none__">{t("setting.ai.transcription-no-provider")}</SelectItem>
{providers.map((provider) => (
<SelectItem key={provider.id} value={provider.id}>
{provider.title || provider.id}
</SelectItem>
))}
</SelectContent>
</Select>
{noProviders && <p className="text-xs text-muted-foreground">{t("setting.ai.transcription-empty-providers")}</p>}
{referencedProvider && !referencedProvider.apiKeySet && (
<p className="text-xs text-destructive">{t("setting.ai.transcription-warning-no-key")}</p>
)}
{referencedProvider?.type === InstanceSetting_AIProviderType.GEMINI && (
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-warning-gemini-webm")}</p>
)}
</div>
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.transcription-model")}</Label>
<Input
value={transcription.model}
onChange={(e) => update({ model: e.target.value })}
placeholder={placeholderForProvider(referencedProvider)}
disabled={!transcription.providerId}
/>
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-model-help")}</p>
</div>
<div className="flex flex-col gap-1.5">
<Label>{t("setting.ai.transcription-language")}</Label>
<Input
value={transcription.language}
onChange={(e) => update({ language: e.target.value })}
placeholder={t("setting.ai.transcription-language-placeholder")}
disabled={!transcription.providerId}
/>
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-language-help")}</p>
</div>
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.transcription-prompt")}</Label>
<Textarea
value={transcription.prompt}
onChange={(e) => update({ prompt: e.target.value })}
placeholder={t("setting.ai.transcription-prompt-placeholder")}
rows={3}
disabled={!transcription.providerId}
/>
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-prompt-help")}</p>
</div>
</div>
);
};
interface AIProviderDialogProps {
provider?: LocalAIProvider;
onOpenChange: (open: boolean) => void;
onSave: (provider: LocalAIProvider) => void;
}
const AIProviderDialog = ({ provider, onOpenChange, onSave }: AIProviderDialogProps) => {
const t = useTranslate();
const [draft, setDraft] = useState<LocalAIProvider>(() => provider ?? newProvider());
useEffect(() => {
const next = provider ?? newProvider();
setDraft(next);
}, [provider]);
const updateDraft = (partial: Partial<LocalAIProvider>) => {
setDraft((prev) => ({ ...prev, ...partial }));
};
const handleSave = () => {
onSave(draft);
};
return (
<Dialog open={!!provider} onOpenChange={onOpenChange}>
<DialogContent size="2xl">
<DialogHeader>
<DialogTitle>{provider?.apiKeySet ? t("setting.ai.edit-provider") : t("setting.ai.add-provider")}</DialogTitle>
<DialogDescription>{t("setting.ai.dialog-description")}</DialogDescription>
</DialogHeader>
<div className="grid grid-cols-1 sm:grid-cols-2 gap-3">
<div className="flex flex-col gap-1.5">
<Label>{t("setting.ai.provider-title")}</Label>
<Input value={draft.title} onChange={(e) => updateDraft({ title: e.target.value })} placeholder="OpenAI" />
</div>
<div className="flex flex-col gap-1.5">
<Label>{t("setting.ai.provider-type")}</Label>
<Select
value={String(draft.type)}
onValueChange={(value) => updateDraft({ type: Number(value) as InstanceSetting_AIProviderType })}
>
<SelectTrigger className="w-full">
<SelectValue />
</SelectTrigger>
<SelectContent>
{providerTypeOptions.map((type) => (
<SelectItem key={type} value={String(type)}>
{getProviderTypeLabel(type)}
</SelectItem>
))}
</SelectContent>
</Select>
</div>
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.endpoint")}</Label>
<Input
value={draft.endpoint}
onChange={(e) => updateDraft({ endpoint: e.target.value })}
placeholder={getDefaultEndpointPlaceholder(draft.type)}
/>
<p className="text-xs text-muted-foreground">{t("setting.ai.endpoint-hint")}</p>
</div>
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.api-key")}</Label>
<Input
type="password"
value={draft.apiKey}
onChange={(e) => updateDraft({ apiKey: e.target.value })}
placeholder={draft.apiKeySet ? t("setting.ai.keep-api-key") : ""}
/>
{draft.apiKeySet && (
<p className="text-xs text-muted-foreground">{t("setting.ai.current-key", { key: draft.apiKeyHint || "-" })}</p>
)}
</div>
</div>
<DialogFooter>
<Button variant="ghost" onClick={() => onOpenChange(false)}>
{t("common.cancel")}
</Button>
<Button onClick={handleSave}>{t("common.save")}</Button>
</DialogFooter>
</DialogContent>
</Dialog>
);
};
const getDefaultEndpointPlaceholder = (type: InstanceSetting_AIProviderType) => {
switch (type) {
case InstanceSetting_AIProviderType.OPENAI:
return "https://api.openai.com/v1";
case InstanceSetting_AIProviderType.GEMINI:
return "https://generativelanguage.googleapis.com/v1beta";
default:
return "";
}
};
export default AISection;
Note: this references Textarea from @/components/ui/textarea. Verify that component exists by running:
ls web/src/components/ui/textarea.tsx
If the file is missing, the project doesn't have a Textarea primitive yet — fall back to the native <textarea> element with the same classes used by the project's Input for visual consistency. Keep the same props (value, onChange, placeholder, rows, disabled).
- Step 3: Type-check + lint
cd web && pnpm lint 2>&1 | tail -40
Expected: PASS. Common failures: missing import, mismatch between schema name (InstanceSetting_TranscriptionConfigSchema) and what buf generate produced — verify the exact name in web/src/types/proto/api/v1/instance_service_pb.ts (it may be InstanceSetting_TranscriptionConfig paired with InstanceSetting_TranscriptionConfigSchema, matching the AISetting pattern).
- Step 4: Manual smoke test
Start backend and frontend:
go run ./cmd/memos --port 8081 &
cd web && pnpm dev
Open http://localhost:3001/, sign in as the host user, navigate to Settings → AI:
-
Verify the AI integrations group shows the existing provider table (or empty state).
-
Verify the Transcription group renders with the four fields disabled when no provider is selected.
-
Add a provider with type OPENAI and a key. The transcription section now lets you select it. Pick the provider; the model placeholder shows
whisper-1. Typewhisper-1in the model field, leave language empty, leave prompt empty, save. -
Refresh the page. The transcription section retains the saved provider and model.
-
Change the provider title → save. The transcription section still references the same provider by id (title in dropdown updates).
-
Delete the provider → the transcription section's
providerIdis cleared (model field becomes disabled). -
Step 5: Commit
git add web/src/components/Settings/AISection.tsx web/src/locales/en.json
git commit -m "feat(settings): add Transcription configuration section
Splits the AI settings page into 'AI integrations' (existing
provider list) and 'Transcription' (new). The transcription form
chooses a provider, model, default language, and prompt hint. Save
writes the entire AI setting in one request."
Task 8: Wire MemoEditor to the persisted transcription config
Files:
- Modify:
web/src/components/MemoEditor/index.tsx:31-67,130-159 - Modify:
web/src/components/MemoEditor/services/transcriptionService.ts
The editor previously scanned aiSetting.providers for the first one with an API key. It now reads aiSetting.transcription.providerId, validates the reference, and calls the service without a provider argument.
- Step 1: Edit
transcriptionService.ts
Replace the entire contents of web/src/components/MemoEditor/services/transcriptionService.ts with:
import { create } from "@bufbuild/protobuf";
import { aiServiceClient } from "@/connect";
import { TranscribeRequestSchema, TranscriptionAudioSchema, TranscriptionConfigSchema } from "@/types/proto/api/v1/ai_service_pb";
export const transcriptionService = {
async transcribeFile(file: File): Promise<string> {
const content = new Uint8Array(await file.arrayBuffer());
const response = await aiServiceClient.transcribe(
create(TranscribeRequestSchema, {
config: create(TranscriptionConfigSchema, {}),
audio: create(TranscriptionAudioSchema, {
source: {
case: "content",
value: content,
},
filename: file.name,
contentType: file.type,
}),
}),
);
return response.text;
},
};
Note: providerId is intentionally omitted — the server resolves it from InstanceAISetting.transcription.providerId.
- Step 2: Edit
MemoEditor/index.tsx
Open the file. Two regions change:
Region A — replace lines 31–67 (the TRANSCRIPTION_PROVIDER_TYPES constant, the transcriptionProvider lookup, and the unused InstanceSetting_AIProviderType import path) with:
// (delete the TRANSCRIPTION_PROVIDER_TYPES constant entirely — no longer needed)
And inside MemoEditorImpl, replace:
const transcriptionProvider = useMemo(
() => aiSetting.providers.find((provider) => provider.apiKeySet && TRANSCRIPTION_PROVIDER_TYPES.includes(provider.type)),
[aiSetting.providers],
);
with:
const canTranscribe = useMemo(() => {
const providerId = aiSetting.transcription?.providerId ?? "";
if (!providerId) return false;
const provider = aiSetting.providers.find((p) => p.id === providerId);
return Boolean(provider?.apiKeySet);
}, [aiSetting.providers, aiSetting.transcription?.providerId]);
Then update the import line at the top of the file. Currently:
import { InstanceSetting_AIProviderType, InstanceSetting_Key } from "@/types/proto/api/v1/instance_service_pb";
Becomes (drop the InstanceSetting_AIProviderType import — it's no longer referenced in this file):
import { InstanceSetting_Key } from "@/types/proto/api/v1/instance_service_pb";
Region B — replace lines 130–159 (the handleTranscribeRecordedAudio callback and any guards) so it consults canTranscribe and calls the service without a provider arg:
const handleTranscribeRecordedAudio = useCallback(
async (localFile: LocalFile) => {
if (!canTranscribe) {
dispatch(actions.addLocalFile(localFile));
setIsTranscribingAudio(false);
setIsAudioRecorderOpen(false);
return;
}
try {
const text = (await transcriptionService.transcribeFile(localFile.file)).trim();
if (!text) {
dispatch(actions.addLocalFile(localFile));
toast.error(t("editor.audio-recorder.transcribe-empty"));
return;
}
insertTranscribedText(text);
toast.success(t("editor.audio-recorder.transcribe-success"));
} catch (error) {
console.error(error);
toast.error(errorService.getErrorMessage(error) || t("editor.audio-recorder.transcribe-error"));
dispatch(actions.addLocalFile(localFile));
} finally {
setIsTranscribingAudio(false);
setIsAudioRecorderOpen(false);
}
},
[actions, canTranscribe, dispatch, insertTranscribedText, t],
);
Then update handleTranscribeAudioRecording (currently around line 225) so its guard uses canTranscribe:
const handleTranscribeAudioRecording = () => {
if (!canTranscribe || isTranscribingAudio) {
return;
}
setIsTranscribingAudio(true);
const didStop = audioRecorder.stopRecording("transcribe");
if (!didStop) {
setIsTranscribingAudio(false);
}
};
Finally, search the file for any remaining references to transcriptionProvider and replace them with canTranscribe. Also update the prop passed to <AudioRecorderPanel canTranscribe={...}> if it currently uses transcriptionProvider — replace with canTranscribe.
- Step 3: Type-check
cd web && pnpm lint 2>&1 | tail -20
Expected: PASS.
- Step 4: Manual smoke test
With the dev server running:
- No transcription configured: Settings → AI shows providers but no transcription selection. In the home editor, open the audio recorder. The Transcribe button (waveform icon) should be hidden — only Cancel and Stop visible.
- Transcription configured with valid provider: Select a provider in Transcription, set model to
whisper-1, save. Open the recorder; the Transcribe button is now visible. Record a short clip in English, click Transcribe — text appears in the editor. - Provider deleted after transcription was configured: Configure transcription, save, then delete the referenced provider in AI Integrations and save. Reload the editor; the Transcribe button is hidden.
- API key cleared from referenced provider: Edit a referenced provider so it has no API key (this requires backend support — currently impossible since save requires apiKey to be set; verify by writing setting directly via API or skip this case).
- Step 5: Commit
git add web/src/components/MemoEditor/index.tsx web/src/components/MemoEditor/services/transcriptionService.ts
git commit -m "feat(memo-editor): use persisted transcription config
The editor's transcribe button now reflects InstanceAISetting.
transcription.providerId rather than an implicit \"first provider
with apiKey\" pick. The transcribeFile service no longer takes a
provider argument — the server resolves it from settings."
Task 9: Strip the now-unused REQUIRED gate on provider_id in the server
Files:
- Modify:
server/router/api/v1/ai_service.go(clean-up only)
Task 6 already deleted the provider_id is required InvalidArgument branch. This task confirms there's no orphaned helper or constant left behind.
- Step 1: Search for orphans
grep -n "resolveAIProviderForTranscription\|provider_id is required" server/router/api/v1/ai_service.go
Expected: no matches (Task 6 should have removed both). If anything matches, delete it.
- Step 2: Run linter
golangci-lint run ./server/router/api/v1/...
Expected: PASS.
- Step 3: Run the full backend suite once more
go test -race ./server/... ./internal/...
Expected: PASS.
- Step 4: Commit (only if Step 1 found anything to clean up)
If Step 1 returned matches and you removed code, commit:
git add server/router/api/v1/ai_service.go
git commit -m "chore(api/ai): remove orphaned helpers from old transcribe flow"
Otherwise, skip the commit — the task is a no-op verification.
Task 10: End-to-end verification
Files: none — verification only.
- Step 1: Full backend test suite
go test -race ./...
Expected: PASS.
- Step 2: Frontend lint + build
cd web && pnpm lint && pnpm build
Expected: PASS for both.
- Step 3: Manual end-to-end flow (single-instance smoke test)
With backend on :8081 and frontend on :3001:
- Sign in as host user. Settings → AI.
- Add a provider: type OPENAI, title "OpenAI", endpoint blank (defaults to
https://api.openai.com/v1), api-keysk-...(real or fake). - In Transcription, select the provider, set model to
whisper-1, leave language and prompt empty. Save. Toast confirms. - Refresh — transcription section retains the values.
- Add a second provider: type OPENAI, title "Groq", endpoint
https://api.groq.com/openai/v1, api-keygsk_.... Switch transcription's provider dropdown to Groq, set model towhisper-large-v3-turbo, save. - Open the home editor. Open audio recorder. Transcribe button visible. (Don't actually call the network unless your test key works.)
- Switch transcription's provider to "None — transcription disabled". Save. Reopen the recorder — Transcribe button hidden.
- Set provider back to "OpenAI". Add a Gemini provider; switch transcription to Gemini. Verify the inline warning "Gemini does not accept browser-recorded audio/webm" appears under the provider dropdown.
- Step 4: Final commit (if anything was tweaked during verification)
If verification surfaced any minor fix (e.g., a string typo in the locale), commit it now:
git add -A
git commit -m "chore(settings): polish transcription section copy"
Self-review checklist (run before opening the PR)
-
Spec coverage:
- Schema additive
TranscriptionConfig(store + api): Tasks 1, 2 ✓ - Backend: read setting at
Transcribestart, fall-through resolution,FailedPreconditionwhen unconfigured: Task 6 ✓ - Validation:
provider_idreferences existing provider; length caps on model/language/prompt: Task 4 ✓ TranscribeRequest.provider_idbecomes optional: Task 5 ✓- Frontend: AI Integrations + Transcription split, Provider/Model/Language/Prompt fields, provider dropdown disabled when empty, "Add an AI integration first" hint, Gemini webm warning, no-key warning: Task 7 ✓
- Frontend: editor reads
aiSetting.transcription.providerId, service drops provider arg: Task 8 ✓ - Backwards compat (no migration): covered — existing instances default to empty
provider_idand fall into "transcription disabled" branch ✓ - Gemini webm: inline warning chosen, implemented as the
transcription-warning-gemini-webmstring in Task 7 ✓
- Schema additive
-
Placeholder scan: No "TBD", "TODO", or "implement later" markers in this plan. Each step contains exact file paths, exact code, exact commands, expected output.
-
Type / name consistency:
- Store proto:
InstanceAISetting.transcription→TranscriptionConfig(Task 1). - Generated Go type:
storepb.TranscriptionConfig(referenced in Tasks 3, 4, 6). - API proto:
InstanceSetting.AISetting.transcription→InstanceSetting.TranscriptionConfig(Task 2). Generated Go type:v1pb.InstanceSetting_TranscriptionConfig(Tasks 3, 4). Generated TS type:InstanceSetting_TranscriptionConfigandInstanceSetting_TranscriptionConfigSchema(Task 7). - All uses match.
- Store proto:
-
Things to watch during execution:
buf generateproduces TS type names that differ slightly across generators. IfInstanceSetting_TranscriptionConfigSchemadoesn't exist after Task 2's regeneration, check the actual export name inweb/src/types/proto/api/v1/instance_service_pb.tsand adjust Task 7's imports accordingly.- The
TextareaUI primitive existence is verified mid-Task 7. If absent, fall back to<textarea>with project styling. prepareInstanceAISettingForUpdateruns only when the AI setting key is being updated; theexistinglookup usess.Store.GetInstanceAISetting(ctx)which already returns the current state with transcription populated (after Task 1).