feat(transcription): explicit STT settings with provider, model, prompt (#5926)

pull/5928/head
boojack 4 weeks ago committed by GitHub
parent ef55013418
commit 238f27dea1
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

File diff suppressed because it is too large Load Diff

@ -0,0 +1,155 @@
# Transcription (STT) settings — design
**Date:** 2026-05-02
**Scope:** Backend + frontend. Schema-additive (no migration required).
## Problem
Memos has one AI feature today: audio transcription (speech-to-text). The current design has three concrete problems:
1. **Model is hard-coded per provider type.** `internal/ai/models.go` pins OpenAI to `gpt-4o-transcribe` and Gemini to `gemini-2.5-flash`. Users who want `whisper-1` (cheaper, often more accurate for non-English) or third-party Whisper-compatible endpoints (Groq's `whisper-large-v3-turbo`, self-hosted whisper.cpp / Speaches via OpenAI-compatible URL) cannot configure them at all.
2. **No explicit transcription configuration.** `InstanceAISetting.providers` is a generic credentials list. The frontend (`MemoEditor/index.tsx:65`) implicitly picks "the first provider with an API key whose type is in TRANSCRIPTION_PROVIDER_TYPES." Users cannot:
- Choose which provider runs transcription when they have multiple.
- Set a default language (Whisper API supports it but it is never sent).
- Set a `prompt` hint to bias spelling of proper nouns / jargon (a documented Whisper feature, surfaced by every other STT product).
3. **Gemini fails for browser-recorded audio.** `internal/ai/gemini.go:23` does not list `audio/webm` in `geminiSupportedContentTypes`, but `MediaRecorder` in browsers defaults to `audio/webm`. So selecting a Gemini provider for in-editor recording produces a content-type error every time.
## Goal
Let the operator configure transcription explicitly: which provider, which model, default language, and a spelling-hint prompt. Make the OpenAI provider work as a universal "OpenAI-compatible" engine so Groq / self-hosted Whisper / Speaches are reachable through endpoint override.
## Non-goals
- Adding STT engines beyond OpenAI and Gemini (Azure, Deepgram, AWS Transcribe — out of scope; the schema admits them later via `AIProviderType` enum).
- Other AI features (summarization, embeddings, tag suggestion). The schema is shaped so they fit later, but none are designed here.
- Per-call provider override at recording time. Research across all surveyed products (OpenWebUI, LibreChat, Whisper Memos, Superwhisper, etc.) confirms STT engine is a global preference, not an action-time choice. We follow the same pattern.
- Server-side audio transcoding (e.g., webm → wav for Gemini). See "Gemini webm" below for the chosen mitigation.
- Multi-user or per-user override of admin defaults. Memos' STT setting is instance-scoped, like every other instance setting.
## Naming
Field and message names follow cross-platform STT conventions, not Memos-internal shorthand:
| Concept | Chosen name | Rationale |
|---|---|---|
| Config message | `TranscriptionConfig` | AssemblyAI uses this exact identifier; matches OpenAI's `CreateTranscription*` verb family and Memos' existing `Transcribe` RPC. The `STT` acronym is not used as a type name in any major STT API. |
| Provider reference | `provider_id` (string) | Plain protobuf convention for a string-ID reference (`field_id`, `user_id` style). `engine` was rejected as an OpenWebUI-only term; typed message refs are not needed since providers are addressed by string ID. |
| Model | `model` | Unanimous across OpenAI, Google v2, Deepgram, OpenWebUI, LibreChat. Not `model_id`. |
| Default language | `language` | Bare `language` is the modern convention (OpenAI, Whisper family, Deepgram, Wyoming). `language_code` is the older Google/AWS form; we accept ISO 639-1 short codes the same way OpenAI does. |
| Spelling hint | `prompt` | OpenAI's public API field name and AssemblyAI's. Whisper's internal name is `initial_prompt`, but `prompt` is what users of `audio.transcriptions.create` recognize. |
A note on the message name collision: `proto/api/v1/ai_service.proto` already declares a `TranscriptionConfig` for **per-call** prompt/language overrides. The new store-level `TranscriptionConfig` lives in package `memos.store`, so the two compile cleanly. Memos already uses parallel `api.v1.X` / `store.X` message pairs (e.g. `User`, `Memo`); this matches that pattern.
## Architecture
### Schema (additive)
`proto/store/instance_setting.proto`:
```proto
message InstanceAISetting {
repeated AIProviderConfig providers = 1; // unchanged — credential pool
TranscriptionConfig transcription = 2; // NEW — feature config
}
message TranscriptionConfig {
// References an entry in providers[].id. Empty string = transcription disabled.
string provider_id = 1;
// Free text. Empty string = engine default (whisper-1 for OPENAI, gemini-2.5-flash for GEMINI).
string model = 2;
// ISO 639-1 short code. Empty string = auto-detect.
string language = 3;
// Up to ~200 tokens. Used as the OpenAI Whisper `prompt` parameter and as
// a "Context and spelling hints:" block in the Gemini prompt.
string prompt = 4;
}
```
`proto/api/v1/ai_service.proto`:
- `TranscribeRequest.provider_id` becomes optional. When omitted, the server resolves the provider from `InstanceAISetting.transcription.provider_id`.
- `TranscribeRequest.config` (per-call `TranscriptionConfig` with `prompt` / `language`) is kept for advanced overrides but its fields, when empty, fall back to the persisted defaults from `InstanceAISetting.transcription`.
### Backend changes
1. **`internal/ai/models.go`** — `DefaultTranscriptionModel` already exists; reuse it as the fallback when `TranscriptionConfig.model` is empty. No new code, just used from a new call site.
2. **`server/router/api/v1/ai_service.go`**:
- Read `InstanceAISetting.transcription` at the start of `Transcribe`.
- Resolve `provider_id` from request → fall back to `transcription.provider_id`. If both empty, return `FailedPrecondition` with a clear "transcription not configured" message.
- Resolve `model` similarly: request override → `transcription.model` → engine default via `DefaultTranscriptionModel`.
- Merge `language` and `prompt`: per-call overrides win; otherwise fall through to persisted defaults.
3. **`internal/ai/gemini.go`** — out of scope to fix the webm content-type list here. See mitigation below.
### Frontend changes
`web/src/components/Settings/AISection.tsx` is restructured into two settings groups inside the existing `SettingSection`:
1. **AI Integrations** (renamed from "Providers" — current behavior): list of credential entries (id, title, type, endpoint, api key). No functional changes; the rename communicates that this section is just credentials.
2. **Transcription** (new): three-segment form
- **Provider** — Select dropdown listing entries from group 1 by `title`. First option is "None — transcription disabled". Disabled with a hint "Add an AI integration first ↑" when group 1 is empty.
- **Model** — text input. Placeholder updates dynamically based on the selected provider's type (`whisper-1` for OPENAI, `gemini-2.5-flash` for GEMINI). Help text below: "Free text. Use the provider's model identifier — e.g., whisper-1, gpt-4o-transcribe, whisper-large-v3-turbo."
- **Default language** — text input, ISO 639-1 placeholder, empty = auto.
- **Prompt hints** — textarea, ~200 token soft limit, help text "Improves spelling of proper nouns and jargon. Whisper limit is ~224 tokens."
`web/src/components/MemoEditor/index.tsx:65` changes:
- Replace the "first provider with apiKey in TRANSCRIPTION_PROVIDER_TYPES" lookup with this enable rule: transcribe button shows iff `aiSetting.transcription.providerId` is non-empty AND the referenced provider exists in `aiSetting.providers` AND that provider has `apiKeySet === true`.
- The editor no longer needs to know the provider object itself for the call — see service change below.
`web/src/components/MemoEditor/services/transcriptionService.ts` is simplified: it stops accepting a `provider` argument and simply omits `provider_id` from the request. The server resolves the provider, model, language, and prompt from `InstanceAISetting.transcription`. (No override path is exposed at the editor layer; advanced callers can still pass `provider_id` directly via the proto if needed in the future.)
### How "OpenAI-compatible" backends work
To use Groq, Speaches, or self-hosted whisper.cpp:
1. In **AI Integrations**, add a provider with type `OPENAI`, set `endpoint` to e.g. `https://api.groq.com/openai/v1` or `http://speaches:8000/v1`, set the API key, give it a recognizable title ("Groq", "Self-hosted Whisper").
2. In **Transcription**, select that provider and set `model` to the backend's model identifier (`whisper-large-v3-turbo`, `Systran/faster-distil-whisper-large-v3`, etc.).
This is the universal escape hatch confirmed across OpenWebUI, LibreChat, and Whisper Obsidian plugin: don't enumerate every backend — let the OpenAI engine be a transport, not a brand.
## Gemini webm mitigation
The Gemini `audio/webm` failure is a real user-blocking bug but separate from the settings redesign. Three options were considered:
- **(a) Server-side transcode** with ffmpeg. Adds a heavy runtime dep; rejected as YAGNI.
- **(b) Switch MediaRecorder format** when STT engine is Gemini. Browser support for `audio/mp4` and `audio/wav` in `MediaRecorder` is patchy across Firefox / Safari / Chrome; rejected as fragile.
- **(c) Inline hint + accept the limitation.** Selected. The Transcription section shows a small warning under the model field when the chosen provider type is `GEMINI`: "Gemini does not accept browser-recorded `audio/webm`. For in-editor recording, use an OpenAI-compatible provider."
Server-side transcoding can be revisited later as a self-contained change if Gemini demand grows.
## Validation
Server validation (`server/router/api/v1/ai_service.go`):
- `transcription.provider_id`, when set, must reference an existing entry in `providers[]`. On `UpdateInstanceSetting` for the AI key, reject with `InvalidArgument` if it doesn't.
- `transcription.model` length cap: 256 chars (covers `Systran/faster-distil-whisper-large-v3`-style names with margin).
- `transcription.language` length cap: 32 chars (existing constant `maxTranscriptionLanguageLength`).
- `transcription.prompt` length cap: 4096 chars (existing constant `maxTranscriptionPromptLength`).
Frontend validation in `AISection.tsx`:
- "Save" disabled if `transcription.providerId` is set but the referenced provider was just deleted from the integrations list (in the same unsaved edit).
- Inline warning shown (but Save still allowed) if the referenced provider exists but has `apiKeySet === false` — surfacing the broken state so the operator can fix it without blocking unrelated edits to other settings.
## Backwards compatibility
The schema change is purely additive. Existing instances with `providers` configured but no `transcription` field default to `provider_id = ""`, which means transcription is disabled until the operator visits the new Transcription section and selects a provider.
This is a small UX regression for instances that were relying on the implicit "first provider wins" behavior — they now must make a one-click selection. Acceptable trade-off because:
- It makes the choice explicit (the implicit pick was the source of confusion when users had multiple providers).
- A one-time migration that auto-fills `transcription.provider_id` with the first STT-capable provider is feasible but adds complexity for a one-line user action. Skip the migration; document the change in the release notes.
## Testing
- `internal/ai/transcription_test.go` (existing) covers the transcribe RPC. Add cases for: empty `provider_id` falls back to setting; empty `model` falls back to `DefaultTranscriptionModel`; per-call overrides win over settings.
- `server/router/api/v1/test/ai_service_test.go` (existing) covers the API service. Add cases for the validation rules above (unknown provider_id, oversized model/language/prompt).
- Frontend: manual verification via the dev server (`pnpm dev` in `web/`) — load Settings, add a provider, configure transcription, verify the home editor's record button enables/disables based on `provider_id`. No new component tests required (existing AISection has none).
## Out of scope, explicitly
- Multiple transcription configurations / per-tag or per-user routing.
- Per-call provider override exposed in the editor UI.
- Test-transcription button in settings (worth doing later; deferred to keep this scope tight).
- Glossary / vocabulary list as a separate field — folded into `prompt` for now (Joplin/Superwhisper split this; we can add later if users ask).
- TTS settings. Memos has none today and none planned.

@ -4,7 +4,7 @@ import "github.com/pkg/errors"
const (
// DefaultOpenAITranscriptionModel is the built-in OpenAI transcription model.
DefaultOpenAITranscriptionModel = "gpt-4o-transcribe"
DefaultOpenAITranscriptionModel = "whisper-1"
// DefaultGeminiTranscriptionModel is the built-in Gemini transcription model.
DefaultGeminiTranscriptionModel = "gemini-2.5-flash"
)

@ -15,27 +15,13 @@ service AIService {
post: "/api/v1/ai:transcribe"
body: "*"
};
option (google.api.method_signature) = "provider_id,config,audio";
option (google.api.method_signature) = "audio";
}
}
message TranscribeRequest {
// Required. The instance AI provider ID to use.
string provider_id = 1 [(google.api.field_behavior) = REQUIRED];
// Required. Transcription options.
TranscriptionConfig config = 2 [(google.api.field_behavior) = REQUIRED];
// Required. Audio input.
TranscriptionAudio audio = 3 [(google.api.field_behavior) = REQUIRED];
}
message TranscriptionConfig {
// Optional. A prompt to improve transcription quality.
string prompt = 1 [(google.api.field_behavior) = OPTIONAL];
// Optional. The language of the input audio.
string language = 2 [(google.api.field_behavior) = OPTIONAL];
TranscriptionAudio audio = 1 [(google.api.field_behavior) = REQUIRED];
}
message TranscriptionAudio {

@ -227,6 +227,10 @@ message InstanceSetting {
message AISetting {
// providers is the list of AI provider configurations available instance-wide.
repeated AIProviderConfig providers = 1;
// transcription is the speech-to-text feature configuration.
// When unset or transcription.provider_id is empty, transcription is disabled.
TranscriptionConfig transcription = 2;
}
// AIProviderConfig represents one callable AI provider connection.
@ -249,6 +253,25 @@ message InstanceSetting {
OPENAI = 1;
GEMINI = 2;
}
// TranscriptionConfig configures the speech-to-text feature.
message TranscriptionConfig {
// provider_id references an entry in AISetting.providers[].id.
// Empty string means transcription is disabled.
string provider_id = 1;
// model is the provider-specific model identifier.
// Empty string falls back to the engine default
// (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
string model = 2;
// language is the default ISO 639-1 language hint sent to the provider.
// Empty string lets the provider auto-detect.
string language = 3;
// prompt is a default spelling/vocabulary hint passed to the provider.
string prompt = 4;
}
}
// Request message for GetInstanceSetting method.

@ -24,12 +24,8 @@ const (
type TranscribeRequest struct {
state protoimpl.MessageState `protogen:"open.v1"`
// Required. The instance AI provider ID to use.
ProviderId string `protobuf:"bytes,1,opt,name=provider_id,json=providerId,proto3" json:"provider_id,omitempty"`
// Required. Transcription options.
Config *TranscriptionConfig `protobuf:"bytes,2,opt,name=config,proto3" json:"config,omitempty"`
// Required. Audio input.
Audio *TranscriptionAudio `protobuf:"bytes,3,opt,name=audio,proto3" json:"audio,omitempty"`
Audio *TranscriptionAudio `protobuf:"bytes,1,opt,name=audio,proto3" json:"audio,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
@ -64,20 +60,6 @@ func (*TranscribeRequest) Descriptor() ([]byte, []int) {
return file_api_v1_ai_service_proto_rawDescGZIP(), []int{0}
}
func (x *TranscribeRequest) GetProviderId() string {
if x != nil {
return x.ProviderId
}
return ""
}
func (x *TranscribeRequest) GetConfig() *TranscriptionConfig {
if x != nil {
return x.Config
}
return nil
}
func (x *TranscribeRequest) GetAudio() *TranscriptionAudio {
if x != nil {
return x.Audio
@ -85,60 +67,6 @@ func (x *TranscribeRequest) GetAudio() *TranscriptionAudio {
return nil
}
type TranscriptionConfig struct {
state protoimpl.MessageState `protogen:"open.v1"`
// Optional. A prompt to improve transcription quality.
Prompt string `protobuf:"bytes,1,opt,name=prompt,proto3" json:"prompt,omitempty"`
// Optional. The language of the input audio.
Language string `protobuf:"bytes,2,opt,name=language,proto3" json:"language,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
func (x *TranscriptionConfig) Reset() {
*x = TranscriptionConfig{}
mi := &file_api_v1_ai_service_proto_msgTypes[1]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
func (x *TranscriptionConfig) String() string {
return protoimpl.X.MessageStringOf(x)
}
func (*TranscriptionConfig) ProtoMessage() {}
func (x *TranscriptionConfig) ProtoReflect() protoreflect.Message {
mi := &file_api_v1_ai_service_proto_msgTypes[1]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
ms.StoreMessageInfo(mi)
}
return ms
}
return mi.MessageOf(x)
}
// Deprecated: Use TranscriptionConfig.ProtoReflect.Descriptor instead.
func (*TranscriptionConfig) Descriptor() ([]byte, []int) {
return file_api_v1_ai_service_proto_rawDescGZIP(), []int{1}
}
func (x *TranscriptionConfig) GetPrompt() string {
if x != nil {
return x.Prompt
}
return ""
}
func (x *TranscriptionConfig) GetLanguage() string {
if x != nil {
return x.Language
}
return ""
}
type TranscriptionAudio struct {
state protoimpl.MessageState `protogen:"open.v1"`
// Types that are valid to be assigned to Source:
@ -156,7 +84,7 @@ type TranscriptionAudio struct {
func (x *TranscriptionAudio) Reset() {
*x = TranscriptionAudio{}
mi := &file_api_v1_ai_service_proto_msgTypes[2]
mi := &file_api_v1_ai_service_proto_msgTypes[1]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
@ -168,7 +96,7 @@ func (x *TranscriptionAudio) String() string {
func (*TranscriptionAudio) ProtoMessage() {}
func (x *TranscriptionAudio) ProtoReflect() protoreflect.Message {
mi := &file_api_v1_ai_service_proto_msgTypes[2]
mi := &file_api_v1_ai_service_proto_msgTypes[1]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
@ -181,7 +109,7 @@ func (x *TranscriptionAudio) ProtoReflect() protoreflect.Message {
// Deprecated: Use TranscriptionAudio.ProtoReflect.Descriptor instead.
func (*TranscriptionAudio) Descriptor() ([]byte, []int) {
return file_api_v1_ai_service_proto_rawDescGZIP(), []int{2}
return file_api_v1_ai_service_proto_rawDescGZIP(), []int{1}
}
func (x *TranscriptionAudio) GetSource() isTranscriptionAudio_Source {
@ -251,7 +179,7 @@ type TranscribeResponse struct {
func (x *TranscribeResponse) Reset() {
*x = TranscribeResponse{}
mi := &file_api_v1_ai_service_proto_msgTypes[3]
mi := &file_api_v1_ai_service_proto_msgTypes[2]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
@ -263,7 +191,7 @@ func (x *TranscribeResponse) String() string {
func (*TranscribeResponse) ProtoMessage() {}
func (x *TranscribeResponse) ProtoReflect() protoreflect.Message {
mi := &file_api_v1_ai_service_proto_msgTypes[3]
mi := &file_api_v1_ai_service_proto_msgTypes[2]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
@ -276,7 +204,7 @@ func (x *TranscribeResponse) ProtoReflect() protoreflect.Message {
// Deprecated: Use TranscribeResponse.ProtoReflect.Descriptor instead.
func (*TranscribeResponse) Descriptor() ([]byte, []int) {
return file_api_v1_ai_service_proto_rawDescGZIP(), []int{3}
return file_api_v1_ai_service_proto_rawDescGZIP(), []int{2}
}
func (x *TranscribeResponse) GetText() string {
@ -290,15 +218,9 @@ var File_api_v1_ai_service_proto protoreflect.FileDescriptor
const file_api_v1_ai_service_proto_rawDesc = "" +
"\n" +
"\x17api/v1/ai_service.proto\x12\fmemos.api.v1\x1a\x1cgoogle/api/annotations.proto\x1a\x17google/api/client.proto\x1a\x1fgoogle/api/field_behavior.proto\"\xb6\x01\n" +
"\x11TranscribeRequest\x12$\n" +
"\vprovider_id\x18\x01 \x01(\tB\x03\xe0A\x02R\n" +
"providerId\x12>\n" +
"\x06config\x18\x02 \x01(\v2!.memos.api.v1.TranscriptionConfigB\x03\xe0A\x02R\x06config\x12;\n" +
"\x05audio\x18\x03 \x01(\v2 .memos.api.v1.TranscriptionAudioB\x03\xe0A\x02R\x05audio\"S\n" +
"\x13TranscriptionConfig\x12\x1b\n" +
"\x06prompt\x18\x01 \x01(\tB\x03\xe0A\x01R\x06prompt\x12\x1f\n" +
"\blanguage\x18\x02 \x01(\tB\x03\xe0A\x01R\blanguage\"\x9c\x01\n" +
"\x17api/v1/ai_service.proto\x12\fmemos.api.v1\x1a\x1cgoogle/api/annotations.proto\x1a\x17google/api/client.proto\x1a\x1fgoogle/api/field_behavior.proto\"P\n" +
"\x11TranscribeRequest\x12;\n" +
"\x05audio\x18\x01 \x01(\v2 .memos.api.v1.TranscriptionAudioB\x03\xe0A\x02R\x05audio\"\x9c\x01\n" +
"\x12TranscriptionAudio\x12\x1f\n" +
"\acontent\x18\x01 \x01(\fB\x03\xe0A\x04H\x00R\acontent\x12\x12\n" +
"\x03uri\x18\x02 \x01(\tH\x00R\x03uri\x12\x1f\n" +
@ -306,10 +228,10 @@ const file_api_v1_ai_service_proto_rawDesc = "" +
"\fcontent_type\x18\x04 \x01(\tB\x03\xe0A\x01R\vcontentTypeB\b\n" +
"\x06source\"(\n" +
"\x12TranscribeResponse\x12\x12\n" +
"\x04text\x18\x01 \x01(\tR\x04text2\x9a\x01\n" +
"\tAIService\x12\x8c\x01\n" +
"\x04text\x18\x01 \x01(\tR\x04text2\x86\x01\n" +
"\tAIService\x12y\n" +
"\n" +
"Transcribe\x12\x1f.memos.api.v1.TranscribeRequest\x1a .memos.api.v1.TranscribeResponse\";\xdaA\x18provider_id,config,audio\x82\xd3\xe4\x93\x02\x1a:\x01*\"\x15/api/v1/ai:transcribeB\xa6\x01\n" +
"Transcribe\x12\x1f.memos.api.v1.TranscribeRequest\x1a .memos.api.v1.TranscribeResponse\"(\xdaA\x05audio\x82\xd3\xe4\x93\x02\x1a:\x01*\"\x15/api/v1/ai:transcribeB\xa6\x01\n" +
"\x10com.memos.api.v1B\x0eAiServiceProtoP\x01Z0github.com/usememos/memos/proto/gen/api/v1;apiv1\xa2\x02\x03MAX\xaa\x02\fMemos.Api.V1\xca\x02\fMemos\\Api\\V1\xe2\x02\x18Memos\\Api\\V1\\GPBMetadata\xea\x02\x0eMemos::Api::V1b\x06proto3"
var (
@ -324,23 +246,21 @@ func file_api_v1_ai_service_proto_rawDescGZIP() []byte {
return file_api_v1_ai_service_proto_rawDescData
}
var file_api_v1_ai_service_proto_msgTypes = make([]protoimpl.MessageInfo, 4)
var file_api_v1_ai_service_proto_msgTypes = make([]protoimpl.MessageInfo, 3)
var file_api_v1_ai_service_proto_goTypes = []any{
(*TranscribeRequest)(nil), // 0: memos.api.v1.TranscribeRequest
(*TranscriptionConfig)(nil), // 1: memos.api.v1.TranscriptionConfig
(*TranscriptionAudio)(nil), // 2: memos.api.v1.TranscriptionAudio
(*TranscribeResponse)(nil), // 3: memos.api.v1.TranscribeResponse
(*TranscribeRequest)(nil), // 0: memos.api.v1.TranscribeRequest
(*TranscriptionAudio)(nil), // 1: memos.api.v1.TranscriptionAudio
(*TranscribeResponse)(nil), // 2: memos.api.v1.TranscribeResponse
}
var file_api_v1_ai_service_proto_depIdxs = []int32{
1, // 0: memos.api.v1.TranscribeRequest.config:type_name -> memos.api.v1.TranscriptionConfig
2, // 1: memos.api.v1.TranscribeRequest.audio:type_name -> memos.api.v1.TranscriptionAudio
0, // 2: memos.api.v1.AIService.Transcribe:input_type -> memos.api.v1.TranscribeRequest
3, // 3: memos.api.v1.AIService.Transcribe:output_type -> memos.api.v1.TranscribeResponse
3, // [3:4] is the sub-list for method output_type
2, // [2:3] is the sub-list for method input_type
2, // [2:2] is the sub-list for extension type_name
2, // [2:2] is the sub-list for extension extendee
0, // [0:2] is the sub-list for field type_name
1, // 0: memos.api.v1.TranscribeRequest.audio:type_name -> memos.api.v1.TranscriptionAudio
0, // 1: memos.api.v1.AIService.Transcribe:input_type -> memos.api.v1.TranscribeRequest
2, // 2: memos.api.v1.AIService.Transcribe:output_type -> memos.api.v1.TranscribeResponse
2, // [2:3] is the sub-list for method output_type
1, // [1:2] is the sub-list for method input_type
1, // [1:1] is the sub-list for extension type_name
1, // [1:1] is the sub-list for extension extendee
0, // [0:1] is the sub-list for field type_name
}
func init() { file_api_v1_ai_service_proto_init() }
@ -348,7 +268,7 @@ func file_api_v1_ai_service_proto_init() {
if File_api_v1_ai_service_proto != nil {
return
}
file_api_v1_ai_service_proto_msgTypes[2].OneofWrappers = []any{
file_api_v1_ai_service_proto_msgTypes[1].OneofWrappers = []any{
(*TranscriptionAudio_Content)(nil),
(*TranscriptionAudio_Uri)(nil),
}
@ -358,7 +278,7 @@ func file_api_v1_ai_service_proto_init() {
GoPackagePath: reflect.TypeOf(x{}).PkgPath(),
RawDescriptor: unsafe.Slice(unsafe.StringData(file_api_v1_ai_service_proto_rawDesc), len(file_api_v1_ai_service_proto_rawDesc)),
NumEnums: 0,
NumMessages: 4,
NumMessages: 3,
NumExtensions: 0,
NumServices: 1,
},

@ -1137,7 +1137,10 @@ func (x *InstanceSetting_NotificationSetting) GetEmail() *InstanceSetting_Notifi
type InstanceSetting_AISetting struct {
state protoimpl.MessageState `protogen:"open.v1"`
// providers is the list of AI provider configurations available instance-wide.
Providers []*InstanceSetting_AIProviderConfig `protobuf:"bytes,1,rep,name=providers,proto3" json:"providers,omitempty"`
Providers []*InstanceSetting_AIProviderConfig `protobuf:"bytes,1,rep,name=providers,proto3" json:"providers,omitempty"`
// transcription is the speech-to-text feature configuration.
// When unset or transcription.provider_id is empty, transcription is disabled.
Transcription *InstanceSetting_TranscriptionConfig `protobuf:"bytes,2,opt,name=transcription,proto3" json:"transcription,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
@ -1179,6 +1182,13 @@ func (x *InstanceSetting_AISetting) GetProviders() []*InstanceSetting_AIProvider
return nil
}
func (x *InstanceSetting_AISetting) GetTranscription() *InstanceSetting_TranscriptionConfig {
if x != nil {
return x.Transcription
}
return nil
}
// AIProviderConfig represents one callable AI provider connection.
type InstanceSetting_AIProviderConfig struct {
state protoimpl.MessageState `protogen:"open.v1"`
@ -1275,6 +1285,83 @@ func (x *InstanceSetting_AIProviderConfig) GetApiKeyHint() string {
return ""
}
// TranscriptionConfig configures the speech-to-text feature.
type InstanceSetting_TranscriptionConfig struct {
state protoimpl.MessageState `protogen:"open.v1"`
// provider_id references an entry in AISetting.providers[].id.
// Empty string means transcription is disabled.
ProviderId string `protobuf:"bytes,1,opt,name=provider_id,json=providerId,proto3" json:"provider_id,omitempty"`
// model is the provider-specific model identifier.
// Empty string falls back to the engine default
// (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
Model string `protobuf:"bytes,2,opt,name=model,proto3" json:"model,omitempty"`
// language is the default ISO 639-1 language hint sent to the provider.
// Empty string lets the provider auto-detect.
Language string `protobuf:"bytes,3,opt,name=language,proto3" json:"language,omitempty"`
// prompt is a default spelling/vocabulary hint passed to the provider.
Prompt string `protobuf:"bytes,4,opt,name=prompt,proto3" json:"prompt,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
func (x *InstanceSetting_TranscriptionConfig) Reset() {
*x = InstanceSetting_TranscriptionConfig{}
mi := &file_api_v1_instance_service_proto_msgTypes[16]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
func (x *InstanceSetting_TranscriptionConfig) String() string {
return protoimpl.X.MessageStringOf(x)
}
func (*InstanceSetting_TranscriptionConfig) ProtoMessage() {}
func (x *InstanceSetting_TranscriptionConfig) ProtoReflect() protoreflect.Message {
mi := &file_api_v1_instance_service_proto_msgTypes[16]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
ms.StoreMessageInfo(mi)
}
return ms
}
return mi.MessageOf(x)
}
// Deprecated: Use InstanceSetting_TranscriptionConfig.ProtoReflect.Descriptor instead.
func (*InstanceSetting_TranscriptionConfig) Descriptor() ([]byte, []int) {
return file_api_v1_instance_service_proto_rawDescGZIP(), []int{2, 8}
}
func (x *InstanceSetting_TranscriptionConfig) GetProviderId() string {
if x != nil {
return x.ProviderId
}
return ""
}
func (x *InstanceSetting_TranscriptionConfig) GetModel() string {
if x != nil {
return x.Model
}
return ""
}
func (x *InstanceSetting_TranscriptionConfig) GetLanguage() string {
if x != nil {
return x.Language
}
return ""
}
func (x *InstanceSetting_TranscriptionConfig) GetPrompt() string {
if x != nil {
return x.Prompt
}
return ""
}
// Custom profile configuration for instance branding.
type InstanceSetting_GeneralSetting_CustomProfile struct {
state protoimpl.MessageState `protogen:"open.v1"`
@ -1287,7 +1374,7 @@ type InstanceSetting_GeneralSetting_CustomProfile struct {
func (x *InstanceSetting_GeneralSetting_CustomProfile) Reset() {
*x = InstanceSetting_GeneralSetting_CustomProfile{}
mi := &file_api_v1_instance_service_proto_msgTypes[16]
mi := &file_api_v1_instance_service_proto_msgTypes[17]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
@ -1299,7 +1386,7 @@ func (x *InstanceSetting_GeneralSetting_CustomProfile) String() string {
func (*InstanceSetting_GeneralSetting_CustomProfile) ProtoMessage() {}
func (x *InstanceSetting_GeneralSetting_CustomProfile) ProtoReflect() protoreflect.Message {
mi := &file_api_v1_instance_service_proto_msgTypes[16]
mi := &file_api_v1_instance_service_proto_msgTypes[17]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
@ -1352,7 +1439,7 @@ type InstanceSetting_StorageSetting_S3Config struct {
func (x *InstanceSetting_StorageSetting_S3Config) Reset() {
*x = InstanceSetting_StorageSetting_S3Config{}
mi := &file_api_v1_instance_service_proto_msgTypes[17]
mi := &file_api_v1_instance_service_proto_msgTypes[18]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
@ -1364,7 +1451,7 @@ func (x *InstanceSetting_StorageSetting_S3Config) String() string {
func (*InstanceSetting_StorageSetting_S3Config) ProtoMessage() {}
func (x *InstanceSetting_StorageSetting_S3Config) ProtoReflect() protoreflect.Message {
mi := &file_api_v1_instance_service_proto_msgTypes[17]
mi := &file_api_v1_instance_service_proto_msgTypes[18]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
@ -1441,7 +1528,7 @@ type InstanceSetting_NotificationSetting_EmailSetting struct {
func (x *InstanceSetting_NotificationSetting_EmailSetting) Reset() {
*x = InstanceSetting_NotificationSetting_EmailSetting{}
mi := &file_api_v1_instance_service_proto_msgTypes[19]
mi := &file_api_v1_instance_service_proto_msgTypes[20]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
@ -1453,7 +1540,7 @@ func (x *InstanceSetting_NotificationSetting_EmailSetting) String() string {
func (*InstanceSetting_NotificationSetting_EmailSetting) ProtoMessage() {}
func (x *InstanceSetting_NotificationSetting_EmailSetting) ProtoReflect() protoreflect.Message {
mi := &file_api_v1_instance_service_proto_msgTypes[19]
mi := &file_api_v1_instance_service_proto_msgTypes[20]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
@ -1552,7 +1639,7 @@ type InstanceStats_DatabaseStats struct {
func (x *InstanceStats_DatabaseStats) Reset() {
*x = InstanceStats_DatabaseStats{}
mi := &file_api_v1_instance_service_proto_msgTypes[20]
mi := &file_api_v1_instance_service_proto_msgTypes[21]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
@ -1564,7 +1651,7 @@ func (x *InstanceStats_DatabaseStats) String() string {
func (*InstanceStats_DatabaseStats) ProtoMessage() {}
func (x *InstanceStats_DatabaseStats) ProtoReflect() protoreflect.Message {
mi := &file_api_v1_instance_service_proto_msgTypes[20]
mi := &file_api_v1_instance_service_proto_msgTypes[21]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
@ -1605,7 +1692,7 @@ const file_api_v1_instance_service_proto_rawDesc = "" +
"\finstance_url\x18\x06 \x01(\tR\vinstanceUrl\x12(\n" +
"\x05admin\x18\a \x01(\v2\x12.memos.api.v1.UserR\x05admin\x12\x16\n" +
"\x06commit\x18\b \x01(\tR\x06commit\"\x1b\n" +
"\x19GetInstanceProfileRequest\"\xf0\x19\n" +
"\x19GetInstanceProfileRequest\"\xcd\x1b\n" +
"\x0fInstanceSetting\x12\x17\n" +
"\x04name\x18\x01 \x01(\tB\x03\xe0A\bR\x04name\x12W\n" +
"\x0fgeneral_setting\x18\x02 \x01(\v2,.memos.api.v1.InstanceSetting.GeneralSettingH\x00R\x0egeneralSetting\x12W\n" +
@ -1671,9 +1758,10 @@ const file_api_v1_instance_service_proto_rawDesc = "" +
"\breply_to\x18\b \x01(\tR\areplyTo\x12\x17\n" +
"\ause_tls\x18\t \x01(\bR\x06useTls\x12\x17\n" +
"\ause_ssl\x18\n" +
" \x01(\bR\x06useSsl\x1aY\n" +
" \x01(\bR\x06useSsl\x1a\xb2\x01\n" +
"\tAISetting\x12L\n" +
"\tproviders\x18\x01 \x03(\v2..memos.api.v1.InstanceSetting.AIProviderConfigR\tproviders\x1a\x80\x02\n" +
"\tproviders\x18\x01 \x03(\v2..memos.api.v1.InstanceSetting.AIProviderConfigR\tproviders\x12W\n" +
"\rtranscription\x18\x02 \x01(\v21.memos.api.v1.InstanceSetting.TranscriptionConfigR\rtranscription\x1a\x80\x02\n" +
"\x10AIProviderConfig\x12\x0e\n" +
"\x02id\x18\x01 \x01(\tR\x02id\x12\x14\n" +
"\x05title\x18\x02 \x01(\tR\x05title\x12@\n" +
@ -1682,7 +1770,13 @@ const file_api_v1_instance_service_proto_rawDesc = "" +
"\aapi_key\x18\x05 \x01(\tB\x03\xe0A\x04R\x06apiKey\x12#\n" +
"\vapi_key_set\x18\b \x01(\bB\x03\xe0A\x03R\tapiKeySet\x12%\n" +
"\fapi_key_hint\x18\t \x01(\tB\x03\xe0A\x03R\n" +
"apiKeyHint\"j\n" +
"apiKeyHint\x1a\x80\x01\n" +
"\x13TranscriptionConfig\x12\x1f\n" +
"\vprovider_id\x18\x01 \x01(\tR\n" +
"providerId\x12\x14\n" +
"\x05model\x18\x02 \x01(\tR\x05model\x12\x1a\n" +
"\blanguage\x18\x03 \x01(\tR\blanguage\x12\x16\n" +
"\x06prompt\x18\x04 \x01(\tR\x06prompt\"j\n" +
"\x03Key\x12\x13\n" +
"\x0fKEY_UNSPECIFIED\x10\x00\x12\v\n" +
"\aGENERAL\x10\x01\x12\v\n" +
@ -1739,7 +1833,7 @@ func file_api_v1_instance_service_proto_rawDescGZIP() []byte {
}
var file_api_v1_instance_service_proto_enumTypes = make([]protoimpl.EnumInfo, 3)
var file_api_v1_instance_service_proto_msgTypes = make([]protoimpl.MessageInfo, 21)
var file_api_v1_instance_service_proto_msgTypes = make([]protoimpl.MessageInfo, 22)
var file_api_v1_instance_service_proto_goTypes = []any{
(InstanceSetting_Key)(0), // 0: memos.api.v1.InstanceSetting.Key
(InstanceSetting_AIProviderType)(0), // 1: memos.api.v1.InstanceSetting.AIProviderType
@ -1760,19 +1854,20 @@ var file_api_v1_instance_service_proto_goTypes = []any{
(*InstanceSetting_NotificationSetting)(nil), // 16: memos.api.v1.InstanceSetting.NotificationSetting
(*InstanceSetting_AISetting)(nil), // 17: memos.api.v1.InstanceSetting.AISetting
(*InstanceSetting_AIProviderConfig)(nil), // 18: memos.api.v1.InstanceSetting.AIProviderConfig
(*InstanceSetting_GeneralSetting_CustomProfile)(nil), // 19: memos.api.v1.InstanceSetting.GeneralSetting.CustomProfile
(*InstanceSetting_StorageSetting_S3Config)(nil), // 20: memos.api.v1.InstanceSetting.StorageSetting.S3Config
nil, // 21: memos.api.v1.InstanceSetting.TagsSetting.TagsEntry
(*InstanceSetting_NotificationSetting_EmailSetting)(nil), // 22: memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
(*InstanceStats_DatabaseStats)(nil), // 23: memos.api.v1.InstanceStats.DatabaseStats
(*User)(nil), // 24: memos.api.v1.User
(*fieldmaskpb.FieldMask)(nil), // 25: google.protobuf.FieldMask
(*timestamppb.Timestamp)(nil), // 26: google.protobuf.Timestamp
(*color.Color)(nil), // 27: google.type.Color
(*emptypb.Empty)(nil), // 28: google.protobuf.Empty
(*InstanceSetting_TranscriptionConfig)(nil), // 19: memos.api.v1.InstanceSetting.TranscriptionConfig
(*InstanceSetting_GeneralSetting_CustomProfile)(nil), // 20: memos.api.v1.InstanceSetting.GeneralSetting.CustomProfile
(*InstanceSetting_StorageSetting_S3Config)(nil), // 21: memos.api.v1.InstanceSetting.StorageSetting.S3Config
nil, // 22: memos.api.v1.InstanceSetting.TagsSetting.TagsEntry
(*InstanceSetting_NotificationSetting_EmailSetting)(nil), // 23: memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
(*InstanceStats_DatabaseStats)(nil), // 24: memos.api.v1.InstanceStats.DatabaseStats
(*User)(nil), // 25: memos.api.v1.User
(*fieldmaskpb.FieldMask)(nil), // 26: google.protobuf.FieldMask
(*timestamppb.Timestamp)(nil), // 27: google.protobuf.Timestamp
(*color.Color)(nil), // 28: google.type.Color
(*emptypb.Empty)(nil), // 29: google.protobuf.Empty
}
var file_api_v1_instance_service_proto_depIdxs = []int32{
24, // 0: memos.api.v1.InstanceProfile.admin:type_name -> memos.api.v1.User
25, // 0: memos.api.v1.InstanceProfile.admin:type_name -> memos.api.v1.User
11, // 1: memos.api.v1.InstanceSetting.general_setting:type_name -> memos.api.v1.InstanceSetting.GeneralSetting
12, // 2: memos.api.v1.InstanceSetting.storage_setting:type_name -> memos.api.v1.InstanceSetting.StorageSetting
13, // 3: memos.api.v1.InstanceSetting.memo_related_setting:type_name -> memos.api.v1.InstanceSetting.MemoRelatedSetting
@ -1780,34 +1875,35 @@ var file_api_v1_instance_service_proto_depIdxs = []int32{
16, // 5: memos.api.v1.InstanceSetting.notification_setting:type_name -> memos.api.v1.InstanceSetting.NotificationSetting
17, // 6: memos.api.v1.InstanceSetting.ai_setting:type_name -> memos.api.v1.InstanceSetting.AISetting
5, // 7: memos.api.v1.UpdateInstanceSettingRequest.setting:type_name -> memos.api.v1.InstanceSetting
25, // 8: memos.api.v1.UpdateInstanceSettingRequest.update_mask:type_name -> google.protobuf.FieldMask
22, // 9: memos.api.v1.TestInstanceEmailSettingRequest.email:type_name -> memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
23, // 10: memos.api.v1.InstanceStats.database:type_name -> memos.api.v1.InstanceStats.DatabaseStats
26, // 11: memos.api.v1.InstanceStats.generated_time:type_name -> google.protobuf.Timestamp
19, // 12: memos.api.v1.InstanceSetting.GeneralSetting.custom_profile:type_name -> memos.api.v1.InstanceSetting.GeneralSetting.CustomProfile
26, // 8: memos.api.v1.UpdateInstanceSettingRequest.update_mask:type_name -> google.protobuf.FieldMask
23, // 9: memos.api.v1.TestInstanceEmailSettingRequest.email:type_name -> memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
24, // 10: memos.api.v1.InstanceStats.database:type_name -> memos.api.v1.InstanceStats.DatabaseStats
27, // 11: memos.api.v1.InstanceStats.generated_time:type_name -> google.protobuf.Timestamp
20, // 12: memos.api.v1.InstanceSetting.GeneralSetting.custom_profile:type_name -> memos.api.v1.InstanceSetting.GeneralSetting.CustomProfile
2, // 13: memos.api.v1.InstanceSetting.StorageSetting.storage_type:type_name -> memos.api.v1.InstanceSetting.StorageSetting.StorageType
20, // 14: memos.api.v1.InstanceSetting.StorageSetting.s3_config:type_name -> memos.api.v1.InstanceSetting.StorageSetting.S3Config
27, // 15: memos.api.v1.InstanceSetting.TagMetadata.background_color:type_name -> google.type.Color
21, // 16: memos.api.v1.InstanceSetting.TagsSetting.tags:type_name -> memos.api.v1.InstanceSetting.TagsSetting.TagsEntry
22, // 17: memos.api.v1.InstanceSetting.NotificationSetting.email:type_name -> memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
21, // 14: memos.api.v1.InstanceSetting.StorageSetting.s3_config:type_name -> memos.api.v1.InstanceSetting.StorageSetting.S3Config
28, // 15: memos.api.v1.InstanceSetting.TagMetadata.background_color:type_name -> google.type.Color
22, // 16: memos.api.v1.InstanceSetting.TagsSetting.tags:type_name -> memos.api.v1.InstanceSetting.TagsSetting.TagsEntry
23, // 17: memos.api.v1.InstanceSetting.NotificationSetting.email:type_name -> memos.api.v1.InstanceSetting.NotificationSetting.EmailSetting
18, // 18: memos.api.v1.InstanceSetting.AISetting.providers:type_name -> memos.api.v1.InstanceSetting.AIProviderConfig
1, // 19: memos.api.v1.InstanceSetting.AIProviderConfig.type:type_name -> memos.api.v1.InstanceSetting.AIProviderType
14, // 20: memos.api.v1.InstanceSetting.TagsSetting.TagsEntry.value:type_name -> memos.api.v1.InstanceSetting.TagMetadata
4, // 21: memos.api.v1.InstanceService.GetInstanceProfile:input_type -> memos.api.v1.GetInstanceProfileRequest
6, // 22: memos.api.v1.InstanceService.GetInstanceSetting:input_type -> memos.api.v1.GetInstanceSettingRequest
7, // 23: memos.api.v1.InstanceService.UpdateInstanceSetting:input_type -> memos.api.v1.UpdateInstanceSettingRequest
8, // 24: memos.api.v1.InstanceService.TestInstanceEmailSetting:input_type -> memos.api.v1.TestInstanceEmailSettingRequest
9, // 25: memos.api.v1.InstanceService.GetInstanceStats:input_type -> memos.api.v1.GetInstanceStatsRequest
3, // 26: memos.api.v1.InstanceService.GetInstanceProfile:output_type -> memos.api.v1.InstanceProfile
5, // 27: memos.api.v1.InstanceService.GetInstanceSetting:output_type -> memos.api.v1.InstanceSetting
5, // 28: memos.api.v1.InstanceService.UpdateInstanceSetting:output_type -> memos.api.v1.InstanceSetting
28, // 29: memos.api.v1.InstanceService.TestInstanceEmailSetting:output_type -> google.protobuf.Empty
10, // 30: memos.api.v1.InstanceService.GetInstanceStats:output_type -> memos.api.v1.InstanceStats
26, // [26:31] is the sub-list for method output_type
21, // [21:26] is the sub-list for method input_type
21, // [21:21] is the sub-list for extension type_name
21, // [21:21] is the sub-list for extension extendee
0, // [0:21] is the sub-list for field type_name
19, // 19: memos.api.v1.InstanceSetting.AISetting.transcription:type_name -> memos.api.v1.InstanceSetting.TranscriptionConfig
1, // 20: memos.api.v1.InstanceSetting.AIProviderConfig.type:type_name -> memos.api.v1.InstanceSetting.AIProviderType
14, // 21: memos.api.v1.InstanceSetting.TagsSetting.TagsEntry.value:type_name -> memos.api.v1.InstanceSetting.TagMetadata
4, // 22: memos.api.v1.InstanceService.GetInstanceProfile:input_type -> memos.api.v1.GetInstanceProfileRequest
6, // 23: memos.api.v1.InstanceService.GetInstanceSetting:input_type -> memos.api.v1.GetInstanceSettingRequest
7, // 24: memos.api.v1.InstanceService.UpdateInstanceSetting:input_type -> memos.api.v1.UpdateInstanceSettingRequest
8, // 25: memos.api.v1.InstanceService.TestInstanceEmailSetting:input_type -> memos.api.v1.TestInstanceEmailSettingRequest
9, // 26: memos.api.v1.InstanceService.GetInstanceStats:input_type -> memos.api.v1.GetInstanceStatsRequest
3, // 27: memos.api.v1.InstanceService.GetInstanceProfile:output_type -> memos.api.v1.InstanceProfile
5, // 28: memos.api.v1.InstanceService.GetInstanceSetting:output_type -> memos.api.v1.InstanceSetting
5, // 29: memos.api.v1.InstanceService.UpdateInstanceSetting:output_type -> memos.api.v1.InstanceSetting
29, // 30: memos.api.v1.InstanceService.TestInstanceEmailSetting:output_type -> google.protobuf.Empty
10, // 31: memos.api.v1.InstanceService.GetInstanceStats:output_type -> memos.api.v1.InstanceStats
27, // [27:32] is the sub-list for method output_type
22, // [22:27] is the sub-list for method input_type
22, // [22:22] is the sub-list for extension type_name
22, // [22:22] is the sub-list for extension extendee
0, // [0:22] is the sub-list for field type_name
}
func init() { file_api_v1_instance_service_proto_init() }
@ -1830,7 +1926,7 @@ func file_api_v1_instance_service_proto_init() {
GoPackagePath: reflect.TypeOf(x{}).PkgPath(),
RawDescriptor: unsafe.Slice(unsafe.StringData(file_api_v1_instance_service_proto_rawDesc), len(file_api_v1_instance_service_proto_rawDesc)),
NumEnums: 3,
NumMessages: 21,
NumMessages: 22,
NumExtensions: 0,
NumServices: 1,
},

@ -2701,6 +2701,12 @@ components:
items:
$ref: '#/components/schemas/InstanceSetting_AIProviderConfig'
description: providers is the list of AI provider configurations available instance-wide.
transcription:
allOf:
- $ref: '#/components/schemas/InstanceSetting_TranscriptionConfig'
description: |-
transcription is the speech-to-text feature configuration.
When unset or transcription.provider_id is empty, transcription is disabled.
description: AI provider configuration settings.
InstanceSetting_GeneralSetting:
type: object
@ -2808,6 +2814,29 @@ components:
so a single entry like "project/.*" matches all tags under that prefix.
Exact tag names are also valid (they are trivially valid regex patterns).
description: Tag metadata configuration.
InstanceSetting_TranscriptionConfig:
type: object
properties:
providerId:
type: string
description: |-
provider_id references an entry in AISetting.providers[].id.
Empty string means transcription is disabled.
model:
type: string
description: |-
model is the provider-specific model identifier.
Empty string falls back to the engine default
(whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
language:
type: string
description: |-
language is the default ISO 639-1 language hint sent to the provider.
Empty string lets the provider auto-detect.
prompt:
type: string
description: prompt is a default spelling/vocabulary hint passed to the provider.
description: TranscriptionConfig configures the speech-to-text feature.
InstanceStats:
type: object
properties:
@ -3538,18 +3567,9 @@ components:
description: Request message for TestInstanceEmailSetting method.
TranscribeRequest:
required:
- providerId
- config
- audio
type: object
properties:
providerId:
type: string
description: Required. The instance AI provider ID to use.
config:
allOf:
- $ref: '#/components/schemas/TranscriptionConfig'
description: Required. Transcription options.
audio:
allOf:
- $ref: '#/components/schemas/TranscriptionAudio'
@ -3577,15 +3597,6 @@ components:
contentType:
type: string
description: Optional. The MIME type of the input audio.
TranscriptionConfig:
type: object
properties:
prompt:
type: string
description: Optional. A prompt to improve transcription quality.
language:
type: string
description: Optional. The language of the input audio.
UpsertMemoReactionRequest:
required:
- name

@ -962,7 +962,10 @@ func (x *InstanceNotificationSetting) GetEmail() *InstanceNotificationSetting_Em
type InstanceAISetting struct {
state protoimpl.MessageState `protogen:"open.v1"`
// providers is the list of AI provider configurations available instance-wide.
Providers []*AIProviderConfig `protobuf:"bytes,1,rep,name=providers,proto3" json:"providers,omitempty"`
Providers []*AIProviderConfig `protobuf:"bytes,1,rep,name=providers,proto3" json:"providers,omitempty"`
// transcription is the speech-to-text feature configuration.
// When unset or transcription.provider_id is empty, transcription is disabled.
Transcription *TranscriptionConfig `protobuf:"bytes,2,opt,name=transcription,proto3" json:"transcription,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
@ -1004,6 +1007,13 @@ func (x *InstanceAISetting) GetProviders() []*AIProviderConfig {
return nil
}
func (x *InstanceAISetting) GetTranscription() *TranscriptionConfig {
if x != nil {
return x.Transcription
}
return nil
}
type AIProviderConfig struct {
state protoimpl.MessageState `protogen:"open.v1"`
Id string `protobuf:"bytes,1,opt,name=id,proto3" json:"id,omitempty"`
@ -1081,6 +1091,85 @@ func (x *AIProviderConfig) GetApiKey() string {
return ""
}
// TranscriptionConfig configures the speech-to-text feature.
type TranscriptionConfig struct {
state protoimpl.MessageState `protogen:"open.v1"`
// provider_id references an entry in InstanceAISetting.providers[].id.
// Empty string means transcription is disabled.
ProviderId string `protobuf:"bytes,1,opt,name=provider_id,json=providerId,proto3" json:"provider_id,omitempty"`
// model is the provider-specific model identifier.
// Empty string falls back to the engine default
// (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
Model string `protobuf:"bytes,2,opt,name=model,proto3" json:"model,omitempty"`
// language is the default ISO 639-1 language hint sent to the provider.
// Empty string lets the provider auto-detect.
Language string `protobuf:"bytes,3,opt,name=language,proto3" json:"language,omitempty"`
// prompt is a default spelling/vocabulary hint passed to the provider.
// Used as the OpenAI Whisper "prompt" parameter and folded into the Gemini
// generation prompt as a "Context and spelling hints" block.
Prompt string `protobuf:"bytes,4,opt,name=prompt,proto3" json:"prompt,omitempty"`
unknownFields protoimpl.UnknownFields
sizeCache protoimpl.SizeCache
}
func (x *TranscriptionConfig) Reset() {
*x = TranscriptionConfig{}
mi := &file_store_instance_setting_proto_msgTypes[12]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
func (x *TranscriptionConfig) String() string {
return protoimpl.X.MessageStringOf(x)
}
func (*TranscriptionConfig) ProtoMessage() {}
func (x *TranscriptionConfig) ProtoReflect() protoreflect.Message {
mi := &file_store_instance_setting_proto_msgTypes[12]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
ms.StoreMessageInfo(mi)
}
return ms
}
return mi.MessageOf(x)
}
// Deprecated: Use TranscriptionConfig.ProtoReflect.Descriptor instead.
func (*TranscriptionConfig) Descriptor() ([]byte, []int) {
return file_store_instance_setting_proto_rawDescGZIP(), []int{12}
}
func (x *TranscriptionConfig) GetProviderId() string {
if x != nil {
return x.ProviderId
}
return ""
}
func (x *TranscriptionConfig) GetModel() string {
if x != nil {
return x.Model
}
return ""
}
func (x *TranscriptionConfig) GetLanguage() string {
if x != nil {
return x.Language
}
return ""
}
func (x *TranscriptionConfig) GetPrompt() string {
if x != nil {
return x.Prompt
}
return ""
}
type InstanceNotificationSetting_EmailSetting struct {
state protoimpl.MessageState `protogen:"open.v1"`
Enabled bool `protobuf:"varint,1,opt,name=enabled,proto3" json:"enabled,omitempty"`
@ -1099,7 +1188,7 @@ type InstanceNotificationSetting_EmailSetting struct {
func (x *InstanceNotificationSetting_EmailSetting) Reset() {
*x = InstanceNotificationSetting_EmailSetting{}
mi := &file_store_instance_setting_proto_msgTypes[13]
mi := &file_store_instance_setting_proto_msgTypes[14]
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
ms.StoreMessageInfo(mi)
}
@ -1111,7 +1200,7 @@ func (x *InstanceNotificationSetting_EmailSetting) String() string {
func (*InstanceNotificationSetting_EmailSetting) ProtoMessage() {}
func (x *InstanceNotificationSetting_EmailSetting) ProtoReflect() protoreflect.Message {
mi := &file_store_instance_setting_proto_msgTypes[13]
mi := &file_store_instance_setting_proto_msgTypes[14]
if x != nil {
ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x))
if ms.LoadMessageInfo() == nil {
@ -1273,15 +1362,22 @@ const file_store_instance_setting_proto_rawDesc = "" +
"\breply_to\x18\b \x01(\tR\areplyTo\x12\x17\n" +
"\ause_tls\x18\t \x01(\bR\x06useTls\x12\x17\n" +
"\ause_ssl\x18\n" +
" \x01(\bR\x06useSsl\"P\n" +
" \x01(\bR\x06useSsl\"\x98\x01\n" +
"\x11InstanceAISetting\x12;\n" +
"\tproviders\x18\x01 \x03(\v2\x1d.memos.store.AIProviderConfigR\tproviders\"\x9e\x01\n" +
"\tproviders\x18\x01 \x03(\v2\x1d.memos.store.AIProviderConfigR\tproviders\x12F\n" +
"\rtranscription\x18\x02 \x01(\v2 .memos.store.TranscriptionConfigR\rtranscription\"\x9e\x01\n" +
"\x10AIProviderConfig\x12\x0e\n" +
"\x02id\x18\x01 \x01(\tR\x02id\x12\x14\n" +
"\x05title\x18\x02 \x01(\tR\x05title\x12/\n" +
"\x04type\x18\x03 \x01(\x0e2\x1b.memos.store.AIProviderTypeR\x04type\x12\x1a\n" +
"\bendpoint\x18\x04 \x01(\tR\bendpoint\x12\x17\n" +
"\aapi_key\x18\x05 \x01(\tR\x06apiKey*\x95\x01\n" +
"\aapi_key\x18\x05 \x01(\tR\x06apiKey\"\x80\x01\n" +
"\x13TranscriptionConfig\x12\x1f\n" +
"\vprovider_id\x18\x01 \x01(\tR\n" +
"providerId\x12\x14\n" +
"\x05model\x18\x02 \x01(\tR\x05model\x12\x1a\n" +
"\blanguage\x18\x03 \x01(\tR\blanguage\x12\x16\n" +
"\x06prompt\x18\x04 \x01(\tR\x06prompt*\x95\x01\n" +
"\x12InstanceSettingKey\x12$\n" +
" INSTANCE_SETTING_KEY_UNSPECIFIED\x10\x00\x12\t\n" +
"\x05BASIC\x10\x01\x12\v\n" +
@ -1312,7 +1408,7 @@ func file_store_instance_setting_proto_rawDescGZIP() []byte {
}
var file_store_instance_setting_proto_enumTypes = make([]protoimpl.EnumInfo, 3)
var file_store_instance_setting_proto_msgTypes = make([]protoimpl.MessageInfo, 14)
var file_store_instance_setting_proto_msgTypes = make([]protoimpl.MessageInfo, 15)
var file_store_instance_setting_proto_goTypes = []any{
(InstanceSettingKey)(0), // 0: memos.store.InstanceSettingKey
(AIProviderType)(0), // 1: memos.store.AIProviderType
@ -1329,9 +1425,10 @@ var file_store_instance_setting_proto_goTypes = []any{
(*InstanceNotificationSetting)(nil), // 12: memos.store.InstanceNotificationSetting
(*InstanceAISetting)(nil), // 13: memos.store.InstanceAISetting
(*AIProviderConfig)(nil), // 14: memos.store.AIProviderConfig
nil, // 15: memos.store.InstanceTagsSetting.TagsEntry
(*InstanceNotificationSetting_EmailSetting)(nil), // 16: memos.store.InstanceNotificationSetting.EmailSetting
(*color.Color)(nil), // 17: google.type.Color
(*TranscriptionConfig)(nil), // 15: memos.store.TranscriptionConfig
nil, // 16: memos.store.InstanceTagsSetting.TagsEntry
(*InstanceNotificationSetting_EmailSetting)(nil), // 17: memos.store.InstanceNotificationSetting.EmailSetting
(*color.Color)(nil), // 18: google.type.Color
}
var file_store_instance_setting_proto_depIdxs = []int32{
0, // 0: memos.store.InstanceSetting.key:type_name -> memos.store.InstanceSettingKey
@ -1345,17 +1442,18 @@ var file_store_instance_setting_proto_depIdxs = []int32{
6, // 8: memos.store.InstanceGeneralSetting.custom_profile:type_name -> memos.store.InstanceCustomProfile
2, // 9: memos.store.InstanceStorageSetting.storage_type:type_name -> memos.store.InstanceStorageSetting.StorageType
8, // 10: memos.store.InstanceStorageSetting.s3_config:type_name -> memos.store.StorageS3Config
17, // 11: memos.store.InstanceTagMetadata.background_color:type_name -> google.type.Color
15, // 12: memos.store.InstanceTagsSetting.tags:type_name -> memos.store.InstanceTagsSetting.TagsEntry
16, // 13: memos.store.InstanceNotificationSetting.email:type_name -> memos.store.InstanceNotificationSetting.EmailSetting
18, // 11: memos.store.InstanceTagMetadata.background_color:type_name -> google.type.Color
16, // 12: memos.store.InstanceTagsSetting.tags:type_name -> memos.store.InstanceTagsSetting.TagsEntry
17, // 13: memos.store.InstanceNotificationSetting.email:type_name -> memos.store.InstanceNotificationSetting.EmailSetting
14, // 14: memos.store.InstanceAISetting.providers:type_name -> memos.store.AIProviderConfig
1, // 15: memos.store.AIProviderConfig.type:type_name -> memos.store.AIProviderType
10, // 16: memos.store.InstanceTagsSetting.TagsEntry.value:type_name -> memos.store.InstanceTagMetadata
17, // [17:17] is the sub-list for method output_type
17, // [17:17] is the sub-list for method input_type
17, // [17:17] is the sub-list for extension type_name
17, // [17:17] is the sub-list for extension extendee
0, // [0:17] is the sub-list for field type_name
15, // 15: memos.store.InstanceAISetting.transcription:type_name -> memos.store.TranscriptionConfig
1, // 16: memos.store.AIProviderConfig.type:type_name -> memos.store.AIProviderType
10, // 17: memos.store.InstanceTagsSetting.TagsEntry.value:type_name -> memos.store.InstanceTagMetadata
18, // [18:18] is the sub-list for method output_type
18, // [18:18] is the sub-list for method input_type
18, // [18:18] is the sub-list for extension type_name
18, // [18:18] is the sub-list for extension extendee
0, // [0:18] is the sub-list for field type_name
}
func init() { file_store_instance_setting_proto_init() }
@ -1378,7 +1476,7 @@ func file_store_instance_setting_proto_init() {
GoPackagePath: reflect.TypeOf(x{}).PkgPath(),
RawDescriptor: unsafe.Slice(unsafe.StringData(file_store_instance_setting_proto_rawDesc), len(file_store_instance_setting_proto_rawDesc)),
NumEnums: 3,
NumMessages: 14,
NumMessages: 15,
NumExtensions: 0,
NumServices: 0,
},

@ -149,6 +149,10 @@ message InstanceNotificationSetting {
message InstanceAISetting {
// providers is the list of AI provider configurations available instance-wide.
repeated AIProviderConfig providers = 1;
// transcription is the speech-to-text feature configuration.
// When unset or transcription.provider_id is empty, transcription is disabled.
TranscriptionConfig transcription = 2;
}
message AIProviderConfig {
@ -165,3 +169,24 @@ enum AIProviderType {
OPENAI = 1;
GEMINI = 2;
}
// TranscriptionConfig configures the speech-to-text feature.
message TranscriptionConfig {
// provider_id references an entry in InstanceAISetting.providers[].id.
// Empty string means transcription is disabled.
string provider_id = 1;
// model is the provider-specific model identifier.
// Empty string falls back to the engine default
// (whisper-1 for OPENAI providers, gemini-2.5-flash for GEMINI providers).
string model = 2;
// language is the default ISO 639-1 language hint sent to the provider.
// Empty string lets the provider auto-detect.
string language = 3;
// prompt is a default spelling/vocabulary hint passed to the provider.
// Used as the OpenAI Whisper "prompt" parameter and folded into the Gemini
// generation prompt as a "Context and spelling hints" block.
string prompt = 4;
}

@ -17,8 +17,6 @@ import (
const (
maxTranscriptionAudioSizeBytes = 25 * MebiByte
maxTranscriptionPromptLength = 4096
maxTranscriptionLanguageLength = 32
maxTranscriptionFilenameLength = 255
)
@ -51,20 +49,6 @@ func (s *APIV1Service) Transcribe(ctx context.Context, request *v1pb.TranscribeR
return nil, status.Errorf(codes.Unauthenticated, "user not authenticated")
}
if strings.TrimSpace(request.ProviderId) == "" {
return nil, status.Errorf(codes.InvalidArgument, "provider_id is required")
}
if request.Config == nil {
return nil, status.Errorf(codes.InvalidArgument, "config is required")
}
prompt := strings.TrimSpace(request.Config.GetPrompt())
if len(prompt) > maxTranscriptionPromptLength {
return nil, status.Errorf(codes.InvalidArgument, "prompt is too long; maximum length is %d characters", maxTranscriptionPromptLength)
}
language := strings.TrimSpace(request.Config.GetLanguage())
if len(language) > maxTranscriptionLanguageLength {
return nil, status.Errorf(codes.InvalidArgument, "language is too long; maximum length is %d characters", maxTranscriptionLanguageLength)
}
if request.Audio == nil {
return nil, status.Errorf(codes.InvalidArgument, "audio is required")
}
@ -90,10 +74,31 @@ func (s *APIV1Service) Transcribe(ctx context.Context, request *v1pb.TranscribeR
return nil, status.Errorf(codes.InvalidArgument, "audio content type %q is not supported", contentType)
}
provider, model, err := s.resolveAIProviderForTranscription(ctx, request.ProviderId)
aiSetting, err := s.Store.GetInstanceAISetting(ctx)
if err != nil {
return nil, status.Errorf(codes.Internal, "failed to get AI setting: %v", err)
}
persisted := aiSetting.GetTranscription()
providerID := persisted.GetProviderId()
if providerID == "" {
return nil, status.Errorf(codes.FailedPrecondition, "transcription is not configured")
}
provider, err := s.resolveAIProvider(aiSetting, providerID)
if err != nil {
return nil, err
}
model := persisted.GetModel()
if model == "" {
defaultModel, err := ai.DefaultTranscriptionModel(provider.Type)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "%v", err)
}
model = defaultModel
}
transcriber, err := ai.NewTranscriber(provider)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "failed to create AI transcriber: %v", err)
@ -105,8 +110,8 @@ func (s *APIV1Service) Transcribe(ctx context.Context, request *v1pb.TranscribeR
ContentType: contentType,
Audio: bytes.NewReader(content),
Size: int64(len(content)),
Prompt: prompt,
Language: language,
Prompt: persisted.GetPrompt(),
Language: persisted.GetLanguage(),
})
if err != nil {
return nil, status.Errorf(codes.Internal, "failed to transcribe audio: %v", err)
@ -116,12 +121,7 @@ func (s *APIV1Service) Transcribe(ctx context.Context, request *v1pb.TranscribeR
}, nil
}
func (s *APIV1Service) resolveAIProviderForTranscription(ctx context.Context, providerID string) (ai.ProviderConfig, string, error) {
setting, err := s.Store.GetInstanceAISetting(ctx)
if err != nil {
return ai.ProviderConfig{}, "", status.Errorf(codes.Internal, "failed to get AI setting: %v", err)
}
func (*APIV1Service) resolveAIProvider(setting *storepb.InstanceAISetting, providerID string) (ai.ProviderConfig, error) {
providers := make([]ai.ProviderConfig, 0, len(setting.GetProviders()))
for _, provider := range setting.GetProviders() {
if provider == nil {
@ -132,13 +132,9 @@ func (s *APIV1Service) resolveAIProviderForTranscription(ctx context.Context, pr
provider, err := ai.FindProvider(providers, providerID)
if err != nil {
return ai.ProviderConfig{}, "", status.Errorf(codes.NotFound, "AI provider not found")
}
selectedModel, err := ai.DefaultTranscriptionModel(provider.Type)
if err != nil {
return ai.ProviderConfig{}, "", status.Errorf(codes.InvalidArgument, "%v", err)
return ai.ProviderConfig{}, status.Errorf(codes.FailedPrecondition, "transcription provider is not configured")
}
return *provider, selectedModel, nil
return *provider, nil
}
func convertAIProviderConfigFromStore(provider *storepb.AIProviderConfig) ai.ProviderConfig {

@ -20,6 +20,12 @@ import (
"github.com/usememos/memos/store"
)
const (
maxTranscriptionConfigModelLength = 256
maxTranscriptionConfigLanguageLength = 32
maxTranscriptionConfigPromptLength = 4096
)
// GetInstanceProfile returns the instance profile.
func (s *APIV1Service) GetInstanceProfile(ctx context.Context, _ *v1pb.GetInstanceProfileRequest) (*v1pb.InstanceProfile, error) {
admin, err := s.GetInstanceAdmin(ctx)
@ -91,6 +97,7 @@ func (s *APIV1Service) GetInstanceSetting(ctx context.Context, request *v1pb.Get
return nil, status.Errorf(codes.PermissionDenied, "permission denied")
}
}
isAdminCaller := false
if instanceSetting.Key == storepb.InstanceSettingKey_AI {
user, err := s.fetchCurrentUser(ctx)
if err != nil {
@ -99,9 +106,22 @@ func (s *APIV1Service) GetInstanceSetting(ctx context.Context, request *v1pb.Get
if user == nil {
return nil, status.Errorf(codes.Unauthenticated, "user not authenticated")
}
isAdminCaller = user.Role == store.RoleAdmin
}
return convertInstanceSettingFromStore(instanceSetting), nil
result := convertInstanceSettingFromStore(instanceSetting)
if instanceSetting.Key == storepb.InstanceSettingKey_AI && !isAdminCaller {
// Non-admin callers only need transcription.provider_id to gate the
// editor's Transcribe button. Model / language / prompt are
// admin-entered defaults that may contain proprietary glossary terms,
// so they are redacted from non-admin responses.
if ai := result.GetAiSetting(); ai != nil && ai.Transcription != nil {
ai.Transcription.Model = ""
ai.Transcription.Language = ""
ai.Transcription.Prompt = ""
}
}
return result, nil
}
func (s *APIV1Service) UpdateInstanceSetting(ctx context.Context, request *v1pb.UpdateInstanceSettingRequest) (*v1pb.InstanceSetting, error) {
@ -508,7 +528,8 @@ func convertInstanceAISettingFromStore(setting *storepb.InstanceAISetting) *v1pb
}
aiSetting := &v1pb.InstanceSetting_AISetting{
Providers: make([]*v1pb.InstanceSetting_AIProviderConfig, 0, len(setting.Providers)),
Providers: make([]*v1pb.InstanceSetting_AIProviderConfig, 0, len(setting.Providers)),
Transcription: convertTranscriptionConfigFromStore(setting.GetTranscription()),
}
for _, provider := range setting.Providers {
if provider == nil {
@ -533,7 +554,8 @@ func convertInstanceAISettingToStore(setting *v1pb.InstanceSetting_AISetting) *s
}
aiSetting := &storepb.InstanceAISetting{
Providers: make([]*storepb.AIProviderConfig, 0, len(setting.Providers)),
Providers: make([]*storepb.AIProviderConfig, 0, len(setting.Providers)),
Transcription: convertTranscriptionConfigToStore(setting.GetTranscription()),
}
for _, provider := range setting.Providers {
if provider == nil {
@ -550,6 +572,30 @@ func convertInstanceAISettingToStore(setting *v1pb.InstanceSetting_AISetting) *s
return aiSetting
}
func convertTranscriptionConfigFromStore(setting *storepb.TranscriptionConfig) *v1pb.InstanceSetting_TranscriptionConfig {
if setting == nil {
return nil
}
return &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: setting.GetProviderId(),
Model: setting.GetModel(),
Language: setting.GetLanguage(),
Prompt: setting.GetPrompt(),
}
}
func convertTranscriptionConfigToStore(setting *v1pb.InstanceSetting_TranscriptionConfig) *storepb.TranscriptionConfig {
if setting == nil {
return nil
}
return &storepb.TranscriptionConfig{
ProviderId: setting.GetProviderId(),
Model: setting.GetModel(),
Language: setting.GetLanguage(),
Prompt: setting.GetPrompt(),
}
}
func validateInstanceSetting(setting *v1pb.InstanceSetting) error {
key, err := ExtractInstanceSettingKeyFromName(setting.Name)
if err != nil {
@ -619,6 +665,53 @@ func (s *APIV1Service) prepareInstanceAISettingForUpdate(ctx context.Context, se
return errors.Errorf("provider %q API key is required", provider.Id)
}
}
if err := preparePersistedTranscriptionConfig(setting, existing); err != nil {
return err
}
return nil
}
func preparePersistedTranscriptionConfig(setting *storepb.InstanceAISetting, existing *storepb.InstanceAISetting) error {
// Preserve the previously stored transcription config when the request omits it,
// matching the same "absence == keep" semantics used for API keys. The preserved
// config still falls through to validation below, so a stale provider_id is
// rejected if the same update removed or renamed its referenced provider.
if setting.Transcription == nil && existing != nil {
setting.Transcription = existing.GetTranscription()
}
if setting.Transcription == nil {
return nil
}
cfg := setting.Transcription
cfg.ProviderId = strings.TrimSpace(cfg.ProviderId)
cfg.Model = strings.TrimSpace(cfg.Model)
cfg.Language = strings.TrimSpace(cfg.Language)
cfg.Prompt = strings.TrimSpace(cfg.Prompt)
if cfg.ProviderId != "" {
referenced := false
for _, provider := range setting.Providers {
if provider != nil && provider.Id == cfg.ProviderId {
referenced = true
break
}
}
if !referenced {
return errors.Errorf("transcription provider_id %q does not reference any configured provider", cfg.ProviderId)
}
}
if len(cfg.Model) > maxTranscriptionConfigModelLength {
return errors.Errorf("transcription model is too long; maximum length is %d characters", maxTranscriptionConfigModelLength)
}
if len(cfg.Language) > maxTranscriptionConfigLanguageLength {
return errors.Errorf("transcription language is too long; maximum length is %d characters", maxTranscriptionConfigLanguageLength)
}
if len(cfg.Prompt) > maxTranscriptionConfigPromptLength {
return errors.Errorf("transcription prompt is too long; maximum length is %d characters", maxTranscriptionConfigPromptLength)
}
return nil
}

@ -21,8 +21,6 @@ func TestTranscribe(t *testing.T) {
defer ts.Cleanup()
_, err := ts.Service.Transcribe(ctx, &v1pb.TranscribeRequest{
ProviderId: "openai-main",
Config: &v1pb.TranscriptionConfig{},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
Filename: "voice.wav",
@ -33,7 +31,7 @@ func TestTranscribe(t *testing.T) {
require.Contains(t, err.Error(), "user not authenticated")
})
t.Run("transcribes audio file with configured provider", func(t *testing.T) {
t.Run("transcribes audio file using persisted transcription setting", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
@ -45,7 +43,8 @@ func TestTranscribe(t *testing.T) {
require.Equal(t, "/audio/transcriptions", r.URL.Path)
require.Equal(t, "Bearer sk-test", r.Header.Get("Authorization"))
require.NoError(t, r.ParseMultipartForm(10<<20))
require.Equal(t, "gpt-4o-transcribe", r.FormValue("model"))
require.Equal(t, "whisper-1", r.FormValue("model"))
require.Equal(t, "fr", r.FormValue("language"))
require.Equal(t, "names: Alice", r.FormValue("prompt"))
file, header, err := r.FormFile("file")
@ -73,16 +72,18 @@ func TestTranscribe(t *testing.T) {
ApiKey: "sk-test",
},
},
Transcription: &storepb.TranscriptionConfig{
ProviderId: "openai-main",
Model: "whisper-1",
Language: "fr",
Prompt: "names: Alice",
},
},
},
})
require.NoError(t, err)
resp, err := ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
ProviderId: "openai-main",
Config: &v1pb.TranscriptionConfig{
Prompt: "names: Alice",
},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
Filename: "voice.wav",
@ -117,14 +118,15 @@ func TestTranscribe(t *testing.T) {
ApiKey: "sk-test",
},
},
Transcription: &storepb.TranscriptionConfig{
ProviderId: "openai-main",
},
},
},
})
require.NoError(t, err)
_, err = ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
ProviderId: "openai-main",
Config: &v1pb.TranscriptionConfig{},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
Filename: "voice.wav",
@ -172,14 +174,15 @@ func TestTranscribe(t *testing.T) {
ApiKey: "gemini-key",
},
},
Transcription: &storepb.TranscriptionConfig{
ProviderId: "gemini-main",
},
},
},
})
require.NoError(t, err)
resp, err := ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
ProviderId: "gemini-main",
Config: &v1pb.TranscriptionConfig{},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("mp3 bytes")},
Filename: "voice.mp3",
@ -190,7 +193,7 @@ func TestTranscribe(t *testing.T) {
require.Equal(t, "gemini transcript", resp.Text)
})
t.Run("uses built-in transcription model", func(t *testing.T) {
t.Run("falls back to engine default model when transcription model is empty", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
@ -200,7 +203,7 @@ func TestTranscribe(t *testing.T) {
openAIServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
require.NoError(t, r.ParseMultipartForm(10<<20))
require.Equal(t, "gpt-4o-transcribe", r.FormValue("model"))
require.Equal(t, "whisper-1", r.FormValue("model"))
w.Header().Set("Content-Type", "application/json")
require.NoError(t, json.NewEncoder(w).Encode(map[string]string{
"text": "built-in model",
@ -221,14 +224,15 @@ func TestTranscribe(t *testing.T) {
ApiKey: "sk-test",
},
},
Transcription: &storepb.TranscriptionConfig{
ProviderId: "openai-main",
},
},
},
})
require.NoError(t, err)
resp, err := ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
ProviderId: "openai-main",
Config: &v1pb.TranscriptionConfig{},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
Filename: "voice.wav",
@ -247,27 +251,7 @@ func TestTranscribe(t *testing.T) {
require.NoError(t, err)
userCtx := ts.CreateUserContext(ctx, user.ID)
_, err = ts.Store.UpsertInstanceSetting(ctx, &storepb.InstanceSetting{
Key: storepb.InstanceSettingKey_AI,
Value: &storepb.InstanceSetting_AiSetting{
AiSetting: &storepb.InstanceAISetting{
Providers: []*storepb.AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: storepb.AIProviderType_OPENAI,
Endpoint: "https://example.com/v1",
ApiKey: "sk-test",
},
},
},
},
})
require.NoError(t, err)
_, err = ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
ProviderId: "openai-main",
Config: &v1pb.TranscriptionConfig{},
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("not audio")},
Filename: "notes.txt",
@ -277,4 +261,23 @@ func TestTranscribe(t *testing.T) {
require.Error(t, err)
require.Contains(t, err.Error(), "not supported")
})
t.Run("returns FailedPrecondition when transcription is not configured", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
user, err := ts.CreateRegularUser(ctx, "alice-empty")
require.NoError(t, err)
userCtx := ts.CreateUserContext(ctx, user.ID)
_, err = ts.Service.Transcribe(userCtx, &v1pb.TranscribeRequest{
Audio: &v1pb.TranscriptionAudio{
Source: &v1pb.TranscriptionAudio_Content{Content: []byte("RIFF")},
Filename: "voice.wav",
ContentType: "audio/wav",
},
})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription is not configured")
})
}

@ -2,6 +2,7 @@ package test
import (
"context"
"strings"
"testing"
"github.com/stretchr/testify/require"
@ -731,4 +732,149 @@ func TestUpdateInstanceSetting(t *testing.T) {
"existing AI provider API key must be preserved when an empty value is sent")
require.Equal(t, "OpenAI primary", stored.GetProviders()[0].GetTitle())
})
t.Run("UpdateInstanceSetting - transcription provider_id must reference an existing provider", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
hostUser, err := ts.CreateHostUser(ctx, "admin")
require.NoError(t, err)
adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
Setting: &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "sk-test",
},
},
Transcription: &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "does-not-exist",
},
},
},
},
})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription provider_id")
})
t.Run("UpdateInstanceSetting - transcription strings are length-capped", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
hostUser, err := ts.CreateHostUser(ctx, "admin")
require.NoError(t, err)
adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
base := &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "sk-test",
},
},
},
},
}
oversizedModel := strings.Repeat("a", 257)
base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Model: oversizedModel,
}
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription model")
oversizedLanguage := strings.Repeat("a", 33)
base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Language: oversizedLanguage,
}
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription language")
oversizedPrompt := strings.Repeat("a", 4097)
base.GetAiSetting().Transcription = &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Prompt: oversizedPrompt,
}
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{Setting: base})
require.Error(t, err)
require.Contains(t, err.Error(), "transcription prompt")
})
t.Run("UpdateInstanceSetting - transcription is preserved when omitted on update", func(t *testing.T) {
ts := NewTestService(t)
defer ts.Cleanup()
hostUser, err := ts.CreateHostUser(ctx, "admin")
require.NoError(t, err)
adminCtx := ts.CreateUserContext(ctx, hostUser.ID)
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
Setting: &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "sk-test",
},
},
Transcription: &v1pb.InstanceSetting_TranscriptionConfig{
ProviderId: "openai-main",
Model: "whisper-1",
Language: "en",
Prompt: "names: Alice",
},
},
},
},
})
require.NoError(t, err)
_, err = ts.Service.UpdateInstanceSetting(adminCtx, &v1pb.UpdateInstanceSettingRequest{
Setting: &v1pb.InstanceSetting{
Name: "instance/settings/AI",
Value: &v1pb.InstanceSetting_AiSetting{
AiSetting: &v1pb.InstanceSetting_AISetting{
Providers: []*v1pb.InstanceSetting_AIProviderConfig{
{
Id: "openai-main",
Title: "OpenAI",
Type: v1pb.InstanceSetting_OPENAI,
ApiKey: "",
},
},
},
},
},
})
require.NoError(t, err)
stored, err := ts.Store.GetInstanceAISetting(ctx)
require.NoError(t, err)
require.NotNil(t, stored.GetTranscription())
require.Equal(t, "openai-main", stored.GetTranscription().GetProviderId())
require.Equal(t, "whisper-1", stored.GetTranscription().GetModel())
require.Equal(t, "en", stored.GetTranscription().GetLanguage())
require.Equal(t, "names: Alice", stored.GetTranscription().GetPrompt())
})
}

@ -8,7 +8,7 @@ import { memoKeys } from "@/hooks/useMemoQueries";
import { userKeys } from "@/hooks/useUserQueries";
import { handleError } from "@/lib/error";
import { cn } from "@/lib/utils";
import { InstanceSetting_AIProviderType, InstanceSetting_Key } from "@/types/proto/api/v1/instance_service_pb";
import { InstanceSetting_Key } from "@/types/proto/api/v1/instance_service_pb";
import { useTranslate } from "@/utils/i18n";
import { convertVisibilityFromString } from "@/utils/memo";
import {
@ -28,11 +28,6 @@ import { EditorProvider, useEditorContext } from "./state";
import type { MemoEditorProps } from "./types";
import type { LocalFile } from "./types/attachment";
const TRANSCRIPTION_PROVIDER_TYPES: InstanceSetting_AIProviderType[] = [
InstanceSetting_AIProviderType.OPENAI,
InstanceSetting_AIProviderType.GEMINI,
];
const MemoEditor = (props: MemoEditorProps) => (
<EditorProvider>
<MemoEditorImpl {...props} />
@ -61,10 +56,12 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({
const [isTranscribingAudio, setIsTranscribingAudio] = useState(false);
const memoName = memo?.name;
const transcriptionProvider = useMemo(
() => aiSetting.providers.find((provider) => provider.apiKeySet && TRANSCRIPTION_PROVIDER_TYPES.includes(provider.type)),
[aiSetting.providers],
);
const canTranscribe = useMemo(() => {
const providerId = aiSetting.transcription?.providerId ?? "";
if (!providerId) return false;
const provider = aiSetting.providers.find((p) => p.id === providerId);
return Boolean(provider?.apiKeySet);
}, [aiSetting.providers, aiSetting.transcription?.providerId]);
// Get default visibility from user settings
const defaultVisibility = userGeneralSetting?.memoVisibility ? convertVisibilityFromString(userGeneralSetting.memoVisibility) : undefined;
@ -129,7 +126,7 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({
const handleTranscribeRecordedAudio = useCallback(
async (localFile: LocalFile) => {
if (!transcriptionProvider) {
if (!canTranscribe) {
dispatch(actions.addLocalFile(localFile));
setIsTranscribingAudio(false);
setIsAudioRecorderOpen(false);
@ -137,7 +134,7 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({
}
try {
const text = (await transcriptionService.transcribeFile(localFile.file, transcriptionProvider)).trim();
const text = (await transcriptionService.transcribeFile(localFile.file)).trim();
if (!text) {
dispatch(actions.addLocalFile(localFile));
toast.error(t("editor.audio-recorder.transcribe-empty"));
@ -155,7 +152,7 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({
setIsAudioRecorderOpen(false);
}
},
[actions, dispatch, insertTranscribedText, t, transcriptionProvider],
[actions, canTranscribe, dispatch, insertTranscribedText, t],
);
const audioRecorderActions = useMemo(
@ -223,7 +220,7 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({
};
const handleTranscribeAudioRecording = () => {
if (!transcriptionProvider || isTranscribingAudio) {
if (!canTranscribe || isTranscribingAudio) {
return;
}
@ -340,7 +337,7 @@ const MemoEditorImpl: React.FC<MemoEditorProps> = ({
onStop={audioRecorder.stopRecording}
onCancel={handleCancelAudioRecording}
onTranscribe={handleTranscribeAudioRecording}
canTranscribe={!!transcriptionProvider}
canTranscribe={canTranscribe}
isTranscribing={isTranscribingAudio}
/>
)}

@ -1,15 +1,12 @@
import { create } from "@bufbuild/protobuf";
import { aiServiceClient } from "@/connect";
import { TranscribeRequestSchema, TranscriptionAudioSchema, TranscriptionConfigSchema } from "@/types/proto/api/v1/ai_service_pb";
import type { InstanceSetting_AIProviderConfig } from "@/types/proto/api/v1/instance_service_pb";
import { TranscribeRequestSchema, TranscriptionAudioSchema } from "@/types/proto/api/v1/ai_service_pb";
export const transcriptionService = {
async transcribeFile(file: File, provider: InstanceSetting_AIProviderConfig): Promise<string> {
async transcribeFile(file: File): Promise<string> {
const content = new Uint8Array(await file.arrayBuffer());
const response = await aiServiceClient.transcribe(
create(TranscribeRequestSchema, {
providerId: provider.id,
config: create(TranscriptionConfigSchema, {}),
audio: create(TranscriptionAudioSchema, {
source: {
case: "content",

@ -1,7 +1,7 @@
import { create } from "@bufbuild/protobuf";
import { isEqual } from "lodash-es";
import { MoreVerticalIcon, PlusIcon } from "lucide-react";
import { useEffect, useMemo, useState } from "react";
import { useEffect, useMemo, useRef, useState } from "react";
import { toast } from "react-hot-toast";
import ConfirmDialog from "@/components/ConfirmDialog";
import { Button } from "@/components/ui/button";
@ -10,6 +10,7 @@ import { DropdownMenu, DropdownMenuContent, DropdownMenuItem, DropdownMenuTrigge
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from "@/components/ui/select";
import { Textarea } from "@/components/ui/textarea";
import { useInstance } from "@/contexts/InstanceContext";
import {
InstanceSetting_AIProviderConfig,
@ -17,6 +18,8 @@ import {
InstanceSetting_AIProviderType,
InstanceSetting_AISettingSchema,
InstanceSetting_Key,
InstanceSetting_TranscriptionConfig,
InstanceSetting_TranscriptionConfigSchema,
InstanceSettingSchema,
} from "@/types/proto/api/v1/instance_service_pb";
import { useTranslate } from "@/utils/i18n";
@ -36,6 +39,13 @@ type LocalAIProvider = {
apiKeyHint: string;
};
type LocalTranscription = {
providerId: string;
model: string;
language: string;
prompt: string;
};
const providerTypeOptions = [InstanceSetting_AIProviderType.OPENAI, InstanceSetting_AIProviderType.GEMINI];
const byokNotes = ["setting.ai.byok-key-note", "setting.ai.byok-storage-note", "setting.ai.byok-model-note"] as const;
@ -61,6 +71,13 @@ const toLocalProvider = (provider: InstanceSetting_AIProviderConfig): LocalAIPro
apiKeyHint: provider.apiKeyHint,
});
const toLocalTranscription = (config: InstanceSetting_TranscriptionConfig | undefined): LocalTranscription => ({
providerId: config?.providerId ?? "",
model: config?.model ?? "",
language: config?.language ?? "",
prompt: config?.prompt ?? "",
});
const newProvider = (): LocalAIProvider => ({
id: createProviderID(),
title: "",
@ -80,11 +97,20 @@ const toProviderConfig = (provider: LocalAIProvider) =>
apiKey: provider.apiKey,
});
const toTranscriptionConfig = (transcription: LocalTranscription) =>
create(InstanceSetting_TranscriptionConfigSchema, {
providerId: transcription.providerId,
model: transcription.model.trim(),
language: transcription.language.trim(),
prompt: transcription.prompt,
});
const AISection = () => {
const t = useTranslate();
const saveInstanceSetting = useInstanceSettingUpdater();
const { aiSetting: originalSetting } = useInstance();
const [providers, setProviders] = useState<LocalAIProvider[]>(() => originalSetting.providers.map(toLocalProvider));
const [transcription, setTranscription] = useState<LocalTranscription>(() => toLocalTranscription(originalSetting.transcription));
const [editingProvider, setEditingProvider] = useState<LocalAIProvider | undefined>();
const [deleteTarget, setDeleteTarget] = useState<LocalAIProvider | undefined>();
@ -92,8 +118,50 @@ const AISection = () => {
setProviders(originalSetting.providers.map(toLocalProvider));
}, [originalSetting.providers]);
const originalProviders = useMemo(() => originalSetting.providers.map(toLocalProvider), [originalSetting.providers]);
const hasChanges = !isEqual(providers, originalProviders);
// Only re-sync the transcription draft when the server-side content actually
// changes — not on every originalSetting identity change. This prevents
// provider-side saves (which keep transcription unchanged on the server) from
// wiping an in-progress transcription draft.
const lastSyncedTranscription = useRef<LocalTranscription>(toLocalTranscription(originalSetting.transcription));
useEffect(() => {
const next = toLocalTranscription(originalSetting.transcription);
if (!isEqual(lastSyncedTranscription.current, next)) {
setTranscription(next);
lastSyncedTranscription.current = next;
}
}, [originalSetting.transcription]);
const originalTranscription = useMemo(() => toLocalTranscription(originalSetting.transcription), [originalSetting.transcription]);
const transcriptionHasChanges = !isEqual(transcription, originalTranscription);
const transcriptionProviderRef = useMemo(
() => providers.find((provider) => provider.id === transcription.providerId),
[providers, transcription.providerId],
);
// Persists the AI setting using a specific providers list and transcription
// value. Provider operations pass originalSetting.transcription so an
// in-progress transcription draft is never accidentally committed.
const persistAISetting = async (
nextProviders: LocalAIProvider[],
nextTranscription: InstanceSetting_TranscriptionConfig | undefined,
errorContext: string,
) => {
return saveInstanceSetting({
key: InstanceSetting_Key.AI,
setting: create(InstanceSettingSchema, {
name: buildInstanceSettingName(InstanceSetting_Key.AI),
value: {
case: "aiSetting",
value: create(InstanceSetting_AISettingSchema, {
providers: nextProviders.map(toProviderConfig),
transcription: nextTranscription,
}),
},
}),
errorContext,
});
};
const handleCreateProvider = () => {
setEditingProvider(newProvider());
@ -103,7 +171,7 @@ const AISection = () => {
setEditingProvider({ ...provider, apiKey: "" });
};
const handleSaveProvider = (provider: LocalAIProvider) => {
const handleSaveProvider = async (provider: LocalAIProvider) => {
const title = provider.title.trim();
const endpoint = provider.endpoint.trim();
@ -116,41 +184,47 @@ const AISection = () => {
return;
}
const normalizedProvider = {
...provider,
title,
endpoint,
};
setProviders((prev) => {
const exists = prev.some((item) => item.id === normalizedProvider.id);
if (!exists) {
return [...prev, normalizedProvider];
}
return prev.map((item) => (item.id === normalizedProvider.id ? normalizedProvider : item));
});
const normalizedProvider = { ...provider, title, endpoint };
const exists = providers.some((item) => item.id === normalizedProvider.id);
const nextProviders = exists
? providers.map((item) => (item.id === normalizedProvider.id ? normalizedProvider : item))
: [...providers, normalizedProvider];
const ok = await persistAISetting(nextProviders, originalSetting.transcription, "Update AI provider");
if (!ok) return;
setProviders(nextProviders);
setEditingProvider(undefined);
};
const handleDeleteProvider = () => {
const handleDeleteProvider = async () => {
if (!deleteTarget) return;
setProviders((prev) => prev.filter((provider) => provider.id !== deleteTarget.id));
const target = deleteTarget;
const nextProviders = providers.filter((provider) => provider.id !== target.id);
// If the persisted transcription references the deleted provider, the
// server would reject the save (provider_id must reference an existing
// provider). Send a cleared transcription in that case.
const persistedTranscription = originalSetting.transcription;
const nextTranscription =
persistedTranscription && persistedTranscription.providerId === target.id
? create(InstanceSetting_TranscriptionConfigSchema, {})
: persistedTranscription;
const ok = await persistAISetting(nextProviders, nextTranscription, "Delete AI provider");
if (!ok) return;
setProviders(nextProviders);
if (transcription.providerId === target.id) {
setTranscription((prev) => ({ ...prev, providerId: "" }));
}
setDeleteTarget(undefined);
};
const handleSaveSetting = async () => {
await saveInstanceSetting({
key: InstanceSetting_Key.AI,
setting: create(InstanceSettingSchema, {
name: buildInstanceSettingName(InstanceSetting_Key.AI),
value: {
case: "aiSetting",
value: create(InstanceSetting_AISettingSchema, {
providers: providers.map(toProviderConfig),
}),
},
}),
errorContext: "Update AI providers",
});
const handleSaveTranscription = async () => {
if (transcription.providerId && !transcriptionProviderRef) {
toast.error(t("setting.ai.transcription-empty-providers"));
return;
}
await persistAISetting(providers, toTranscriptionConfig(transcription), "Update transcription");
};
return (
@ -183,7 +257,7 @@ const AISection = () => {
</div>
</SettingPanel>
<SettingGroup title={t("setting.ai.providers")} description={t("setting.ai.description")}>
<SettingGroup title={t("setting.ai.integrations-title")} description={t("setting.ai.integrations-description")}>
<SettingTable
columns={[
{
@ -242,11 +316,23 @@ const AISection = () => {
/>
</SettingGroup>
<div className="w-full flex justify-end">
<Button disabled={!hasChanges} onClick={handleSaveSetting}>
{t("common.save")}
</Button>
</div>
<SettingGroup
title={t("setting.ai.transcription-title")}
description={t("setting.ai.transcription-description")}
showSeparator
actions={
<Button disabled={!transcriptionHasChanges} onClick={handleSaveTranscription}>
{t("common.save")}
</Button>
}
>
<TranscriptionForm
providers={providers}
transcription={transcription}
onChange={setTranscription}
referencedProvider={transcriptionProviderRef}
/>
</SettingGroup>
<AIProviderDialog
provider={editingProvider}
@ -267,6 +353,98 @@ const AISection = () => {
);
};
interface TranscriptionFormProps {
providers: LocalAIProvider[];
transcription: LocalTranscription;
referencedProvider: LocalAIProvider | undefined;
onChange: (next: LocalTranscription) => void;
}
const TranscriptionForm = ({ providers, transcription, referencedProvider, onChange }: TranscriptionFormProps) => {
const t = useTranslate();
const noProviders = providers.length === 0;
const update = (partial: Partial<LocalTranscription>) => {
onChange({ ...transcription, ...partial });
};
const placeholderForProvider = (provider: LocalAIProvider | undefined) => {
if (!provider) return "";
return provider.type === InstanceSetting_AIProviderType.GEMINI
? t("setting.ai.transcription-model-placeholder-gemini")
: t("setting.ai.transcription-model-placeholder-openai");
};
return (
<div className="grid grid-cols-1 sm:grid-cols-2 gap-3 max-w-3xl">
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.transcription-provider")}</Label>
<Select
value={transcription.providerId || "__none__"}
onValueChange={(value) => update({ providerId: value === "__none__" ? "" : value })}
disabled={noProviders}
>
<SelectTrigger className="w-full">
<SelectValue />
</SelectTrigger>
<SelectContent>
<SelectItem value="__none__">{t("setting.ai.transcription-no-provider")}</SelectItem>
{providers.map((provider) => (
<SelectItem key={provider.id} value={provider.id}>
{provider.title || provider.id}
</SelectItem>
))}
</SelectContent>
</Select>
{noProviders && <p className="text-xs text-muted-foreground">{t("setting.ai.transcription-empty-providers")}</p>}
{referencedProvider && !referencedProvider.apiKeySet && (
<p className="text-xs text-destructive">{t("setting.ai.transcription-warning-no-key")}</p>
)}
{referencedProvider?.type === InstanceSetting_AIProviderType.GEMINI && (
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-warning-gemini-webm")}</p>
)}
</div>
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.transcription-model")}</Label>
<Input
value={transcription.model}
onChange={(e) => update({ model: e.target.value })}
placeholder={placeholderForProvider(referencedProvider)}
disabled={!transcription.providerId}
maxLength={256}
/>
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-model-help")}</p>
</div>
<div className="flex flex-col gap-1.5">
<Label>{t("setting.ai.transcription-language")}</Label>
<Input
value={transcription.language}
onChange={(e) => update({ language: e.target.value })}
placeholder={t("setting.ai.transcription-language-placeholder")}
disabled={!transcription.providerId}
maxLength={32}
/>
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-language-help")}</p>
</div>
<div className="flex flex-col gap-1.5 sm:col-span-2">
<Label>{t("setting.ai.transcription-prompt")}</Label>
<Textarea
value={transcription.prompt}
onChange={(e) => update({ prompt: e.target.value })}
placeholder={t("setting.ai.transcription-prompt-placeholder")}
rows={3}
disabled={!transcription.providerId}
maxLength={4096}
/>
<p className="text-xs text-muted-foreground">{t("setting.ai.transcription-prompt-help")}</p>
</div>
</div>
);
};
interface AIProviderDialogProps {
provider?: LocalAIProvider;
onOpenChange: (open: boolean) => void;

@ -427,13 +427,32 @@
"edit-provider": "Edit provider",
"endpoint": "Endpoint",
"endpoint-hint": "Leave empty to use the official provider endpoint.",
"integrations-description": "Provider keys are supplied by the instance owner and used by server-side AI features.",
"integrations-title": "AI integrations",
"keep-api-key": "Leave blank to keep the existing key",
"label": "AI",
"no-providers": "No AI providers configured.",
"provider-title": "Provider name",
"provider-title-required": "Provider name is required.",
"provider-type": "Provider type",
"providers": "Providers"
"providers": "Providers",
"transcription-description": "Speech-to-text settings used when recording audio in the memo composer.",
"transcription-empty-providers": "Add an AI integration first to enable transcription.",
"transcription-language-help": "ISO 639-1 short code (e.g. en, de, zh). Leave empty to auto-detect.",
"transcription-language-placeholder": "auto-detect",
"transcription-language": "Default language",
"transcription-model-help": "Free text. Use the provider's model identifier — e.g. whisper-1, gpt-4o-transcribe, whisper-large-v3-turbo.",
"transcription-model-placeholder-gemini": "gemini-2.5-flash",
"transcription-model-placeholder-openai": "whisper-1",
"transcription-model": "Model",
"transcription-no-provider": "None — transcription disabled",
"transcription-prompt-help": "Improves spelling of proper nouns and jargon. Whisper limit is roughly 224 tokens.",
"transcription-prompt-placeholder": "Names: Alice, Bob. Glossary: kubernetes, OAuth.",
"transcription-prompt": "Prompt hints",
"transcription-provider": "Provider",
"transcription-title": "Transcription",
"transcription-warning-gemini-webm": "Gemini does not accept browser-recorded audio/webm. For in-editor recording, use an OpenAI-compatible provider.",
"transcription-warning-no-key": "The selected provider has no API key set. Edit the integration above to add one."
},
"instance": {
"access-description": "Control sign-up, authentication, profile editing, and calendar defaults for this instance.",

@ -13,30 +13,16 @@ import type { Message } from "@bufbuild/protobuf";
* Describes the file api/v1/ai_service.proto.
*/
export const file_api_v1_ai_service: GenFile = /*@__PURE__*/
fileDesc("ChdhcGkvdjEvYWlfc2VydmljZS5wcm90bxIMbWVtb3MuYXBpLnYxIpsBChFUcmFuc2NyaWJlUmVxdWVzdBIYCgtwcm92aWRlcl9pZBgBIAEoCUID4EECEjYKBmNvbmZpZxgCIAEoCzIhLm1lbW9zLmFwaS52MS5UcmFuc2NyaXB0aW9uQ29uZmlnQgPgQQISNAoFYXVkaW8YAyABKAsyIC5tZW1vcy5hcGkudjEuVHJhbnNjcmlwdGlvbkF1ZGlvQgPgQQIiQQoTVHJhbnNjcmlwdGlvbkNvbmZpZxITCgZwcm9tcHQYASABKAlCA+BBARIVCghsYW5ndWFnZRgCIAEoCUID4EEBIncKElRyYW5zY3JpcHRpb25BdWRpbxIWCgdjb250ZW50GAEgASgMQgPgQQRIABINCgN1cmkYAiABKAlIABIVCghmaWxlbmFtZRgDIAEoCUID4EEBEhkKDGNvbnRlbnRfdHlwZRgEIAEoCUID4EEBQggKBnNvdXJjZSIiChJUcmFuc2NyaWJlUmVzcG9uc2USDAoEdGV4dBgBIAEoCTKaAQoJQUlTZXJ2aWNlEowBCgpUcmFuc2NyaWJlEh8ubWVtb3MuYXBpLnYxLlRyYW5zY3JpYmVSZXF1ZXN0GiAubWVtb3MuYXBpLnYxLlRyYW5zY3JpYmVSZXNwb25zZSI72kEYcHJvdmlkZXJfaWQsY29uZmlnLGF1ZGlvgtPkkwIaOgEqIhUvYXBpL3YxL2FpOnRyYW5zY3JpYmVCpgEKEGNvbS5tZW1vcy5hcGkudjFCDkFpU2VydmljZVByb3RvUAFaMGdpdGh1Yi5jb20vdXNlbWVtb3MvbWVtb3MvcHJvdG8vZ2VuL2FwaS92MTthcGl2MaICA01BWKoCDE1lbW9zLkFwaS5WMcoCDE1lbW9zXEFwaVxWMeICGE1lbW9zXEFwaVxWMVxHUEJNZXRhZGF0YeoCDk1lbW9zOjpBcGk6OlYxYgZwcm90bzM", [file_google_api_annotations, file_google_api_client, file_google_api_field_behavior]);
fileDesc("ChdhcGkvdjEvYWlfc2VydmljZS5wcm90bxIMbWVtb3MuYXBpLnYxIkkKEVRyYW5zY3JpYmVSZXF1ZXN0EjQKBWF1ZGlvGAEgASgLMiAubWVtb3MuYXBpLnYxLlRyYW5zY3JpcHRpb25BdWRpb0ID4EECIncKElRyYW5zY3JpcHRpb25BdWRpbxIWCgdjb250ZW50GAEgASgMQgPgQQRIABINCgN1cmkYAiABKAlIABIVCghmaWxlbmFtZRgDIAEoCUID4EEBEhkKDGNvbnRlbnRfdHlwZRgEIAEoCUID4EEBQggKBnNvdXJjZSIiChJUcmFuc2NyaWJlUmVzcG9uc2USDAoEdGV4dBgBIAEoCTKGAQoJQUlTZXJ2aWNlEnkKClRyYW5zY3JpYmUSHy5tZW1vcy5hcGkudjEuVHJhbnNjcmliZVJlcXVlc3QaIC5tZW1vcy5hcGkudjEuVHJhbnNjcmliZVJlc3BvbnNlIijaQQVhdWRpb4LT5JMCGjoBKiIVL2FwaS92MS9haTp0cmFuc2NyaWJlQqYBChBjb20ubWVtb3MuYXBpLnYxQg5BaVNlcnZpY2VQcm90b1ABWjBnaXRodWIuY29tL3VzZW1lbW9zL21lbW9zL3Byb3RvL2dlbi9hcGkvdjE7YXBpdjGiAgNNQViqAgxNZW1vcy5BcGkuVjHKAgxNZW1vc1xBcGlcVjHiAhhNZW1vc1xBcGlcVjFcR1BCTWV0YWRhdGHqAg5NZW1vczo6QXBpOjpWMWIGcHJvdG8z", [file_google_api_annotations, file_google_api_client, file_google_api_field_behavior]);
/**
* @generated from message memos.api.v1.TranscribeRequest
*/
export type TranscribeRequest = Message<"memos.api.v1.TranscribeRequest"> & {
/**
* Required. The instance AI provider ID to use.
*
* @generated from field: string provider_id = 1;
*/
providerId: string;
/**
* Required. Transcription options.
*
* @generated from field: memos.api.v1.TranscriptionConfig config = 2;
*/
config?: TranscriptionConfig | undefined;
/**
* Required. Audio input.
*
* @generated from field: memos.api.v1.TranscriptionAudio audio = 3;
* @generated from field: memos.api.v1.TranscriptionAudio audio = 1;
*/
audio?: TranscriptionAudio | undefined;
};
@ -48,32 +34,6 @@ export type TranscribeRequest = Message<"memos.api.v1.TranscribeRequest"> & {
export const TranscribeRequestSchema: GenMessage<TranscribeRequest> = /*@__PURE__*/
messageDesc(file_api_v1_ai_service, 0);
/**
* @generated from message memos.api.v1.TranscriptionConfig
*/
export type TranscriptionConfig = Message<"memos.api.v1.TranscriptionConfig"> & {
/**
* Optional. A prompt to improve transcription quality.
*
* @generated from field: string prompt = 1;
*/
prompt: string;
/**
* Optional. The language of the input audio.
*
* @generated from field: string language = 2;
*/
language: string;
};
/**
* Describes the message memos.api.v1.TranscriptionConfig.
* Use `create(TranscriptionConfigSchema)` to create a new message.
*/
export const TranscriptionConfigSchema: GenMessage<TranscriptionConfig> = /*@__PURE__*/
messageDesc(file_api_v1_ai_service, 1);
/**
* @generated from message memos.api.v1.TranscriptionAudio
*/
@ -119,7 +79,7 @@ export type TranscriptionAudio = Message<"memos.api.v1.TranscriptionAudio"> & {
* Use `create(TranscriptionAudioSchema)` to create a new message.
*/
export const TranscriptionAudioSchema: GenMessage<TranscriptionAudio> = /*@__PURE__*/
messageDesc(file_api_v1_ai_service, 2);
messageDesc(file_api_v1_ai_service, 1);
/**
* @generated from message memos.api.v1.TranscribeResponse
@ -138,7 +98,7 @@ export type TranscribeResponse = Message<"memos.api.v1.TranscribeResponse"> & {
* Use `create(TranscribeResponseSchema)` to create a new message.
*/
export const TranscribeResponseSchema: GenMessage<TranscribeResponse> = /*@__PURE__*/
messageDesc(file_api_v1_ai_service, 3);
messageDesc(file_api_v1_ai_service, 2);
/**
* @generated from service memos.api.v1.AIService

File diff suppressed because one or more lines are too long
Loading…
Cancel
Save