Google GenAI Processors：重新定义实时AI开发架构

文｜祝融

编辑｜郭嘉

<——最重要的是订阅“鲁班模锤”——>

构建复杂的AI应用，特别是处理多模态输入并需要实时响应的应用，经常感觉像是在拼装复杂拼图：需要将不同的数据处理步骤、异步API调用和自定义逻辑拼接在一起。随着复杂性的增长，这可能导致脆弱、难以维护的代码。2025年7月，Google DeepMind发布了GenAI Processors，这是一个专为解决这些技术挑战而设计的开源Python库。

核心架构：ProcessorParts

GenAI Processors的核心创新在于其ProcessorParts数据结构。每个ProcessorPart可以被视为标准化的数据部分（例如，音频块、文本转录、图像帧），它们携带相关的元数据在管道中流动。这种设计有几个关键技术优势：

结构化数据载荷

class ProcessorPart:    content: ProcessorContent  # 实际数据载荷    metadata: Dict[str, Any]   # 元数据字典    mime_type: str            # MIME类型标识    timestamp: float          # 时间戳    sequence_id: str          # 序列标识符

异步流处理能力库提供了用于分割、连接和合并ProcessorParts异步流的实用工具。这意味着数据可以在不阻塞主线程的情况下连续处理：

async def process_stream(input_stream: AsyncIterator[ProcessorPart]) -> AsyncIterator[ProcessorPart]:    async for part in input_stream:        # 处理每个部分        processed_part = await transform_part(part)        yield processed_part

双向流控制与传统的单向数据流不同，GenAI Processors支持双向流控制，允许下游处理器向上游发送反馈信息

class BidirectionalProcessor:    async def process(self, input_stream, feedback_stream):        # 同时处理输入和反馈        async for input_part, feedback_part in zip(input_stream, feedback_stream):            result = await self.handle_with_feedback(input_part, feedback_part)            yield result

Processor接口：统一的处理抽象

每个Processor都实现了标准接口，这提供了强大的组合能力：

class Processor(ABC):    @abstractmethod    async def process(self, input_stream: AsyncIterator[ProcessorPart]) -> AsyncIterator[ProcessorPart]:        pass    def __call__(self, input_stream):        return self.process(input_stream)

这种设计允许复杂的处理链：

# 处理链组合audio_processor = AudioTranscriber()text_processor = TextAnalyzer()response_generator = ResponseGenerator()# 链式处理async def process_audio_input(audio_stream):    transcribed = audio_processor(audio_stream)    analyzed = text_processor(transcribed)    responses = response_generator(analyzed)    return responses

技术实现细节

GenAI Processors库需要Python 3.10+。这个版本要求确保了对现代异步特性的完全支持：pip install genai-processors

核心模块core/目录包含一组基本处理器，可以在你自己的应用程序中使用。它包括大多数实时应用程序所需的通用构建块。

AudioProcessor: 处理音频数据的专用处理器
TextProcessor: 文本处理和分析
ImageProcessor: 图像和视频帧处理
ModelProcessor: 与AI模型交互的处理器
StreamSplitter: 将单一流分割为多个并行流
StreamMerger: 合并多个流为单一输出
FilterProcessor: 基于条件过滤数据
TransformProcessor: 数据格式转换

该库提供了与Google Gemini API的现成连接器，包括同步的基于文本的调用和用于流式应用的Gemini Live API。

1. 同步文本处理from genai_processors.models import GeminiTextProcessortext_processor = GeminiTextProcessor(    model_name="gemini-pro",    api_key="your-api-key",    temperature=0.7,    max_tokens=1000)async def process_text_query(query: str):    input_part = ProcessorPart(        content=TextContent(query),        metadata={"user_id": "123", "session_id": "abc"}    )    async for response_part in text_processor(async_iter([input_part])):        return response_part.content.text2. Live API流式处理from genai_processors.models import GeminiLiveProcessorlive_processor = GeminiLiveProcessor(    model_name="gemini-live",    api_key="your-api-key",    streaming=True,    real_time_factor=1.0)async def handle_live_audio(audio_stream):    async for audio_chunk in audio_stream:        input_part = ProcessorPart(            content=AudioContent(audio_chunk),            metadata={"format": "wav", "sample_rate": 16000}        )        async for response in live_processor(async_iter([input_part])):            if response.content.type == "audio":                yield response.content.audio_data            elif response.content.type == "text":                print(f"Transcription: {response.content.text}")

GenAI Processors的异步设计带来了几个关键的性能优势：

1. 非阻塞I/O处理传统的同步处理在等待API响应时会阻塞整个线程。GenAI Processors通过异步设计避免了这个问题：

class AsyncModelProcessor:    async def process_batch(self, inputs: List[ProcessorPart]):        # 并发处理多个输入        tasks = [self.process_single(input_part) for input_part in inputs]        results = await asyncio.gather(*tasks)        return results    async def process_single(self, input_part: ProcessorPart):        # 异步API调用        async with aiohttp.ClientSession() as session:            response = await session.post(self.api_endpoint, json=input_part.to_dict())            return ProcessorPart.from_response(await response.json())

2. 只需几行代码即可使用 Gemini Live API 轻松构建能够实时处理音频和视频流的“Live Agent”。在以下示例中，使用 + 运算符组合输入源和处理步骤，从而创建清晰的数据流

from genai_processors.core import audio_io, live_model, video# Input processor: combines camera streams and audio streamsinput_processor = video.VideoIn() + audio_io.PyAudioIn(...)# Output processor: plays the audio parts. Handles interruptions and pauses# audio output when the user is speaking.play_output = audio_io.PyAudioOut(...)# Gemini Live API processorlive_processor = live_model.LiveProcessor(...)# Compose the agent: mic+camera -> Gemini Live API -> play audiolive_processor = live_model.LiveProcessor(...)live_agent = input_processor + live_processor + play_outputasync for part in live_agent(streams.endless_stream()):  # Process the output parts (e.g., print transcription, model output, metadata)  print(part)

具体应用场景

对于需要处理大量数据场景，GenAI Processors提供优化的批处理能力：

class BatchProcessor:    def __init__(self, batch_size: int = 32, max_concurrency: int = 10):        self.batch_size = batch_size        self.semaphore = asyncio.Semaphore(max_concurrency)    async def process_batch(self, input_stream):        batch = []        async for item in input_stream:            batch.append(item)            if len(batch) >= self.batch_size:                async with self.semaphore:                    results = await self.process_batch_items(batch)                    for result in results:                        yield result                batch = []        if batch:            async with self.semaphore:                results = await self.process_batch_items(batch)                for result in results:                    yield result

GenAI Processors实现了智能的背压控制机制和内存管理和资源清理，当然它也支持自定义Processor。创建自定义处理器的典型步骤包括创建Processor或PartProcessor，

class CustomAudioProcessor(Processor):    def __init__(self, model_path: str, config: Dict[str, Any]):        self.model = load_model(model_path)        self.config = config    async def process(self, input_stream: AsyncIterator[ProcessorPart]) -> AsyncIterator[ProcessorPart]:        async for part in input_stream:            # 验证输入类型            if not isinstance(part.content, AudioContent):                raise ValueError(f"Expected AudioContent, got {type(part.content)}")            audio_data = await self.preprocess_audio(part.content.audio_data)            result = await self.model.predict(audio_data)            # 创建输出ProcessorPart            output_part = ProcessorPart(                content=TextContent(result.transcription),                metadata={                    **part.metadata,                    'confidence': result.confidence,                    'processing_time': result.processing_time                }            )            yield output_part    async def preprocess_audio(self, audio_data: bytes) -> np.ndarray:        # 音频预处理逻辑        audio_array = np.frombuffer(audio_data, dtype=np.int16)        # 标准化        audio_array = audio_array.astype(np.float32) / 32768.0        # 重采样到目标频率        if self.config.get('target_sample_rate'):            audio_array = resample(audio_array, self.config['target_sample_rate'])        return audio_arrayPartProcessor的高级用法对于需要更细粒度控制的场景，可以使用PartProcessor：class AdvancedPartProcessor(PartProcessor):    async def process_part(self, part: ProcessorPart) -> AsyncIterator[ProcessorPart]:        # 检查是否需要分割大型数据        if part.content.size > self.max_chunk_size:            # 分割为较小的块            chunks = await self.split_content(part.content)            for i, chunk in enumerate(chunks):                chunk_part = ProcessorPart(                    content=chunk,                    metadata={                        **part.metadata,                        'chunk_index': i,                        'total_chunks': len(chunks)                    }                )                processed_chunk = await self.process_chunk(chunk_part)                yield processed_chunk        else:            # 直接处理小数据            result = await self.process_single_part(part)            yield result

与现有技术的深度对比，相比Apache Kafka Streams而言，它是AI原生设计，专门为AI工作负载设计，内置了对多模态数据和AI模型的支持，Kafka Streams需要额外的适配层来处理AI特定的数据类型

GenAI Processors通过其创新的ProcessorParts流式架构和统一的Processor接口，为AI应用开发提供了一个强大而灵活的基础设施。这些”模型处理器”抽象了批处理、上下文管理和流式I/O的复杂性，使得交互式系统的快速原型开发成为可能。

Google GenAI Processors：重新定义实时AI开发架构

更多专栏请看：

{{userData.name}}已认证

Google GenAI Processors：重新定义实时AI开发架构

🧠 解码大语言模型的记忆力：上下文长度的前世今生

🧠 大模型到底厉害在哪？看完这篇你就懂了！