invoke_model_with_response_stream¶

Operation¶

invoke_model_with_response_stream `async` ¶

invoke_model_with_response_stream(input: InvokeModelWithResponseStreamInput, plugins: list[Plugin] | None = None) -> OutputEventStream[ResponseStream, InvokeModelWithResponseStreamOutput]

Invoke the specified Amazon Bedrock model to run inference using the prompt and inference parameters provided in the request body. The response is returned in a stream.

To see if a model supports streaming, call GetFoundationModel and check the responseStreamingSupported field in the response.

Note

The CLI doesn't support streaming operations in Amazon Bedrock, including InvokeModelWithResponseStream.

For example code, see Invoke model with streaming code example in the Amazon Bedrock User Guide.

This operation requires permissions to perform the bedrock:InvokeModelWithResponseStream action.

Warning

To deny all inference access to resources that you specify in the modelId field, you need to deny access to the bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream actions. Doing this also denies access to the resource through the Converse API actions (Converse and ConverseStream). For more information see Deny access for inference on specific models.

For troubleshooting some of the common errors you might encounter when using the InvokeModelWithResponseStream API, see Troubleshooting Amazon Bedrock API Error Codes in the Amazon Bedrock User Guide

Parameters:

Name	Type	Description	Default
`input`	`InvokeModelWithResponseStreamInput`	An instance of `InvokeModelWithResponseStreamInput`.	required
`plugins`	`list[Plugin] \| None`	A list of callables that modify the configuration dynamically. Changes made by these plugins only apply for the duration of the operation execution and will not affect any other operation invocations.	`None`

Returns:

Type	Description
`OutputEventStream[ResponseStream, InvokeModelWithResponseStreamOutput]`	An `OutputEventStream` for server-to-client streaming of `ResponseStream` events with initial `InvokeModelWithResponseStreamOutput` response.

Source code in src/aws_sdk_bedrock_runtime/client.py

async def invoke_model_with_response_stream(
    self,
    input: InvokeModelWithResponseStreamInput,
    plugins: list[Plugin] | None = None,
) -> OutputEventStream[ResponseStream, InvokeModelWithResponseStreamOutput]:
    """Invoke the specified Amazon Bedrock model to run inference using the
    prompt and inference parameters provided in the request body. The
    response is returned in a stream.

    To see if a model supports streaming, call
    [GetFoundationModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetFoundationModel.html)
    and check the `responseStreamingSupported` field in the response.

    Note:
        The CLI doesn't support streaming operations in Amazon Bedrock,
        including `InvokeModelWithResponseStream`.

    For example code, see *Invoke model with streaming code example* in the
    *Amazon Bedrock User Guide*.

    This operation requires permissions to perform the
    `bedrock:InvokeModelWithResponseStream` action.

    Warning:
        To deny all inference access to resources that you specify in the
        modelId field, you need to deny access to the `bedrock:InvokeModel` and
        `bedrock:InvokeModelWithResponseStream` actions. Doing this also denies
        access to the resource through the Converse API actions
        ([Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html)
        and
        [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html)).
        For more information see [Deny access for inference on specific
        models](https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html#security_iam_id-based-policy-examples-deny-inference).

    For troubleshooting some of the common errors you might encounter when
    using the `InvokeModelWithResponseStream` API, see [Troubleshooting
    Amazon Bedrock API Error
    Codes](https://docs.aws.amazon.com/bedrock/latest/userguide/troubleshooting-api-error-codes.html)
    in the Amazon Bedrock User Guide

    Args:
        input:
            An instance of `InvokeModelWithResponseStreamInput`.
        plugins:
            A list of callables that modify the configuration dynamically.
            Changes made by these plugins only apply for the duration of the
            operation execution and will not affect any other operation
            invocations.

    Returns:
        An `OutputEventStream` for server-to-client streaming of `ResponseStream`
            events with initial `InvokeModelWithResponseStreamOutput` response.
    """
    operation_plugins: list[Plugin] = []
    if plugins:
        operation_plugins.extend(plugins)
    config = deepcopy(self._config)
    for plugin in operation_plugins:
        plugin(config)
    if config.protocol is None or config.transport is None:
        raise ExpectationNotMetError(
            "protocol and transport MUST be set on the config to make calls."
        )
    pipeline = RequestPipeline(protocol=config.protocol, transport=config.transport)
    call = ClientCall(
        input=input,
        operation=INVOKE_MODEL_WITH_RESPONSE_STREAM,
        context=TypedProperties({"config": config}),
        interceptor=InterceptorChain(config.interceptors),
        auth_scheme_resolver=config.auth_scheme_resolver,
        supported_auth_schemes=config.auth_schemes,
        endpoint_resolver=config.endpoint_resolver,
        retry_strategy=config.retry_strategy,
    )

    return await pipeline.output_stream(
        call, ResponseStream, _ResponseStreamDeserializer().deserialize
    )

Input¶

InvokeModelWithResponseStreamInput `dataclass` ¶

Dataclass for InvokeModelWithResponseStreamInput structure.

Source code in src/aws_sdk_bedrock_runtime/models.py

@dataclass(kw_only=True)
class InvokeModelWithResponseStreamInput:
    """Dataclass for InvokeModelWithResponseStreamInput structure."""

    body: bytes | None = field(repr=False, default=None)
    """The prompt and inference parameters in the format specified in the
    `contentType` in the header. You must provide the body in JSON format.
    To see the format and content of the request and response bodies for
    different models, refer to [Inference
    parameters](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html).
    For more information, see [Run
    inference](https://docs.aws.amazon.com/bedrock/latest/userguide/api-methods-run.html)
    in the Bedrock User Guide.
    """

    content_type: str | None = None
    """The MIME type of the input data in the request. You must specify
    `application/json`.
    """

    accept: str | None = None
    """The desired MIME type of the inference body in the response. The default
    value is `application/json`.
    """

    model_id: str | None = None
    """The unique identifier of the model to invoke to run inference.

    The `modelId` to provide depends on the type of model or throughput that
    you use:

    - If you use a base model, specify the model ID or its ARN. For a list
      of model IDs for base models, see [Amazon Bedrock base model IDs
      (on-demand
      throughput)](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html#model-ids-arns)
      in the Amazon Bedrock User Guide.

    - If you use an inference profile, specify the inference profile ID or
      its ARN. For a list of inference profile IDs, see [Supported Regions
      and models for cross-region
      inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference-support.html)
      in the Amazon Bedrock User Guide.

    - If you use a provisioned model, specify the ARN of the Provisioned
      Throughput. For more information, see [Run inference using a
      Provisioned
      Throughput](https://docs.aws.amazon.com/bedrock/latest/userguide/prov-thru-use.html)
      in the Amazon Bedrock User Guide.

    - If you use a custom model, specify the ARN of the custom model
      deployment (for on-demand inference) or the ARN of your provisioned
      model (for Provisioned Throughput). For more information, see [Use a
      custom model in Amazon
      Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-use.html)
      in the Amazon Bedrock User Guide.

    - If you use an [imported
      model](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html),
      specify the ARN of the imported model. You can get the model ARN from
      a successful call to
      [CreateModelImportJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateModelImportJob.html)
      or from the Imported models page in the Amazon Bedrock console.
    """

    trace: str | None = None
    """Specifies whether to enable or disable the Bedrock trace. If enabled,
    you can see the full Bedrock trace.
    """

    guardrail_identifier: str | None = None
    """The unique identifier of the guardrail that you want to use. If you
    don't provide a value, no guardrail is applied to the invocation.

    An error is thrown in the following situations.

    - You don't provide a guardrail identifier but you specify the
      `amazon-bedrock-guardrailConfig` field in the request body.

    - You enable the guardrail but the `contentType` isn't
      `application/json`.

    - You provide a guardrail identifier, but `guardrailVersion` isn't
      specified.
    """

    guardrail_version: str | None = None
    """The version number for the guardrail. The value can also be `DRAFT`."""

    performance_config_latency: str = "standard"
    """Model performance settings for the request."""

    def serialize(self, serializer: ShapeSerializer):
        serializer.write_struct(_SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT, self)

    def serialize_members(self, serializer: ShapeSerializer):
        if self.body is not None:
            serializer.write_blob(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["body"],
                self.body,
            )

        if self.content_type is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["contentType"],
                self.content_type,
            )

        if self.accept is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["accept"],
                self.accept,
            )

        if self.model_id is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["modelId"],
                self.model_id,
            )

        if self.trace is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["trace"],
                self.trace,
            )

        if self.guardrail_identifier is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                    "guardrailIdentifier"
                ],
                self.guardrail_identifier,
            )

        if self.guardrail_version is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                    "guardrailVersion"
                ],
                self.guardrail_version,
            )

        if self.performance_config_latency is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                    "performanceConfigLatency"
                ],
                self.performance_config_latency,
            )

    @classmethod
    def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
        return cls(**cls.deserialize_kwargs(deserializer))

    @classmethod
    def deserialize_kwargs(cls, deserializer: ShapeDeserializer) -> dict[str, Any]:
        kwargs: dict[str, Any] = {}

        def _consumer(schema: Schema, de: ShapeDeserializer) -> None:
            match schema.expect_member_index():
                case 0:
                    kwargs["body"] = de.read_blob(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["body"]
                    )

                case 1:
                    kwargs["content_type"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "contentType"
                        ]
                    )

                case 2:
                    kwargs["accept"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "accept"
                        ]
                    )

                case 3:
                    kwargs["model_id"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "modelId"
                        ]
                    )

                case 4:
                    kwargs["trace"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["trace"]
                    )

                case 5:
                    kwargs["guardrail_identifier"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "guardrailIdentifier"
                        ]
                    )

                case 6:
                    kwargs["guardrail_version"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "guardrailVersion"
                        ]
                    )

                case 7:
                    kwargs["performance_config_latency"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "performanceConfigLatency"
                        ]
                    )

                case _:
                    logger.debug("Unexpected member schema: %s", schema)

        deserializer.read_struct(
            _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT, consumer=_consumer
        )
        return kwargs

Attributes¶

accept `class-attribute` `instance-attribute` ¶

accept: str | None = None

The desired MIME type of the inference body in the response. The default value is application/json.

body `class-attribute` `instance-attribute` ¶

body: bytes | None = field(repr=False, default=None)

The prompt and inference parameters in the format specified in the contentType in the header. You must provide the body in JSON format. To see the format and content of the request and response bodies for different models, refer to Inference parameters. For more information, see Run inference in the Bedrock User Guide.

content_type `class-attribute` `instance-attribute` ¶

content_type: str | None = None

The MIME type of the input data in the request. You must specify application/json.

guardrail_identifier `class-attribute` `instance-attribute` ¶

guardrail_identifier: str | None = None

The unique identifier of the guardrail that you want to use. If you don't provide a value, no guardrail is applied to the invocation.

An error is thrown in the following situations.

You don't provide a guardrail identifier but you specify the amazon-bedrock-guardrailConfig field in the request body.
You enable the guardrail but the contentType isn't application/json.
You provide a guardrail identifier, but guardrailVersion isn't specified.

guardrail_version `class-attribute` `instance-attribute` ¶

guardrail_version: str | None = None

The version number for the guardrail. The value can also be DRAFT.

model_id `class-attribute` `instance-attribute` ¶

model_id: str | None = None

The unique identifier of the model to invoke to run inference.

The modelId to provide depends on the type of model or throughput that you use:

If you use a base model, specify the model ID or its ARN. For a list of model IDs for base models, see Amazon Bedrock base model IDs (on-demand throughput) in the Amazon Bedrock User Guide.
If you use an inference profile, specify the inference profile ID or its ARN. For a list of inference profile IDs, see Supported Regions and models for cross-region inference in the Amazon Bedrock User Guide.
If you use a provisioned model, specify the ARN of the Provisioned Throughput. For more information, see Run inference using a Provisioned Throughput in the Amazon Bedrock User Guide.
If you use a custom model, specify the ARN of the custom model deployment (for on-demand inference) or the ARN of your provisioned model (for Provisioned Throughput). For more information, see Use a custom model in Amazon Bedrock in the Amazon Bedrock User Guide.
If you use an imported model, specify the ARN of the imported model. You can get the model ARN from a successful call to CreateModelImportJob or from the Imported models page in the Amazon Bedrock console.

performance_config_latency `class-attribute` `instance-attribute` ¶

performance_config_latency: str = 'standard'

Model performance settings for the request.

trace `class-attribute` `instance-attribute` ¶

trace: str | None = None

Specifies whether to enable or disable the Bedrock trace. If enabled, you can see the full Bedrock trace.

Output¶

This operation returns an OutputEventStream for server-to-client streaming.

Event Stream Structure¶

Output Event Type¶

ResponseStream

Initial Response Structure¶

InvokeModelWithResponseStreamOutput `dataclass` ¶

Dataclass for InvokeModelWithResponseStreamOutput structure.

Source code in src/aws_sdk_bedrock_runtime/models.py

@dataclass(kw_only=True)
class InvokeModelWithResponseStreamOutput:
    """Dataclass for InvokeModelWithResponseStreamOutput structure."""

    content_type: str
    """The MIME type of the inference result."""

    performance_config_latency: str | None = None
    """Model performance settings for the request."""

    def serialize(self, serializer: ShapeSerializer):
        serializer.write_struct(_SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT, self)

    def serialize_members(self, serializer: ShapeSerializer):
        serializer.write_string(
            _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT.members["contentType"],
            self.content_type,
        )
        if self.performance_config_latency is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT.members[
                    "performanceConfigLatency"
                ],
                self.performance_config_latency,
            )

    @classmethod
    def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
        return cls(**cls.deserialize_kwargs(deserializer))

    @classmethod
    def deserialize_kwargs(cls, deserializer: ShapeDeserializer) -> dict[str, Any]:
        kwargs: dict[str, Any] = {}

        def _consumer(schema: Schema, de: ShapeDeserializer) -> None:
            match schema.expect_member_index():
                case 1:
                    kwargs["content_type"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT.members[
                            "contentType"
                        ]
                    )

                case 2:
                    kwargs["performance_config_latency"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT.members[
                            "performanceConfigLatency"
                        ]
                    )

                case _:
                    logger.debug("Unexpected member schema: %s", schema)

        deserializer.read_struct(
            _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT, consumer=_consumer
        )
        return kwargs

Attributes¶

content_type `instance-attribute` ¶

content_type: str

The MIME type of the inference result.

performance_config_latency `class-attribute` `instance-attribute` ¶

performance_config_latency: str | None = None

Model performance settings for the request.

invoke_model_with_response_stream¶

Operation¶

invoke_model_with_response_stream async ¶

Input¶

InvokeModelWithResponseStreamInput dataclass ¶

Attributes¶

accept class-attribute instance-attribute ¶

body class-attribute instance-attribute ¶

content_type class-attribute instance-attribute ¶

guardrail_identifier class-attribute instance-attribute ¶

guardrail_version class-attribute instance-attribute ¶

model_id class-attribute instance-attribute ¶

performance_config_latency class-attribute instance-attribute ¶

trace class-attribute instance-attribute ¶

Output¶

Event Stream Structure¶

Output Event Type¶

Initial Response Structure¶

InvokeModelWithResponseStreamOutput dataclass ¶

Attributes¶

content_type instance-attribute ¶

performance_config_latency class-attribute instance-attribute ¶

Errors¶

invoke_model_with_response_stream `async` ¶

InvokeModelWithResponseStreamInput `dataclass` ¶

accept `class-attribute` `instance-attribute` ¶

body `class-attribute` `instance-attribute` ¶

content_type `class-attribute` `instance-attribute` ¶

guardrail_identifier `class-attribute` `instance-attribute` ¶

guardrail_version `class-attribute` `instance-attribute` ¶

model_id `class-attribute` `instance-attribute` ¶

performance_config_latency `class-attribute` `instance-attribute` ¶

trace `class-attribute` `instance-attribute` ¶

InvokeModelWithResponseStreamOutput `dataclass` ¶

content_type `instance-attribute` ¶

performance_config_latency `class-attribute` `instance-attribute` ¶