Skip to content

invoke_model_with_response_stream

Operation

invoke_model_with_response_stream async

invoke_model_with_response_stream(input: InvokeModelWithResponseStreamInput, plugins: list[Plugin] | None = None) -> OutputEventStream[ResponseStream, InvokeModelWithResponseStreamOutput]

Invoke the specified Amazon Bedrock model to run inference using the prompt and inference parameters provided in the request body. The response is returned in a stream.

To see if a model supports streaming, call GetFoundationModel and check the responseStreamingSupported field in the response.

Note

The CLI doesn't support streaming operations in Amazon Bedrock, including InvokeModelWithResponseStream.

For example code, see Invoke model with streaming code example in the Amazon Bedrock User Guide.

This operation requires permissions to perform the bedrock:InvokeModelWithResponseStream action.

Warning

To deny all inference access to resources that you specify in the modelId field, you need to deny access to the bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream actions. Doing this also denies access to the resource through the Converse API actions (Converse and ConverseStream). For more information see Deny access for inference on specific models.

For troubleshooting some of the common errors you might encounter when using the InvokeModelWithResponseStream API, see Troubleshooting Amazon Bedrock API Error Codes in the Amazon Bedrock User Guide

Parameters:

Name Type Description Default
input InvokeModelWithResponseStreamInput

An instance of InvokeModelWithResponseStreamInput.

required
plugins list[Plugin] | None

A list of callables that modify the configuration dynamically. Changes made by these plugins only apply for the duration of the operation execution and will not affect any other operation invocations.

None

Returns:

Type Description
OutputEventStream[ResponseStream, InvokeModelWithResponseStreamOutput]

An OutputEventStream for server-to-client streaming of ResponseStream events with initial InvokeModelWithResponseStreamOutput response.

Source code in src/aws_sdk_bedrock_runtime/client.py
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
async def invoke_model_with_response_stream(
    self,
    input: InvokeModelWithResponseStreamInput,
    plugins: list[Plugin] | None = None,
) -> OutputEventStream[ResponseStream, InvokeModelWithResponseStreamOutput]:
    """Invoke the specified Amazon Bedrock model to run inference using the
    prompt and inference parameters provided in the request body. The
    response is returned in a stream.

    To see if a model supports streaming, call
    [GetFoundationModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetFoundationModel.html)
    and check the `responseStreamingSupported` field in the response.

    Note:
        The CLI doesn't support streaming operations in Amazon Bedrock,
        including `InvokeModelWithResponseStream`.

    For example code, see *Invoke model with streaming code example* in the
    *Amazon Bedrock User Guide*.

    This operation requires permissions to perform the
    `bedrock:InvokeModelWithResponseStream` action.

    Warning:
        To deny all inference access to resources that you specify in the
        modelId field, you need to deny access to the `bedrock:InvokeModel` and
        `bedrock:InvokeModelWithResponseStream` actions. Doing this also denies
        access to the resource through the Converse API actions
        ([Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html)
        and
        [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html)).
        For more information see [Deny access for inference on specific
        models](https://docs.aws.amazon.com/bedrock/latest/userguide/security_iam_id-based-policy-examples.html#security_iam_id-based-policy-examples-deny-inference).

    For troubleshooting some of the common errors you might encounter when
    using the `InvokeModelWithResponseStream` API, see [Troubleshooting
    Amazon Bedrock API Error
    Codes](https://docs.aws.amazon.com/bedrock/latest/userguide/troubleshooting-api-error-codes.html)
    in the Amazon Bedrock User Guide

    Args:
        input:
            An instance of `InvokeModelWithResponseStreamInput`.
        plugins:
            A list of callables that modify the configuration dynamically.
            Changes made by these plugins only apply for the duration of the
            operation execution and will not affect any other operation
            invocations.

    Returns:
        An `OutputEventStream` for server-to-client streaming of `ResponseStream`
            events with initial `InvokeModelWithResponseStreamOutput` response.
    """
    operation_plugins: list[Plugin] = []
    if plugins:
        operation_plugins.extend(plugins)
    config = deepcopy(self._config)
    for plugin in operation_plugins:
        plugin(config)
    if config.protocol is None or config.transport is None:
        raise ExpectationNotMetError(
            "protocol and transport MUST be set on the config to make calls."
        )
    pipeline = RequestPipeline(protocol=config.protocol, transport=config.transport)
    call = ClientCall(
        input=input,
        operation=INVOKE_MODEL_WITH_RESPONSE_STREAM,
        context=TypedProperties({"config": config}),
        interceptor=InterceptorChain(config.interceptors),
        auth_scheme_resolver=config.auth_scheme_resolver,
        supported_auth_schemes=config.auth_schemes,
        endpoint_resolver=config.endpoint_resolver,
        retry_strategy=config.retry_strategy,
    )

    return await pipeline.output_stream(
        call, ResponseStream, _ResponseStreamDeserializer().deserialize
    )

Input

InvokeModelWithResponseStreamInput dataclass

Dataclass for InvokeModelWithResponseStreamInput structure.

Source code in src/aws_sdk_bedrock_runtime/models.py
13448
13449
13450
13451
13452
13453
13454
13455
13456
13457
13458
13459
13460
13461
13462
13463
13464
13465
13466
13467
13468
13469
13470
13471
13472
13473
13474
13475
13476
13477
13478
13479
13480
13481
13482
13483
13484
13485
13486
13487
13488
13489
13490
13491
13492
13493
13494
13495
13496
13497
13498
13499
13500
13501
13502
13503
13504
13505
13506
13507
13508
13509
13510
13511
13512
13513
13514
13515
13516
13517
13518
13519
13520
13521
13522
13523
13524
13525
13526
13527
13528
13529
13530
13531
13532
13533
13534
13535
13536
13537
13538
13539
13540
13541
13542
13543
13544
13545
13546
13547
13548
13549
13550
13551
13552
13553
13554
13555
13556
13557
13558
13559
13560
13561
13562
13563
13564
13565
13566
13567
13568
13569
13570
13571
13572
13573
13574
13575
13576
13577
13578
13579
13580
13581
13582
13583
13584
13585
13586
13587
13588
13589
13590
13591
13592
13593
13594
13595
13596
13597
13598
13599
13600
13601
13602
13603
13604
13605
13606
13607
13608
13609
13610
13611
13612
13613
13614
13615
13616
13617
13618
13619
13620
13621
13622
13623
13624
13625
13626
13627
13628
13629
13630
13631
13632
13633
13634
13635
13636
13637
13638
13639
13640
13641
13642
13643
13644
13645
13646
13647
13648
13649
13650
13651
13652
13653
13654
13655
13656
13657
13658
13659
13660
13661
13662
13663
13664
13665
@dataclass(kw_only=True)
class InvokeModelWithResponseStreamInput:
    """Dataclass for InvokeModelWithResponseStreamInput structure."""

    body: bytes | None = field(repr=False, default=None)
    """The prompt and inference parameters in the format specified in the
    `contentType` in the header. You must provide the body in JSON format.
    To see the format and content of the request and response bodies for
    different models, refer to [Inference
    parameters](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html).
    For more information, see [Run
    inference](https://docs.aws.amazon.com/bedrock/latest/userguide/api-methods-run.html)
    in the Bedrock User Guide.
    """

    content_type: str | None = None
    """The MIME type of the input data in the request. You must specify
    `application/json`.
    """

    accept: str | None = None
    """The desired MIME type of the inference body in the response. The default
    value is `application/json`.
    """

    model_id: str | None = None
    """The unique identifier of the model to invoke to run inference.

    The `modelId` to provide depends on the type of model or throughput that
    you use:

    - If you use a base model, specify the model ID or its ARN. For a list
      of model IDs for base models, see [Amazon Bedrock base model IDs
      (on-demand
      throughput)](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html#model-ids-arns)
      in the Amazon Bedrock User Guide.

    - If you use an inference profile, specify the inference profile ID or
      its ARN. For a list of inference profile IDs, see [Supported Regions
      and models for cross-region
      inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference-support.html)
      in the Amazon Bedrock User Guide.

    - If you use a provisioned model, specify the ARN of the Provisioned
      Throughput. For more information, see [Run inference using a
      Provisioned
      Throughput](https://docs.aws.amazon.com/bedrock/latest/userguide/prov-thru-use.html)
      in the Amazon Bedrock User Guide.

    - If you use a custom model, specify the ARN of the custom model
      deployment (for on-demand inference) or the ARN of your provisioned
      model (for Provisioned Throughput). For more information, see [Use a
      custom model in Amazon
      Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-use.html)
      in the Amazon Bedrock User Guide.

    - If you use an [imported
      model](https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-import-model.html),
      specify the ARN of the imported model. You can get the model ARN from
      a successful call to
      [CreateModelImportJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateModelImportJob.html)
      or from the Imported models page in the Amazon Bedrock console.
    """

    trace: str | None = None
    """Specifies whether to enable or disable the Bedrock trace. If enabled,
    you can see the full Bedrock trace.
    """

    guardrail_identifier: str | None = None
    """The unique identifier of the guardrail that you want to use. If you
    don't provide a value, no guardrail is applied to the invocation.

    An error is thrown in the following situations.

    - You don't provide a guardrail identifier but you specify the
      `amazon-bedrock-guardrailConfig` field in the request body.

    - You enable the guardrail but the `contentType` isn't
      `application/json`.

    - You provide a guardrail identifier, but `guardrailVersion` isn't
      specified.
    """

    guardrail_version: str | None = None
    """The version number for the guardrail. The value can also be `DRAFT`."""

    performance_config_latency: str = "standard"
    """Model performance settings for the request."""

    def serialize(self, serializer: ShapeSerializer):
        serializer.write_struct(_SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT, self)

    def serialize_members(self, serializer: ShapeSerializer):
        if self.body is not None:
            serializer.write_blob(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["body"],
                self.body,
            )

        if self.content_type is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["contentType"],
                self.content_type,
            )

        if self.accept is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["accept"],
                self.accept,
            )

        if self.model_id is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["modelId"],
                self.model_id,
            )

        if self.trace is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["trace"],
                self.trace,
            )

        if self.guardrail_identifier is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                    "guardrailIdentifier"
                ],
                self.guardrail_identifier,
            )

        if self.guardrail_version is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                    "guardrailVersion"
                ],
                self.guardrail_version,
            )

        if self.performance_config_latency is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                    "performanceConfigLatency"
                ],
                self.performance_config_latency,
            )

    @classmethod
    def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
        return cls(**cls.deserialize_kwargs(deserializer))

    @classmethod
    def deserialize_kwargs(cls, deserializer: ShapeDeserializer) -> dict[str, Any]:
        kwargs: dict[str, Any] = {}

        def _consumer(schema: Schema, de: ShapeDeserializer) -> None:
            match schema.expect_member_index():
                case 0:
                    kwargs["body"] = de.read_blob(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["body"]
                    )

                case 1:
                    kwargs["content_type"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "contentType"
                        ]
                    )

                case 2:
                    kwargs["accept"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "accept"
                        ]
                    )

                case 3:
                    kwargs["model_id"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "modelId"
                        ]
                    )

                case 4:
                    kwargs["trace"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members["trace"]
                    )

                case 5:
                    kwargs["guardrail_identifier"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "guardrailIdentifier"
                        ]
                    )

                case 6:
                    kwargs["guardrail_version"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "guardrailVersion"
                        ]
                    )

                case 7:
                    kwargs["performance_config_latency"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT.members[
                            "performanceConfigLatency"
                        ]
                    )

                case _:
                    logger.debug("Unexpected member schema: %s", schema)

        deserializer.read_struct(
            _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_INPUT, consumer=_consumer
        )
        return kwargs

Attributes

accept class-attribute instance-attribute
accept: str | None = None

The desired MIME type of the inference body in the response. The default value is application/json.

body class-attribute instance-attribute
body: bytes | None = field(repr=False, default=None)

The prompt and inference parameters in the format specified in the contentType in the header. You must provide the body in JSON format. To see the format and content of the request and response bodies for different models, refer to Inference parameters. For more information, see Run inference in the Bedrock User Guide.

content_type class-attribute instance-attribute
content_type: str | None = None

The MIME type of the input data in the request. You must specify application/json.

guardrail_identifier class-attribute instance-attribute
guardrail_identifier: str | None = None

The unique identifier of the guardrail that you want to use. If you don't provide a value, no guardrail is applied to the invocation.

An error is thrown in the following situations.

  • You don't provide a guardrail identifier but you specify the amazon-bedrock-guardrailConfig field in the request body.

  • You enable the guardrail but the contentType isn't application/json.

  • You provide a guardrail identifier, but guardrailVersion isn't specified.

guardrail_version class-attribute instance-attribute
guardrail_version: str | None = None

The version number for the guardrail. The value can also be DRAFT.

model_id class-attribute instance-attribute
model_id: str | None = None

The unique identifier of the model to invoke to run inference.

The modelId to provide depends on the type of model or throughput that you use:

performance_config_latency class-attribute instance-attribute
performance_config_latency: str = 'standard'

Model performance settings for the request.

trace class-attribute instance-attribute
trace: str | None = None

Specifies whether to enable or disable the Bedrock trace. If enabled, you can see the full Bedrock trace.

Output

This operation returns an OutputEventStream for server-to-client streaming.

Event Stream Structure

Output Event Type

ResponseStream

Initial Response Structure

InvokeModelWithResponseStreamOutput dataclass

Dataclass for InvokeModelWithResponseStreamOutput structure.

Source code in src/aws_sdk_bedrock_runtime/models.py
13932
13933
13934
13935
13936
13937
13938
13939
13940
13941
13942
13943
13944
13945
13946
13947
13948
13949
13950
13951
13952
13953
13954
13955
13956
13957
13958
13959
13960
13961
13962
13963
13964
13965
13966
13967
13968
13969
13970
13971
13972
13973
13974
13975
13976
13977
13978
13979
13980
13981
13982
13983
13984
13985
13986
13987
13988
@dataclass(kw_only=True)
class InvokeModelWithResponseStreamOutput:
    """Dataclass for InvokeModelWithResponseStreamOutput structure."""

    content_type: str
    """The MIME type of the inference result."""

    performance_config_latency: str | None = None
    """Model performance settings for the request."""

    def serialize(self, serializer: ShapeSerializer):
        serializer.write_struct(_SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT, self)

    def serialize_members(self, serializer: ShapeSerializer):
        serializer.write_string(
            _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT.members["contentType"],
            self.content_type,
        )
        if self.performance_config_latency is not None:
            serializer.write_string(
                _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT.members[
                    "performanceConfigLatency"
                ],
                self.performance_config_latency,
            )

    @classmethod
    def deserialize(cls, deserializer: ShapeDeserializer) -> Self:
        return cls(**cls.deserialize_kwargs(deserializer))

    @classmethod
    def deserialize_kwargs(cls, deserializer: ShapeDeserializer) -> dict[str, Any]:
        kwargs: dict[str, Any] = {}

        def _consumer(schema: Schema, de: ShapeDeserializer) -> None:
            match schema.expect_member_index():
                case 1:
                    kwargs["content_type"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT.members[
                            "contentType"
                        ]
                    )

                case 2:
                    kwargs["performance_config_latency"] = de.read_string(
                        _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT.members[
                            "performanceConfigLatency"
                        ]
                    )

                case _:
                    logger.debug("Unexpected member schema: %s", schema)

        deserializer.read_struct(
            _SCHEMA_INVOKE_MODEL_WITH_RESPONSE_STREAM_OUTPUT, consumer=_consumer
        )
        return kwargs
Attributes
content_type instance-attribute
content_type: str

The MIME type of the inference result.

performance_config_latency class-attribute instance-attribute
performance_config_latency: str | None = None

Model performance settings for the request.

Errors