1
00:00:00,700 --> 00:00:03,510
In this video, we ll talk about streaming.

2
00:00:04,260 --> 00:00:08,150
In the context of LLMs, streaming refers

3
00:00:08,160 --> 00:00:14,550
to the process of delivering the response
in a continuous stream of data instead of

4
00:00:14,560 --> 00:00:17,250
sending the entire response at once.

5
00:00:17,760 --> 00:00:20,610
This allows the user to receive the

6
00:00:20,620 --> 00:00:25,190
response piece by piece as it is
generated which can improve the user

7
00:00:25,200 --> 00:00:28,830
experience and reduce the overall latency.

8
00:00:30,520 --> 00:00:34,610
I am asking Copilot to write a rock song

9
00:00:34,620 --> 00:00:36,530
about the moon and the raven.

10
00:00:39,150 --> 00:00:42,260
As you can observe, Copilot is streaming

11
00:00:42,270 --> 00:00:48,260
the answer not delivering the entire
response at once, but piece by piece.

12
00:00:51,930 --> 00:00:57,080
Let s get back to coding and assign the
same task to the LLM.

13
00:00:59,230 --> 00:01:07,660
I am importing chatOpenAi and creating
the LLM object and the prompt will be

14
00:01:07,670 --> 00:01:21,360
write a rock song about the moon and the
raven and print LLM .invoke of prompt .content.

15
00:01:23,290 --> 00:01:28,560
By default, the model will return the
response after completing the entire

16
00:01:28,570 --> 00:01:29,600
generation process.

17
00:01:30,350 --> 00:01:31,120
Let s run the code.

18
00:01:32,070 --> 00:01:33,140
It is running.

19
00:01:36,500 --> 00:01:38,150
Let s enable streaming.

20
00:01:39,160 --> 00:01:44,750
The model will send you pieces of the
response as they are completed rather

21
00:01:44,760 --> 00:01:48,390
than waiting until the entire response is finished.

22
00:01:49,080 --> 00:01:52,390
To enable streaming, simply call LLM

23
00:01:52,400 --> 00:01:56,210
.stream with the prompt as an argument.

24
00:01:58,890 --> 00:02:02,100
To observe the streaming, I will do the following.

25
00:02:02,110 --> 00:02:16,420
For chunk in LLM .stream of prompt, print
chunk .content and equals empty string

26
00:02:16,430 --> 00:02:20,200
and the flash equals true.

27
00:02:21,650 --> 00:02:22,540
I am running the code.

28
00:02:23,550 --> 00:02:28,580
You can see how it is streaming the
response sending it in pieces.

29
00:02:40,470 --> 00:02:47,380
If you want token by token streaming,
this requires native support from the LLM provider.

30
00:02:48,230 --> 00:02:48,540
That s it.

31
00:02:48,870 --> 00:02:51,240
In this video, you learned about

32
00:02:51,250 --> 00:02:55,620
streaming, what it is and how to
implement streaming in Landshare.