1 00:00:00,700 --> 00:00:03,510 In this video, we ll talk about streaming. 2 00:00:04,260 --> 00:00:08,150 In the context of LLMs, streaming refers 3 00:00:08,160 --> 00:00:14,550 to the process of delivering the response in a continuous stream of data instead of 4 00:00:14,560 --> 00:00:17,250 sending the entire response at once. 5 00:00:17,760 --> 00:00:20,610 This allows the user to receive the 6 00:00:20,620 --> 00:00:25,190 response piece by piece as it is generated which can improve the user 7 00:00:25,200 --> 00:00:28,830 experience and reduce the overall latency. 8 00:00:30,520 --> 00:00:34,610 I am asking Copilot to write a rock song 9 00:00:34,620 --> 00:00:36,530 about the moon and the raven. 10 00:00:39,150 --> 00:00:42,260 As you can observe, Copilot is streaming 11 00:00:42,270 --> 00:00:48,260 the answer not delivering the entire response at once, but piece by piece. 12 00:00:51,930 --> 00:00:57,080 Let s get back to coding and assign the same task to the LLM. 13 00:00:59,230 --> 00:01:07,660 I am importing chatOpenAi and creating the LLM object and the prompt will be 14 00:01:07,670 --> 00:01:21,360 write a rock song about the moon and the raven and print LLM .invoke of prompt .content. 15 00:01:23,290 --> 00:01:28,560 By default, the model will return the response after completing the entire 16 00:01:28,570 --> 00:01:29,600 generation process. 17 00:01:30,350 --> 00:01:31,120 Let s run the code. 18 00:01:32,070 --> 00:01:33,140 It is running. 19 00:01:36,500 --> 00:01:38,150 Let s enable streaming. 20 00:01:39,160 --> 00:01:44,750 The model will send you pieces of the response as they are completed rather 21 00:01:44,760 --> 00:01:48,390 than waiting until the entire response is finished. 22 00:01:49,080 --> 00:01:52,390 To enable streaming, simply call LLM 23 00:01:52,400 --> 00:01:56,210 .stream with the prompt as an argument. 24 00:01:58,890 --> 00:02:02,100 To observe the streaming, I will do the following. 25 00:02:02,110 --> 00:02:16,420 For chunk in LLM .stream of prompt, print chunk .content and equals empty string 26 00:02:16,430 --> 00:02:20,200 and the flash equals true. 27 00:02:21,650 --> 00:02:22,540 I am running the code. 28 00:02:23,550 --> 00:02:28,580 You can see how it is streaming the response sending it in pieces. 29 00:02:40,470 --> 00:02:47,380 If you want token by token streaming, this requires native support from the LLM provider. 30 00:02:48,230 --> 00:02:48,540 That s it. 31 00:02:48,870 --> 00:02:51,240 In this video, you learned about 32 00:02:51,250 --> 00:02:55,620 streaming, what it is and how to implement streaming in Landshare.