1 00:00:00,570 --> 00:00:06,760 In this project, we will build a question -answering system using the OPL stack. 2 00:00:07,290 --> 00:00:11,500 OPL stands for OpenAI, Pinecon, and LangChain. 3 00:00:12,430 --> 00:00:14,800 GPT models are great at answering 4 00:00:14,810 --> 00:00:18,780 questions, but only on topics they have been trained on. 5 00:00:19,350 --> 00:00:24,680 What if you want GPT to answer questions about topics it hasn't been trained on? 6 00:00:24,690 --> 00:00:34,000 For example, about recent events after September 2021 for GPT 3 .5 or GPT 4, or 7 00:00:34,010 --> 00:00:36,280 about your non -public documents. 8 00:00:37,910 --> 00:00:41,080 GPT models can learn new knowledge in two 9 00:00:41,090 --> 00:00:45,640 ways, fine -tuning on a training set or model inputs. 10 00:00:47,050 --> 00:00:52,040 Fine -tuning on a training set is the most natural way to teach the model 11 00:00:52,050 --> 00:00:55,440 knowledge, but it can be time -consuming and expensive. 12 00:00:56,230 --> 00:01:00,460 It also builds long -term memory, which is not always necessary. 13 00:01:01,690 --> 00:01:07,340 Model inputs means inserting the knowledge into an input message. 14 00:01:07,850 --> 00:01:13,960 For example, we can send an entire book or PDF document to the model as an input 15 00:01:13,970 --> 00:01:20,620 message, and then we can start asking questions on topics found in the input message. 16 00:01:20,630 --> 00:01:25,260 This is a good way to build short -term memory for the model. 17 00:01:26,530 --> 00:01:31,640 When we have a large corpus of text, it can be difficult to use model inputs 18 00:01:31,650 --> 00:01:36,800 because each model is limited to a maximum number of tokens, which in most 19 00:01:36,810 --> 00:01:39,040 cases is around 4000. 20 00:01:39,490 --> 00:01:42,420 We cannot simply send the text from a 500 21 00:01:42,430 --> 00:01:48,200 -page document to the model because this will exceed the maximum number of tokens 22 00:01:48,210 --> 00:01:49,880 that the model supports. 23 00:01:51,130 --> 00:01:54,140 The recommended approach is to use model 24 00:01:54,150 --> 00:01:57,140 inputs with embedded -based search. 25 00:01:57,430 --> 00:01:59,960 Embeddings are simple to implement and 26 00:01:59,970 --> 00:02:02,560 work especially well with questions. 27 00:02:03,830 --> 00:02:06,240 The pipeline is as follows. 28 00:02:07,110 --> 00:02:11,900 First, we prepare the search data, and we'll do that once per document. 29 00:02:12,350 --> 00:02:19,220 This means, load the data into Lanchain documents, split the documents into short 30 00:02:19,230 --> 00:02:25,760 and mostly self -contained sections called chunks, embed the chunks into 31 00:02:25,770 --> 00:02:32,940 numeric vectors using an embedding model such as OpenAI's text embedding ADA002, 32 00:02:33,330 --> 00:02:39,860 and save the chunks and the embeddings to a vector database such as Pinecon, 33 00:02:40,250 --> 00:02:42,420 Chroma, Milvus, or Quadrant. 34 00:02:43,390 --> 00:02:46,100 After we've prepared the document, we can 35 00:02:46,110 --> 00:02:46,740 do the search. 36 00:02:47,690 --> 00:02:50,960 Given a user query, generate an embedding 37 00:02:50,970 --> 00:02:54,120 for the question using the same embedding model. 38 00:02:54,890 --> 00:02:57,680 Using the questions embedding and the 39 00:02:57,690 --> 00:03:03,320 chunk embeddings from the first step, rank the vectors by similarity to the 40 00:03:03,330 --> 00:03:07,960 questions embedding using cosine similarity or Euclidean distance. 41 00:03:08,510 --> 00:03:12,780 The nearest vectors represent chunks similar to the question. 42 00:03:13,590 --> 00:03:15,340 The third step is to ask. 43 00:03:16,450 --> 00:03:19,020 You insert the question and the most 44 00:03:19,030 --> 00:03:28,610 relevant chunks obtained in step 2b, here, into a message to XGPD model. 45 00:03:29,460 --> 00:03:32,770 And the XGPD model will return an answer. 46 00:03:34,260 --> 00:03:37,350 In this project, we'll build a complete 47 00:03:37,360 --> 00:03:43,430 question -answering application on custom data that follows the pipeline I've just described. 48 00:03:44,040 --> 00:03:49,570 This technique is also called retrieval augmentation because we retrieve relevant 49 00:03:49,580 --> 00:03:55,610 information from an external knowledge base and give that information to our LLM. 50 00:03:56,040 --> 00:04:02,630 The external knowledge base is our window into the world beyond the LLM's training data. 51 00:04:03,540 --> 00:04:09,130 Because practice is more valuable than 1000 words, let me show you the final 52 00:04:09,140 --> 00:04:10,670 version of the application. 53 00:04:12,060 --> 00:04:16,410 I have loaded, chunked, and embedded this 54 00:04:16,420 --> 00:04:18,390 PDF file on React. 55 00:04:19,580 --> 00:04:23,570 This document was published in March 2023 56 00:04:23,580 --> 00:04:27,230 and was not included in the XGPD 4 training data. 57 00:04:28,040 --> 00:04:33,730 By the way, in this context, React has nothing to do with the JavaScript 58 00:04:33,740 --> 00:04:36,490 framework used to build frontends. 59 00:04:37,320 --> 00:04:40,610 React is a general paradigm that combines 60 00:04:40,620 --> 00:04:46,850 reasoning and acting advices to enable language models to solve various language 61 00:04:46,860 --> 00:04:49,650 reasoning and decision -making tasks. 62 00:04:50,640 --> 00:04:54,470 Now, I want to ask questions about this document. 63 00:04:54,480 --> 00:05:00,410 It's like having someone in front of me who knows everything about React and can 64 00:05:00,420 --> 00:05:01,930 respond to any questions. 65 00:05:04,020 --> 00:05:06,610 The first question is, what is reasoning 66 00:05:06,620 --> 00:05:09,030 and acting in LLMs? 67 00:05:12,720 --> 00:05:14,470 And I've got the answer. 68 00:05:16,000 --> 00:05:18,490 I'm asking the second question. 69 00:05:20,730 --> 00:05:22,740 What is chain of thought? 70 00:05:24,940 --> 00:05:29,290 All these questions are from this document. 71 00:05:31,600 --> 00:05:33,490 And I've got a pretty good answer. 72 00:05:34,300 --> 00:05:35,770 And the last question. 73 00:05:36,540 --> 00:05:38,330 What is React prompting? 74 00:05:44,330 --> 00:05:45,020 Very well. 75 00:05:45,790 --> 00:05:48,580 I've got good answers to all my questions. 76 00:05:49,190 --> 00:05:53,520 Of course, instead of React, we can load a document about anything. 77 00:05:53,930 --> 00:05:59,020 A medical pathology, a treatment plan, legal documents, accounting documents, 78 00:05:59,110 --> 00:05:59,420 and more. 79 00:06:00,190 --> 00:06:00,560 Ok. 80 00:06:00,730 --> 00:06:06,400 We have a lot of work ahead, so let's get started building this application step by step.