1
00:00:00,570 --> 00:00:06,760
In this project, we will build a question
-answering system using the OPL stack.

2
00:00:07,290 --> 00:00:11,500
OPL stands for OpenAI, Pinecon, and LangChain.

3
00:00:12,430 --> 00:00:14,800
GPT models are great at answering

4
00:00:14,810 --> 00:00:18,780
questions, but only on topics they have
been trained on.

5
00:00:19,350 --> 00:00:24,680
What if you want GPT to answer questions
about topics it hasn't been trained on?

6
00:00:24,690 --> 00:00:34,000
For example, about recent events after
September 2021 for GPT 3 .5 or GPT 4, or

7
00:00:34,010 --> 00:00:36,280
about your non -public documents.

8
00:00:37,910 --> 00:00:41,080
GPT models can learn new knowledge in two

9
00:00:41,090 --> 00:00:45,640
ways, fine -tuning on a training set or
model inputs.

10
00:00:47,050 --> 00:00:52,040
Fine -tuning on a training set is the
most natural way to teach the model

11
00:00:52,050 --> 00:00:55,440
knowledge, but it can be time -consuming
and expensive.

12
00:00:56,230 --> 00:01:00,460
It also builds long -term memory, which
is not always necessary.

13
00:01:01,690 --> 00:01:07,340
Model inputs means inserting the
knowledge into an input message.

14
00:01:07,850 --> 00:01:13,960
For example, we can send an entire book
or PDF document to the model as an input

15
00:01:13,970 --> 00:01:20,620
message, and then we can start asking
questions on topics found in the input message.

16
00:01:20,630 --> 00:01:25,260
This is a good way to build short -term
memory for the model.

17
00:01:26,530 --> 00:01:31,640
When we have a large corpus of text, it
can be difficult to use model inputs

18
00:01:31,650 --> 00:01:36,800
because each model is limited to a
maximum number of tokens, which in most

19
00:01:36,810 --> 00:01:39,040
cases is around 4000.

20
00:01:39,490 --> 00:01:42,420
We cannot simply send the text from a 500

21
00:01:42,430 --> 00:01:48,200
-page document to the model because this
will exceed the maximum number of tokens

22
00:01:48,210 --> 00:01:49,880
that the model supports.

23
00:01:51,130 --> 00:01:54,140
The recommended approach is to use model

24
00:01:54,150 --> 00:01:57,140
inputs with embedded -based search.

25
00:01:57,430 --> 00:01:59,960
Embeddings are simple to implement and

26
00:01:59,970 --> 00:02:02,560
work especially well with questions.

27
00:02:03,830 --> 00:02:06,240
The pipeline is as follows.

28
00:02:07,110 --> 00:02:11,900
First, we prepare the search data, and
we'll do that once per document.

29
00:02:12,350 --> 00:02:19,220
This means, load the data into Lanchain
documents, split the documents into short

30
00:02:19,230 --> 00:02:25,760
and mostly self -contained sections
called chunks, embed the chunks into

31
00:02:25,770 --> 00:02:32,940
numeric vectors using an embedding model
such as OpenAI's text embedding ADA002,

32
00:02:33,330 --> 00:02:39,860
and save the chunks and the embeddings to
a vector database such as Pinecon,

33
00:02:40,250 --> 00:02:42,420
Chroma, Milvus, or Quadrant.

34
00:02:43,390 --> 00:02:46,100
After we've prepared the document, we can

35
00:02:46,110 --> 00:02:46,740
do the search.

36
00:02:47,690 --> 00:02:50,960
Given a user query, generate an embedding

37
00:02:50,970 --> 00:02:54,120
for the question using the same embedding model.

38
00:02:54,890 --> 00:02:57,680
Using the questions embedding and the

39
00:02:57,690 --> 00:03:03,320
chunk embeddings from the first step,
rank the vectors by similarity to the

40
00:03:03,330 --> 00:03:07,960
questions embedding using cosine
similarity or Euclidean distance.

41
00:03:08,510 --> 00:03:12,780
The nearest vectors represent chunks
similar to the question.

42
00:03:13,590 --> 00:03:15,340
The third step is to ask.

43
00:03:16,450 --> 00:03:19,020
You insert the question and the most

44
00:03:19,030 --> 00:03:28,610
relevant chunks obtained in step 2b,
here, into a message to XGPD model.

45
00:03:29,460 --> 00:03:32,770
And the XGPD model will return an answer.

46
00:03:34,260 --> 00:03:37,350
In this project, we'll build a complete

47
00:03:37,360 --> 00:03:43,430
question -answering application on custom
data that follows the pipeline I've just described.

48
00:03:44,040 --> 00:03:49,570
This technique is also called retrieval
augmentation because we retrieve relevant

49
00:03:49,580 --> 00:03:55,610
information from an external knowledge
base and give that information to our LLM.

50
00:03:56,040 --> 00:04:02,630
The external knowledge base is our window
into the world beyond the LLM's training data.

51
00:04:03,540 --> 00:04:09,130
Because practice is more valuable than
1000 words, let me show you the final

52
00:04:09,140 --> 00:04:10,670
version of the application.

53
00:04:12,060 --> 00:04:16,410
I have loaded, chunked, and embedded this

54
00:04:16,420 --> 00:04:18,390
PDF file on React.

55
00:04:19,580 --> 00:04:23,570
This document was published in March 2023

56
00:04:23,580 --> 00:04:27,230
and was not included in the XGPD 4
training data.

57
00:04:28,040 --> 00:04:33,730
By the way, in this context, React has
nothing to do with the JavaScript

58
00:04:33,740 --> 00:04:36,490
framework used to build frontends.

59
00:04:37,320 --> 00:04:40,610
React is a general paradigm that combines

60
00:04:40,620 --> 00:04:46,850
reasoning and acting advices to enable
language models to solve various language

61
00:04:46,860 --> 00:04:49,650
reasoning and decision -making tasks.

62
00:04:50,640 --> 00:04:54,470
Now, I want to ask questions about this document.

63
00:04:54,480 --> 00:05:00,410
It's like having someone in front of me
who knows everything about React and can

64
00:05:00,420 --> 00:05:01,930
respond to any questions.

65
00:05:04,020 --> 00:05:06,610
The first question is, what is reasoning

66
00:05:06,620 --> 00:05:09,030
and acting in LLMs?

67
00:05:12,720 --> 00:05:14,470
And I've got the answer.

68
00:05:16,000 --> 00:05:18,490
I'm asking the second question.

69
00:05:20,730 --> 00:05:22,740
What is chain of thought?

70
00:05:24,940 --> 00:05:29,290
All these questions are from this document.

71
00:05:31,600 --> 00:05:33,490
And I've got a pretty good answer.

72
00:05:34,300 --> 00:05:35,770
And the last question.

73
00:05:36,540 --> 00:05:38,330
What is React prompting?

74
00:05:44,330 --> 00:05:45,020
Very well.

75
00:05:45,790 --> 00:05:48,580
I've got good answers to all my questions.

76
00:05:49,190 --> 00:05:53,520
Of course, instead of React, we can load
a document about anything.

77
00:05:53,930 --> 00:05:59,020
A medical pathology, a treatment plan,
legal documents, accounting documents,

78
00:05:59,110 --> 00:05:59,420
and more.

79
00:06:00,190 --> 00:06:00,560
Ok.

80
00:06:00,730 --> 00:06:06,400
We have a lot of work ahead, so let's get
started building this application step by step.