1 00:00:00,620 --> 00:00:01,580 Hey, Cloud Gurus, 2 00:00:01,580 --> 00:00:03,100 welcome to our second lesson 3 00:00:03,100 --> 00:00:05,100 on meeting the tools of the trade. 4 00:00:05,100 --> 00:00:07,393 This time, focused on Transact-SQL. 5 00:00:09,810 --> 00:00:10,643 In this lesson, 6 00:00:10,643 --> 00:00:13,220 we're going to talk about T-SQL uses, 7 00:00:13,220 --> 00:00:15,840 we're going to go over some code examples, 8 00:00:15,840 --> 00:00:18,480 and wrap everything up with a review. 9 00:00:18,480 --> 00:00:21,240 And yes, this is a super short lesson 10 00:00:21,240 --> 00:00:23,800 not because Transact-SQL is not important, 11 00:00:23,800 --> 00:00:26,660 in fact, it is a very powerful tool, 12 00:00:26,660 --> 00:00:27,550 but this is something 13 00:00:27,550 --> 00:00:29,620 you should be pretty familiar with already 14 00:00:29,620 --> 00:00:31,370 as a data professional. 15 00:00:31,370 --> 00:00:32,680 This lesson focuses more 16 00:00:32,680 --> 00:00:35,150 on how it's used in Azure data engineering, 17 00:00:35,150 --> 00:00:38,150 as well as how it might pop up on the exam. 18 00:00:38,150 --> 00:00:41,700 If you don't have a strong background in Transact-SQL, 19 00:00:41,700 --> 00:00:44,790 feel free to check out our SQL Deep Dive. 20 00:00:44,790 --> 00:00:48,550 That's focused on ANSI SQL, button-based in Azure, 21 00:00:48,550 --> 00:00:51,200 and will help you become familiar with the language. 22 00:00:53,860 --> 00:00:56,170 Starting out with T-SQL uses, 23 00:00:56,170 --> 00:01:00,720 T-SQL is how people most commonly refer to Transact-SQL. 24 00:01:00,720 --> 00:01:02,410 This is a powerful language 25 00:01:02,410 --> 00:01:04,460 that is used in a variety of scenarios 26 00:01:04,460 --> 00:01:07,280 to move and transform data. 27 00:01:07,280 --> 00:01:09,870 This can be done either within ETL, 28 00:01:09,870 --> 00:01:12,780 or extract, transform, and load software, 29 00:01:12,780 --> 00:01:14,683 or as standalone scripts. 30 00:01:16,250 --> 00:01:18,700 It can be utilized within Azure Machine Learning 31 00:01:18,700 --> 00:01:21,770 using the Apply SQL Transformation module, 32 00:01:21,770 --> 00:01:24,230 and oftentimes, people don't think of T-SQL 33 00:01:24,230 --> 00:01:26,910 in relation to things like machine learning, 34 00:01:26,910 --> 00:01:29,470 but it's really used across every data field: 35 00:01:29,470 --> 00:01:33,530 data science, data engineering, machine learning-- 36 00:01:33,530 --> 00:01:36,000 it's used in a little bit of everything. 37 00:01:36,000 --> 00:01:37,460 Particular to this module 38 00:01:37,460 --> 00:01:41,250 that's actually running SQLite rather than true T-SQL, 39 00:01:41,250 --> 00:01:43,360 but it can take up to 3 data sets 40 00:01:43,360 --> 00:01:46,973 and has very similar syntax due to the ANSI SQL standard. 41 00:01:47,960 --> 00:01:50,330 Some ways you might see T-SQL being used 42 00:01:50,330 --> 00:01:54,483 is to create tables for results or to save datasets, 43 00:01:55,540 --> 00:01:58,360 performing custom transformations on data types, 44 00:01:58,360 --> 00:02:00,320 or creating aggregates. 45 00:02:00,320 --> 00:02:02,700 For instance, if you have a string column 46 00:02:02,700 --> 00:02:06,080 but need to cast that into an int data type, 47 00:02:06,080 --> 00:02:07,653 T SQL can be used for that. 48 00:02:09,510 --> 00:02:12,100 You can also filter or alter the data 49 00:02:12,100 --> 00:02:14,413 and return the query results as a table. 50 00:02:16,900 --> 00:02:18,910 Let's go over some common code examples 51 00:02:18,910 --> 00:02:21,420 of how you might see T-SQL used. 52 00:02:21,420 --> 00:02:23,950 First, a simple select, 53 00:02:23,950 --> 00:02:28,950 or selecting recency, frequency, monetary, time, and class 54 00:02:29,540 --> 00:02:31,680 from a dataset called T1, 55 00:02:31,680 --> 00:02:34,840 or time is between 3 and 20. 56 00:02:34,840 --> 00:02:38,060 And this is against a blood donation dataset, 57 00:02:38,060 --> 00:02:40,400 so we're simply selecting the fields we want 58 00:02:40,400 --> 00:02:42,050 from the time range that we want. 59 00:02:44,090 --> 00:02:48,190 Another example might be a more complex form of select 60 00:02:48,190 --> 00:02:50,030 called a join. 61 00:02:50,030 --> 00:02:53,310 In this example, we're just selecting the distinct place IDs 62 00:02:53,310 --> 00:02:55,650 from one dataset: T2. 63 00:02:55,650 --> 00:02:58,740 We're also selecting the name, city, state, 64 00:02:58,740 --> 00:03:01,650 price, and entree from there, 65 00:03:01,650 --> 00:03:06,650 as well as the average rating renamed as average rating. 66 00:03:06,940 --> 00:03:10,330 We're pulling from 2 different datasets, T1, and T2, 67 00:03:10,330 --> 00:03:14,080 and rejoining on the place ID column. 68 00:03:14,080 --> 00:03:15,540 And so we're going to pull the matches 69 00:03:15,540 --> 00:03:17,190 from both of those data sets 70 00:03:17,190 --> 00:03:19,970 and then group them together by the place ID column 71 00:03:19,970 --> 00:03:22,160 in the second dataset. 72 00:03:22,160 --> 00:03:23,010 All that to say 73 00:03:23,010 --> 00:03:25,400 is joining 2 tables to create a dataset 74 00:03:25,400 --> 00:03:28,330 that combines the specified restaurant features 75 00:03:28,330 --> 00:03:31,610 with average ratings for each restaurant. 76 00:03:31,610 --> 00:03:33,960 This is a common way you will see T-SQL used, 77 00:03:33,960 --> 00:03:35,700 especially in data engineering, 78 00:03:35,700 --> 00:03:37,670 pulling multiple data sets together 79 00:03:37,670 --> 00:03:40,540 so that we can either visualize that information 80 00:03:40,540 --> 00:03:42,793 or further transform and massage it. 81 00:03:44,490 --> 00:03:48,290 Another example would be using aggregates. 82 00:03:48,290 --> 00:03:49,860 Here, we're returning a dataset 83 00:03:49,860 --> 00:03:51,920 containing the restaurant ID, 84 00:03:51,920 --> 00:03:54,650 along with the average rating for the restaurant. 85 00:03:54,650 --> 00:03:57,430 You can see we're selecting the distinct place ID 86 00:03:57,430 --> 00:04:00,230 and then aggregating the ratings into an average 87 00:04:00,230 --> 00:04:02,750 and grouping by that place ID. 88 00:04:02,750 --> 00:04:05,550 This is very helpful for summarizing information, 89 00:04:05,550 --> 00:04:08,313 especially as you're creating high-level dashboards. 90 00:04:10,370 --> 00:04:13,030 Another example would be of casting. 91 00:04:13,030 --> 00:04:15,610 Going back to our original select, 92 00:04:15,610 --> 00:04:17,860 this time we're taking time 93 00:04:17,860 --> 00:04:21,610 and casting it as a date-time type. 94 00:04:21,610 --> 00:04:24,220 Dates can be stored in a variety of formats, 95 00:04:24,220 --> 00:04:26,960 including timestamp, and sometimes in order 96 00:04:26,960 --> 00:04:29,240 to match data together from different sources, 97 00:04:29,240 --> 00:04:30,780 you have to change the data type 98 00:04:30,780 --> 00:04:33,030 so that they work well together. 99 00:04:33,030 --> 00:04:35,160 In this instance, we're casting time 100 00:04:35,160 --> 00:04:37,963 and making sure it's of that date-time data type. 101 00:04:40,290 --> 00:04:41,240 By way of review, 102 00:04:41,240 --> 00:04:44,520 Transacts SQL can be used in a variety of situations 103 00:04:44,520 --> 00:04:46,280 to create structures, query 104 00:04:46,280 --> 00:04:49,383 or transform data, and save datasets. 105 00:04:50,900 --> 00:04:53,050 You can perform common sequel operations 106 00:04:53,050 --> 00:04:54,480 in Azure Machine Learning 107 00:04:54,480 --> 00:04:57,563 using the Apply SQL Transformation module. 108 00:04:58,910 --> 00:05:02,410 And that's it for our super quick look at Transact-SQL. 109 00:05:02,410 --> 00:05:04,120 Remember that you don't need to be an expert 110 00:05:04,120 --> 00:05:05,930 in this language for the exam. 111 00:05:05,930 --> 00:05:07,920 Simply be able to recognize queries 112 00:05:07,920 --> 00:05:09,450 and their general structure, 113 00:05:09,450 --> 00:05:11,960 and maybe be able to fill in the missing pieces 114 00:05:11,960 --> 00:05:14,360 based on a few different choices given to you, 115 00:05:14,360 --> 00:05:16,410 but don't underestimate this tool 116 00:05:16,410 --> 00:05:20,340 or its value in your day-to-day data engineering work. 117 00:05:20,340 --> 00:05:21,630 The better you know this language, 118 00:05:21,630 --> 00:05:23,160 the better it will serve you. 119 00:05:23,160 --> 00:05:26,060 And as I said, can be used in a variety of tasks 120 00:05:26,060 --> 00:05:28,680 across all data disciplines, 121 00:05:28,680 --> 00:05:30,200 but that's it for now. 122 00:05:30,200 --> 00:05:31,170 When you're ready, 123 00:05:31,170 --> 00:05:32,700 let's go onto meeting another item 124 00:05:32,700 --> 00:05:34,390 you can place in your tool belt. 125 00:05:34,390 --> 00:05:35,423 I'll see you there.