layout:true
Advanced Analysis with SQL - NYC Parks and Recreation
-- class:center, middle ![img-center-50](images/datapolitan.png) # Advanced Analysis with SQL
City of New York Parks and Recreation - - - ## Instructors: Eric Brelsford and Richard Dunks ## Teaching Assistant: Sarah Kontos ### Follow along at: http://bit.ly/advanced-analysis-sql #### See the code at: http://bit.ly/advanced-analysis-sql-code --- class:center,middle # Welcome ??? + Facilitators and TA introduce themselves, establishing mastery but also creating openings for vulnerability and awareness --- # A Few Ground Rules ??? + Facilitators establish the intention we have for the culture of the classroom -- + Step up, step back -- + One mic -- + Be curious and ask questions -- + Assume noble regard and positive intent -- + Respect multiple perspectives -- + Listen to comprehend, not to respond -- + Be present (phone, email, social media, etc.) --- # Introduce Yourself to Your Neighbor + Who are you? + Where do you work? + What has been the proudest moment in your job? + What's your favorite national or state park? ??? + An opportunity for participants to get to know each other and settle into the class + We bring them into a positive space related to their work + Facilitator reminds them the data isn't the end but the means to achieve the actions that make us feel good about our jobs --- # Housekeeping ??? + Facilitator sets expectations with the students + Establishes the "contract" for the class -- + We’ll have one 15 minute break in the morning -- + We’ll have an hour for lunch -- + We’ll have a 15 minute break in the afternoon -- + Class will start promptly after breaks -- + Feel free to use the bathroom if you need during class -- + Please take any phone conversations into the hall to not disrupt the class --- # What to Expect Today -- + Introduction to databases -- + Introduction to SQL -- + Hands-on experience writing and using SQL -- + A chance to explore SQL and perform some analysis using it --- class:center,middle # Let's Get Started --- class:center,middle # A Quick Recap from
[Introduction to Data Analytics](http://training.datapolitan.com/parks-intro-analytics/#1) ??? + Facilitator concludes the walk-through to shift to a discussion of data and analysis --- # The Value of Data ??? + Facilitator brings back discussion of the value of data, prompting participants to describe the value of data in their job + Optionally, facilitator can ask the value of the data just aggregated and how it might be useful to decisionmakers -- + Data tells a story about something that's happened -- + Can describe what happened directly or indirectly -- ![img-center-80](images/parks-inspections-visitors.png) ??? + Facilitator helps participants see the direct measurements of service requests while indirectly measuring when Park incidents actually happen --- class: center,middle # What are some of the data challenges you face? ??? + Facilitator holds discussion with participants about the data issues they face on a regular basis + Facilitator prompts participants to think about their data problems as we go through the data techniques today --- exclude:true class:middle > Facts do not "speak for themselves." They speak for or against competing theories. Facts divorced from theory or visions are mere isolated curiosities. ## -Thomas Sowell *A Conflict of Visions* --- # What is Analysis? -- >“Analysis is simply the pursuit of understanding, usually through detailed inspection or comparison” ## - [Carter Hewgley](https://www.linkedin.com/in/carterhewgley), Senior Advisor for Family & Homeless Services, Department of Human Services, District of Columbia ??? + Facilitator prompts participants to think about analysis in the context of their work and what it is they are trying to accomplish --- class:center,middle # A Quick Recap from
Data Analysis with Excel I and II --- # 5 Data Analytics Tasks ??? + Facilitator introduces the 5 main tasks of analysis + Facilitator reminds participants we've talked about these in previous courses + We're going to look at the first three in SQL -- 1. Filtering -- 1. Sorting -- 1. Aggregating -- 1. Transforming -- 1. Visualizing --- class:center,middle # Importing data into Excel with SQL ??? + Walk through importing data with SQL into Excel with a simple query --- # Importing data from a database ![img-center-100](images/power_query.png) + Click on the **Power Query** ribbon --- # Importing data from a database ![img-center-40](images/power_query_sql_server.png) + Power Query > From Database > From SQL Server Database --- # Importing data from a database ![img-center-100](images/server_connect.png) + Enter `data.nycdpr.parks.nycnet` in the **Server** field + Enter `DWH` in the **Database** field --- # Encryption Warning ![img-center-100](images/encryp.jpg) ## Just select "OK" ### [For more information](https://docs.microsoft.com/en-us/sql/relational-databases/security/encryption/sql-server-encryption?view=sql-server-2017) --- # Importing data from a database ![img-center-70](images/server_navigator.jpg) + View the accessible databases --- # Importing data from a database ![img-center-70](images/server_navigator.jpg) + Find `tbl_ref_calendar` and click **Load** to load it --- # Importing data from a database ![img-center-100](images/tbl_ref_calendar_load.jpg) --- class:center,middle # What did we just do? --- class:center,middle # What is a database? --- # What is a database? -- + A software system that manages storing and retrieving data -- + The software for a database often runs on a server, not the computer you're using -- + There are many different database systems, today we'll be using Microsoft's SQL Server -- + A database stores data in *tables* --- # Tables -- + How data is stored in a database -- + Similar to a worksheet in Excel -- + Organized in *rows* and *columns* -- ![img-left-50](images/table_row.png) -- ![img-right-50](images/table_column.png) --- class:center,middle # Let's do that again using SQL --- # `SELECT` -- + How you specify which columns to get from a table -- + Start with `SELECT *`, which gets all the columns -- + Separate column names with commas (`,`) --- # For example: ![img-right-30](images/cal_query_crop_box.png) ```sql SELECT * FROM tbl_ref_calendar ``` -- + Get all of the columns from the `tbl_ref_calendar` table --- # For example: ![img-right-30](images/cal_query2_crop.png) ```sql SELECT ref_date, fiscal_year FROM tbl_ref_calendar ``` -- + Only get the `ref_date` and `fiscal_year` columns from the `tbl_ref_calendar` table -- ![img-center-45](images/cal_query2_result_crop.png) --- # `FROM` -- + How you specify the table you are querying -- ```sql FROM tbl_ref_calendar ``` --- class:center,middle # Why use a database? ??? + Facilitator walks through benefits of using a database: single source of truth, always the latest data --- # Why use a database? -- + Single source of truth -- + Don't have to worry about moving and sharing files -- + Can be faster and more consistent than Excel files --- class:center,middle # Why use SQL? ??? + Facilitator walks through benefits of using SQL: you need to in order to communicate with a database, but also it can be self documenting: "here's the query I used" rather than "I clicked here, then there, then picked this option..." in Excel. --- # SQL -- + The language databases understand -- + If you want to use a database, you will need at least a little SQL -- + It makes queries replicable—you can run the query again and be sure you will get the same results -- + ...and someone can use your query and be sure they will get the same results, too! --- # Editing a query -- + You don't have to create a new query each time you want to change the SQL -- + If you are experimenting with a query, we recommend *editing* the SQL instead --- # Editing a query ![img-center-100](images/query-edit-1.jpg) + Double click the **Source** --- # Editing a query ![img-center-60](images/query-edit-2.jpg) + Edit the SQL as usual --- # Your Turn -- + We talked about using `SELECT` to pick which columns to pull from a table: -- ```sql SELECT ref_date, fiscal_year FROM tbl_ref_calendar ``` -- + Write a SQL query to get only the `ref_date`, `fiscal_day`, `fiscal_week`, `fiscal_qtr` columns from `tbl_ref_calendar` -- + Run the query in Excel --- # Data Analytics Tasks: Filtering ??? + Facilitator leads brief discussion of filtering, soliciting examples from participants on applicability to their own work -- + Show a preview of rows -- + Or only show rows that contain some value -- + Can filter by multiple values -- + Can filter by values in multiple columns --- # Daily Tasks -- + Labor booking for routine litter and cleaning -- + Approximately 78,000 tasks logged a month (~936,000 a year) -- + Nearly 3.9 million tasks logged as of Spring 2019 -- + Can't work with all this data at once --- # Previewing rows -- + You can use `SELECT TOP` to get the first set of rows -- ```sql SELECT TOP 5 * FROM tbl_dailytasks ``` -- + Select the top 5 rows from `tbl_dailytasks` -- + Without *sorting* the data this could be any 5 rows—we'll talk about sorting soon -- + Can use this to view the column names and the values in the table -- + For more information on TOP, [check out this tutorial](http://www.sqlservertutorial.net/sql-server-basics/sql-server-select-top/) --- # Filtering by column values ![img-center-100](images/filter_slide83.png) ??? + Facilitator reminds students about filtering in Excel --- # In SQL we use `WHERE` -- + Lets you limit which *rows* are selected from the database -- + You define a condition that is either true or false, and every row is evaluated using that ??? + Look back at filters in Excel + Do something similar in a WHERE + `AND` and `OR` + Watch out for precedence with `AND` and `OR` + `NULL` vs empty strings --- # `WHERE` ```sql SELECT TOP 10 * FROM tbl_dailytasks WHERE graffiti = 1 ``` -- + Select the top 10 daily tasks rows where graffiti was present -- + Note that we are using `TOP` along with `WHERE` here so we don't load too much data --- # But `graffiti` is `TRUE` or `FALSE` ![img-center-80](images/graffiti-bit.jpg) -- + We'll be talking about data types soon, but this column is stored as a `BIT`, which is `1` or `0` -- + This is a common way to store values that are true or false -- + In this case, Excel is displaying the values at `TRUE` and `FALSE`, but in the database they are `1` or `0` -- + Learn more about [BIT here](http://www.sqlservertutorial.net/sql-server-basics/sql-server-bit/) --- # `WHERE` ```sql SELECT TOP 10 * FROM tbl_dailytasks WHERE nhours > 1 ``` -- + Select the top 10 daily tasks rows where `nhours` is greater than 1 --- # `WHERE` operators + `=` -- - equal to -- + `>` -- - greater than -- + `>=` -- - greater than or equal to -- + `<` -- - less than -- + `<=` -- - less than or equal to -- + `!=` -- and `<>` -- - not equal to -- + Learn more in [the SQL Server documentation](https://docs.microsoft.com/en-us/sql/t-sql/language-elements/comparison-operators-transact-sql?view=sql-server-2017) --- # Your Turn + Create an SQL query with a `WHERE` clause + Continue using the `tbl_dailytasks` table + Try writing a condition on a column with numbers in it + Make sure you limit the results with `TOP` + If you have time, try writing a condition on a column with text in it --- class:center,middle # What if you want to combine conditions? --- # `WHERE` ```sql SELECT TOP 10 * FROM tbl_dailytasks WHERE graffiti = 1 AND nhours > 1 ``` -- + Select the daily tasks rows where graffiti was present *and* the number of hours taken was over 1 --- # `WHERE` ```sql SELECT TOP 10 * FROM tbl_dailytasks WHERE omppropid = 'X045-01' OR omppropid = 'X045-02' ``` -- + Select the daily tasks rows for `omppropid` equal to either `X045-01` or `X045-02` --- # `WHERE` ```sql SELECT TOP 10 * FROM tbl_dailytasks WHERE NOT (omppropid = 'X045-01' OR omppropid = 'X045-02') ``` -- + What do you think this does? -- + Why would we use parentheses here? -- + Learn more about [SQL's order of evaluation](http://www.peachpit.com/articles/article.aspx?p=1276352&seqNum=6) and [propositional logic](https://www.tutorialspoint.com/discrete_mathematics/discrete_mathematics_propositional_logic.htm) --- # Your Turn + Create an SQL query with a `WHERE` clause + Continue using the `tbl_dailytasks` table + Start with one condition, then try adding conditions using `AND` / `OR` --- # Documentation -- + It never hurts to keep track of your work -- + Can help you remember the steps you took -- + Can help you find errors in your methods -- + In the future, can help you reconstruct your work and adapt it to solve different problems -- + Colleagues and your future self will find it helpful --- class:center,middle # How should we document SQL work? --- # Documentation with SQL -- + If you're working in Excel, you could use a separate worksheet where you track the steps you have taken -- ![img-center-100](images/document_ex1_rev_box.png) --- # Documentation with SQL + If you're not working in Excel, you could keep a simple text document in Notepad where you keep track of the queries you're using -- + Either way, be sure to track the queries you're using and the date that you accessed the data -- + And if you make changes to the data after getting it from the database, track those steps too -- + The queries we're talking about today are documented in your [today's handout](workbook.pdf)—you can take notes there, too --- class:center,middle # WRAP-UP ![img-center-100](images/implementation_guide.png) --- class:center,middle # 15 MIN BREAK --- # Data types -- + Every column in a database table has a type -- + A column's type determines what you can do with that column -- + For example, you can do math on numbers but not on text -- + The database will refuse to store data that doesn't match the column's type -- + Unless you're an administrator, you most likely will not be changing the column's type ??? + Take a look back at data types in Excel + Mention that data types are more rigid in a database + Integers, numeric, datetime, strings --- # Common numeric data types -- + `int` -- - a number with no digits after the decimal point, like `5` or `100349` -- + `numeric` -- - a number with digits after the decimal point, like `5.01` or `100.349` -- + `bit` -- - a number that can only be `0` or `1`; as we mentioned earlier, this is commonly used for boolean (true or false) values --- # Common text data types -- + `varchar` -- - text with a maximum length, like `Queens` -- + `text` -- - text with no maximum length, like an open comment field --- # Other common data types -- + `date` -- - a specific date, like `2002-10-17` -- + `datetime` -- - a specific date and time, like `2002-10-17 06:13:00` -- + Learn more in the [SQL Server documentation](https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-2017) --- # Why do data types matter? -- + Databases and SQL are less flexible than Excel here -- + Knowing the data type of a column tells you what to expect in that column -- + For example: `date` columns will never include time -- + Some functions only work on certain data types --- # What data type should zip codes be stored as? -- + `numeric`? -- + `varchar`? -- + How do you handle a zip code like **08904**? -- + or **11368-3398**? --- # Working with Data Types -- + Can get the information on types from the `INFORMATION_SCHEMA` table: -- ```sql SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME='tbl_dailytasks' ``` -- + If you need a column to be a different data type, you might find [`CAST` or `CONVERT` useful](https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-2017) --- # Your Turn ??? + Quick demonstration of why data types will matter + Instructor could show how `YEAR()` does not work with just any column -- + There are functions that work only with specific data types -- + For example, `YEAR()` only works on columns that are `date`s -- + See other date and time functions in the [SQL Server documentation](https://docs.microsoft.com/en-us/sql/t-sql/functions/date-and-time-data-types-and-functions-transact-sql?view=sql-server-2017#DateandTimeFunctions) -- + Try selecting the top 10 rows from `tbl_dailytasks` with the columns `date_worked` and `YEAR(date_worked)` --- # Formulas -- + You can do arithmetic and combine values in the ways you're used to in Excel -- + You can also get a modified version of a value ??? + Take a look back at arithmetic in Excel + You can do arithmetic in SQL, too + Order of operations is similar, using parentheses as needed + Show equivalent in SQL to something relatively complicated in Excel + Introduce `AS` for setting the output column name --- # Using formulas with numbers -- + You can use the usual mathematical operators here: `+`, `-`, `/`, `*` -- + For example, you can multiply a column by a number -- + `nhours * 60` -- would find the number of minutes -- ```sql SELECT TOP 10 nhours * 60 FROM tbl_dailytasks ``` --- # Using functions with text -- + Remember the `LEFT` function in Excel? -- ![img-center-80](images/leftfunctionaction.png) -- + It extracts part of a cell's value --- # Using functions with text -- + We can do the same thing with SQL -- + `LEFT` extracts part of a string -- + The format is `LEFT(column, length)` -- + `LEFT(omppropid, 1)` would select the first letter from the `omppropid` column -- ```sql SELECT TOP 10 LEFT(omppropid, 1) FROM tbl_dailytasks ``` -- + Find more functions in the [SQL Server Documentation](https://docs.microsoft.com/en-us/sql/t-sql/functions/string-functions-transact-sql?view=sql-server-2017) --- # Your Turn + Look at the documentation for [mathematical functions](https://docs.microsoft.com/en-us/sql/t-sql/functions/mathematical-functions-transact-sql?view=sql-server-2017) and [string functions](https://docs.microsoft.com/en-us/sql/t-sql/functions/string-functions-transact-sql?view=sql-server-2017) + Write an SQL query that uses one of the functions that you found --- # Expanding boro names ## Remember this from Excel II? ![img-center-100](images/excel-nestedifs.png) --- # Expanding boro names ## You can do something similar in SQL! -- ```sql SELECT TOP 10 CASE WHEN LEFT(omppropid, 1) = 'B' THEN 'Brooklyn' WHEN LEFT(omppropid, 1) = 'M' THEN 'Manhattan' WHEN LEFT(omppropid, 1) = 'X' THEN 'Bronx' WHEN LEFT(omppropid, 1) = 'R' THEN 'Staten Island' WHEN LEFT(omppropid, 1) = 'Q' THEN 'Queens' END AS boro FROM tbl_dailytasks WHERE LEFT(omppropid, 1) IS NOT NULL ``` --- # Data Analytics Tasks: Sorting ??? + Facilitator discusses briefly the value of sorting, soliciting examples from participants of when they've sorted data for their work -- + Reorganize rows in a dataset based on the values in a column -- + It's possible to sort on multiple columns --- # Sorting in SQL -- + Use `ORDER BY` after `WHERE`, followed by the columns you want to sort the rows by -- ```sql SELECT TOP 10 * FROM tbl_dailytasks ORDER BY date_worked ``` -- + Want to get the most recently worked tasks? Do a *descending* sort using `DESC` -- ```sql SELECT TOP 10 * FROM tbl_dailytasks ORDER BY date_worked DESC ``` --- # Sorting in SQL -- + Sort by a list of columns using `,` as a separator -- ```sql SELECT TOP 10 * FROM tbl_dailytasks ORDER BY sector, date_worked DESC ``` Sorts by `sector` first, then within each sector sorts the rows by `date_worked` --- # Your Turn -- + Use `ORDER BY` to sort `tbl_dailytasks` using another column or two -- + Experiment with descending sort (`DESC`) --- # Data Analytics Tasks: Aggregating Data ??? + Facilitator reminds participants we've already done this work with the available data + Facilitator prompts participants to think of other ways they aggregate data for insights -- + Trends only become clear in aggregate -- + Often where you discover the "so what" -- + Aggregating data meaningfully can be tricky -- ![img-center-70](images/parks-inspections-visitors.png) --- # Aggregation in Excel -- + Remember pivot tables? -- + ...and `COUNTIF` and `SUMIF`? -- ![img-center-80](images/countiffunctionaction.png) -- + We're going to work through something similar using SQL --- # Aggregation in SQL -- + You often don't want or need every single row from a database -- + For example, if you want to count all of the rows matching some condition you don't want to load all of them in Excel, then count them -- + When you can, *make the database do the work* ??? + Look back at COUNTIF/SUMIF pivot tables in Excel + Do something similar using a combination of GROUP BY and WHERE --- # `GROUP BY` -- + You can add `GROUP BY` after a `WHERE` in a query to group by a certain column -- + For example if you want to summarize data by `sector`, then you would use `GROUP BY sector` --- # Aggregation functions -- + `GROUP BY` isn't useful by itself -- + You will usually want to aggregate some other columns -- + Aggregation functions let you do this --- # Aggregation functions -- + `COUNT` -- - count the number of values, usually used as `COUNT(*)` -- + `SUM` -- - sum the values, for example `SUM(nhours)` -- + `AVG` -- - average the values, such as `AVG(nhours)` -- + `MIN` -- - find the smallest value, such as `MIN(nhours)` -- + `MAX` -- - find the largest value, such as `MAX(nhours)` --- # Aggregation functions -- + Putting it all together, we can use `SUM(nhours)` and `GROUP BY sector` -- ```sql SELECT sector, SUM(nhours) FROM tbl_dailytasks GROUP BY sector ``` -- + You usually want to include the column you grouped by in the `SELECT` so you can see which group is being talked about --- # Your Turn + Aggregate the data again + Use the `tbl_dailytasks` table + In your SQL, you should use `GROUP BY` and one of the aggregation functions on the previous slide + Feel free to experiment --- # What We've Covered So Far -- + Getting data from a database in Excel -- + Choosing which *columns* we want from a table using SQL -- + Choosing which *rows* we want from a table using SQL -- + Aggregating data using SQL -- + ...anything else? --- class:center,middle # LUNCH --- class:center, middle # WELCOME BACK! --- class:center, middle # What questions came up over lunch? --- # Recap -- + `SELECT` -- - picks columns from your table -- + `FROM` -- - picks a table -- + `WHERE` -- - decides which rows are selected -- + `GROUP BY` -- - aggregates data based on a column --- class:center, middle # Mind if we complicate things a little more? --- class:center, middle # Specifically, what if you need to look at data from *two or more* tables? --- class:center, middle # Then you might need a *join* --- # Joins -- + Allow you to bring in columns from mulitple database tables at once -- + Require some common column between the tables in order to determine which values are associated with which -- + Are a little tricky to write -- ...but are very powerful --- # Joins in Excel -- ## Remember `VLOOKUP`? -- ![img-center-90](images/vlookupfunction.png) --- # Joins in Excel ![img-center-100](images/vlookupfunctionaction.png) --- # Joins in SQL -- + As with `VLOOKUP`, you will need a common column between two tables -- + For the join to work, you need the values in these columns to match *exactly* and have the same data type --- # Joins in SQL -- + There are numerous types of joins, but we'll work with a `LEFT` join today -- + `LEFT` joins keep all of the rows from the *left* table, combined with the *right* table where there are matches -- + The format of a join is: `LEFT JOIN <table_name> ON <condition>` -- + For example, `LEFT JOIN tbl_PIP_AllSites AS sites ON pip.[Prop ID] = sites.[Prop ID]` --- class:center,middle # Let's use a join to add park names to the PIP Inspection data --- # Joins in SQL ??? + Instructor walks through named tables, this is the first we're seeing them -- ## The full example is: ```sql SELECT pip.*, sites.[Prop Name], sites.[Site Name] FROM tbl_PIP_InspectionMain AS pip LEFT JOIN tbl_PIP_AllSites AS sites ON pip.[Prop ID] = sites.[Prop ID] ``` -- + What is familiar here? + What is unfamiliar? --- # Your Turn + Try running the join example we worked through together: ```sql SELECT pip.*, sites.[Prop Name], sites.[Site Name] FROM tbl_PIP_InspectionMain AS pip LEFT JOIN tbl_PIP_AllSites AS sites ON pip.[Prop ID] = sites.[Prop ID] ``` + Add more columns from `tbl_PIP_AllSites` to the `SELECT` columns --- # Other kinds of joins -- + Sometimes a `LEFT` join won't do quite what you need -- + There are other types of joins that you may need to use, such as `RIGHT` and `INNER` -- + Different joins treat matches and mismatches between the joined tables differently -- + For example, `INNER` joins will only get results where there is a match between both tables -- + Learn more at [A Visual Explanation of SQL Joins](https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/) --- # Left join ![img-center-60](images/joins-venn-outer-left.png) ## Keeps all records in the left table with matching entries (if any) from the right table --- # Inner join ![img-center-60](images/joins-venn-inner.png) ## Keeps only records that match in the left table and right table --- # Right join ![img-center-60](images/joins-venn-outer-right.png) ## Keeps all records in the right table with matching entries (if any) from the left table --- ![img-center-50](images/joins-julia-evans.jpeg) #### Image Source: [Julia Evans on Twitter](https://twitter.com/b0rk/status/1177642768484712449) --- class:center,middle # WRAP-UP --- class:center,middle # 15 MIN BREAK --- # Final Exercise -- + Consider the SQL we just talked about -- + Think about some questions you could ask of the PIP and Daily Tasks data (or some other database tables you have access to) -- + Think about data visualizations you could do to answer the questions -- + You may find *process mapping* and ideation useful as you come up with ideas --- # Process Mapping ![img-center-100](images/Makebreakfast.gif) .caption[By Scottsm1991 (Own work) [CC BY-SA 3.0](http://creativecommons.org/licenses/by-sa/3.0), via Wikimedia Commons] --- # Final Exercise -- + Create a new Excel workbook -- + Create a worksheet in the workbook that will contain your documentation -- + Try to write SQL queries for each of the questions that you established earlier, making a new worksheet for each -- + Make a data visualization with at least one of these worksheets -- + [Click to submit your work](https://script.google.com/a/macros/datapolitan.com/s/AKfycbwMcE5pcJwZHdbSCN_2epwaXnRLSRPaLbHCAhxaZJ79UXaRpQ_l/exec) --- class:center, middle # Presentations --- class:center, middle # Wrap-up --- # What We've Covered -- + Introduction to databases -- + Introduction to SQL -- + Using SQL in Excel to gather and analyze data -- + Did we miss anything? --- # Optimizing queries -- + What if you try to load data and it takes too long? -- + Try to minimize the number of *rows* and *columns* you are pulling out of the database -- + Remember you can use `SELECT` to limit the columns -- + And you can use `WHERE` to limit the rows -- + If you only need a summary of the data, consider using aggregation (`GROUP BY`) ??? + Sometimes you can write inefficient queries + Common pitfalls, how to recognize a slow query + Common ways to speed things up (such only selecting what you need to select) + Mention indexes? --- # SSMS ![img-center-100](images/ssms_query.png) --- # SSMS ![img-center-80](images/ssms_query_1.png) --- # SSMS + Now that you know some SQL, you may be interested in SSMS (SQL Server Management Studio) -- + SSMS is a free tool for querying databases -- + You might test an SQL query in SSMS, then use it in Excel -- + If you don’t already have it installed, email the IT Help Desk at
Parks-IT.HelpDesk@parks.nyc.gov
and mention that you attended the Advanced Data Analysis with SQL class --- # Resources ??? + Facilitator provides key resources for continued learning -- + [SQL: One of the Most Valuable Skills](http://www.craigkerstiens.com/2019/02/12/sql-most-valuable-skill/) -- - more about why you might want to learn SQL -- + [Microsoft SQL Documentation](https://docs.microsoft.com/en-us/sql/?view=sql-server-2017) -- - Includes anything you might want to know about SQL Server -- + [SQL Zoo](https://sqlzoo.net/wiki/Main_Page) -- - Interactive SQL tutorial -- + [SQL Server Management Studio Intro](https://www.youtube.com/watch?v=moCgS2kfxnU) -- - Video overview -- + [SQL Server Management Studio tutorials](https://docs.microsoft.com/en-us/sql/ssms/tutorials/tutorial-sql-server-management-studio?view=sql-server-2017) -- - Microsoft's tutorials for SSMS -- + [Copy of today's handout](workbook.pdf) -- + [Datapolitan training classes](http://training.datapolitan.com/) --- # .center[Contact Information] ## Eric Brelsford
Email: ebrelsford[at]gmail[dot]com
Twitter:
@ebrelsford
## Richard Dunks
Email: richard[at]datapolitan[dot]com
Website:
http://www.datapolitan.com
Twitter:
@datapolitan
--- class:center, middle # THANK YOU! # Please take a moment to
provide your feedback on class today
## And checkout [the dashboard of your responses](https://datastudio.google.com/s/kSGuvoygzjg)