{ "cells": [ { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The sql extension is already loaded. To reload it, use:\n", " %reload_ext sql\n", "Done.\n", "Done.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%load_ext sql\n", "%sql sqlite:///complaint.db\n", "import time\n", "%sql drop index if exists helpful_index;\n", "%sql analyze" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Activity 12: Indexes\n", "===========\n", "\n", "Let's play with the [consumer complaint database](https://catalog.data.gov/dataset/consumer-complaint-database) from data.gov" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", "
count(*)
471871
" ], "text/plain": [ "[(471871,)]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql select count(*) from complaints;" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Date_receivedProductSubproductIssueSubissueConsumer_narrativeCompany_public_responseCompanyStateZIP_codeSubmitted_viaDate_sent_to_companyCompany_responseTimely_responseConsumer_disputedComplaint_ID
10/28/2015Debt collectionOther (phone, health club, etc.)Cont'd attempts collect debt not owedDebt was paidCompany believes it acted appropriately as authorized by contract or lawFirst National Collection Bureau, Inc.CO810XXWeb10/28/2015Closed with explanationYes1629130
10/28/2015Credit reportingIncorrect information on credit reportPublic recordCompany chooses not to provide a public responseFNIS (Fidelity National Information Services, Inc.)MI48334Web10/28/2015Closed with explanationYes1628785
10/28/2015Money transfersInternational money transferFraud or scamNavy FCUVA221XXWeb10/28/2015In progressYes1629655
10/28/2015Debt collectionCont'd attempts collect debt not owedDebt is not mineEOS Holdings, Inc.FL321XXWeb10/28/2015Closed with explanationYesYes1629306
10/28/2015Debt collectionMedicalCont'd attempts collect debt not owedDebt was paidTransworld Systems Inc.CA961XXWeb10/28/2015In progressYes1629407
" ], "text/plain": [ "[(u'10/28/2015', u'Debt collection', u'Other (phone, health club, etc.)', u\"Cont'd attempts collect debt not owed\", u'Debt was paid', u'', u'Company believes it acted appropriately as authorized by contract or law', u'First National Collection Bureau, Inc.', u'CO', u'810XX', u'Web', u'10/28/2015', u'Closed with explanation', u'Yes', u'', u'1629130'),\n", " (u'10/28/2015', u'Credit reporting', u'', u'Incorrect information on credit report', u'Public record', u'', u'Company chooses not to provide a public response', u'FNIS (Fidelity National Information Services, Inc.)', u'MI', u'48334', u'Web', u'10/28/2015', u'Closed with explanation', u'Yes', u'', u'1628785'),\n", " (u'10/28/2015', u'Money transfers', u'International money transfer', u'Fraud or scam', u'', u'', u'', u'Navy FCU', u'VA', u'221XX', u'Web', u'10/28/2015', u'In progress', u'Yes', u'', u'1629655'),\n", " (u'10/28/2015', u'Debt collection', u'', u\"Cont'd attempts collect debt not owed\", u'Debt is not mine', u'', u'', u'EOS Holdings, Inc.', u'FL', u'321XX', u'Web', u'10/28/2015', u'Closed with explanation', u'Yes', u'Yes', u'1629306'),\n", " (u'10/28/2015', u'Debt collection', u'Medical', u\"Cont'd attempts collect debt not owed\", u'Debt was paid', u'', u'', u'Transworld Systems Inc.', u'CA', u'961XX', u'Web', u'10/28/2015', u'In progress', u'Yes', u'', u'1629407')]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql select * from complaints limit 5;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1: Query without an index\n", "\n", "First, let's start off by writing a query to find the **counts of the top 5 Product, State pairs** in the complaints database (return the product and state as well as the count). Use the single-line syntax for simple timing so we can see how long the query takes:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n", "CPU times: user 791 ms, sys: 78.3 ms, total: 869 ms\n", "Wall time: 888 ms\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ProductStatec
MortgageCA29434
MortgageFL18659
Debt collectionCA11066
MortgageNY10002
Credit reportingCA9402
" ], "text/plain": [ "[(u'Mortgage', u'CA', 29434),\n", " (u'Mortgage', u'FL', 18659),\n", " (u'Debt collection', u'CA', 11066),\n", " (u'Mortgage', u'NY', 10002),\n", " (u'Credit reporting', u'CA', 9402)]" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time %sql SELECT product, state, count(*) AS c FROM complaints GROUP BY product, state ORDER BY c DESC LIMIT 5;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exerise 2: Single search key index\n", "\n", "Now create a _single-key_ index such that the above query is faster! The syntax to create an index in SQL is:\n", "> DROP INDEX IF EXISTS index_name;\n", "> CREATE INDEX index_name ON table(attributes);" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n", "Done.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "DROP INDEX IF EXISTS state_index;\n", "CREATE INDEX state_index ON complaints(state);" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n", "CPU times: user 791 ms, sys: 71.9 ms, total: 862 ms\n", "Wall time: 879 ms\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ProductStatec
MortgageCA29434
MortgageFL18659
Debt collectionCA11066
MortgageNY10002
Credit reportingCA9402
" ], "text/plain": [ "[(u'Mortgage', u'CA', 29434),\n", " (u'Mortgage', u'FL', 18659),\n", " (u'Debt collection', u'CA', 11066),\n", " (u'Mortgage', u'NY', 10002),\n", " (u'Credit reporting', u'CA', 9402)]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time %sql SELECT product, state, count(*) AS c FROM complaints GROUP BY product, state ORDER BY c DESC LIMIT 5;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3\n", "\n", "Now, create a _covering_ index for the query and then see how long it takes to run!" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n", "Done.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "DROP INDEX IF EXISTS state_product_index;\n", "CREATE INDEX state_product_index ON complaints(state, product);" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n", "CPU times: user 110 ms, sys: 6.58 ms, total: 117 ms\n", "Wall time: 116 ms\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
ProductStatec
MortgageCA29434
MortgageFL18659
Debt collectionCA11066
MortgageNY10002
Credit reportingCA9402
" ], "text/plain": [ "[(u'Mortgage', u'CA', 29434),\n", " (u'Mortgage', u'FL', 18659),\n", " (u'Debt collection', u'CA', 11066),\n", " (u'Mortgage', u'NY', 10002),\n", " (u'Credit reporting', u'CA', 9402)]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time %sql SELECT product, state, count(*) AS c FROM complaints GROUP BY product, state ORDER BY c DESC LIMIT 5;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3(b):\n", "\n", "Use EXPLAIN to see if sqlite used/recognized your covering index. EXPLAIN is an operator that tells SQL to explain its query plan... we'll look into this in more depth later. For now, the syntax is:\n", "> EXPLAIN QUERY PLAN your_query_here;" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
selectidorderfromdetail
000SCAN TABLE complaints USING COVERING INDEX state_product_index
000USE TEMP B-TREE FOR ORDER BY
" ], "text/plain": [ "[(0, 0, 0, u'SCAN TABLE complaints USING COVERING INDEX state_product_index'),\n", " (0, 0, 0, u'USE TEMP B-TREE FOR ORDER BY')]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "EXPLAIN QUERY PLAN\n", "SELECT product, state, count(*) AS c\n", "FROM complaints\n", "GROUP BY product, state ORDER BY c DESC LIMIT 5;" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }