{
"cells": [
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"The sql extension is already loaded. To reload it, use:\n",
" %reload_ext sql\n",
"Done.\n",
"Done.\n"
]
},
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%load_ext sql\n",
"%sql sqlite:///complaint.db\n",
"import time\n",
"%sql drop index if exists helpful_index;\n",
"%sql analyze"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Activity 12: Indexes\n",
"===========\n",
"\n",
"Let's play with the [consumer complaint database](https://catalog.data.gov/dataset/consumer-complaint-database) from data.gov"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n"
]
},
{
"data": {
"text/html": [
"
\n",
" \n",
" count(*) | \n",
"
\n",
" \n",
" 471871 | \n",
"
\n",
"
"
],
"text/plain": [
"[(471871,)]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql select count(*) from complaints;"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" Date_received | \n",
" Product | \n",
" Subproduct | \n",
" Issue | \n",
" Subissue | \n",
" Consumer_narrative | \n",
" Company_public_response | \n",
" Company | \n",
" State | \n",
" ZIP_code | \n",
" Submitted_via | \n",
" Date_sent_to_company | \n",
" Company_response | \n",
" Timely_response | \n",
" Consumer_disputed | \n",
" Complaint_ID | \n",
"
\n",
" \n",
" 10/28/2015 | \n",
" Debt collection | \n",
" Other (phone, health club, etc.) | \n",
" Cont'd attempts collect debt not owed | \n",
" Debt was paid | \n",
" | \n",
" Company believes it acted appropriately as authorized by contract or law | \n",
" First National Collection Bureau, Inc. | \n",
" CO | \n",
" 810XX | \n",
" Web | \n",
" 10/28/2015 | \n",
" Closed with explanation | \n",
" Yes | \n",
" | \n",
" 1629130 | \n",
"
\n",
" \n",
" 10/28/2015 | \n",
" Credit reporting | \n",
" | \n",
" Incorrect information on credit report | \n",
" Public record | \n",
" | \n",
" Company chooses not to provide a public response | \n",
" FNIS (Fidelity National Information Services, Inc.) | \n",
" MI | \n",
" 48334 | \n",
" Web | \n",
" 10/28/2015 | \n",
" Closed with explanation | \n",
" Yes | \n",
" | \n",
" 1628785 | \n",
"
\n",
" \n",
" 10/28/2015 | \n",
" Money transfers | \n",
" International money transfer | \n",
" Fraud or scam | \n",
" | \n",
" | \n",
" | \n",
" Navy FCU | \n",
" VA | \n",
" 221XX | \n",
" Web | \n",
" 10/28/2015 | \n",
" In progress | \n",
" Yes | \n",
" | \n",
" 1629655 | \n",
"
\n",
" \n",
" 10/28/2015 | \n",
" Debt collection | \n",
" | \n",
" Cont'd attempts collect debt not owed | \n",
" Debt is not mine | \n",
" | \n",
" | \n",
" EOS Holdings, Inc. | \n",
" FL | \n",
" 321XX | \n",
" Web | \n",
" 10/28/2015 | \n",
" Closed with explanation | \n",
" Yes | \n",
" Yes | \n",
" 1629306 | \n",
"
\n",
" \n",
" 10/28/2015 | \n",
" Debt collection | \n",
" Medical | \n",
" Cont'd attempts collect debt not owed | \n",
" Debt was paid | \n",
" | \n",
" | \n",
" Transworld Systems Inc. | \n",
" CA | \n",
" 961XX | \n",
" Web | \n",
" 10/28/2015 | \n",
" In progress | \n",
" Yes | \n",
" | \n",
" 1629407 | \n",
"
\n",
"
"
],
"text/plain": [
"[(u'10/28/2015', u'Debt collection', u'Other (phone, health club, etc.)', u\"Cont'd attempts collect debt not owed\", u'Debt was paid', u'', u'Company believes it acted appropriately as authorized by contract or law', u'First National Collection Bureau, Inc.', u'CO', u'810XX', u'Web', u'10/28/2015', u'Closed with explanation', u'Yes', u'', u'1629130'),\n",
" (u'10/28/2015', u'Credit reporting', u'', u'Incorrect information on credit report', u'Public record', u'', u'Company chooses not to provide a public response', u'FNIS (Fidelity National Information Services, Inc.)', u'MI', u'48334', u'Web', u'10/28/2015', u'Closed with explanation', u'Yes', u'', u'1628785'),\n",
" (u'10/28/2015', u'Money transfers', u'International money transfer', u'Fraud or scam', u'', u'', u'', u'Navy FCU', u'VA', u'221XX', u'Web', u'10/28/2015', u'In progress', u'Yes', u'', u'1629655'),\n",
" (u'10/28/2015', u'Debt collection', u'', u\"Cont'd attempts collect debt not owed\", u'Debt is not mine', u'', u'', u'EOS Holdings, Inc.', u'FL', u'321XX', u'Web', u'10/28/2015', u'Closed with explanation', u'Yes', u'Yes', u'1629306'),\n",
" (u'10/28/2015', u'Debt collection', u'Medical', u\"Cont'd attempts collect debt not owed\", u'Debt was paid', u'', u'', u'Transworld Systems Inc.', u'CA', u'961XX', u'Web', u'10/28/2015', u'In progress', u'Yes', u'', u'1629407')]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql select * from complaints limit 5;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 1: Query without an index\n",
"\n",
"First, let's start off by writing a query to find the **counts of the top 5 Product, State pairs** in the complaints database (return the product and state as well as the count). Use the single-line syntax for simple timing so we can see how long the query takes:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"CPU times: user 791 ms, sys: 78.3 ms, total: 869 ms\n",
"Wall time: 888 ms\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" Product | \n",
" State | \n",
" c | \n",
"
\n",
" \n",
" Mortgage | \n",
" CA | \n",
" 29434 | \n",
"
\n",
" \n",
" Mortgage | \n",
" FL | \n",
" 18659 | \n",
"
\n",
" \n",
" Debt collection | \n",
" CA | \n",
" 11066 | \n",
"
\n",
" \n",
" Mortgage | \n",
" NY | \n",
" 10002 | \n",
"
\n",
" \n",
" Credit reporting | \n",
" CA | \n",
" 9402 | \n",
"
\n",
"
"
],
"text/plain": [
"[(u'Mortgage', u'CA', 29434),\n",
" (u'Mortgage', u'FL', 18659),\n",
" (u'Debt collection', u'CA', 11066),\n",
" (u'Mortgage', u'NY', 10002),\n",
" (u'Credit reporting', u'CA', 9402)]"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time %sql SELECT product, state, count(*) AS c FROM complaints GROUP BY product, state ORDER BY c DESC LIMIT 5;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exerise 2: Single search key index\n",
"\n",
"Now create a _single-key_ index such that the above query is faster! The syntax to create an index in SQL is:\n",
"> DROP INDEX IF EXISTS index_name;\n",
"> CREATE INDEX index_name ON table(attributes);"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"Done.\n"
]
},
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"DROP INDEX IF EXISTS state_index;\n",
"CREATE INDEX state_index ON complaints(state);"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"CPU times: user 791 ms, sys: 71.9 ms, total: 862 ms\n",
"Wall time: 879 ms\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" Product | \n",
" State | \n",
" c | \n",
"
\n",
" \n",
" Mortgage | \n",
" CA | \n",
" 29434 | \n",
"
\n",
" \n",
" Mortgage | \n",
" FL | \n",
" 18659 | \n",
"
\n",
" \n",
" Debt collection | \n",
" CA | \n",
" 11066 | \n",
"
\n",
" \n",
" Mortgage | \n",
" NY | \n",
" 10002 | \n",
"
\n",
" \n",
" Credit reporting | \n",
" CA | \n",
" 9402 | \n",
"
\n",
"
"
],
"text/plain": [
"[(u'Mortgage', u'CA', 29434),\n",
" (u'Mortgage', u'FL', 18659),\n",
" (u'Debt collection', u'CA', 11066),\n",
" (u'Mortgage', u'NY', 10002),\n",
" (u'Credit reporting', u'CA', 9402)]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time %sql SELECT product, state, count(*) AS c FROM complaints GROUP BY product, state ORDER BY c DESC LIMIT 5;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 3\n",
"\n",
"Now, create a _covering_ index for the query and then see how long it takes to run!"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"Done.\n"
]
},
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"DROP INDEX IF EXISTS state_product_index;\n",
"CREATE INDEX state_product_index ON complaints(state, product);"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"CPU times: user 110 ms, sys: 6.58 ms, total: 117 ms\n",
"Wall time: 116 ms\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" Product | \n",
" State | \n",
" c | \n",
"
\n",
" \n",
" Mortgage | \n",
" CA | \n",
" 29434 | \n",
"
\n",
" \n",
" Mortgage | \n",
" FL | \n",
" 18659 | \n",
"
\n",
" \n",
" Debt collection | \n",
" CA | \n",
" 11066 | \n",
"
\n",
" \n",
" Mortgage | \n",
" NY | \n",
" 10002 | \n",
"
\n",
" \n",
" Credit reporting | \n",
" CA | \n",
" 9402 | \n",
"
\n",
"
"
],
"text/plain": [
"[(u'Mortgage', u'CA', 29434),\n",
" (u'Mortgage', u'FL', 18659),\n",
" (u'Debt collection', u'CA', 11066),\n",
" (u'Mortgage', u'NY', 10002),\n",
" (u'Credit reporting', u'CA', 9402)]"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time %sql SELECT product, state, count(*) AS c FROM complaints GROUP BY product, state ORDER BY c DESC LIMIT 5;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 3(b):\n",
"\n",
"Use EXPLAIN to see if sqlite used/recognized your covering index. EXPLAIN is an operator that tells SQL to explain its query plan... we'll look into this in more depth later. For now, the syntax is:\n",
"> EXPLAIN QUERY PLAN your_query_here;"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" selectid | \n",
" order | \n",
" from | \n",
" detail | \n",
"
\n",
" \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" SCAN TABLE complaints USING COVERING INDEX state_product_index | \n",
"
\n",
" \n",
" 0 | \n",
" 0 | \n",
" 0 | \n",
" USE TEMP B-TREE FOR ORDER BY | \n",
"
\n",
"
"
],
"text/plain": [
"[(0, 0, 0, u'SCAN TABLE complaints USING COVERING INDEX state_product_index'),\n",
" (0, 0, 0, u'USE TEMP B-TREE FOR ORDER BY')]"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"EXPLAIN QUERY PLAN\n",
"SELECT product, state, count(*) AS c\n",
"FROM complaints\n",
"GROUP BY product, state ORDER BY c DESC LIMIT 5;"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}