CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutSign UpSign In

Real-time collaboration for Jupyter Notebooks, Linux Terminals, LaTeX, VS Code, R IDE, and more,
all in one place. Commercial Alternative to JupyterHub.

| Download
Views: 175
Image: ubuntu2204
Kernel: Python 3 (Colab)

Exploratory Data Anaysis with SQL

I was tasked with discovering information by performing SQL inquiries, these were prompted queries to demonstrate proficiency in running SQL queries and subqueries.

%load_ext sql
import csv, sqlite3 con = sqlite3.connect("my_data1.db") cur = con.cursor()
%sql sqlite:///my_data1.db
import pandas as pd df = pd.read_csv("https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DS0321EN-SkillsNetwork/labs/module_2/data/Spacex.csv") df.to_sql("SPACEXTBL", con, if_exists='replace', index=False,method="multi")
101

Removing any blank rows from the table

%sql create table SPACEXTABLE as select * from SPACEXTBL where Date is not null
* sqlite:///my_data1.db Done.
[]

First I looked for the names of the unique launch sites in the space mission

%sql select DISTINCT "Launch_Site" from SPACEXTABLE
* sqlite:///my_data1.db Done.

Next I looked for records where launch sites begin with the string 'CCA'

%sql select * from SPACEXTABLE where "Launch_Site" LIKE 'CCA%' limit 5
* sqlite:///my_data1.db Done.

Next I looked for the total payload mass carried by boosters launched by NASA (CRS)

%sql select SUM(PAYLOAD_MASS__KG_) from SPACEXTABLE where "Customer" like 'NASA (CRS)%'
* sqlite:///my_data1.db Done.

Next I looked for the average payload mass carried by booster version F9 v1.1

%sql select AVG(PAYLOAD_MASS__KG_) from SPACEXTABLE where "Booster_Version" LIKE 'F9 v1.1%'
* sqlite:///my_data1.db Done.

Looking for the date when the first succesful landing outcome in ground pad was acheived.

%sql select MIN("Date") from SPACEXTABLE where "Landing_Outcome"='Success (ground pad)'
* sqlite:///my_data1.db Done.

Looking for the total number of successful and failure mission outcomes

%sql select "Mission_Outcome", count("Mission_Outcome") from SPACEXTABLE GROUP BY "Mission_Outcome"
* sqlite:///my_data1.db Done.

Looking for the records which will display the month names, failure landing_outcomes in drone ship ,booster versions, launch_site for the months in year 2015.

%sql SELECT CASE SUBSTR("Date", 6, 2)WHEN '01' THEN 'January'WHEN '02' THEN 'February'WHEN '03' THEN 'March'WHEN '04' THEN 'April'WHEN '05' THEN 'May'WHEN '06' THEN 'June'WHEN '07' THEN 'July'WHEN '08' THEN 'August'WHEN '09' THEN 'September'WHEN '10' THEN 'October'WHEN '11' THEN 'November'WHEN '12' THEN 'December'ELSE 'Unknown' END AS month,"Landing_Outcome"='Failure (drone ship)',"Booster_Version","Launch_Site"FROM SPACEXTABLE WHERE SUBSTR("Date", 0, 5) = '2015';
* sqlite:///my_data1.db Done.

Here I attempted to rank the landing outcomes by multiple criterion, and within a specific date range.

%sql select "Landing_Outcome",count("Landing_Outcome") from SPACEXTABLE where "Date" between 20100604 and 20170320 group by "Landing_Outcome" order by 2 desc
* sqlite:///my_data1.db Done.