How to speak to databases
Databases are the backbone of most of the digital world, powering everything from websites and mobile applications to software programs. And in order to communicate with them, you need to speak their native tongue, which is SQL.
SQL, or Structured Query Language, is a standard language used to communicate with relational databases. If you want to retrieve, add, delete or edit information on a database, the easiest way to do it is through SQL. If you have no idea what a database is, you can visualize it for now as a spreadsheet (Excel file) : a collection of rows and columns. Altough we will see later that there are differences between these 2 systems, but it’s better to simplify things than complicate them, especially when you’re just starting.
Although SQL is not the only language in the market, especially with the rise of Big Data technologies, but it’s still the dominant and most popular way to communicate with databases :
As the name suggests, this languages uses queries in order to communicate. A query is simply a request to you ask the database. Queries are divided into two major categories: data definition language (DDL) and data manipulation language (DML).
First things first
Let’s focus first on the DML queries since these are the ones used 90% of the time when handling data, and most of those times you’ll be using this one magic query : SELECT.
Here is the anatomy of a SELECT query, through the most common commands tested during SQL job interviews :
- `SELECT` — used to select specific columns from a table
- `FROM` — used to specify the table that contains the columns you are SELECT’ing
- `WHERE` — used to specify which rows to pick
- `GROUP BY` — used to group rows with similar values together
- `HAVING` — used to specify which groups to include, that were formed by the GROUP BY clause.
- `ORDER BY` — used to order the rows in the result set, either in ascending or descending order
- `LIMIT` — used to limit the number of rows returned
Meet the joins
One of the most useful features of SQL is its ability to join data from different sources. Merging two data sets can be accomplished through JOINS. A JOIN is a SQL instruction in the FROM clause of your query that is used to identify the tables you are querying and how they should be combined.
There different types of joins, here are the most common ones :
- An Outer join combines the columns from all tables on one or more common dimension when possible, and includes all data from all tables.
- An Inner join only includes data for the columns that share the same values in the common N column(s)
- A Left join returns all the values from an Inner join, plus all the other values of the Left table. You can also use a right Join if you want to keep the values of the right table.
L’Union fait la force
Another useful way to combine data in SQL is through UNION.
A Union will stack tables that have the exact same structure (same columns) on top of each other resulting in new rows.
In the next post we can check out some other useful SQL functionalities.
Friendly reminder, learning programming is 10% tutorials and 90% actually writing code, so if you want to start manipulating data using SQL, head to w3schools, there you will find interactive exercices to practice talking with machines.