SQL Queries You Need to Know in Health Informatics
In this blog post, I am going to go over some common SQL queries that are used in the health informatics field. Experience writing SQL queries is a common skill listed on many health informatics roles because knowing how to "slice and dice" raw, unstructured into actionable insights can be a large part of the role. Healthcare data is oftentimes messy and can come from multiple sources: it can be entered in manually, it can be sourced from the EHR/EMR, it can come from labs, the possibilities are endless! Nevertheless it may be your role to make sense of all this data and tell the story of the patient's medical history.
SQL Joins
SQL joins are one of the first commands that you learn when programming in SQL. Some of the important syntax you will want to know from your SQL queries is how to join different tables, which houses your healthcare data, using left, right, inner and outer joins.
In the below image I have two example tables, Table 1 has some patient demographic information such as the patient's name and gender and Table 2 indicates if the patient has a positive or negative COVID-19 test. We often will want to join information from matching rows in one table together with another table so we can view the data as one large table and do further analysis on this single table (e.g. counting the number of rows in this single table).
To "combine" data from Table 1 and Table 2 into one table which contains both Patient Demographic and Patient Health Condition information, we need to use the join commands and have a common column which is shared between the tables we are joining. In this case, the common column or key between Patient Demographics and Patient Health Condition which we will join on, is the Patient Health Card Number column.
With the Left Join command when you join Table 1 and Table 2 together on the Patient Health Card Number columns in each table, you will end up a table with just two rows containing Joaquin and Joanna's records because a Left Join joins all the rows in the left table and all the matching rows in the right table and Billy, Jillian and Chad are not present in the left table which is Table 1.
Similarly with the Right Join when you join Table 1 and Table 2 together you will have three nulls with your Right Join because a Right Join joins all the rows in the right table and all the matching tables in the left table which are are just Joaquin and Joanna's records.
The other types of Joins that are common with SQL queries are Inner Joins which returns only matching records which are both in Table 1 and Table 2 and a Full Join which is usually used for specific business cases where you return records both in Table 1 and Table 2 and will output nulls when there is no matching between tables.
Once you have joined your tables together to show the relevant data you want to query, that's when one can start getting into using aggregations such as COUNT, DISTINCT, SUM, AVERAGE and much more, to output certain calculations.
For instance, one query you might write:
SELECT COUNT(Patient Health Card Number. Patient Health Condition)
FROM Patient Demographics
LEFT JOIN Patient Health Condition
ON Patient Demographics. Patient Health Card Number = Patient Health Condition. Patient Health Card Number
The result of this query counts the number of patients who have matching records between Table 1 and Table 2. In this toy example, this can be eye-balled but when you have hundreds or thousands of patients being added daily to your database, writing these quick queries are essential for getting quick insights from your data.
Wildcard statements
Wildcard statements can allow you to filter the data by selecting specific conditions. For instance, if you only wanted to count the number of patients who have a first name which starts with "J" the query would look like this:
SELECT COUNT(Patient Health Card Number. Patient Health Condition)
FROM Patient Demographics
LEFT JOIN Patient Health Condition
ON Patient Demographics. Patient Health Card Number = Patient Health Condition. Patient Health Card Number
WHERE Patient Name LIKE 'J%'
Group By
Another important function when learning SQL is to learn how to use "Group By" statements, as the name suggests, this function groups rows by certain criteria. For instance you could group your patients by, have they been diagnosed with a positive or negative test for COVID-19.
SELECT COUNT(Patient Health Card Number. Patient Health Condition)
FROM Patient Demographics
LEFT JOIN Patient Health Condition
ON Patient Demographics. Patient Health Card Number = Patient Health Condition. Patient Health Card Number
WHERE Patient Name LIKE 'J%'
GROUP BY Patient Health Condition
Order By
Order by allows you to order the dataset in the table either by ascending or descending order, in this below example Joanna would appear in the table before Joaquin:
SELECT COUNT(Patient Health Card Number. Patient Health Condition)
FROM Patient Demographics
LEFT JOIN Patient Health Condition
ON Patient Demographics. Patient Health Card Number = Patient Health Condition. Patient Health Card Number
WHERE Patient Name LIKE 'J%'
GROUP BY Patient Health Condition
ORDER BY Patient Name ASC
These are just a few SQL functions that can be used to "slice and dice" the data into a format that can then be integrated into a tool such as R, Python or Power BI for further analysis and visualization. Be on the lookout for more posts on tips on how to visualize healthcare data! Comment down below, what SQL functions do you use most often in your day-to-day?