Resolving ehrQL errors
ehrQL error messagesπ
If an error is found in your dataset definition. ehrQL will stop running and give you an error message. ehrQL error messages are shown as a Python error report, known as a "traceback".
The error messages are from Python because ehrQL runs in Python.
These error messages can be confusing to read, but they also give you lots of information to use to debug and fix your dataset definition.
Example error messageπ
Let's look at an example of an error report:
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 7, in <module>
dataset._age = age
^^^^^^^^^^^^^
AttributeError: Variable names must start with a letter, and contain only alphanumeric characters and underscores (you defined a variable '_age')
- The traceback tells you what code actually caused the error. The traceback shows both the filename and where the error occurred in the file.
- There is an error message at the end.
The error message shows what kind of error occurred β
here, this is an
AttributeErrorβ followed by details of what the problem is.
How to use this pageπ
Structure of this pageπ
For each error, there is:
- a simple code example that causes the error
- the error details
- the simple code example modified to fix the error or guidance on how to resolve the error
Finding an error on this pageπ
If you are working with ehrQL, and encounter an error, this page may help you.
Because of the included code examples and errors, this is a long page.
Here are some tips on narrowing down the search
Using the table of contentsπ
Skimming the table of contents navigation bar on the right-hand side of this page, to see if any of the general descriptions of errors apply to what you are trying to do.
Using your browser's "Find text in page" featureπ
Using the "Find text in page" feature of your browser, searching for parts of the error report. Let's look at the example given above again:
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 7, in <module>
dataset._age = age
^^^^^^^^^^^^^
AttributeError: Variable names must start with a letter, and contain only alphanumeric characters and underscores (you defined a variable '_age')
The first part of this traceback depends on the specific code that has been written here. It shows:
- the name of the file β
dataset_definition.pystored in theanalysisdirectory - the line number in the file causing the error β line 7
- the line of code causing the error
All of these will vary depending on the code being run. These are useful to point you to where your error is.
However, they are possibly less useful to search for in the list provided here,
because this part of the error report will vary.
What will stay more constant is the final error message.
Searching in this page for parts of that line,
for example AttributeError or Variable names must start with a letter
may show you the relevant error.
This page covers many of the common ehrQL errors you may see,
but is not an exhaustive list.
Notice that even the error message may contain references to the precise code.
In this example:
you defined a variable '_age'.
Can you find the part of this page that does explain this error?
Python syntax errorsπ
These can occur because Python has its own syntactic rules that ehrQL code must also adhere to.
Code indentation errorπ
Python has particular rules about indentation. If a dataset definition contains indentation errors, the error message will tell you about them. For example, there is an indentation error in the following dataset definition.
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.age = patients.age_on("2023-01-01")
dataset.define_population(dataset.age > 16) # This line has incorrect indentation.
Run the dataset definition with:
opensafely exec ehrql:v1 generate-dataset analysis/dataset_definition.py
Errorπ
Error loading file 'analysis/dataset_definition.py':
File "/workspace/analysis/dataset_definition.py", line 6
dataset.define_population(dataset.age > 16)
IndentationError: unexpected indent
The error message tells us that there is an indentation error, and also the line that the error occurred on.
Fixed dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.age = patients.age_on("2023-01-01")
dataset.define_population(dataset.age > 16) # This line now has correct indentation.
Forbidden feature namesπ
Python has constraints on allowed variable names, which also apply to the names of dataset features.
For example, a name β age! β with a non-alphanumeric character is invalid:
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.age! = patients.age_on("2023-01-01") # age! is an invalid feature name.
Run the dataset definition with:
opensafely exec ehrql:v1 generate-dataset analysis/dataset_definition.py
Errorπ
Error loading file 'analysis/dataset_definition.py':
File "/workspace/analysis/dataset_definition.py", line 5
dataset.age! = patients.age_on("2023-01-01") # age! is an invalid feature name.
^
SyntaxError: invalid syntax
Fixed dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.age = patients.age_on("2023-01-01") # We have changed the invalid feature name, "age!", to a valid one, "age".
Common ehrQL errorsπ
These errors are specific to ehrQL, rather than Python.
Forgetting to set a populationπ
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.age = patients.age_on("2023-01-01")
Run the dataset definition with:
opensafely exec ehrql:v1 generate-dataset analysis/dataset_definition.py
Errorπ
A population has not been defined; define one with define_population()
Fixed dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.age = patients.age_on("2023-01-01")
dataset.define_population(dataset.age > 16) # Here we have now defined a population for the dataset.
Invalid feature name: population is a reserved nameπ
There are a few constraints on feature names in ehrQL.
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.population = patients.age_on("2023-01-01") > 16
Run the dataset definition with:
opensafely exec ehrql:v1 generate-dataset analysis/dataset_definition.py
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 6, in <module>
dataset.population = patients.age_on("2023-01-01") > 16
^^^^^^^^^^^^^^^^^^
AttributeError: Cannot set variable 'population'; use define_population() instead
Fixed dataset definition
π
Define population with the define_population syntax:
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.define_population(patients.age_on("2023-01-01") > 16)
Or rename the feature, if it is required as a separate output:
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.over_16 = patients.age_on("2023-01-01") > 16
Invalid feature name: variables is a reserved nameπ
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.variables = patients.age_on("2023-01-01") > 16
...
Run the dataset definition with:
opensafely exec ehrql:v1 generate-dataset analysis/dataset_definition.py
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 5, in <module>
dataset.variables = patients.age_on("2023-01-01") > 16
^^^^^^^^^^^^^^^^^
AttributeError: 'variables' is not an allowed variable name
Fixed dataset definition
π
Rename the feature to something other than variables.
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.age_greater_than_16 = patients.age_on("2023-01-01") > 16
...
Invalid feature name: feature names must not start with underscoresπ
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.define_population(age > 16)
dataset._age = age
Run the dataset definition with:
opensafely exec ehrql:v1 generate-dataset analysis/dataset_definition.py
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 7, in <module>
dataset._age = age
^^^^^^^^^^^^^
AttributeError: Variable names must start with a letter, and contain only alphanumeric characters and underscores (you defined a variable '_age')
Fixed data definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.define_population(age > 16)
dataset.age = age # _age feature renamed to remove the leading underscores.
Re-defining a featureπ
In the following dataset definition, dataset.age is first defined as age and then defined again as age1.
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2000-01-01")
age1 = patients.age_on("2023-01-01")
dataset.define_population(age > 16)
dataset.age = age
dataset.age = age1
Run the dataset definition with:
opensafely exec ehrql:v1 generate-dataset analysis/dataset_definition.py
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 9, in <module>
dataset.age = age1
^^^^^^^^^^^
AttributeError: 'age' is already set and cannot be reassigned
Fixed dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2000-01-01")
age1 = patients.age_on("2023-01-01")
dataset.define_population(age > 16)
dataset.age = age
dataset.age1 = age1 # The second age feature now has a unique name on the dataset
Undefined featuresπ
All features set on a dataset must be defined; in the following dataset, age has been
defined on its own, but has not been defined when set on the dataset:
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2000-01-01")
dataset.define_population(age > 16)
dataset.age
Run the dataset definition with:
opensafely exec ehrql:v1 generate-dataset analysis/dataset_definition.py
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 8, in <module>
dataset.age
AttributeError: Variable 'age' has not been defined
Fixed dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2000-01-01")
dataset.define_population(age > 16)
dataset.age = age # dataset.age is now defined
Trying to set a feature that has more than one row per patientπ
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.tpp import practice_registrations
dataset = create_dataset()
dataset.registered_on = practice_registrations.start_date
The practice_registrations table contains multiple rows per patient.
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 5, in <module>
dataset.registered_on = practice_registrations.start_date
^^^^^^^^^^^^^^^^^^^^^
TypeError: Invalid variable 'registered_on'. Dataset variables must return one row per patient
Fixed dataset definition
π
To return the latest registered_on date, first sort the practice registrations table, find the
last registration for each patient, and then get the start date.
from ehrql import create_dataset
from ehrql.tables.tpp import practice_registrations
dataset = create_dataset()
latest_registration_per_patient = practice_registrations.sort_by(practice_registrations.start_date).last_for_patient()
dataset.registered_on = latest_registration_per_patient.start_date
Trying to set a feature to a row rather than a valueπ
In the following dataset definition, we have reduce the practice registrations to one row per patient, but we have not selected a value as the feature:
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.tpp import practice_registrations
dataset = create_dataset()
dataset.registered_on = practice_registrations.sort_by(practice_registrations.start_date).last_for_patient()
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 5, in <module>
dataset.registered_on = practice_registrations.sort_by(practice_registrations.start_date).last_for_patient()
^^^^^^^^^^^^^^^^^^^^^
TypeError: Invalid variable 'registered_on'. Dataset variables must be values not whole rows
Fix the dataset definition by setting the feature to a single value, in this case, start_date.
Fixed dataset definition
π
from ehrql import create_dataset
from ehrql.tables.tpp import practice_registrations
dataset = create_dataset()
latest_registration_per_patient = practice_registrations.sort_by(practice_registrations.start_date).last_for_patient()
dataset.registered_on = latest_registration_per_patient.start_date
Type errors in ehrQL expressionsπ
Many ehrQL comparisons require the elements being compared to be of the same type.
In the following dataset definition, age is an integer, but in the last line we
try to define the population by comparing age to the string "10"
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.define_population(age >= "10")
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 6, in <module>
dataset.define_population(age >= "10")
^^^^^^^^^^^
ehrql.query_model.nodes.TypeValidationError: GE.rhs requires 'ehrql.query_model.nodes.Series[int]' but got 'ehrql.query_model.nodes.Series[str]'
Fixed dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.define_population(age >= 10) # age is now being compared to the integer 10
Invalid keywords "and", "or", "not"π
In normal Python, logical operations can be performed using the keywords and, or and not. In ehrQL
these are prohibited and will raise an error.
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.define_population((age >= 16) and (age <= 80))
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 6, in <module>
dataset.define_population((age >= 16) and (age <= 80))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: The keywords 'and', 'or', and 'not' cannot be used with ehrQL, please use the operators '&', '|' and '~' instead.
(You will also see this error if you try use a chained comparison, such as 'a < b < c'.)
Fixed dataset definition
π
As described in the error message, use the operator & instead:
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.define_population((age >= 16) & (age <= 80))
Chaining comparisonsπ
Chained comparisons are not allowed in ehrQL.
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.define_population(16 < age <= 80)
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 6, in <module>
dataset.define_population(16 < age <= 80)
^^^^^^^^^^^^^^
TypeError: The keywords 'and', 'or', and 'not' cannot be used with ehrQL, please use the operators '&', '|' and '~' instead.
(You will also see this error if you try use a chained comparison, such as 'a < b < c'.)
Fixed dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.define_population((age >= 16) & (age <= 80))
Trying to perform arithmetic operations with an integer column and a float constantπ
In the following dataset, age is an integer. We cannot subtract a float from it.
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.age_minus_5 = age - 5.5
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 6, in <module>
dataset.age_minus_5 = age - 5.5
~~~~^~~~~
ehrql.query_model.nodes.TypeValidationError: Subtract.rhs requires 'ehrql.query_model.nodes.Series[int]' but got 'ehrql.query_model.nodes.Series[float]'
Fixed dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.age_minus_5 = age - 5
Calculate a date difference without specifying return unitsπ
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.age_in_may = "2023-05-01" - patients.date_of_birth
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 5, in <module>
dataset.age_in_may = "2023-05-01" - patients.date_of_birth
^^^^^^^^^^^^^^^^^^
TypeError: Invalid variable 'age_in_may'. Dataset variables must be values not whole rows
To fix this error, specify the units of the date difference that you want in the feature:
Fixed dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.age_in_may = ("2023-05-01" - patients.date_of_birth).years
Trying to subtract/add constants to datesπ
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.date_at_age_16 = patients.date_of_birth + 16
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 5, in <module>
dataset.date_at_age_16 = patients.date_of_birth + 16
~~~~~~~~~~~~~~~~~~~~~~~^~~~
TypeError: unsupported operand type(s) for +: 'DatePatientSeries' and 'int'
ehrQL cannot add an integer to a date - it needs to know what sort of time unit we are adding (days, months, years).
Fixed dataset definition
π
from ehrql import create_dataset, years
from ehrql.tables.core import patients
dataset = create_dataset()
dataset.date_at_age_16 = patients.date_of_birth + years(16)
Incorrectly referencing a table columnπ
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import clinical_events
dataset = create_dataset()
first_event = clinical_events.sort_by(date).first_for_patient()
dataset.event_date = first_event.date
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 5, in <module>
first_event = clinical_events.sort_by(date).first_for_patient()
^^^^
NameError: name 'date' is not defined
Fixed dataset definition
π
Columns must be specified as the table attribute:
from ehrql import create_dataset
from ehrql.tables.core import clinical_events
dataset = create_dataset()
first_event = clinical_events.sort_by(clinical_events.date).first_for_patient()
dataset.event_date = first_event.date
Specifying a default for case which is a different type to the valuesπ
In the following dataset definition, two age groups are defined as integers (1 and 2). A default value (for patients who don't fall into one of the categories) is defined as "unknown". This is an error - any default value given for a case statement must be of the same type (or None).
Failing dataset definition
π
from ehrql import create_dataset, case, when
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.age_group = case(
when(age < 10).then(1),
when(age > 80).then(2),
otherwise="unknown",
)
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 7, in <module>
dataset.age_group5 = case(
^^^^^
ehrql.query_model.nodes.TypeValidationError: Case.default requires 'ehrql.query_model.nodes.Series[int] | None' but got 'ehrql.query_model.nodes.Series[str]'
Fixed dataset definition
π
from ehrql import create_dataset, case, when
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.age_group = case(
when(age < 10).then(1),
when(age > 80).then(2),
otherwise=0,
)
Using is_in without a containerπ
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.age_30 = age.is_in(30)
Errorπ
Traceback (most recent call last):
File "/workspace/analysis/dataset_definition.py", line 7, in <module>
dataset.age_30 = age.is_in(30)
^^^^^^^^^^^^^
ehrql.query_model.nodes.TypeValidationError: In.rhs requires 'ehrql.query_model.nodes.Series[collections.abc.Set[int]]' but got 'ehrql.query_model.nodes.Series[int]'
This is also an error:
dataset.age_30_or_40 = age.is_in(30, 40)
Fixed dataset definition
π
Arguments passed to is_in must be wrapped in a python container - a set, list or tuple.
All of the following features defined with is_in are valid.
from ehrql import create_dataset
from ehrql.tables.core import patients
dataset = create_dataset()
age = patients.age_on("2023-01-01")
dataset.age_30_list = age.is_in([30])
dataset.age_30_or_40_set = age.is_in({30, 40})
dataset.age_30_or_40_tuple = age.is_in((30, 40))
Permissions error during local developmentπ
If your definition file uses tables or features that require special permission,
you will need to use claim_permissions to run the code locally.
If you have not done this, ehrQL will raise an error that tells you exactly what code you need to add.
See the permissions documentation for more information.
Failing dataset definition
π
from ehrql import create_dataset
from ehrql.tables.tpp import patients, appointments
dataset = create_dataset()
dataset.define_population(patients.exists_for_patient())
dataset.start_date = (
appointments.sort_by(appointments.start_date).last_for_patient().start_date
)
Errorπ
EHRQLPermissionError: Some of the tables or features you are using require special permission to use with real
patient data. The permissions needed are:
* appointments: required for access to the `tpp.appointments` table
You can continue to work on your code using dummy data by βclaimingβ the required permisions:
from ehrql import claim_permissions
claim_permissions("appointments")
Note that you will only be able to run your code against real data if you actually have these
permissions assigned by the OpenSAFELY team. For more information see:
https://docs.opensafely.org/ehrql/reference/language/#permissions
Fixed dataset definition
π
from ehrql import claim_permissions, create_dataset
from ehrql.tables.tpp import patients, appointments
claim_permissions("appointments")
dataset = create_dataset()
dataset.define_population(patients.exists_for_patient())
dataset.start_date = (
appointments.sort_by(appointments.start_date).last_for_patient().start_date
)
Permissions error on the backendπ
If your definition file uses tables or features that require special permission, your project must be assigned the appropriate permissions by the OpenSAFELY team.
If your project is missing the necessary permissions, ehrQL will raise an error.
See the permissions documentation for more information.
Failing dataset definition
π
Note that this dataset definition will run fine locally.
from ehrql import claim_permissions, create_dataset
from ehrql.tables.tpp import patients, appointments
claim_permissions("appointments")
dataset = create_dataset()
dataset.define_population(patients.exists_for_patient())
dataset.start_date = (
appointments.sort_by(appointments.start_date).last_for_patient().start_date
)
Errorπ
This error message is shown in the log output in Airlock.
The job error message (available on OpenSAFELY Jobs) simply states "You do not have the required permissions for the ehrQL you are trying to run".
EHRQLPermissionError: You do not currently have all the permissions needed for this action.
Missing permissions are:
* appointments: required for access to the `tpp.appointments` table
If you think this is a mistake and that you should have these permissions please contact OpenSAFELY support.
How to resolve the error
π
You should contact OpenSAFELY support if you have not been granted the necessary permissions for your project.