Boolean arrays#
import numpy as np
Remember the problem of the onsets and reaction times.
We had the task of calculating the onset times of trials, given a file of trial inter-stimulus intervals, and response times.
import nipraxis
# Fetch the file.
stim_fname = nipraxis.fetch_file('24719.f3_beh_CHYM.csv')
# Show the filename.
stim_fname
Downloading file '24719.f3_beh_CHYM.csv' from 'https://raw.githubusercontent.com/nipraxis/nipraxis-data/0.5/24719.f3_beh_CHYM.csv' to '/home/runner/.cache/nipraxis/0.5'.
'/home/runner/.cache/nipraxis/0.5/24719.f3_beh_CHYM.csv'
We got the data using the Pandas library:
# Get the Pandas module, rename as "pd"
import pandas as pd
# Read the data file into a data frame.
data = pd.read_csv(stim_fname)
# Show the result
data
response | response_time | trial_ISI | trial_shape | |
---|---|---|---|---|
0 | NaN | 0 | 2000 | red_star |
1 | NaN | 0 | 1000 | red_circle |
2 | NaN | 0 | 2500 | green_triangle |
3 | NaN | 0 | 1500 | yellow_square |
4 | NaN | 0 | 1500 | blue_circle |
... | ... | ... | ... | ... |
315 | space | 294 | 1000 | red_square |
316 | NaN | 0 | 2500 | green_circle |
317 | NaN | 0 | 1000 | green_star |
318 | space | 471 | 1000 | red_circle |
319 | NaN | 0 | 1000 | blue_circle |
320 rows × 4 columns
There is one row for each trial. The columns we are interested in are:
response_time
— the reaction time for their response (milliseconds after the stimulus, 0 if no response)trial_ISI
— the time between the previous stimulus and this one (the Interstimulus Interval). For the first stimulus this is the time from the start of the experimental software.
response_times = np.array(data['response_time'])
trial_isis = np.array(data['trial_ISI'])
We then calculated the onset times of each trial relative to the start of the scanning run. The scanning run started 4000 milliseconds before the experimental software.
exp_onsets = np.cumsum(trial_isis)
scanner_onsets = exp_onsets + 4000
scanner_onsets[:15]
array([ 6000, 7000, 9500, 11000, 12500, 14500, 17000, 18500, 20500,
21500, 22500, 24000, 25500, 27500, 29000])
We then wanted to calculate the onset times of each response, relative to the scanner start. The response times for each trial are relative to the start of the trial, so we can add the response
# Same result from adding the two arrays with the same shape.
scanner_response_onsets = scanner_onsets + response_times
scanner_response_onsets[:15]
array([ 6000, 7000, 9500, 11000, 12500, 14500, 17000, 18500, 20500,
21500, 22927, 24000, 25500, 27869, 29000])
Boolean arrays#
As you remember, many of the response time values are 0 indicating no response:
first_15_rts = response_times[:15]
first_15_rts
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 427, 0, 0,
369, 0])
We would like to select the response onsets corresponding to not 0
response_times
.
We can use Boolean arrays to do this.
This is just a taster of selecting with Boolean arrays. See Boolean indexing for more.
Boolean arrays are arrays that contain values that are one of the two Boolean
values True
or False
.
Remember Boolean values, and Comparison operators from Brisk introduction to Python. We can be use comparison operators on arrays, to create Boolean arrays.
Let’s start by looking at the first 15 reaction times:
first_15_rts
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 427, 0, 0,
369, 0])
Remember that comparisons are operators that give answers to a comparison question. This is how comparisons work on individual values:
first_15_rts[0] > 0
False
What do you think will happen if we do the comparison on the whole array, like this?
first_15_rts > 0
You have seen how Numpy works when adding a single number to an array — it takes this to mean that you want to add that number to every element in the array.
Comparisons work the same way:
first_15_rts_not_zero = first_15_rts > 0
first_15_rts_not_zero
array([False, False, False, False, False, False, False, False, False,
False, True, False, False, True, False])
This is the result of asking the comparison question > 0
of every element in
the array.
So the values that end up in the first_15_rts_not_zero
array come from these
comparisons:
print('Position 0:', first_15_rts[0] > 0)
print('Position 1:', first_15_rts[1] > 0)
print(' ... and so on, up to ...')
print('Position 13:', first_15_rts[13] > 0)
print('Position 14:', first_15_rts[14] > 0)
Position 0: False
Position 1: False
... and so on, up to ...
Position 13: True
Position 14: False
Here is the equivalent array for all the reaction times:
rts_not_zero = response_times > 0
# Show the first 50 values.
rts_not_zero[:50]
array([False, False, False, False, False, False, False, False, False,
False, True, False, False, True, False, True, False, True,
False, False, False, False, False, True, True, True, False,
False, False, True, False, True, False, False, False, False,
False, False, True, False, False, False, False, True, True,
True, False, False, True, False])
We will soon see that we can use these arrays to select elements from other arrays.
Specifically, if we put a Boolean array like rts_not_zero
between square
brackets for another array, that will have the effect of selecting the elements
at positions where rts_not_zero
has True, and throwing away elements where
rts_not_zero
has False.
For example, rushing ahead, we can select the values in rt_arr
corresponding
to reaction times greater than zero with:
response_times[rts_not_zero]
array([427, 369, 337, 308, 375, 478, 300, 321, 306, 370, 372, 372, 342,
382, 318, 371, 450, 442, 452, 381, 351, 394, 353, 380, 387, 341,
366, 432, 415, 406, 466, 364, 462, 477, 374, 384, 375, 432, 369,
354, 455, 338, 364, 376, 345, 299, 308, 381, 375, 438, 325, 371,
377, 314, 297, 349, 607, 391, 334, 385, 386, 369, 394, 361, 480,
375, 368, 325, 410, 328, 412, 379, 383, 376, 351, 361, 403, 411,
447, 590, 328, 376, 410, 433, 370, 353, 353, 386, 417, 315, 298,
370, 365, 439, 391, 351, 464, 357, 328, 294, 471])