Boolean arrays#

import numpy as np

Remember the problem of the onsets and reaction times.

We had the task of calculating the onset times of trials, given a file of trial inter-stimulus intervals, and response times.

import nipraxis

# Fetch the file.
stim_fname = nipraxis.fetch_file('24719.f3_beh_CHYM.csv')
# Show the filename.
stim_fname
Downloading file '24719.f3_beh_CHYM.csv' from 'https://raw.githubusercontent.com/nipraxis/nipraxis-data/0.5/24719.f3_beh_CHYM.csv' to '/home/runner/.cache/nipraxis/0.5'.
'/home/runner/.cache/nipraxis/0.5/24719.f3_beh_CHYM.csv'

We got the data using the Pandas library:

# Get the Pandas module, rename as "pd"
import pandas as pd

# Read the data file into a data frame.
data = pd.read_csv(stim_fname)
# Show the result
data
response response_time trial_ISI trial_shape
0 NaN 0 2000 red_star
1 NaN 0 1000 red_circle
2 NaN 0 2500 green_triangle
3 NaN 0 1500 yellow_square
4 NaN 0 1500 blue_circle
... ... ... ... ...
315 space 294 1000 red_square
316 NaN 0 2500 green_circle
317 NaN 0 1000 green_star
318 space 471 1000 red_circle
319 NaN 0 1000 blue_circle

320 rows × 4 columns

There is one row for each trial. The columns we are interested in are:

  • response_time — the reaction time for their response (milliseconds after the stimulus, 0 if no response)

  • trial_ISI — the time between the previous stimulus and this one (the Interstimulus Interval). For the first stimulus this is the time from the start of the experimental software.

response_times = np.array(data['response_time'])
trial_isis = np.array(data['trial_ISI'])

We then calculated the onset times of each trial relative to the start of the scanning run. The scanning run started 4000 milliseconds before the experimental software.

exp_onsets = np.cumsum(trial_isis)
scanner_onsets = exp_onsets + 4000
scanner_onsets[:15]
array([ 6000,  7000,  9500, 11000, 12500, 14500, 17000, 18500, 20500,
       21500, 22500, 24000, 25500, 27500, 29000])

We then wanted to calculate the onset times of each response, relative to the scanner start. The response times for each trial are relative to the start of the trial, so we can add the response

# Same result from adding the two arrays with the same shape.
scanner_response_onsets = scanner_onsets + response_times
scanner_response_onsets[:15]
array([ 6000,  7000,  9500, 11000, 12500, 14500, 17000, 18500, 20500,
       21500, 22927, 24000, 25500, 27869, 29000])

Boolean arrays#

As you remember, many of the response time values are 0 indicating no response:

first_15_rts = response_times[:15]
first_15_rts
array([  0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 427,   0,   0,
       369,   0])

We would like to select the response onsets corresponding to not 0 response_times.

We can use Boolean arrays to do this.

This is just a taster of selecting with Boolean arrays. See Boolean indexing for more.

Boolean arrays are arrays that contain values that are one of the two Boolean values True or False.

Remember Boolean values, and Comparison operators from Brisk introduction to Python. We can be use comparison operators on arrays, to create Boolean arrays.

Let’s start by looking at the first 15 reaction times:

first_15_rts
array([  0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 427,   0,   0,
       369,   0])

Remember that comparisons are operators that give answers to a comparison question. This is how comparisons work on individual values:

first_15_rts[0] > 0
False

What do you think will happen if we do the comparison on the whole array, like this?

first_15_rts > 0

You have seen how Numpy works when adding a single number to an array — it takes this to mean that you want to add that number to every element in the array.

Comparisons work the same way:

first_15_rts_not_zero = first_15_rts > 0
first_15_rts_not_zero
array([False, False, False, False, False, False, False, False, False,
       False,  True, False, False,  True, False])

This is the result of asking the comparison question > 0 of every element in the array.

So the values that end up in the first_15_rts_not_zero array come from these comparisons:

print('Position 0:', first_15_rts[0] > 0)
print('Position 1:', first_15_rts[1] > 0)
print(' ... and so on, up to ...')
print('Position 13:', first_15_rts[13] > 0)
print('Position 14:', first_15_rts[14] > 0)
Position 0: False
Position 1: False
 ... and so on, up to ...
Position 13: True
Position 14: False

Here is the equivalent array for all the reaction times:

rts_not_zero = response_times > 0
# Show the first 50 values.
rts_not_zero[:50]
array([False, False, False, False, False, False, False, False, False,
       False,  True, False, False,  True, False,  True, False,  True,
       False, False, False, False, False,  True,  True,  True, False,
       False, False,  True, False,  True, False, False, False, False,
       False, False,  True, False, False, False, False,  True,  True,
        True, False, False,  True, False])

We will soon see that we can use these arrays to select elements from other arrays.

Specifically, if we put a Boolean array like rts_not_zero between square brackets for another array, that will have the effect of selecting the elements at positions where rts_not_zero has True, and throwing away elements where rts_not_zero has False.

For example, rushing ahead, we can select the values in rt_arr corresponding to reaction times greater than zero with:

response_times[rts_not_zero]
array([427, 369, 337, 308, 375, 478, 300, 321, 306, 370, 372, 372, 342,
       382, 318, 371, 450, 442, 452, 381, 351, 394, 353, 380, 387, 341,
       366, 432, 415, 406, 466, 364, 462, 477, 374, 384, 375, 432, 369,
       354, 455, 338, 364, 376, 345, 299, 308, 381, 375, 438, 325, 371,
       377, 314, 297, 349, 607, 391, 334, 385, 386, 369, 394, 361, 480,
       375, 368, 325, 410, 328, 412, 379, 383, 376, 351, 361, 403, 411,
       447, 590, 328, 376, 410, 433, 370, 353, 353, 386, 417, 315, 298,
       370, 365, 439, 391, 351, 464, 357, 328, 294, 471])