Where does Python look for modules?#

See:

Let’s say we have written a Python module and saved it as a_module.py, in a directory called code.

We would normally do this with a text editor, but, for illustration, here we write out the module file using the Jupyter / IPython %%file magic command:

%%file code/a_module.py
""" This is a_module
"""

def a_func():
    return 99

print('Finished importing a_module.py')
Writing code/a_module.py

We also have a script called a_script.py in a directory called scripts:

%%file scripts/a_script.py
""" This is a_script
"""

import a_module

print('Result of a_func is:', a_module.a_func())
Writing scripts/a_script.py

At the moment, a_script.py will fail with:

run scripts/a_script.py
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
File ~/work/textbook/textbook/scripts/a_script.py:4
      1 """ This is a_script
      2 """
----> 4 import a_module
      6 print('Result of a_func is:', a_module.a_func())

ModuleNotFoundError: No module named 'a_module'

Above we ran the script within the Python process of the notebook, but we can also run the script in the terminal. Here we are using the %%bash command at the top of the cell to run a terminal on Linux or Mac. This may not work on Windows.

Notice that running the script this way gives the same error, for the same reason:

%%bash
python3 scripts/a_script.py
Traceback (most recent call last):
  File "/home/runner/work/textbook/textbook/scripts/a_script.py", line 4, in <module>
    import a_module
ModuleNotFoundError: No module named 'a_module'
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
Cell In [4], line 1
----> 1 get_ipython().run_cell_magic('bash', '', 'python3 scripts/a_script.py\n')

File /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/IPython/core/interactiveshell.py:2362, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2360 with self.builtin_trap:
   2361     args = (magic_arg_s, cell)
-> 2362     result = fn(*args, **kwargs)
   2363 return result

File /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/IPython/core/magics/script.py:153, in ScriptMagics._make_script_magic.<locals>.named_script_magic(line, cell)
    151 else:
    152     line = script
--> 153 return self.shebang(line, cell)

File /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/IPython/core/magics/script.py:305, in ScriptMagics.shebang(self, line, cell)
    300 if args.raise_error and p.returncode != 0:
    301     # If we get here and p.returncode is still None, we must have
    302     # killed it but not yet seen its return code. We don't wait for it,
    303     # in case it's stuck in uninterruptible sleep. -9 = SIGKILL
    304     rc = p.returncode or -9
--> 305     raise CalledProcessError(rc, cell)

CalledProcessError: Command 'b'python3 scripts/a_script.py\n'' returned non-zero exit status 1.

When Python hits the line import a_module, it tries to find a package or a module called a_module. A package is a directory containing modules, but we will only consider modules for now. A module is a file with a matching extension, such as .py. So, Python is looking for a file a_module.py, and not finding it.

We will see the same effect at the interactive Python console, or in Jupyter or IPython:

import a_module
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In [5], line 1
----> 1 import a_module

ModuleNotFoundError: No module named 'a_module'

Python looks for modules in ‘sys.path’#

Python has a simple algorithm for finding a module with a given name, such as a_module. It looks for a file called a_module.py in the directories listed in the variable sys.path.

import sys

# Show sys.path
sys.path
['/home/runner/work/textbook/textbook',
 '/opt/hostedtoolcache/Python/3.9.14/x64/lib/python39.zip',
 '/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9',
 '/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/lib-dynload',
 '',
 '/opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages']

The a_module.py file is in the code directory, and this directory is not in the sys.path list.

sys.path is just a Python list, like any other:

type(sys.path)
list

That means we can make the import work in our notebook, by appending the code directory to the sys.path list:

sys.path.append('code')

# Now the import will work
import a_module
Finished importing a_module.py

There are various ways of making sure a directory is always on the Python sys.path list when you run Python, including.

One of them is making the module part of an installable package, and install it — see: making a Python package — but we don’t cover that here.

Now we have imported the module into this Python process, the import will work correctly in the script, executed within this Python process:

run scripts/a_script.py
Result of a_func is: 99

However, if we run the script in its own new terminal, we still get the error, because we aren’t using the notebook Python process, and we therefore haven’t successfully imported a_module.py:

%%bash
python3 scripts/a_script.py
Traceback (most recent call last):
  File "/home/runner/work/textbook/textbook/scripts/a_script.py", line 4, in <module>
    import a_module
ModuleNotFoundError: No module named 'a_module'
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
Cell In [10], line 1
----> 1 get_ipython().run_cell_magic('bash', '', 'python3 scripts/a_script.py\n')

File /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/IPython/core/interactiveshell.py:2362, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2360 with self.builtin_trap:
   2361     args = (magic_arg_s, cell)
-> 2362     result = fn(*args, **kwargs)
   2363 return result

File /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/IPython/core/magics/script.py:153, in ScriptMagics._make_script_magic.<locals>.named_script_magic(line, cell)
    151 else:
    152     line = script
--> 153 return self.shebang(line, cell)

File /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/IPython/core/magics/script.py:305, in ScriptMagics.shebang(self, line, cell)
    300 if args.raise_error and p.returncode != 0:
    301     # If we get here and p.returncode is still None, we must have
    302     # killed it but not yet seen its return code. We don't wait for it,
    303     # in case it's stuck in uninterruptible sleep. -9 = SIGKILL
    304     rc = p.returncode or -9
--> 305     raise CalledProcessError(rc, cell)

CalledProcessError: Command 'b'python3 scripts/a_script.py\n'' returned non-zero exit status 1.

As a crude solution to the problem above, you can do what we’ve done here, and put the directory containing the module into the Python sys.path list, at the top of the files that need it:

%%file scripts/a_script.py
""" This is a_script

We've made sure a_module is on the Python path this time.
"""

import sys
sys.path.append('code')

import a_module

print('Result of a_func is:', a_module.a_func())
Overwriting scripts/a_script.py

Then:

%%bash
python3 scripts/a_script.py
Finished importing a_module.py
Result of a_func is: 99

The simple append above will only work when running the script from a directory containing the code subdirectory. For example, here we are running a few commands in the terminal, to show that the script fails if we run it from another directory:

%%bash
mkdir another_dir
cd another_dir
# Run the script, but from the new directory.
python3 ../scripts/a_script.py
Traceback (most recent call last):
  File "/home/runner/work/textbook/textbook/another_dir/../scripts/a_script.py", line 9, in <module>
    import a_module
ModuleNotFoundError: No module named 'a_module'
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
Cell In [13], line 1
----> 1 get_ipython().run_cell_magic('bash', '', 'mkdir another_dir\ncd another_dir\n# Run the script, but from the new directory.\npython3 ../scripts/a_script.py\n')

File /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/IPython/core/interactiveshell.py:2362, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2360 with self.builtin_trap:
   2361     args = (magic_arg_s, cell)
-> 2362     result = fn(*args, **kwargs)
   2363 return result

File /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/IPython/core/magics/script.py:153, in ScriptMagics._make_script_magic.<locals>.named_script_magic(line, cell)
    151 else:
    152     line = script
--> 153 return self.shebang(line, cell)

File /opt/hostedtoolcache/Python/3.9.14/x64/lib/python3.9/site-packages/IPython/core/magics/script.py:305, in ScriptMagics.shebang(self, line, cell)
    300 if args.raise_error and p.returncode != 0:
    301     # If we get here and p.returncode is still None, we must have
    302     # killed it but not yet seen its return code. We don't wait for it,
    303     # in case it's stuck in uninterruptible sleep. -9 = SIGKILL
    304     rc = p.returncode or -9
--> 305     raise CalledProcessError(rc, cell)

CalledProcessError: Command 'b'mkdir another_dir\ncd another_dir\n# Run the script, but from the new directory.\npython3 ../scripts/a_script.py\n'' returned non-zero exit status 1.

This is because the directory code that we specified is a relative path, and therefore Python looks for the code directory in the current working directory.

To make the hack work when running the code from any directory, you could use some path manipulation on the file variable:

%%file scripts/a_script.py
""" This is a_script

Another more general way of making sure the code directory is on the Python
path.
"""

from pathlib import Path

# Directory containing this script.
MY_DIRECTORY = Path(__file__).parent
# Code directory is in the directory above the one containing the script.
CODE_DIRECTORY = MY_DIRECTORY / '..' / 'code'
print('code directory is', str(CODE_DIRECTORY))

# Put this directory on the path.
# sys.path expects strings, not Path objects.
import sys
sys.path.append(str(CODE_DIRECTORY))

import a_module

print('Result of a_func is:', a_module.a_func())
Overwriting scripts/a_script.py

Now the module import does work from this directory, or from another_dir

%%bash
# Running from this directory
python3 scripts/a_script.py
code directory is /home/runner/work/textbook/textbook/scripts/../code
Finished importing a_module.py
Result of a_func is: 99
%%bash
# From another_directory
cd another_dir
python3 ../scripts/a_script.py
code directory is /home/runner/work/textbook/textbook/another_dir/../scripts/../code
Finished importing a_module.py
Result of a_func is: 99