Home Python read/write/seek operations under the hood
Reply: 2

Python read/write/seek operations under the hood

Tals Published in 2017-12-07 15:08:29Z

While creating a character device on a Linux system, I interacted with it using Python and its basic file operations.

After experiencing several crashes, I started printing debug messages and noticed a strange behavior: it seems that Python "optimizes" file operations in a weird sense.

Let's peek at an example; here's a basic code and output of an interaction:

Kernel module

// Several includes and kernel module initialization

static ssize_t dev_read(struct file *filep, char *buffer, size_t len, long long *offset){
    printk(KERN_INFO "[DEBUGGER] - dev_read with len: %d, offset: 0x%llx.\n", len, offset[0]);
    return len;

static ssize_t dev_write(struct file *filep, const char *buffer, size_t len, long long *offset){
    printk(KERN_INFO "[DEBUGGER] - dev_write with len: %d, offset: 0x%llx.\n", len, offset[0]);
    return len;

static long long dev_llseek(struct file *filep, long long offset, int orig){
    printk(KERN_INFO "[DEBUGGER] - dev_llseek with offset: 0x%llx, orig: %d\n", offset, orig);
    return offset;

static int dev_release(struct inode *inodep, struct file *filep){
    return 0; // Success

static int dev_open(struct inode *inodep, struct file *filep){
    return 0; // Success

static struct file_operations fops =
   .open = dev_open,
   .read = dev_read,
   .write = dev_write,
   .release = dev_release,
   .llseek = dev_llseek,

int init_module(void){
   // Code to create character device
   return 0;

void cleanup_module(void){
   // Code to delete character device


with open("/dev/chardevice", "r+b") as character:
   character.write("\xff" * 4)


# seek(1)
[DEBUGGER] - dev_llseek with offset: 0x0, orig: 0
[DEBUGGER] - dev_read with len: 1, offset: 0x0.
[DEBUGGER] - dev_llseek with offset: 0x1, orig: 0
# read(4)
[DEBUGGER] - dev_read with len: 4, offset: 0x0.
# seek(0x7f123456)
[DEBUGGER] - dev_llseek with offset: 0x7f123000, orig: 0
[DEBUGGER] - dev_read with len: 1110, offset: 0x0.
# read(20)
[DEBUGGER] - dev_read with len: 4096, offset: 0x0.
# write("\xff" * 4)
[DEBUGGER] - dev_write with len: 4, offset: 0x0.

It is clear that the basic file operations do not translate directly to the same operations on file, with the clearest examples being that seek to 0x7f123000 instead of 0x7f123456 and the read of 4096 bytes while only a read on 20 bytes was requested.

This raises the following questions:

  • Why is this a feature?
  • What optimization does it achieve, as most of it doesn't look like a good "next-operation" prediction?
  • Is it documented anywhere, to know what to expect when programming a read/write functionality beforehand?
  • Besides pure interest about this area, I still wish to use Python for easier access - so is there any way to disable this optimization, and force Python to behave like a C code executing these operations?


Scott Mermelstein
Scott Mermelstein Reply to 2017-12-07 18:47:44Z

Python's file objects are actually wrappers around FILE* object (in C language), so they are buffered streams. Because of buffering, Python's operations with file don't translate them into the syscalls with the same parameters, but attempt to optimize request time (both for current and future operations).

Method open() accepts buffering parameter as 3d argument. Passing 0 should disable the buffering, so python will translate all file's requests directly to the underlying system:

open("/dev/chardevice", "r+b", 0)
progmatico Reply to 2017-12-07 15:22:01Z

I am not sure that's the case here but I think this has to do with the time penalty of reading a byte being the same as reading a whole sector, so why not read always whole sectors from disk (or maybe you can't even ask to read less bytes than the sector size to underlying system)

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.346089 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO