user934 Published in April 20, 2018, 10:38 am

I have a .wav audio file in HDFS. If that file is stored on local file system, I am able to read it using librosa library with librosa.load. But I am not able to use the same function if file is in HDFS.

I tried using HdfsCLI library to read the .wav file in following way,

import struct
import numpy as np
from hdfs import InsecureClient
client = InsecureClient(host_port, user)

with client.read(filepath_in_hdfs) as f:
    meta = struct.unpack('<iHHIIHH', f.read(20))
    audio = np.frombuffer(f.read(), dtype=np.uint8)
print audio

This gives me output as:

[1, 0, 1,...., 247, 27, 248]

When I use librosa.load for same file on local, the output is:

[0, 0, 0,....., 0, 0, 0]

Are both the outputs correct? If not, is there any other way in which I can load such files in same way as librosa is loading?

