Home Scrapy dynamic CSV pipeline not reading object
Reply: 0

Scrapy dynamic CSV pipeline not reading object

user1833
1#
user1833 Published in May 22, 2018, 4:36 am

So i'm taking data from a csv, running scrapy to find some data, then add that data as a few last fields on an otherwise similiar csv (only difference is a cleaned up header title. The thing is, i will run with different csvs, containing different data, so I need the pipeline to be dynamic so I don't have to create a new scraper for every csv.

So I got it all working, down to the pipeline. I like the pipeline because i can compare for duplicates and all prior to writing. I open and read the same csv, modify the headers exactly as i do with the spider, and all, but for some reason when I go to populate the data in the row that is to be written in process_item() it doesn't find the value. I tired many iterations and I cannot seem to figure this out.

class CSVWriterPipeline(object):

    headers = []
    with open(csv_input_location) as csv_input:
        reader = csv.reader(csv_input, delimiter=",")
        headers = next(reader)
        headers = [header.lower().strip().replace(' ', '_') for header in headers]
        headers.append('found_item')

    def __init__(self):
        self.csvwriter = csv.writer(open('items.csv', 'w', newline=''))

    def open_spider(self, spider):
        #writes the header
        self.csvwriter.writerow(self.headers)

    def process_item(self, item, spider):
        new_row = [new_row.append(item._values[field]) for field in self.headers] #doesn't find the value from the item
        self.csvwriter.writerow(new_row)
        return item

However if I write the new_row list like so: new_row = [item._values['header_title1'], item.values['header_title2'], item._values['found_item'] it works. I dont want to have to change it with every csv. Any help?

Thank you in advance

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.321444 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO