Home Having trouble while fetching data from PYTHON script using AWS Athena-Boto3
Reply: 0

Having trouble while fetching data from PYTHON script using AWS Athena-Boto3

user3336 Published in April 19, 2018, 9:53 am

I am trying to query the dataset present in s3 bucket, using Athena query via python script with help of boto3 functions.

I am using start_query_execution() to run my query. this is being executed perfectly, next to get results in my python script, so that I get access to the result of the query I am using the function get_query_results().

Now if I run these two function separately(one script after another) I get the data which is an output of Athena query. I want them to be written in a single script - something like, fetch data from s3 and start manipulating the output of query using python code.

Since the query is asyn in nature, i am using pool technique, where it waits till the Athena query is executed. But if i run the below codethe, the status show is running for the query.

I think I am doing some silly mistake as if I run them separately I get desired output. In short, I want to query the data present in s3 using Athena, then do some processing on this fetched data in python script, hence this approach. Please help

Here is the sample code

#!/usr/bin/env python3

import boto3
import time
from functools import partial
from multiprocessing.dummy import Pool
pool = Pool(processes=1)

# def async_function(name):
#     time.sleep(1)
#     return name
# def callback_function(name, age):
#     print(name, age)

def run_query(query, database, s3_output):
    client = boto3.client('athena')
    response = client.start_query_execution(
            'Database': database
            'OutputLocation': s3_output,
    print('Execution ID: ' + response['QueryExecutionId'])
    return response
def show_res(res, q):
    client = boto3.client('athena')
    print("Executing query: %s" % (q))
    print('Execution ID: ' + res['QueryExecutionId'])
    # response = client.stop_query_execution(
    #     QueryExecutionId=res['QueryExecutionId']
    # )
    response = client.get_query_results(
        # QueryExecutionId='f3642735-d9d9-4246-ade4-7453eaed0717'
    print("Executing query: %s" % (q))
    print('Execution ID: ' + res['QueryExecutionId'])
    print('rRespone:'.join(str(x) for x in response['ResultSet']['Rows']));
    return response

# for age, name in enumerate(['jack', 'jill', 'james']):
#     new_callback_function = partial(callback_function, age=age)
#     pool.apply_async(
#         async_function,
#         args=[name],
#         callback=new_callback_function
#     )

#Athena configuration
s3_input = 's3://dummy/'
s3_ouput = 's3://dummy/results/'
database = 'dummy'
table = 'dummy'

#Query definitions
query_1 = "SELECT * FROM %s.%s where sex = 'F';" % (database, table)
query_2 = "SELECT * FROM %s.%s where age > 30;" % (database, table)
#Execute all queries
queries = [ query_1 ]
for q in queries:
    print("Executing query: %s" % (q))
    new_callback_function = partial(show_res, q=q)
        args=[q, database, s3_ouput],

You need to login account before you can post.

About| Privacy statement| Terms of Service| Advertising| Contact us| Help| Sitemap|
Processed in 0.509432 second(s) , Gzip On .

© 2016 Powered by mzan.com design MATCHINFO