Monday, 12 February 2018

How to sheet properly

Does your app interact with Google Sheets?  Consider offloading work to Google Script code deployed as a web app instead of using the Google Sheets API.

Memory Usage

I have a Python Google App Engine (GAE) application which writes a single row of ~10 columns to a Google Spreadsheet. 

Sheet(y) API

Initially, I used the Sheets API which would normally involve making a JSON file with credentials to be read by your app to authorize it's writing of values to your sheet.  In GAE, one can simplify to using  oauth2client.client.GoogleCredentials.get_application_default()  ,
 but one still relies on  googleapiclient.discovery to build a "service" and use it to handle sheet interaction.  This solution would often cause my GAE instance to crash from using over the allocated 128MB RAM.  I made use of google.appengine.api.runtime.runtime.memory_usage() to profile memory usage (more detailed tools used for Python memory profiling use too much memory).   It turns out using the Google Sheet API with the Python client libraries used ~80MB of RAM!!  How can that be?!  Isn't it just making a web request?   I used the official Quickstart as a guide.  Here's what my code was like:

import httplib2
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

SCOPES = 'https://www.googleapis.com/auth/spreadsheets'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Google Sheets API Python Quickstart'
GOOGLE_SHEET='asdf2134qwer7890'


def get_credentials():
    credentials = GoogleCredentials.get_application_default()
    return credentials.create_scoped(SCOPES)


def write_data_to_sheet():
    """ Gathers data from a DB or wherever and makes a row to write to the sheet
    """
    credentials = get_credentials()
    http = credentials.authorize(httplib2.Http())
    discoveryUrl = ('https://sheets.googleapis.com/$discovery/rest?'
                    'version=v4')
    service = discovery.build('sheets', 'v4', http=http,
                              discoveryServiceUrl=discoveryUrl,
                              cache_discovery=False)

    values = [ ['some', 'set', 'of', 'values', 'to', 'write', 'to', 'a', 'sheet' ] ]
    service.spreadsheets().values().append(
        spreadsheetId=GOOGLE_SHEET, range='MySheet!A:I',
        valueInputOption='RAW', body={'values': values}, insertDataOption='INSERT_ROWS').execute()

Srsly, 80MB.  for <30 lines of code.

Sheet Script

Ever use a Google Form to input rows into a sheet?  Turns out, that's just a Web App script built into your sheet which handles POST and GET web requests.  What's more is we can write our own handlers for those request methods.  Martin Hawskey wrote a gist which I've slightly modified (supports specifying which sheet row is your header row) and forked here.   

To use this in-sheet web service with my Python GAE app, i just needed to use the `requests` module to send a simple POST to my sheet web app URL.  I re-profiled memory usage to find this method uses ~800kB of memory - that's 1% of what the Sheet API used!   Even this 800k went down to 0 since the app stayed running and was likely due to importing, say, the requests module which hadn't been used yet.  


Speed


Another perk with this new solution was speed.  Using the Sheets API toom 3-5s for the whole data-gathering and writing process to complete.  After switching to using the Google Script web app built into my sheet along with the Python `requests` module, that time dropped to ~250ms.  

The only downside I can see is that a sheet web app web request is public, so anyone could theoretically post rows to your sheet using the cryptic web app URL.  if this security hole is too big for you, then you may need to buy some more RAM.