Monday, 12 February 2018

How to sheet properly

Does your app interact with Google Sheets?  Consider offloading work to Google Script code deployed as a web app instead of using the Google Sheets API.

Memory Usage

I have a Python Google App Engine (GAE) application which writes a single row of ~10 columns to a Google Spreadsheet. 

Sheet(y) API

Initially, I used the Sheets API which would normally involve making a JSON file with credentials to be read by your app to authorize it's writing of values to your sheet.  In GAE, one can simplify to using  oauth2client.client.GoogleCredentials.get_application_default()  ,
 but one still relies on  googleapiclient.discovery to build a "service" and use it to handle sheet interaction.  This solution would often cause my GAE instance to crash from using over the allocated 128MB RAM.  I made use of google.appengine.api.runtime.runtime.memory_usage() to profile memory usage (more detailed tools used for Python memory profiling use too much memory).   It turns out using the Google Sheet API with the Python client libraries used ~80MB of RAM!!  How can that be?!  Isn't it just making a web request?   I used the official Quickstart as a guide.  Here's what my code was like:

import httplib2
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

SCOPES = 'https://www.googleapis.com/auth/spreadsheets'
CLIENT_SECRET_FILE = 'client_secret.json'
APPLICATION_NAME = 'Google Sheets API Python Quickstart'
GOOGLE_SHEET='asdf2134qwer7890'


def get_credentials():
    credentials = GoogleCredentials.get_application_default()
    return credentials.create_scoped(SCOPES)


def write_data_to_sheet():
    """ Gathers data from a DB or wherever and makes a row to write to the sheet
    """
    credentials = get_credentials()
    http = credentials.authorize(httplib2.Http())
    discoveryUrl = ('https://sheets.googleapis.com/$discovery/rest?'
                    'version=v4')
    service = discovery.build('sheets', 'v4', http=http,
                              discoveryServiceUrl=discoveryUrl,
                              cache_discovery=False)

    values = [ ['some', 'set', 'of', 'values', 'to', 'write', 'to', 'a', 'sheet' ] ]
    service.spreadsheets().values().append(
        spreadsheetId=GOOGLE_SHEET, range='MySheet!A:I',
        valueInputOption='RAW', body={'values': values}, insertDataOption='INSERT_ROWS').execute()

Srsly, 80MB.  for <30 lines of code.

Sheet Script

Ever use a Google Form to input rows into a sheet?  Turns out, that's just a Web App script built into your sheet which handles POST and GET web requests.  What's more is we can write our own handlers for those request methods.  Martin Hawskey wrote a gist which I've slightly modified (supports specifying which sheet row is your header row) and forked here.   

To use this in-sheet web service with my Python GAE app, i just needed to use the `requests` module to send a simple POST to my sheet web app URL.  I re-profiled memory usage to find this method uses ~800kB of memory - that's 1% of what the Sheet API used!   Even this 800k went down to 0 since the app stayed running and was likely due to importing, say, the requests module which hadn't been used yet.  


Speed


Another perk with this new solution was speed.  Using the Sheets API toom 3-5s for the whole data-gathering and writing process to complete.  After switching to using the Google Script web app built into my sheet along with the Python `requests` module, that time dropped to ~250ms.  

The only downside I can see is that a sheet web app web request is public, so anyone could theoretically post rows to your sheet using the cryptic web app URL.  if this security hole is too big for you, then you may need to buy some more RAM.  

Thursday, 28 September 2017

Building is so GAE

Summary

Learn to use Google Container Builder to deploy to Google App Engine instead.   It can use the same triggers for when to build, but then uses Cloud Builders to execute `gcloud` commands to deploy.


cloudbuild.yaml

The file contents are painfully simple with only a single step:
steps:
# Deploy to GAE: https://cloud.google.com/appengine/docs/standard/python/tools/uploadinganapp
- name: 'gcr.io/cloud-builders/gcloud'
  args: ['app', 'deploy', '--project', '<YOUR PROJECT ID>', '--version', '<SOME VERSION>]

Put these contents into a file called "cloudbuild.yaml" in your project somewhere.  If you only have 1 "service" being provided, then the project root makes sense.  Otherwise, you can put a different cloudbuild.yaml file in each Service folder so that it will deploy different Versions depending  on what git repo branch triggers the build.

Lastly, set up a build trigger to kick off a build when code is pushed to your project's repo.

IAM Permissions

The catch though is to make sure Container Builder has permissions to modify your Google AppEngine App.

Go to the "IAM & admin" page on the Google Cloud Console and search for "cloudbuild" which will show you which service account is being used to with Container Builder.

Change the user's roles to include AppEngine Deployer and also AppEngine Service Admin.


Friday, 12 August 2016

Stacked Chips

Intel Edison is the size of a cracker with 2 CPUs, solid state storage and a gig of RAM.  They size up to 3.5 x 2.5 x 0.4 cm and use <1W.  Compared to an HP Proliant server at 68.20 x 44.80 x 4.32 cm, one could layer some Edisons 19 x 17 x 10  all sandwiched together in the same space making for 3230 units, or 6460 CPUs, 3.2 TB RAM and ~13TB of storage space, completed modularly and distributed.  Of course, a Poweredge would only use 500W while this setup would use over 3kW.  But imagine even 500 to get keep wattage in check: that's still 2TB storage, 1000 CPU cores and 500GB RAM.

Edison Beowulf  http://lcamtuf.coredump.cx/edison_fuzz/

To compare more fairly, a Xeon chip of a Proliant server would be about 53kMIPS, and at 615MIPS per Edison, that's 307kMIPS.  That's right: 6x the the processing power...and 4x the cost.

The glory here is that unlike other SOC's, the Edison isn't ARM, but x86, meaning you can run a lot more linux applications (or Docker containers?) than one could with ARM.  The performance seems to be a lot better, but that may be because of OS optimizations since x86 linux has been around and under development longer.  The idea here is: the modular hardware makes it easy to distribute storage and processing power, and replace bad $39 units as they burn out with minimal impact to the overall cluster.  Provided tasks and services can be written for a distributed system, it would be quite the cheap setup (and silent as it has no fans).


Friday, 13 July 2012

Chisel Server

I once saw a cheap ($170) new laser printer at Staples (HP LaserJet P1102w)which connected to a home wifi and shared itself with everyone in the house.   At first this reminded me of a former room mate, but then got me to thinking about building a print server from an old PC I had laying around.  Had I used the same hours mowing lawns, I'm sure I would have come out ahead.  However, to help others through the process (both the decision making process and the technical process), I'm documenting my journey here.

Cast of Character

The PC

  • Pentium 1, 170MHz
  • 64MB RAM
  • 3.2GB HDD

The Printer

Samsung ML-1740 - a black & white laser printer with lackluster Mac OS X support

The Scanner

HP SCSI ScanJet 5p - pretty sure it works...

The WiFi

Buffalo USB AirStation G54 (WLI2-USB2-G52).  As a backup, I also have a PCI WiFi card somewhere.

The Goals

  1. Install a basic (text-only?) linux-based OS
  2. Use CUPS to share printer over network
  3. Use SANE to share scanner over network
  4. Get USB WiFi working to get server into another room
Needless to say, this didn't play out into an efficient story.  Prior to starting I removed the sound card and modem (both ISA) from the PC and added in a PCI Network card.  Though installing from a CD, most linux distros play nicest when they have an internet connection during setup.  Ideally, I would have used the WiFi device for this, but USB WiFI networking has historically been a troublesome.

The OS

Ubuntu is my first choice only because I have experience with it and it's well documented.  I started by grabbing the latest 12.04 Server CD because:
  1. Latest = Greatest
  2. Server = text only
I was wrong on both counts.  Ubuntu dropped support for older CPUs like mine after 10.04 (trying to install 11.x or 12.x resulted in errors about missing instruction sets).  When I tried 10.04 Server, it took forever and sit for 10's of hours installing (hung at "Storing language...").  It turns out even though I'm creating a print "server", I shouldn't use the Server CD because it has a lot of other server packages I don't need.

In researching other options, I came up with this list:
I couldn't get this to install as it would hang on screens showing no progress bar for hours - sometimes over night.  It appears that this installer is minimal in that the CD has minimal data on it (and therefore needs an internet connection to download a LOT of data during installation) rather than simply installing a bare-bones minimal OS.  After about 8 tries at this I gave up (even using the manual/expert installer didn't help much)

DSL

Other users reported getting both CUPS and SANE working with Damn Small Linux.  I got it installed without any problems (has a GUI even!).  However, the deal breakers were:
  1. It wouldn't enable APT so there was no way for me to add the CUPS and SANE packages (perhaps DSL is stale?  It should use Debian repositories, but trying to enable APT resulted in 404 Not Found errors)
  2. It's not made to be installed to HDD, but rather run from a flash drive etc.  It recommends a normal Debian (or Bonsai) installation.  
Since it wasn't going smoothly and since I preferred to stay in charted territory (Ubuntu) I gave up on this option.

Network Print Server

I didn't end up trying this option because of these limits stated in the documentation:
  1. No USB printers - only Parallel Port (LPT) printers.  My printer could work if bought a parallel port cable, but that's $15 i could put towards the new HP.
  2. No WiFi support
  3. No way of adding SANE for scanner sharing

Alternate

After reading some Ubuntu Help, it became clear that the Alternate installation is more supported (and more highly recommended) than the Minimal installation.  The help also noted that the command line system installation is quite different than Ubuntu Server.  So, I grabbed the CD and went to town.  In the time it took me to write to this point, it's 71% done.  

...18 hours later..."Configuring apt"...downloading package 13 of 28....same screen for at least 12 hours...hard drive clicking away...HP network LaserJet soundin' pretty good...

...24 more hours later...still "Configuring apt".  Further reading suggests a networking bug sometimes will make this step hang, so I restarted installation sans networking (but with the NIC still installed).


The winning combination was to do an Expert Install with the network enabled.  In the problems above where it would stall on "Configuring apt", the solution was to hit Enter and thus select the "cancel" option.  It would then move on to the next item and possibly succeed.  Eventually the OS got installed.  In my case, I failed to set up a default user.  I corrected this by booting into recovery mode (holding Shift just after BIOS POST so GRUB shows some boot options) then dropping into a Root shell.  I could then make a user (useradd), add him to the "sudo" group, make sure /etc/sudoers looked right (use visudo) and then reboot to log in as my fresh new user.

The CUPS

To speed things up, I uncommented the "cdrom" lines in /etc/apt/sources.list and popped in the Alternate CD (don't forget to tell BIOS not to boot off the CD anymore, and also don't forget to mount it correctly: sudo mount /dev/cdrom /cdrom).  I then did a simple sudo apt-get install cups which installed flawlessly.  


I could use console commands to configure my printer, but I decided to get the CUPS web interface rolling first (No need to install Apache/httpd - CUPS serves its own pages).  Here are steps to make that happen:
  1. Add user to lpadmin group:  sudo usermod -aG lpadmin username
  2. If you want to use another computer to reach the web interface, then...
    1. sudo vi /etc/cups/cupsd.conf
    2. Comment out two "Listen" lines and add "Listen *:631
    3. In <Location /> section, add line "Allow 192.168.0.0/24" (or whatever network...)
    4. Do the same in <Location /admin > section
    5. Write file and quit :wq
    6. Restart CUPS daemon sudo /etc/init.d/cups restart
Now just go to the web interface in your browser (ie http://192.168.0.2:631), click Administration, Add Printer, click the one that makes sense, share it, and BAM!  Done!  In my case, the Samsung showed up out of the box - no drivers needed.  BTW, it showed up when plugged into the USB port - no parallel cable required!  I think this made for an easier printer detection.  However, from the CUPS web interface, I couldn't successfully print a test page - not done!

To try and get things to work [FAIL], I first installed the Samsung drivers:
  1. Download: wget http://downloadcenter.samsung.com/content/DR/200707/20070720165704156_UnifiedLinuxDriver.tar.gz
  2. Unzip: sudo tar xzf ./20070720165704156_UnifiedLinuxDriver.tar.gz 
  3. Install: sudo ./cdroot/autorun
That didn't help much.  So I followed this guide and ultimately got it to print the test page with the Splix drivers:
apt-get install splix


But network printing didn't work =(      Hours (days) later, i'm not sure what's wrong - pretty sure it's all in the cupsd.conf file, but I give up.  Just bought a used HP on Amazon for $90 with shipping.  

 The SANE

Didn't try...

The WiFi

Since the Buffalo doesn't have linux support out of the box, step one is to dive into these WiFi docs and hacks.