Week 42

Published on Author malmLeave a comment

Self-referential data visualisation IV: MongoDB and Flask

[avatar user=”malm” size=”small” align=”left” link=”file” /]

Over the last three weeks I’ve developed a simple Python-based data visualisation pipeline which collects some basic sizing stats about my 2015 blog posts by web-scraping them and then employs a range of approaches for generating multi-line graphs from this data. A combination of BeautifulSoup, pandas, nvd3 and bokeh among other libraries are used in the process.  This week I finish the job of constructing all the elements of the original inspiration for this work, the pipeline described by Kyran Dale at PyCon UK.  The missing elements remaining to be addressed from where we ended up last week are: i) MongoDB database for storing processed JSON, ii) Flask web app for serving the visualisations.  With these in place, the pipeline will comprise these six stages:

# 1. Get the web pages                = requests
# 2. Parse data into raw JSON         = BeautifulSoup
# 3. Examine and clean the JSON       = pandas
# 4. Save the cleaned JSON to db      = pymongo
# 5. Visualise the cleaned data in db = nvd3, bokeh etc
# 6. Serve the visualisation          = flask

pymongo

In order to get started quickly with MongoDB, I set up a free account with MongoDB-as-a-service provider mongolab.com and then configured a dbuser and dbpassword combination in order to create a database called importdigest.  This can all be done through the mongolab GUI:

mongolab

Next I created MongolabClient class that can connect to the importdigest database in the cloud in order to create a MongoDB collection called blog for holding our JSON data.  The code below shows the initial outline of this class along with methods to insert, bulk delete and read records from the collection.   The pymongo library can be acquired through a single pip install even on Windows:

from pymongo import MongoClient, InsertOne, DeleteOne, ReplaceOne 

class MongolabClient(object):
    def __init__(self):
        self.db = ''
        try:
            self.dbuser = readFromFile('.dbuser')
            self.dbpasswd = readFromFile('.dbpasswd')
            self.endpoint = readFromFile('.dbendpoint')
        except:
            print("** Missing credentials files! **")
            sys.exit()
        self.db = MongoClient(self.endpoint)
        scheme = 'SCRAM-SHA-1'
        auth = self.db.importdigest.authenticate(\
          self.dbuser,self.dbpasswd,mechanism=scheme)
        assert(auth)
        self.collection = self.db.importdigest.blog

    def __del__(self):
        if self.db:
            self.db.close()

    def insertCollection(self,df):
        recs = self.convertDataframeToRecords(df)
        self.collection.insert_many(recs)

    def getCollection(self):
        recs = []
        for rec in self.collection.find():
            recs.append(rec)
        return recs

    def deleteCollection(self):
        requests = []
        for i,rec in enumerate(self.collection.find()):
            wk = rec.get('week')
            requests.append(DeleteOne({'week':wk}))
        result = self.collection.bulk_write(requests)
        assert(self.collection.count() == 0)

Another method I developed called upsertCollection allows the existing collection records to be inserted and/or updated in place if they already exist.  At this point we have a full pipeline in place and we’re ready to try serving the data stored in the collection during processing.  I’ll use Flask to do that.  If you’re interested in looking at the full pipeline source, I have released a new source file called blog5.py which contains full working code for it all including the latest iteration of the MongoDB support outlined above.

Flask

Flask is a web microframework for Python that is a great starting point for setting up a simple no-frills web server which is what we need to serve up our generated visualisations.  Normally you’d do this by rendering templates using Jinja2 markup using CSS for styling with perhaps some fancy JavaScript on top.   The approach I used is much more basic and involves zero additional files beyond an app.py script along these lines:

from flask import Flask
import blog5

app = Flask(__name__,static_url_path='')

client = blog5.MongolabClient()
recs = client.getCollection(stripId=True)
df = client.convertRecordsToDataframe(recs)

@app.route('/')
def index():
    html = """
<!DOCTYPE HTML>
<BODY>
<h3>Visualisations</h3>
<ol>
<li>Dimple</li>
</ol>
</BODY>
</HTML>"""
    return html

@app.route('/dimple/')
def dimple():
    htmlfile = 'blog_dimple.html'
    title = 'malm.teqy.net blog stats dimple.js'
    blog5.dimpleVisualisation(df,htmlfile,width,height,title)
    with open(htmlfile,'r') as f:
        html = f.read()
        return html

if __name__ == '__main__':
    app.run(debug=True)

The web server instance is bought up using a simple command line invocation thus:

$ python app.py

At this point navigating to http://localhost:5000/ should yield an list of links to the generated visualisations.   The full app.py file I created is also available for inspection in case anyone finds it useful. However, before you do, it’s important to note the following serious limitations with the Flask web app implementation, as befits an MVP:

1. No styling/CSS
2. Improper approach to rendering in eschewing templates
3. No test code.  In a pro context, you want to be using py.test
4. Awkward dependency on visualisation code in blog5.py
5. Most importantly of all, it only works on my machine

Hopefully the code remains useful in outlining the overall structure and approach taken to this problem.  Note that my work to date only covers the “dev” part of the devops picture at present! Next week we’ll look at how to automatically deploy this setup to the cloud using Docker – the “ops” part of devops.

Devices and Manufacturers

  • UK smartphone startup WileyFox launched with a splash a few weeks ago.  They appear to have already secured distribution for their premium model, the dual-SIM 4G Storm, through Amazon Prime. The Storm packs 32GB, a Gorilla Glass display, MicroSD and is unlocked as well.   At a sub-£200 price point, it could well end up being a bit of a Christmas sleeper hit in the UK.

WileyFox

Scores

  • One has to take these sort of benchmarks with a pinch of salt. The graph above has an exaggerated scale and whether the average punter is able to really discern a meaningful difference is open to question.

Google and Android

What’s happening, in other words, is that even the smartphone – which, however you look at it, is a fantastically intricate device – is being commoditised, reduced to a low-margin product that is stamped out by the billion. This is the iron law of electronics manufacturing: there’s no money in hardware.

  • Blackberry certainly seem to be majoring on security with their forthcoming Priv smartphone.  Engadget outline the BB10 security features Blackberry are bringing to their first Android product which positions the device in the cipherphone space.

blackberry-venice-tinhte

  • The Verge review Android Marshmallow and suggest that Now on Tap is the signature feature in a platform that might be better named “Google OS” in future.  It’s a good summary of the direction that Google are taking Android which involves the exercise of ever greater control over the look and feel and key service features of the platform in order to remain competitive with iOS:

  • Meanwhile LifeHacker cover five of the “coolest Marshmallow features that Google didn’t announce:
    • Powerful new app backup tool (if developers use it)
    • You can manually add or rearrange quick settings tiles
    • Show your battery % and hide icons in the Status Bar
    • Swipe left on Lock Screen to open voice search not Dialler
    • Your SD Card Will (Finally!) Be Treated As Internal Storage

The Coolest Android Mashmallow Features That Google Didn't Announce

  • Charles Arthur outlines the evidence to support his claim that “50% of users do zero Google searches per day on mobile”  in spite of all the efforts by Google outlined above to push their service offering front of mind on Android.  These include the prominent positioning of a Google search bar widget on the home screen of Google-endorsed Android smartphones.  Arthur’s assessment of the daily search distribution is as follows:

Google searches on desktop

  • As for the reason why, it’s because mobile usage is centred on apps:

“Mobile search is a real problem for Google: people don’t do it nearly as much as you suspect it would like. But there’s no obvious way of changing that behaviour while users are so addicted to apps on their phones – and there’s no sign of that changing any time soon, no matter whether news organisations wish people would use mobile sites instead.”

  • Most worryingly of all for Google, the long term trend is to lower CPC (cost-per-click) which again raises a question-mark over the longer-term strategy for Android:

Google paid clicks, cost-per-click and product

Apps and Services

The fact that a company the size of Facebook can’t optimize energy consumption of their iOS app is simply ridiculous. If they can but don’t want to (because of processes they want to run in the background, constant notifications, etc.) – well, that’s even worse.

Security

`picardfacepalm.jpg`No words found in the dictionary, so a secure passphrase is outOh wait but that was already out because no spacesErr, wait, so no dictionary words, and no special characters. 1w0nD3r what people will do for their p455w0rd5 now?i hate you, chase. love, the editor.

  • And no password can protect you from an attack as audacious as the man-in-the-middle one launched to comprehensively defeat chip-and-PIN cards. The ‘team’ behind this appear to have ‘leveraged’ a research paper written by Ross Anderson’s team in Cambridge.  X-rays reveal a fraudulent chip soldered onto the original (see below) to allow any PIN to be input and approved.  If they’d been a bit smarter about how they leveraged the exploit, they could have lived off the proceeds for years.   Instead it seems at least one of the team kept going back to the same cashpoints around the same time.   Guess they didn’t watch The Imitation Game.

An artificially colored x-ray image showing the FUN chip and the green stolen credit card chip soldered to it.

The Internet of Things and Cloud Computing

  • Little wonder that Ansible, a 50-person company that develops a free open source IT automation stack and has only been going for a couple of years, has just been acquired for a staggering $100m by RedHat.   The Ansible stack is some 34-68k SLOC depending on how you cut it.  So, in a very real sense, it constitutes some of the most financially valuable Python code ever developed.  At least in terms of the price that RedHat is willing to pay for it weighing in at a mind-boggling $1500/SLOC. Surely that’s sufficient incentive for any lingering non-coders to dive into Python right away.
  • Nicholas Tollervey who MC’d the Education session at PyCon UK last month announced MicroPython support and published a fascinating blog post about the genesis and evolution of the port. Significant kudos is due to the BBC for supporting this work which is arguably the modern analogue for the legendary BBC Micro released 35 years ago:

from today the BBC have agreed that we can continue our work in the open and outside the restrictions of the NDA. The micro:bit related parts of MicroPython have been released under the MIT license and can be found at this GitHub repository.

  • Anyone who used a BBC Micro will recognise the positive formative experience highlighted below:

Micropython

  • Never one to miss a movement, The Register are jumping on the devops bandwagon hosting a conference of their own next year in London. They just announced a call for papers which says all the right things:

The emphasis will be hands-on introductions, conceptual decisions, live demos, or comparisons of different techniques. Presentations that are based solely on promoting a product will not be allowed.

Space

The light from this strange world was seen to dim from 15 to 22 percent at irregular intervals.

Cars

Software Tools

  • InfoQ preview Atlassian’s Jira 7 platform which spins three products from the existing monolith which “can be purchased separately, but still installed together to provide a unified JIRA instance for an entire organization“.  It’s an important development because it positions Jira as an essential enterprise tool:

Development and collaboration software vendor Atlassian released version 7 of its project tracking application JIRA as three new standalone products: JIRA Core, JIRA Software and JIRA Service Desk. All three products are build atop a common platform to better serve non-technical business teams, development teams as well as IT and other service teams with an edition tailored to each team’s needs.

gitlab-vs-github2

  • Pro tip from twit.tv that leaves you unsure about whether it’s tongue in cheek – enter recipient names last when composing emails.

Leadership and Hiring

By seeking, sensing, and sharing, everyone in an organization can become part of a learning organism, listening at different frequencies, scanning the horizon, recognizing patterns and making better decisions on an informed basis.

I didn’t hate people until I became a sysadmin.

  • One can sympathise to a degree after reading this excoriating take on “the great tech recruiter infestation”.  If you’ve ever had a run-in with ill-advised cold callers that know nothing about your business needs fishing with questions about your PSL arrangements and open vacancies, then you’ll be familiar with the rogue’s gallery of chancers and bounders laid out here.  Even so the language is startlingly agricultural.  There are good recruiters out there but there are also a lot of teflon-coated miscreants:

Their bullshit-filled emails go straight in the trash.  My LinkedIn profile carries a huge “recruiters, go away!” sign.  About half of them actually observe it.  The rest are pro-thick, so certain of their own brilliance that the instructions don’t apply.

Hailstones

Startups

  • This FirstRound profile of Nerdwallet outlines how CEO Tim Chen sought to position the company for aligned engineering growth. It all depends on having the right ‘tent poles’ holding the business up:

Before he could build out Nerdwallet’s ranks, Chen set out to make a few senior-level hires that would not only help him steer the ship, but attract the right type of attention and respect from other prospective candidates. He needed marquee names, what he calls “tent poles,” the people who could keep the multi-ring circus running and set standards and expectations for everyone else.

Orwell, Truth and Inequality

The world’s new “Ministry of Truth”, Google believes that screening and censoring information requested by its users will help avoid “websites full of misinformation” from showing up at the top of the search list. Known as the “Knowledge Vault,” the novel algorithm is described by The New American as “an automated and super-charged version of Google’s manually compiled fact database called Knowledge Graph.”

It’s unclear exactly what Google plans to do with this new technology, if anything at all. Still, even the possibility of a search engine that evaluates truth is a pretty incredible breakthrough. And it definitely gives new meaning to the phrase “let me Google that for you.”

When researchers used magnetic energy to shut down the brain’s threat perception, nearly a third of patients were more tolerant to immigrants. More said they didn’t believe in God.

  • Staying with the theme of Big Brother, this Atlantic piece entitled “If you’re not paranoid you’re not crazy” asks if it is possible to live “free lives” any more given the extent and depth of digital surveillance we are being subjected to by governments and corporations.   Cue a picture of the NSA Utah Data Centre which requires 2 million gallons of water a day to keep it cool:

“On an Internet built on the assumption that every contribution is equally valid, harassers are just as valuable as their victims. But as the harasser flames his victim into silence, he becomes more valuable than his target  …  to a social media company’s “cold bottom line, a troll calling women names all day gets more advertising hits. He is a devoted user.”

Culture and Society

Mr Xi given guard of honour

  • If you grew up in the East Midlands in the 1970’s, you’ll need no reminding of the the original Special One, Brian Clough.  The Guardian take an entertaining ride through the legendary exploits of his double European Cup winning champions Nottingham Forest reminding us there are many paths to excellence and they don’t all involve authoritarian control:

This was the team that prepared for a European Cup semi-final against the great Ajax side by walking round the red-light district of Amsterdam, settling down for some late-night beers in a bar full of potheads and sex tourists.

Leave a Reply