Category Archives: All About Software Technology

An Android Board Game: Sweet Spots

Back in 2013, one of my planned to-do items was to explore the Android programming language. As always, the best way to learn a programming platform is to write code on it. At the time I was hooked on an interesting board game called Alberi (based on some mathematical puzzle designed by Giorgio Dendi), so I decided to develop a board game using the very game logic as a programming exercise. I was going to post it in a blog but was buried in a different project upon finishing the code.

This was the first Android application I’ve ever developed. It turned out to be a little more complex than initially thought as a first programming exercise on a platform unfamiliar to me. Nevertheless, it was fun to develop a game I enjoyed playing. Android is Java-based so to me the learning curve is not that steep, and the Android SDK comes with a lot of sample code that can be borrowed.

The game is pretty simple. For a given game, the square-shaped board is composed of N rows x N columns of square cells. The entire board is also divided into N contiguous colored zones. The goal is to distribute a number of treasure chests over the board with the following rules:

  1. Each row must have 1 treasure chest
  2. Each column must have 1 treasure chest
  3. Each zone must have 1 treasure chest
  4. Treasure chests cannot be adjacent row-wise, column-wise or diagonally to each other
  5. There is also a variant of 2 treasure chests (per row/column/zone) at larger board size

Here’s a screen-shot of the board game (N = 6):

SweetSpots screenshot

SweetSpots screenshot

Publishing the game app on Google Play

Back then I didn’t publish the game on Google Play. I’ve decided to do it now just to try out the process. To do that, I need to create a signed Android application package (APK) then zip-align it as per Google’s publish requirement. Eclipse and Android SDK along with Android Debug Bridge (adb) for the Android device simulator were used for developing the app back in 2013. Android OS version at the time was Jelly Bean, although the game still plays fine today on my Lollipop Android phone. The Eclipse version used for the development was Juno and Android SDK version was 17.0.0.

Just two years later today, while the game app still runs fine as an unsigned APK on the current Android platform it no longer builds properly on the latest Eclipse (Mars) and Android SDK (v24.0.2), giving the infamous “R cannot be resolved” error. Lots of suggestions out there on how to solve the problem such as fixing the resource XML, modifying build path, etc, but unfortunately none applies.

As Google is ending support for Android Developer Tool (ADT) in Eclipse literally by end of the month, leaving IntelliJ-based Android Studio the de facto IDE for future Android app development, I thought I would give it a shot. Android Studio appears to be a great IDE product and importing the Eclipse project to it was effortless. It even nicely correlates dependencies and organizes multiple related projects into one. But then a stubborn adb connection problem blocked me from moving forward. I decided to move back to Eclipse. Finally, after experimenting and mixing the Android SDK build tools and platform tools with older versions I managed to successfully build the app. Here’s the published game at Google Play.

From Tic-tac-toe to SweetSpots

The Android’s SDK version I used comes with a bunch of sample applications along with working code. Among the applications is a Tic-tac-toe game which I decided would serve well as the codebase for the board game. I gave the game a name called Sweet Spots.

Following the code structure of the Tic-tac-toe sample application, there are two inter-dependent projects for Sweet Spots: SweetSpotsMain and SweetSpotsLib, each with its own manifest file (AndroidManifest.xml). The file system structure of the source code and resource files is simple:

/path-to-app/SweetSpotsMain/
    AndroidManifest.xml
    res/
        drawable/
            icon.png
        layout/
            main.xml
        layout-land/
            main.xml
        values/
            strings.xml
    src/
        com/
            genuine/
                android/
                    sweetspots/
                        MainActivity.java
/path-to-app/SweetSpotsLib/
    AndroidManifest.xml
    res/
        drawable/
            (board content images)
        layout/
            lib_game.xml
        layout-land/
            lib_game.xml
        values/
            strings.xml
    src/
        com/
            genuine/
                android/
                    sweetspots/
                        library/
                            GameActivity.java
                            GameView.java

MainActivity (extends Activity)

The main Java class in SweetSpotsMain is MainActivity, which defines method onCreate() that consists of buttons for games of various board sizes. In the original code repurposed from the Tic-tac-toe app, each of the game buttons uses its own onClickListener that defines onClick(), which in turn calls startGame() to launch GameActivity using startActivity() by means of an Intent object with the GameActivity class. It has been refactored to have the activity class implement onClickListener and override onCreate() with setOnClickListener(this) and onClick() with specific actions for individual buttons.

View source code of MainActivity.java in a separate browser tab.

GameActivity (extends Activity)

One of the main Java classes in SweetSpotsLib is GameActivity. It defines a few standard activities including onCreate(), onPause() and onResume(). GameActivity also implements a number of onClickListener’s for operational buttons such as Save, Restore and Confirm. The Save and Restore buttons are for temporarily saving the current game state to be restored, say, after trying a few tentative moves. Clicking on the Confirm button will initiate validation of the game rules.

View source code of GameActivity.java in a separate browser tab.

GameView (extends View)

The other main Java class in SweetSpotsLib is GameView, which defines and maintains the view of the board game in accordance with the activities. It defines many of the game-logic methods within standard method calls including onDraw(), onMeasure(), onSizeChanged(), onTouchEvent(), onSaveInstanceState() and onRestoreInstanceState().

GameView also consists of interface ICellListener with the abstract method onCellSelected() that is implemented in GameActivity. The method does nothing in GameActivity but could be added with control logic if wanted.

View source code of GameView.java in a separate browser tab.

Resource files

Images and layout (portrait/landscape) are stored under the res/ subdirectory. Much of the key parametric data (e.g. board size) is also stored there in res/values/strings.xml. Since this was primarily a programming exercise on a mobile platform, visual design/UI wasn’t given much effort. Images used in the board game were assembled using Gimp from public domain sources.

Complete source code for the Android board game is at: https://github.com/oel/sweetspots

How were the games created?

These games were created using a separate Java application that, for each game, automatically generates random zones on the board and validate the game via trial-and-error for a solution. I’ll talk about the application in a separate blog post when I find time. One caveat about the automatic game solution creation is that the solution is generally not unique. A game with unique solution would allow some interesting game-playing logic to be more effective in solving the game. One way to create a unique solution would be to manually re-shape the zones in the generated solution.

Adopting Node.js In The Core Tech Stack

At a startup company, DwellAware, I’ve been with recently, I was tasked to build a web-centric application with a backend for comprehensive data analytics in the residential real estate space. Nevertheless, this post is not about the startup venture. It’s about Node.js, the technology stack chosen to power the application. Programming platforms considered at the beginning of the venture include Scala/Play, PHP/Laravel, Python/Twisted, Ruby/Sinatra and Javascript/Node.js.

Neither is it a blog post about comparing programming platforms. I’m going to simply state that Node.js was picked mainly for a few reasons:

  1. its lean-and-mean minimalist design principle is in line with how I would like to run things in general,
  2. its event-driven, non-blocking-I/O architecture is well suited for contemporary high-concurrency web-centric applications, and,
  3. keeping the entire web application to a single programming platform, since contemporary client-side features are heavily and ubiquitously implemented using Javascript anyway.

Is adopting Node.js a justifiable risk?

In fact, that was the original title of the blog post. I was going to blog about the necessary research for adopting Node.js as the tech stack for the core web application back in 2013. It never grew to more than a few bullet points and was soon buried deep down the priority to-do list.

Javascript has been used on the client side in web applications for a long time. Handling non-blocking events triggered by human activities on a web browser is one thing, dealing with split-second server events and I/O activities on the server side in a non-blocking fashion is a little different. Node.js’s underlying event-driven non-blocking architecture does help somewhat flatten the learning curve to Javascript developers.

Although new Node modules emerged almost daily to try address just about anything in any problem space one could think of, not many of them prove to be very useful, let alone production-grade. That was two years ago. Admittedly, a lot has changed over the past couple of years and Node has definitely become more mature everyday. By most standards, Node.js is still a relatively young technology though.

Anyway, let’s rewind back to Fall 2013.

Built on Google’s V8 Javascript engine, Node is a Javascript-based server platform designed to efficiently run I/O-intensive server applications. For a long time, Javascript was being considered a client-side-only technology. Node.js has made it a serious contender for server-side technology. The fact that prominent software companies such as Microsoft, eBay, LinkedIn, adopted Node.js in some of their products/services was more or less testimonial. While hypes about certain seemingly arbitrary technologies have always been a phenomenon in the Silicon Valley, I wouldn’t characterize the recent uprising of Javascript and Node a mere hype.

Node.js modules

Node by itself is just a barebone server, hence picking suitable modules was one of the upfront tasks. One of the core modules that was an essential part of Node’s middleware framework is Connect, which provides chaining of functions and enhances Node’s http module. ExpressJS further equips Node with rich web app features on top of Connect. To take advantage of multi-core/processor server configuration, Node offers a method child_process.fork() for spawning worker processes that are capable of communicating with their parent via built-in IPC (Inter-process Communication).

On build tool, we started out with Grunt then later shifted to Gulp partly for the speed due to Gulp’s streaming approach. But we were happy with Grunt as well. Node uses Jade as its default templating engine. We didn’t like the performance, so we evaluated a couple of alternate templating engines including doT.js and Swig, and were shocked to see performance gain in an order of magnitude. We promptly switched to Swig (with doT.js a close second).

On test framework, we used Mocha.js with assertion libray, Chai.js, which supports BDD (Behavior-driven Development) assertion style.

Data persistence, caching, content delivery, etc

A key part of our product offerings is about data intelligence, thus databases for both OLTP and warehousing are critical components of the technology stack. MongoDB has been a default database choice for many Node.js applications for good reasons. The emerging MEAN (MongoDB-ExpressJS-AngularJS-Node.js) framework hints the popularity of the Node-MongoDB combo. So Mongo was definitely a considered database. After careful consideration, we decided to go with MySQL. One consideration being that it wouldn’t be too hard to hire a DBA/devops with MySQL experience given its popularity. Both Node.js and MongoDB were relatively new products and we didn’t have in-house MongoDB expertise at the time, so taming one beast (Node in this case) at a time was a preferred route.

There weren’t many Node-MySQL modules out there, though we managed to adopt a simplistic MySQL module that also provides simple connection pooling. Later on, due to the superior geospatial functionality of PostGIS available in the PostgreSQL ecosystem, we migrated from MySQL to PostgreSQL. Thanks to the vast Node.js module repository, there were Node-PostgreSQL modules readily available for connection pooling. To cache frequently referenced application data, we used Redis as a centralized cache store.

Besides dynamic content rendered by application, we were building a web presence also with a lot of static content of various types including images and certain client-side application data. To serve static web content, a few typical approaches, including using a proxy web server, content delivery network (CDN), have been reviewed. On proxy server, Nginx has been on its rise to overtake Apache to become the most popular HTTP server. Its minimalist design is kind of like Node’s. We did some load-testing of static content on Node which appears to be a rather efficient static content server. We decided a proxy server wasn’t necessary at least in the immediate term. As to CDN, we used Amazon’s CloudFront.

Score calculation & image processing

Part of the core value proposition of the product was to come up with objective scores in individual residential real estate properties and neighborhoods so as to help users to make intelligent choice in buying/selling their homes. As described in a previous blog post, a lot of data science work in a wide spectrum of areas (cost analysis, crime, schools, comfort, noise, etc) was performed to generate the scores.

Based on the computed scores, we then derived badges for qualified real estate properties in different areas (e.g. “Low Energy Bills”, “Safe Neighborhood”, “Top Rated Elementary School”). The badges were embedded in selected photos of individual real estate properties, which could then be fed back into the listings distribution cycle by resubmitting into the associated MLSes if the real estate agents/brokers chose to.

All the necessary score calculation and image processing for badges were done in the backend on a Python platform with PostgreSQL databases. Python Tornado servers were used as data service API along with basic caching for Node.js to consume data as presentation content.

Here’s a screen-shot of the Dwelling Page for a given real estate property, showing its DwellScore:

DwellAware DwellScore

Geospatial maps & search

For geographical maps and search, Google Maps API was extensively used from within Node.js. We gecoded in advance all real estate property addresses using the API as part of the backend data processing routine so as to take advantage of Google’s superior search capability.

To supplement the already pretty robust Google Maps search from within Node.js to better utilize our own geospatial data content, we experimented using an Elasticsearch module which comes with their n-gram lexical analyzer for fuzzy-match search. The test result was promising. An advantage of using such an autonomous search system is that it doesn’t directly tax on the Node.js server or the PostgreSQL database (e.g. pg_trgm) as traffic load increases.

Below is a screen-shot of the Search Page centering around San Diego:

DwellAware Search

Fast-forward to the present

As mentioned earlier, Node.js has evolved quite a bit over the past couple of years — the rather significant feature/performance improvements from the v0.10 to v0.12, the next LTS (long-term-support) release incorporating the latest V8 Javascript engine and ES6 ECMA features, the fork-off to io.js which later merged back to Node, …, all sound promising and exciting.

In conclusion, given the evident progress of Node’s development I’d say it’s now hardly a risk to adopt Node for building general web-centric applications, provided that your engineering team possesses sufficiently strong Javascript skills. It wasn’t a difficult decision for me two years ago to pick Node as the core technology stack, and would be an even easier one today.

For more screen-shots of the website, click here.

Yet Another Startup Venture

It has been a while since I published my last blog post. Over the past couple of years, I was busy working with a small team of entrepreneurs on a startup, DwellAware, in the residential real estate space. What we set out to build is a contemporary web application that offers objective ratings derived from a wide sprectum of data sources for individual real estate properties.

Throughout the course of the startup venture, we maintained a skeletal staff including the no-fear CEO, the product czar, a UX designer, a couple of web app/backend engineers, a data scientist, and the engineering head (myself). The office was located in the SoMa district of San Francisco. Competing for top talent in the SF Bay is always a challenge but we were thrilled to have had some of the best talent forming the foundation team.

MVP and product-market fit

Our initial focus was to build a minimally viable product (MVP) and go through rapid iterations to achieve product-market fit. To maximize the velocity of our MVP iterations, we started out with a selected region, San Diego county. We listened to users feedback regularly and iterated continuously in accordance with the feedbacks. These feedbacks were diligently acquired thru interviews with people in local coffee shops, online usability testing as well as website activity analytics.

Eventually we arrived at a refined release and started to geographically expand from a single county to the entire California. Awaiting in the processing queue ready to be deployed were a number of states including Florida, Texas, New York, Illinois. The goal was to cover the 120+ millions of properties nationwide. We scaled the technology infrastructure as we expanded the geography and had all the key technology components in place.

Sadly we couldn’t quite make it to the finish line and had to wind down the operation. In the hindsight, perhaps there were mistakes made at both strategic and tactical level that led to the disappointing result and would warrant some hard analysis. That isn’t what this blog post is about. For now I would simply like to share some of the technological considerations and decisions made during the course of the venture.

DwellScore and HoodScore

A significant portion of the engineering work lied within the data science domain. In order to create an objective scoring system for individual properties in the nation that factors in hidden-cost (e.g. commute, maintenance) analysis, we exhausted various data sources from public census databases, open-source projects to commercial data providers, so as to establish a comprehensive data warehouse.

To help real estate agents/brokers to promote their listings, we derived badges (e.g. “Safe Neighborhood”, “Low Traffic Street”, “Top Rated High School”) and blended them into listings photos for qualified real estate properties in accordance with the calculated scores. The agents/brokers were free to circulate selected badged photos by resubmitting them to associated MLSes for on-going distribution.

One of the challenges was to validate and consolidate incomplete and sometimes inaccurate data from the various sources that are often times incompatible among themselves. Even data acquired with expensive license terms was often found erroneous and incomplete. We got to the point that we were going to redefine our own nation-wide neighborhood dataset in the next upgrade.

Nevertheless, we were able to come up with our first-generation scores for individual properties (DwellScore) and neighborhoods (HoodScore), backed by some extensive data science work that aggregate sub-scores in areas of cost analysis, crime rate, school districts, neighborhood lifestyle and economics. Among the sub-scores was a comfort score that includes a number of unique ingredients including noise. To come up with just noise ranking, we had to comb through data and heat maps related to aircraft , railroad and road traffic count, all from different sources.

The fact that a number of technology partners were interested in acquiring our data science work at the end of the venture does speak to its quality and comprehensiveness.

NLP & computer vision

Real estate listings have long been known for their lack of completeness and accuracy. There are hundreds of MLSes administered using disparate data management systems and possibly over a million real estate brokers/agents in the nation. As a result, listings data not only needs to be up-to-date, but should also be systematically validated in order to be trustable.

We experimented using of NLP (natural language processing) to help validate listings data by extracting and interpreting data of interest from latest free-form text entered by agents. In addition, we worked with a computer vision company to process massive volume of listings images via pattern recognition and machine learning. Certain characteristics of individual property listings, such as curb appeal, actual living area to lot size ratio, existence of power lines, etc, could be identified through computer vision.

Technology stack

We adopted Node.js as the core tech stack for our web-centric application. Python was used as the backend/data-mining platform for data processing tasks such as real estate listings import from MLSes as well as for data-science number crunching. In addition, we also developed data service APIs for internal consumption using Tornado servers to abstract Node.js from having to handle data processing routines.

MySQL was initially chosen as the database management system for OLTP data storage and data warehousing. While Python has a rich set of libraries for geospatial/GIS (geographic information system) which constitutes a significant portion of our core development work, on the database front it didn’t take long for us to hit the limit of geospatial capability offered by MySQL’s latest stable release. Apparently, PostgreSQL equipped with PostGIS has been the de facto database choice for most geospatialists in recent years. Understanding that a database transition was going to cost us non-trivial effort, it’s one of those uncompromisable actions we had to take. Switching the database platform was made easy with SQLAlchemy providing the ORM (object-relational mapping) abstraction layer on Python.

Geospatial search

GoogleMaps API has great features for maps, street views and geocoded address search, but there were still cases where a separate custom search solution could complement the search functionality. PostgreSQL has a Trigram (pg_trgm) module which maintains trigram-based indexes over text columns for similarity search. That helps add some crude NLP (natural language processing) capability to the search functionality necessary for more user-friendly geographical search (e.g. for property address).

While Postgres’ Trigram is a viable tool, it directly taxes on the database and could impact performance as the database volume continues to grow. To scale up search independently from the database, we picked Elasticsearch. Elasticsearch comes with a comprehensive set of functions for robust text search (partial match, fuzzy match, human language, synonym support, etc) via its underlying n-gram lexical analyzer. In addition, it also has basic functions for geolocation, supporting complex shapes in GeoJSON format. In brief, Elasticsearch appears to fit well into our search requirement.

Cloud computing platform

We picked Amazon AWS as our hosting and cloud computing platform, so using its CloudFront as the CDN was a logical step. Other readily available AWS services also offer useful tools in various areas. On the operation front, Route 53 is a DNS service one might find some competitive edge over other existing services out there. For instance, it supports setting up canonical name (CNAME) for the base domain name that many big-name DNS hosting services don’t. Amazon’s elastic load balancer (ELB) also makes load-balancing setup easy and allows centralized digital certificate setup. With wildcard digital certificate for a base domain name and a security policy that permits ending SSL/TLS at the load balancer, secure website setup could be made real simple.

Security-wise, AWS now offers a rather high degree of flexibility for role-based security policy and security group setup. On database, Amazon’s RDS provides a data persistence storage solution to shield one from having to deal with building and maintaining individual relational database servers. I had a lot of reservation when evaluating AWS security in a prior startup venture about its readiness to provide a production-grade infrastructure. I must say that it has improved a great deal since.

A fun run

Although the venture lasted just slightly over two years, it was a fun run. We fostered a culture of transparency and best-ideas-win. We also embraced risk-taking and fast-learning on many fronts, including adopting and picking up bleeding-edge technologies not entirely within our comfort zone. Below are a couple of pictures taken on the day the first production application was launched back in Summer of 2014:

The crowd in the engineering room

Engineering room

Launching the first web site

Launching 1st web site