Big Data SaaS For HAN Devices

At one of the startups I co-founded in recent years, I was responsible for building a SaaS (Software-as-a-Service) platform to manage a rapidly growing set of HAN (Home Area Network) devices across multiple geographies. It’s an interesting product belonging to the world of IoT (Internet of Things), a buzzword that wasn’t popular at all back in 2007. Building such a product required a lot of adventurous exploration and R&D effort from me and my team, especially back then when SaaS and HAN were generally perceived as two completely segregated worlds. The company is EcoFactor and is in the energy/cleantech space.

Our goal was to optimize residential home energy, particularly in the largely overlooked HVAC (heating, ventilation, and air conditioning) area. We were after the consumer market and chose to leverage channel partners in various verticals including HVAC service companies, broadband service providers, energy retailers, to reach mass customers. Main focus was twofold: energy efficiency and energy load shaping. Energy efficiency is all about saving energy while not significantly compromising comfort, and energy load shaping primarily targets utility companies who have vast interest in reducing spikes in energy load during usage peak-time.

Home energy efficiency

Energy efficiency implementation requires intelligence derived from mass real-world data and delivered by optimization algorithms. Proper execution of such optimization isn’t trivial. It involves deep learning of HVAC usage pattern in a given home, analysis of the building envelope (i.e. how well-insulated the building is), the users’ thermostat control activities, etc. All that information is deduced from the raw thermal data spit out by the thermostats, without needing to ask the users a single question. And execution is in the form of programmatic refinement through learning over time as well as interactive adjustment in accordance with feedback from ad-hoc activities.

Obviously, local weather condition and forecast information is another crucial input data source for executing the energy efficiency strategy. Besides temperature, other information such as solar/radiation condition and humidity are also important parameters. There are quite a lot of commercial weather datafeed services available for subscription, though one can also acquire raw data for U.S. directly from NCDC (National Climatic Data Center).

Energy load shaping

Many utilities offer demand response programs, often with incentives, to curtail energy consumption during usage peak-time (e.g. late afternoon on a hot Summer day). Load reduction in a conventional demand response program inevitably causes discomfort experienced by the home occupants, leading to high opt-out rate that beats the very purpose of the program. Since the “thermal signature” of individual homes is readily available from the vast thermal data being collected around the clock, it didn’t take too much effort to come up with suitable load shaping strategy, including pre-conditioning, for each home to take care of the comfort factor while conducting a demand response program. Utility companies loved the result.

HAN devices

The product functionality described so far seems to suggest that: a) some nonexistent complicated device communications protocol is needed, and, b) in-house hardware/firmware engineering effort is needed. Fortunately, there were already some WPAN (Wireless Personal Area Network) protocols, such as ZigBee/IEEE 802.15.4, Z-Wave, 6loWPAN (and other wireless protocols such as WiFi/IEEE 802.11.x), although implementations were still in experimentation at the time I started researching into that space.

We wanted to stay in the software space (more specifically, SaaS) and focus more on delivering business intelligence out of the collected data, hence we would do everything we could to keep our product hardware- and protocol-agnostic. Instead of trying to delve into the hardware engineering world ourselves, we sought and adopted strategic partnership with suitable hardware vendors and worked collaboratively with them to build quality hardware to match our functionality requirement.

Back in 2007, the WPAN-based devices available on the market were too immature to be used even for field trials, so we started out with some IP-based thermostats each of which equipped with a stripped-down HTTP server. Along with the manufacturer’s REST-based device access service, we had our first-generation 2-way communicating thermostats for proof-of-concept work. Field trials were conducted simultaneously in both Texas and Australia so as to capture both Summer and Winter data at the same time. The trials were a success. In particular, the trial result answered the few key hypotheses that were the backbone of our value proposition.

WPAN vs WiFi

To prepare ourselves for large-scale deployment, a low-cost barebone programmable thermostat one can find in a local hardware store like Home Depot is what we were going after as the base hardware. The remaining requirement would be to equip it with a low-cost chip that can communicate over some industry-standard protocol. An IP-based thermostat requiring running ethernet cable inside a house is out of question for both deployment cost and cosmetic reasons which we learned a great deal from our field trials. In essence, we only considered thermostats communicating over wireless protocols such as WPAN or WiFi.

Next, WPAN won over WiFi because of the relatively less work required for messing with the broadband network in individual homes and the low-power specs that works better for battery-powered thermostats. Finally, ZigBee became our choice for the first mass deployment because of its relatively robust application profiles tailored for energy efficiency and home automation. Another reason is that it was going to be the protocol SmartMeters would use, and communicating with SmartMeters for energy consumption information was in our product roadmap.

ZigBee forms a low-power wireless mesh network in which nodes relay communications. At 250 kbit/s, it isn’t a high-speed protocol and can operate in the 2.4GHz frequency band. It’s built on top of IEEE 802.15.4 and is equipped with industry-standard public-key cryptography security. Within a ZigBee network, a ZigBee gateway device typically serves as the network coordinator device, responsible for enforcing the security policy in the ZigBee network and enrollment of joining devices. It connects via ethernet cable or WiFi to a broadband router on one end and communicates wirelessly with the ZigBee devices in the home. The gateway device in essence is the conduit to the associated HAN devices. Broadband internet connectivity is how these HAN devices communicate with our SaaS platform in the cloud. This means that we only target homes with broadband internet service.

The SaaS platform

Our very first SaaS prototype system prior to VC funding was built on a LAMP platform using first-generation algorithms co-developed by a small group of physicists from academia. We later rebuilt the production version on the Java platform using a suite of open-source application servers and frameworks supplemented with algorithms written in Python. Heavy R&D of optimization strategy and machine learning algorithms were being performed by a dedicated taskforce and integrated into the “brain” of the SaaS platform. A suite of selected open-source software including application servers and frameworks were adopted along with tools for development, build automation, source control, integration and QA.

Relational databases were set up initially to persist acquired data from the HAN devices in homes across the nation (and beyond). The data acquisition architecture was later revamped to use HBase as a fast data dumping persistent store to accommodate the rapidly growing around-the-clock data stream. Only selected data sets were funneled to the relational databases for application logics requiring more complex CRUD (create, read, update and delete) operations. Demanding Big Data mining, aggregation and analytics tasks were performed on Hadoop/HDFS clusters.

Under the software-focused principle, our SaaS applications do not directly handle low-level communications with the gateway and thermostat devices. The selected gateway vendor provides its PaaS (Platform-as-a-Service) which takes care of M2M (machine to machine) hardware communications and exposes a set of APIs for basic device operations. The platform also maintains bidirectional communications with the gateway devices by means of periodic phone-home from devices and UDP source port keep-alive (a.k.a. hole-punching, for inbound communications through the firewall in a typical broadband router). Such separation of work allows us to focus on the high-level application logics and business intelligence. It also allows us to more easily extend our service to multiple hardware vendors.


Obviously I can’t get into any specifics of the algorithms which represents collective intellectual work developed and scrutinized by domain experts since the very beginning of the startup venture. It suffices to say that they constitute the brain of the SaaS application. Besides information garnered from historical data, the execution also takes into account of interactive feedback from the users (e.g. ad-hoc manual overrides of temperature settings on the thermostat via the up/down buttons or a mobile app for thermostat control) and modifies existing optimization strategy accordingly.

Lots of modeling and in-depth learning of real-world data were performed in the areas of thermal energy exchange in a building, HVAC run-time, thermostat temperature cycles, etc. A team of quants with strong background in Physics and numerical analysis were assembled to just focus on the relevant work. Besides custom optimization algorithms, machine learning algorithms including clustering analysis (e.g. k-Means Clustering) were employed for various kinds of tasks such as fault detection.

A good portion of the algorithmic programming work was done on the Python platform primarily for its abundance of contemporary Math libraries (SciPy, NumPy, etc). Other useful tools include R for programmatic statistical analysis and Matlab/Octave for modeling. For good reasons, the quant team is the group demanding the most computing resource from the Hadoop platform. And Hadoop’s streaming API makes it possible to maintain a hybrid of Java and Python applications. A Hadoop/HDFS cluster was used to accommodate all the massive data aggregation operations. On the other hand, a relational database with its schema optimized for the quant programs was built to handle real-time data processing, while a long-term solution using HBase was being developed.

Putting everything together

Although elastic cloud service such as Amazon’s EC2 has been hot and great for marketing, our around-the-clock data acquisition model consists of a predictable volume and steady stream rate. So the cloud’s elasticity wouldn’t benefit us much, but it’s useful for development work and benchmarking.

Another factor is security, which is one of the most critical requirements in operating an energy management business. A malicious attack that simultaneously switches on 100,000 A/Cs in a metropolitan region on a hot Summer day could easily bring down the entire grid. Cloud computing service tends to come with less flexible security measure, whereas one can more easily implement enhanced security in a conventional hosting environment, and co-located hosting would offer the highest flexibility in that regard. Thus a decision was made.

That pretty much covers all the key ingredients of this interesting product that brings together the disparate SaaS and HAN worlds at a Big Data scale. All in all, it was a rather ambitious endeavor on both the business and technology fronts, certainly not without challenges – which I’ll talk about perhaps in a separate post some other time.

2 thoughts on “Big Data SaaS For HAN Devices

  1. Pingback: Challenges Of Big Data + SaaS + HAN | Genuine Blog

  2. Pingback: Internet-of-Things And Akka Actors | Genuine Blog

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">

Current month ye@r day *