Text Mining With Akka Streams

Reactive Systems, whose core characteristics are declared in the Reactive Manifesto, have started to emerge in recent years as message-driven systems that emphasize scalability, responsiveness and resilience. It’s pretty clear from the requirements that a system can’t be simply made Reactive. Rather, it should be built from the architectural level to be Reactive.

Akka’s actor systems, which rely on asynchronous message-passing among lightweight loosely-coupled actors, serve a great run-time platform for building Reactive Systems on the JVM (Java Virtual Machine). I posted a few blogs along with sample code about Akka actors in the past. This time I’m going to talk about something different but closely related.

Reactive Streams

While bearing a similar name, Reactive Streams is a separate initiative that mandates its implementations to be capable of processing stream data asynchronously and at the same time automatically regulating the stream flows in a non-blocking fashion.

Akka Streams, built on top of Akka actor systems, is an implementation of Reactive Streams. Equipped with the back-pressure functionality, it eliminates the need of manually buffering stream flows or custom-building stream buffering mechanism to avoid buffer overflow problems.

Extracting n-grams from text

In text mining, n-grams are useful data in the area of NLP (natural language processing). In this blog post, I’ll illustrate extracting n-grams from a stream of text messages using Akka Streams with Scala as the programming language.

First thing first, let’s create an object with methods for generating random text content:

Source code: TextMessage.scala

Some minimal effort has been made to generate random clauses of likely pronounceable fake words along with punctuations. To make it a little more flexible, lengths of individual words and clauses would be supplied as parameters.

Next, create another object with text processing methods responsible for extracting n-grams from input text, with n being an input parameter. Using Scala’s sliding(size, step) iterator method with size n and step default to 1, a new iterator of sliding window view is generated to produce the wanted n-grams.

Source code: TextProcessor.scala

Now that the text processing tools are in place, we can focus on building the main streaming application in which Akka Streams plays the key role.

First, make sure we have the necessary library dependencies included in build.sbt:

Source code: build.sbt

As Akka Streams is relatively new development work, more recent Akka versions (2.4.9 or higher) should be used.

Let’s start with a simple stream for this text mining application:

Source code: NgramStream_v01.scala

As shown in the source code, constructing a simple stream like this is just defining and chaining together the text-generating source, the text-processing flow and the text-display sink as follows:

Graph DSL

Akka Streams provides a Graph DSL (domain-specific language) that helps build the topology of stream flows using predefined fan-in/fan-out functions.

What Graph DSL does is somewhat similar to how Apache Storm‘s TopologyBuilder pieces together its spouts (i.e. stream sources), bolts (i.e. stream processors) and stream grouping/partitioning functions, as illustrated in a previous blog about HBase streaming.


Now, let’s branch off the stream using Graph DSL to illustrate how the integral back-pressure feature is at play.

Source code: NgramStream_v02.scala

Streaming to a file should be significantly slower than streaming to the console. To make the difference more noticeable, a delay is deliberately added to streaming each line of text in the file sink.

Running the application and you will notice that the console display is slowed down. It’s the result of the upstream data flow being regulated to accommodate the relatively slow file I/O outlet even though the other console outlet is able to consume relatively faster – all that being conducted in a non-blocking fashion.

Graph DSL create() methods

To build a streaming topology using Graph DSL, you’ll need to use one of the create() methods defined within trait GraphApply, which is extended by object GraphDSL. Here are the signatures of the create() methods:

Note that the sbt-boilerplate template language is needed to interpret the create() method being used in the application that takes multiple stream components as input parameters.

Materialized values

In Akka Streams, materializing a constructed stream is the step of actually running the stream with the necessary resources. To run the stream, the implicitly passed factory method ActorMaterializer() is required to allocate the resources for stream execution. That includes starting up the underlying Akka actors to process the stream.

Every processing stage of the stream can produce a materialized value. By default, using the via(flow) and to(sink) functions, the materialized value of the left-most stage will be preserved. As in the following example, for graph1, the materialized value of the source is preserved:

To allow one to selectively capture the materialized values of the specific stream components, Akka Streams provides functions viaMat(flow) and toMat(sink) along with a combiner function, Keep. As shown in the above example, for graph2, the materialized value of the flow is preserved, whereas for graph3, materialized values for both the flow and sink are preserved.

Back to our fileSink function as listed below, toMat(fileIOSink)(Keep.right) instructs Akka Streams to keep the materialized value of the fileIOSink as a Future value of type IOResult:

Using Graph DSL, as seen earlier in the signature of the create() method, one can select what materialized value is to be preserved by specifying the associated stream components accordingly as the curried parameters:

In our case, we want the materialized value of fileSink, thus the curried parameters should look like this:

Defining the stream graph

Akka Streams provides a number of functions for fan-out (e.g. Broadcast, Balance) and fan-in (e.g. Merge, Concat). In our example, we want a simple topology with a single text source and the same n-gram generator flow branching off to two sinks in parallel:

Adding a message counter

Let’s further expand our n-gram extraction application to include displaying a count. A simple count-flow is created to map each message string into numeric 1, and a count-sink to sum up all these 1′s streamed to the sink. Adding them as the third flow and sink to the existing stream topology yields something similar to the following:

Source code: NgramStream_v03.scala

Full source code of the application is at GitHub.

Final thoughts

Having used Apache Storm, I see it a rather different beast compared with Akka Streams. A full comparison between the two would obviously be an extensive exercise by itself, but it suffices to say that both are great platforms for streaming applications.

Perhaps one of the biggest differences between the two is that Storm provides granular message delivery options (at most / at least / exactly once, guaranteed message delivery) whereas Akka Streams by design questions the premise of reliable messaging on distributed systems. For instance, if guaranteed message delivery is a requirement, Akka Streams would probably not be the best choice.

Back-pressure has recently been added to Storm’s v.1.0.x built-in feature list, so there is indeed some flavor of reactiveness in it. Aside from message delivery options, choosing between the two technologies might be a decision basing more on other factors such as engineering staff’s expertise, concurrency model preference, etc.

Outside of the turf of typical streaming systems, Akka Streams also plays a key role as the underlying platform for an emerging service stack. Viewed as the next-generation of Spray.io, Akka HTTP is built on top of Akka Streams. Designed for building HTTP-based integration layers, Akka HTTP provides versatile streaming-oriented HTTP routing and request/response transformation mechanism. Under the hood, Akka Streams’ back-pressure functionality regulates data streaming between the server and the remote client, consequently conserving memory utilization on the server.

PostgreSQL Table Partitioning

With the ever growing demand for data science work in recent years, PostgreSQL has gained superb popularity especially in areas where extensive geospatial/GIS (geographic information system) functionality is needed. In a previous startup venture, MySQL was initially adopted and I went through the trouble of migrating to PostgreSQL mainly because of the sophisticated geospatial features PostGIS offers.

PostgreSQL offers a lot of goodies, although it does have a few things that I wish were done differently. Most notable to me is that while its SELECT statement supports SQL-92 Standard’s JOIN syntax, its UPDATE statement would not. For instance, the following UPDATE statement would not work in PostgreSQL:

Partial indexing

Nevertheless, for general performance and scalability, PostgreSQL remains one of the top candidates with proven track record in the world of open source RDBMS. In scaling up a PostgreSQL database, there is a wide variety of approaches. Suitable indexing are probably some of the first strategies to be looked into. Aside from planning out proper column orders in indexes that are optimal for the frequently used queries, there are also a couple of indexing features that help scaling.

Partial indexing allows an index to be built over a subset of a table based on a conditional expression. For instance:

In the case of a table with large amount of rows, this feature could make an otherwise gigantic index much smaller, thus more efficient for queries against the selectively indexed data.

Scaling up with table partitioning

However, when a table grows to certain volume, say, beyond a couple of hundreds of million rows, and if periodically archiving off data from the table isn’t an option, it would still be a problem even with applicable indexing strategy. In many cases, it might be necessary to do something directly with the table structure and table partitioning is often a good solution.

There are a few approaches to partition a PostgreSQL table. Among them, partitioning by means of table inheritance is perhaps the most popular approach. A master table will be created as a template that defines the table structure. This master table will be empty whereas a number of child tables inherited from this master table will actually host the data.

The partitioning is based on a partition key which can be a column or a combination of columns. In some common use cases, the partition keys are often date-time related. For instance, a partition key could be defined in a table to partition all sales orders by months with constraint like the following:

order_date >= ’2016-12-01 00:00:00′ AND order_date < ’2017-01-01 00:00:00′

Other common cases include partitioning geographically, etc.

A table partitioning example

When I was with a real estate startup building an application that involves over 100 millions nationwide properties, each with multiple attributes of interest, table partitioning was employed to address the demanding data volume. Below is a simplified example of how the property sale transaction table was partitioned to maintain a billion rows of data.

First, create the master table which will serve as the template for the table structure.

Next, create child tables inheriting from the master table for the individual states. For simplicity, I only set up 24 states for performance evaluation.

Nothing magical so far, until a suitable trigger for propagating insert is put in place. The trigger essentially redirects insert requests against the master table to the corresponding child tables.

Let’s test inserting data into the partitioned tables via the trigger:

A Python program for data import

Now that the master table and its child tables are functionally in place, we’re going to populate them with large-scale data for testing. First, write a simple program using Python (or any other programming/scripting language) as follows to generate simulated data in a tab-delimited file for data import:

Run the Python program to generate up to 1 billion rows of property sale data. Given the rather huge output, make sure the generated file is on a storage device with plenty of space. Since it’s going to take some time to finish the task, it would better be run in the background, perhaps along with mail notification, like the following:

Next, load data from the generated infile into the partitioned tables using psql. In case there are indexes created for the partitioned tables, it would generally be much more efficient to first drop them and recreate them after loading the data, like in the following:

Query with Constraint Exclusion

Prior to querying the tables, make sure the query optimization parameter, constraint_exclusion, is enabled.

With constraint exclusion enabled, the query planner will be smart enough to examine query constraints to exclude scanning of those partitioned tables that don’t match the constraints. Unfortunately, though, if the constraints involve matching against non-constants like the NOW() function, the query planner won’t have enough information to filter out unwanted partitions hence won’t be able to take advantage of the optimization.

Final notes

With a suitable partitioning scheme applied to a big table, query performance can be improved by an order of magnitude. As illustrated in the above case, the entire partitioning scheme centers around the key column used for partitioning, hence it’s critical to properly plan out which key column (or combination of columns) to partition. Number of partitions should also be carefully thought out, as too few partitions might not help whereas too many partitions would create too much overhead.

Relational Database Redemption

Relational databases, such as PostgreSQL and Oracle, can be traced back to the 80′s when they became a dominant type of data management systems. Their prominence was further secured by the ANSI standardization of the domain specific language called SQL (Structured Query Language). Since then, RDBMS (relational database management system) has been the de facto component most data-centric applications would be architecturally centered around.

What happened to relational Databases?

It’s a little troubling, though, over the past 10-15 years, I’ve witnessed relational databases being sidelined from the core functionality requirement review or architectural design in many software engineering projects that involve data-centric applications. In particular, other kinds of databases would often be favored for no good reasons. And when relational databases are part of the core technology stack, thorough data model design would often be skipped and using of SQL would often be avoided in cases where it would be highly efficient.

So, why have relational databases been treated with such noticeably less preference or seriousness? I believe a couple of causes have led to the phenomenon.

Object-oriented data persistence architecture

First, there was a shift in application architecture in late 90′s when object-oriented programming began to increasingly dominate in the computing world. In particular, the backend data persistence component of object-oriented applications began to take over the heavy lifting of the database CRUD (create/read/update/delete) operations which used to reside within the database tier via SQL or procedural language PL/SQL.

Java EJB (enterprise Java bean), which was aimed to emulate data persistence and query functionality among other things, took the object-oriented programming world by storm. ORM (object-relational mapping) then further helped keep software engineers completely inside the Object world. Realizing that the initial EJB specifications were over-engineered, it later evolved into JPA (Java Persistence API) which also incorporates ORM functionality. All that doesn’t eliminate the need of relational databases, but engineering design focus has since been pulled away from the database tier and SQL has been treated as if it was irrelevant.

NoSQL databases

Then, in late 00′s came column-oriented NoSQL databases like HBase and Cassandra, which were designed to primarily handle large-scale datasets. Designed to run on scalable distributed computing platforms, these databases are great for handling Big Data at the scale that conventional relational databases would have a hard time to perform well.

Meanwhile, document-based NoSQL databases like MongoDB also emerged and have increasingly been adopted as part of the core technology stack by software engineers. These NoSQL databases have all of a sudden stole the spotlight in the database world. Relational databases were further perceptually “demoted” and SQL wouldn’t look right without a negation prefix.

Object-oriented data persistence versus SQL, PL/SQL

Just to be clear, I’m not against having the data persistence layer of the application handle the business logic of data manipulations and queries within the Object world. In fact, I think it makes perfect sense to keep data access business logic within the application tier using the same object-oriented programming paradigm, shielding software engineers from having to directly deal with things in the disparate SQL world.

Another huge benefit of using the object-oriented data persistence is that it takes advantage of any scaling mechanism provided by the application servers (especially for those on distributed computing platforms), rather than, say, relying everything on database-resident PL/SQL procedures that don’t scale well.

What I’m against, though, is that proper design and usage best practices are skipped when a relational database is used, hallucinating that the ORM would just magically handle all the data manipulations/queries of a blob of poorly structured data. In addition, while ORMs can automatically generate SQLs for a relatively simple data model, they aren’t good at coming up with optimal efficient SQLs for many sophisticated models in the real world.

NoSQL databases versus Relational databases

Another clarification point I thought I should raise is that – I love both SQL-based relational and NoSQL databases, and have adopted them as core parts of different systems in the past. I believe they have their own sweet spots as well as drawbacks, and should be adopted in accordance with the specific need in data persistence and consumption.

I’ve seen some engineering organizations flocking to the NoSQL world for valid reasons, and others just for looking cool. I’ve also seen in a couple of occasions that companies decided to roll back from a NoSQL platform to using relational databases to better address their database transaction need after realizing that their increasing data volume demand can actually be handled fine with a properly designed relational database system.

In general, if your database need leans towards data warehousing and the projected data volume is huge, NoSQL is probably a great choice; otherwise, sticking to using relational databases might be the best deal. It all boils down to specific business requirement, and these days it’s also common that both database types are simultaneously adopted to complement each other. As to what’s considered huge, I would say it warrants a NoSQL database solution when one or more tables need to house 100′s of millions or more rows of data.

Why do relational databases still matter?

The answer to whether relational databases still matter is a decisive yes:

  1. Real-world need of relational data models — A good portion of structured and inter-related data in the real world is still best represented by relational data models. While column-oriented databases excel in handling very large datasets, they aren’t designed for modeling relational data entities.
  2. Transactional CRUD operations — Partly due to NoSQL database’s fundamental design, data often need to be stored in denormalized form for performance, and that makes transactional operations difficult. On the contrary, relational database is a much more suitable model for transactional CRUD operations that many types of applications require. That, coupled with the standard SQL language for transactional CRUD makes the role of relational databases not easily replaceable.
  3. Bulk data manipulations — Besides proven a versatile powerful tool in handling transactional CRUD, SQL also excels in manipulating data in bulk without compromise in atomicity. While PL/SQL isn’t suitable for all kinds of data manipulation tasks, when used with caution it provides procedural functionality in bulk data processing or complex ETL (extract-transform-load).
  4. Improved server hardware — Improvement in server processing power and low cost of memory and storage in recent years have helped make relational databases cope with the increasing demand of high data volume. On top of that, prominent database systems are equipped with robust data sharding and clustering features that also decidedly help in scalability. Relational databases with 10′s or even 100′s of million rows of data in a table aren’t uncommon these days.

Missing skills from today’s software architects

In recent years, I’ve encountered quite a few senior software engineers/architects with advanced programming skills but poor relational data modeling/SQL knowledge. With their computing backgound I believe many of these engineers could pick up the essential knowledge without too much effort. (That being said, I should add that while commanding the relational database fundamentals is rather trivial, becoming a database guru does require some decent effort.) It’s primarily the lack of drive to sharpen their skills in the specific domain that has led to the said phenomenon.

The task of database design still largely falls on the shoulders of the software architect. Most database administrators can configure database systems and fine-tune queries at the operational level to ensure the databases are optimally run, but few possess business requirement knowledge or, in many cases, skills for database design. Suitable database design and data modeling requires intimate knowledge and understanding of business logic of the entire application that is normally in the software architect’s arena.

Even in the NoSQL world of column-oriented databases, I’ve noticed that database design skills are also largely missing. Part of NoSQL database’s signature is that data columns don’t have to be well-defined upfront and can be added later as needed. Because of that, many software architects tend to think that they have the liberty to bypass proper schema design upfront. The truth is that NoSQL databases do need proper schema design as well. For instance, in HBase, due to the by-design limitation of indexing, one needs to carefully lay out upfront what the row key is comprised of and what column families will be maintained.

Old and monolithic?

Aside from causes related to the disruptive technologies described above, some misconceptions that associate relational databases with obsolete technology or monolithic design have also helped contribute to the unwarranted negative attitude towards RDBMS.

Old != Obsolete — Relational database technology is old. Fundamentally it hasn’t changed since decades ago, whereas new computing and data persistence technology buzzwords keep popping up left and right non-stopped. Given so many emerging technologies that one wants to learn all at once, old RDBMS often gets placed at the bottom of the queue. In any case, if a technology is old but continues to excel within its domain, it isn’t obsolete.

RDBMS != Monolith — Contemporary software architects have been advocating against monolithic design. In recent years, more and more applications have been designed and built as microservices with isolated autonomous services and data locality. That’s all great stuff in the ever-evolving software engineering landscape, but when people automatically categorize an application with a high-volume relational database a monolithic system, that’s a flawed assumption.

Bottom line, as long as much of the data in the real world is still best represented in relational data models, RDBMS will have its place in the computing world.

Startup Culture 2.0

Startup has been a household term since early/mid 90′s when the World Wide Web began to take the world by storm. Triggered by the first graphical browser Mosaic available on all popular platforms, the blossoming of the Web all of a sudden opened up all sorts of business opportunities attracting entrepreneurs to try capitalize the newly born eye-catching medium.

A historically well known place for technology entrepreneurship, the Silicon Valley (or more precisely, San Francisco Bay Area) became an even hotter spot for entrepreneurs to swamp in. Many of these entrepreneurs were young energetic college graduates (or drop-outs) in some science/engineering disciplines who were well equipped to quickly learn and apply new things in the computing area. They generally had a fast-paced work style with a can-do spirit. Along with the youthful work-hard play-hard attitude, the so-called startup culture was born. Sun Microsystems was probably a great representative of companies embracing the very culture back in the dot-com days.

So, that was a brief history, admittedly unofficial, of the uprising of startup culture 1.0.

Setting up an open-space engineering room

Setting up an engineering workspace

Version 2.0

This isn’t another ex-dot-commer glorifying the good old startup culture in the dot-com days that later degenerated into a less commendable version observed today. I’m simply describing the gradual and, to some extent, subtle changes to the so-called startup culture I’ve observed over the years.

Heading into startup culture 2.0, besides an emphasis of fast-paced and can-do, along came a number of phenomenons including long hours, open-space and the Agile “movement”. Let’s dive a bit deeper into these phenomenons.

Long hours

In 1.0, we saw a lot of enthusiastic technologists voluntarily working long hours in the office. What’s different from the past is that long-hours is now often an “involuntary” phenomenon. The mindset that software engineers should be working long hours is so predominant that company management are often obsessed with the picture of all the techies diligently coding and brainstorming in the office. By pushing for long hours in the office, management are in essence commodifying software engineering work to become some hourly-paid kind of mechanical work. In most cases, it’s an unintentional act, but in others it’s often an act of distrust.

The fact is that serious software engineering requires serious brain-work. One can only deliver a limited number of hours of work on any given day in a productive manner. In other words, the actual amount of quality work would not increase beyond a few hours of serious brain-work. If you force the engineers to pull long hours in the office beyond the normal productive limit, you’re not improving actual work done but are instead killing the possibility of them to have any stamina or incentive left to contribute bonus work when they aren’t in the office. Even if they manage to overdraft themselves to deliver some extra work in a productivity-depleted state, they most likely will need to replenish their energy by taking time off the following day which would otherwise be productive time.

Flexibility in work hours and locale

Back in 1.0, general Internet connection speed was slow. It was common for a good-sized company with nationwide offices to share a T1 line as their Internet backbone, whereas today many residential consumers use connections at an order of magnitude faster than a T1. So, in order to carry out productive work back then, people had to go to the office, thus oftentimes you could find software engineers literally camping in the office.

Given the vastly improved residential Internet infrastructure today, much of the engineering work that used to be doable only in the office in the past can be done at home. So, if a software engineer already has regular office presence, there is little to no reason to work long hours in the office. In fact, other than pre-scheduled group meetings and white-boarding sessions, engineers really don’t have to stay in the office all the time, especially for those who have to endure bad commute.

Open space

Open office space has been advocated since the middle of the 1.0 era. Common office cubes used to be 5-1/2 feet or taller in the old days. People later began to explore opening up a visually-cluttered office space by cutting down about a foot from the cube wall, allowing individuals to see each other when standing up while keeping some privacy when sitting down. Personally, I think that’s the optimal setting. But then in recent years, lots of startups adopted cubes with walls further lowered or completely removed, essentially enforcing a multicast communication protocol all the time for individuals. In addition, the vast open view would also ensure constant visual distractions.

Bear in mind software engineers need good chunks of solitary time to conduct their coding work. Such a multicast plus visually distracting environment isn’t going to provide a productive environment for them. It’s understandable for a cash-strapped early-stage startup to adopt a temporary economical seating solution like that, but I’ve seen many well-funded companies building out such workspace as their long-term work environment.


Agile software development has been very popular these days. Virtually every software engineering organization is practicing some sort of Agile process. There are already lots of insights about pros and cons of the Agile methodology out there, so I’m not going to get into it. It suffices to say I like certain core Agile practices like 2-week development sprint, daily 15-minute scrum, and continuous integration which I think are applicable to many software development projects. However, I have reservation in mechanically adopting everything advocated in the Agile methodology. Which aspects of the Agile process are to be adopted for an engineering organization should be evaluated, carried out and adjusted in accordance with engineers skillset, project type and operation environment. The primary goal should be for efficiency and productivity, not because Agile sounds cool.

Insecurity and distrust?

Underneath the phenomenons described above, I think there are some signs of insecurity and distrust when viewed from a certain angle.

For most startups, every penny counts and the high cost of keeping a team of software engineers is well known. Thus it’s understandable that management tend to be nervous about whether their engineering hires deliver work in a timely manner. Long hours in office would help make the management believe that things are moving along at above the average speed. Open space further helps overcome their seeing-is-believing insecurity. Frequent sprints and daily scrums help make them feel that things are constantly being cranked out and status visibility is at its maximum granularity.

If that’s the general sentiment felt by the software engineers, most likely they won’t be happy and the most competent ones will likely be the first to flee. Nor will the management be happy when they don’t see the expected high productivity and find it hard to retain top engineers. The end result is that despite all the rally of fun, energetic startup culture on the company’s hiring web page, you’ll hardly experience any fun or energy there.

What can be done better?


  1. Give your staff benefit of doubt — It’s hard to let go of the doubt about whether people are working as hard as expected, but pushing for long hours in the office and keeping them visually exposed at their desks only send a signal of distrust and insecurity. It’ll only backfire and may just result in some kind of conditional attendance that they superficially attend the office only when they know you’re around. I would also recommend making the work hours as flexible as possible. On working remote, a pre-agreed arrangement of telecommuting would go a long way for those who must endure horrible commute. People with enough self-respect would appreciate demonstrated trust from the management and it would make them enjoy their job more.
  2. Work healthy — Work hard and play hard is almost a synonym of startup culture that we believe fun is a critical aspect in a startup environment. Throwing in a ping pong or foosball table would not automatically spawn a fun environment. In building out a work environment, I would argue that “work healthy” perhaps should replace “fun” as the primary initiative. I think providing a healthy working environment will lead to happier staff, better productivity, and fun will come as a bonus. Common ways to achieve that include ergonomic office furniture, natural lighting, facility for exercise, workout subsidy programs, healthy snacks or even a room for taking naps. Speaking of naps, I think it’s worth serious consideration to embrace it as part of the culture. Evidently, a 15-30 minutes of nap after lunch can do magic in refreshing one’s mood and productivity for the remaining half of the day.
  3. Adopt Agile with agility — Take only what best suit your staff’s skillset, project type and operational environment. Stick to the primary goal of better productivity and efficiency, and add/remove Agile practices as needed. It’s also important that you regularly communicate with the staff for feedback and improvement, and make adaptive changes to the practices if necessary.
  4. Make meetings worth attending — Meetings are common and important activities in pretty much all businesses. But oftentimes meetings aren’t conducted in an efficient manner. One common cause is having too many agendas packed in a single session, resulting in a lengthy and exhaustive meeting. Another common cause is that the host hasn’t prepared well enough in advance. But in many cases when the host did prepare themselves, the meeting could still fail because the participants weren’t demanded ahead to also prepare for it. This is particularly common when the meeting requires input from participants. With unprepared participants joining a meeting, the host will likely have to spend much of the meeting time explaining the obvious during the more productive time of a meeting when participants still have a relatively fresh brain.
  5. Lead by example — Too often are we seeing management handing down a set of rules to the staff while condescendingly assuming the rules don’t apply to themselves. It’s almost certain such rules will at best be followed superficially. Another commonly seen phenomenon is that management rally to create a culture which conflicts in many ways with their actual belief and style. They do it just because they were told they must create a culture to run a startup operation, but they ignore the fact that culture can only be built and fostered with themselves genuinely being a part of it. It cannot be fabricated.


  1. Honor the honor system — There may be various reasons contributing to the commonly seen distrust by the management. Unsurprisingly, one of them comes directly from individuals who abuse some of the employee benefits meant to be used on discretion. Perhaps the most common case is claiming the need for work-from-home with made-up reasons or without actually putting in the hours to work. Well, you can’t blame people’s distrust in you unless you first honor the honor system . For instance, when you do work from home, stick to the actual meaning of work-from-home. Also, making your availability known to those who need to work closely with you would be helpful. One effective way, especially for a relatively small team, would be to have a shared group calendar account designated for showing up-to-date team members availability.
  2. Self discipline — Again, using work-from-home as an example, one needs to have the self-discipline to actually put in decent amount of hours to work. It’s easy to be distracted, for instance by your kids, when working at home, but it’s your own responsibility to make necessary arrangement to minimize any expected distractions. It’s also your obligation to make it clear to your teammates in advance when you will be unavailable for scheduled appointments or what-not.
  3. Reasonable work priority — When it comes to priority in life, nothing compares to one’s family. Not even your job, as you can change job but not your family. So, for unplanned urgent family matters, no one will complain if you drop everything to take care of them. However, that doesn’t necessarily justify you should frequently compromise your attendance to work for reasons like picking up your kids from school, unless that’s a pre-agreed work schedule. Bottom line, if your career matters to you, you shouldn’t place your job responsibility way down your priority list from general family routines.
  4. Active participation — Most software engineers hate meetings, feeling that they consume too much of their time which could otherwise be used for actually building the product. I think if the hosts do their job well in prepping for meetings and the participants also do their prep work with regard to the agendas, most meetings can benefit everyone without any feeling of time being wasted. Even as a participant, attending a meeting without any prep work carrying a feed-me mindset will likely feel tedious and boring, unless it’s just some kind of quick announcement-only meeting. When you’ve prepared before a meeting, chances are that you will be a lot more engaged in the discussion and be able to contribute valuable input. Such active participation stimulates collective creativity and fosters a culture of “best ideas win”.
  5. Keep upgrading yourself — This may sound off-topic, but keeping yourself abreast of knowledge and best-practices in the very area of your core job responsibilities does help shape the team culture. Constant self-improvement will naturally boost up one’s confidence in their own domain which, in turn, facilitates efficient knowledge exchange and stimulates healthy competition. All that helps promote a high-efficiency no-nonsense culture. The competitive aspect presents healthy challenge to individuals, as long as excessive egos don’t get in the way. On upgrading, between breath and depth I would always lean toward depth. These days it’s too easy to claim familiarity of all sorts of robust frameworks and libraries on the surface, but the most wanted technologists are often the ones who demonstrated in-depth knowledge, say, down to the code level of a software library.

Final thoughts

Startup culture 1.0 left us a signature work style many aspire to embrace. It has evolved over the years into a more contemporary 2.0 that better suits modern software development models in various changeable competitive spaces. But it’s important we don’t superficially take the surface values of all the hyped-up buzzwords and mechanically take them as gospel. The said culture should be selectively embraced, adopted and adjusted in accordance with the team’s strength and weaknesses, project type, etc. More importantly, whatever embraced culture should never be driven by distrust or insecurity.