Overview

Dota-Sight has many features that are useful, however there are two issues that limit its growth and potential to scale. Fortunately a program created with node webkit can help resolve that issue.

Currently, a user has to manually navigate to a folder to copy logs before our server can parse the player IDs and retrieve data from dotabuff.com. The manual action of navigating to a folder has prevented this app from achieving greater traction. Also, another potential concern that would only matter when there’s sufficient traffic, is that the gathering of specific player data from dotabuff.com is actually a web scrape of the profile pages of 10 players. If the dotabuff servers blocked our server, Dota-Sight would not work.

Dota-Sight Structure

My goal today is to implement a program created by node webkit, whose purpose would be to access the file system’s specific file and monitor it for changes. When it does, it will retrieve player specific data from dotabuff from the client side node webkit program, and then it will pass the data to our Dota-Sight web app through url parameters. Something like this:

Updated Dota-Sight Structure

Creating a native app that monitors the server_log.txt

I created a node-webkit package thats performs the following task when ran:

  1. When loaded, based on your operation system, it suggests default server_log.txt path (using Node.js - OS module)
  2. Choosing the file path
  3. If the path is correct, start
  4. If not, choose the path through a file input form. Note that if this program was in a browser, it would not return the file path due to security concerns. (Chrome returns “C:\fakepath\text.xls”!) the only reason this works is due to this app functioning as a native app through node webkit.
  5. The Node “fs” module watches the selected file.
  6. Upon file change, it parses the log on the client side into playerId’s, and then passes it to our webclient.

In order to pass along a node-webkit app:

  1. Zip all the contents of app with a package.json file in a .zip format
  2. Rename the extension from .zip to .nw
  3. Tell users to download nwjs from nwjs.io
  4. Tell users to unzip the nwjs into a folder, and place your .nw file into the same folder
  5. Run the nw executable!
  6. Alternatively, it is possible to package a .exe file, however it requires a few more steps and few would ever download an executable file for a program like dota-sight. Also, the .nw archive is 75kb, while the .exe file is 40+mb/

Next Steps

I attempted to use the “request” and “cheerio” libraries on the client side to pull data from www.dotabuff.com, and that worked. The problem was passing on that information to the front end. If I tried passing the data as a url parameter, the data caused a HTTP 414 error due to a long URI (the urls are long with a big json string). Since I can’t change the server settings on the heroku instance, in addition to many browsers not supporting such long urls anyway, I have to change my methodology.

Approaches I could attempt include:

  1. Converting my web app into a client side app
  2. Posting the client pulled data onto the server, and then piping it to the web client based on a session token
  3. ??? - Investigate other technologies such as web components and chrome extensions
  4. TODO- maybe if I could store the data into a local storage that can be shared by the node-webkit app and chrome, this would resolve the issue as well.

The drawbacks of approach #1 is that any future updates to the app requires a manual download/installation by the user- and we all know how much we love manually updating files. #2 seems kinda redundant in network resource efficiency, however it may be the most viable right now. Other tools I’ll need to investigate include web components and chrome extensions. #4 I just thought of and will have to do more investigative work.

Last but not least

In a sense, Dota-Sight started from a 2 day internal hackathon we had a Fullstack Academy. It actually contains the least modular code I’ve ever written. Before I attempt to try the above technologies to further improve Dota-Sight, I will refactor my working legacy code to adhere to JS best practices. I plan to utilize Require.JS instead of using Node.js’ CommonJS-style dependency management for this project. Another post for the before/after!


Overview

“Practical Node.js” is a well written book by Azat Mardan. It starts chap1 with the basics of Node.js and useful utilities. Chap2 introduces building express apps with the built in generator and common middleware, and without the generator. The third chapter talks about various TDD/BDD frameworks such as Mocha, Assert.js, and Expect.js. Chap4 was about server side rendering with jade and handlebars. Chap5 was an intro to persistence with MongoDB and Mongoskin. Chap6 was using sessions and oauth to authorize/authenticate users. Chap7 is using the Mongoose ORM library. Chap8 is building RESTful API servers with express/hapi. Chap 9 talks about websocket, socket.io, and derbyJS. Chap10 was preparing apps to become production ready. Chap11 is about dpeloying apps. Chap 12 is about contributing to open source.

While alot of the book for me personally was review, I think this is a wonderful book for people looking to know more about fullstack Javascript.

Mongoose, Mongoskin, and comparing libraries

Having familiarized myself with the Mongoose ORM library before spending much time with the native MongoDB driver and Mongoskin, I greatly appreciate the elegance the ORM library brought. That said, when comparing libraries of a similar nature it is important to understand the syntax. For example, I think of Mongoose collection representations as Messages.findOne({}), while the raw MongoDB driver example was db.collection('messages').findOne({}). The first is clearly less verbose than the later, however if for the later we set an alias of var Message = db.collection('messages'), that would have brought the different libraries to a similar conciseness under the context.

Of course, Mongoose isn’t an ORM layer due to its conciseness in syntax. NoSQL databases don’t have schemas themselves. By using an ORM layer on the Node.js instances, we shift the database side validations/functions to the Node.js instances, levaraging the scalability of Node.js apps and reducing the need of database/server instances to handle a similar amount of database transactions. This is important, as scaling the Z-axis on the scalability scale is the most expensive.

Websockets, socket.io, and other frameworks

The chapter about real time apps using native html5 web sockets, socket.io, and a framework such as derbyJS was interesting. Socket.io is a great library as it has multiple fallback methods for browser compatability. I have to admit, I got hooked to the game agar.io for a few days. When I witnessed the Friday night lag, all I could think of was how scalable was the app given decent FPS? How much computation is handled client side vs server side? Quick research suggested vector data is sufficient (direction + speed sent on change) vs actual location (constant polling required). I considered creating a clone of agar.io for fun, however a quick search on agar.io on github found many results, with a popular clone repository with many people working together to improve the clone. A specific open issue that interested me was the replacement of socket.io. There were discussions on which library/technology should be used as the replacement, and an article about optimizing websocket bandwith popped up.

The gist was, socket.io converts items to JSON format, while different libraries sending simple binary data could be potentially half the message length. When the network bandwidths are the bottleneck, it makes sense to use socket technology with a smaller footprint. The concensus in the discussion thread seemed to try the ws library. I really loved reading people’s comments and thoughts on a technical challenge I was interested into as well.

Frameworks like DerbyJS and Meteor.js have interesting built in capabilities, and Firebase is also nice for syncing databases. Many different tools for different cases.

Getting Node.js App Production Ready

This was a great and practical chapter:

  1. Setting environmental variables on any node process
  2. Conditional error handling in Express.js based on NODE_ENV
  3. How to share the in memory store like Redis on different Node.js processes/servers
  4. Setting socket.io to production
  5. Leveraging hipchat/twilio/sendgrid etc to send notifications on server errors
  6. Using the ‘domain’’ library to trace errors
  7. Multithreading with Cluster and Cluster2 (ebay’s production library)
  8. Creating an api route that functions as a dashboard of the process
  9. Different processes with Grunt
  10. Continuous Integration with cloud testing with TravisCI and codeship

Conclusion

This was a wonderful book to help build real-world scalable web apps with best practices and scalability in mind. I really enjoyed the variety of topics covered in this book, along with the up to date best practices of the respective best in class Node.js specific libraries. The other wonderful book I’m still reading is “Node.js Design Patterns” has some more in depth examples and use cases, however is more difficult to digest at once due to lots of theory + depth. With these 2 books, and the myriad of modules available in the npm ecosystem, an informed engineer can do so much to leverage the advantages of Node.js.


Chapter 7- Scalability and Architectural Patterns

Summary- Reasons to scale includes workload, availability, and tolerance to failures. The Scaling Cube has an X Axis of cloning, Y Axis of decomposing by service, and z axis of splitting by data pattern. Z axis is usually done after X and Y are done- through horizongtal partitioning / sharding.

Cloning and Load Balancing

Since Node.js processes are single-threaded and have a memory limit of 1.7GB on 64 bit machines, scalability is usually considered earlier on by Node.js Engineers.

A simpler implementation of cloning starts with the built in Node.js cluster module. The default after Node.js 0.11.2 leverages a round robin algorithm that assigns processes one by one. Using plain old JavaScript, it is possible to increase availability through reliency by restarting cluster workers upon error/existing. We can also write custom functions that restart workers one by one to have zero-downtime or performs load balancing. By default, additional threads on the same computer share the same port / memory.

Dealing with Stateful communications

When we have several application instances, how do we maintain stateful communications between a client? One method is to share the state across a database or leverage Redis/Memcached. Another is by sticky load balancing- once a session starts, that session is assigned to the same instance determined by an algorithm (based on location, username, IP etc).

Alternatively, if we have several machines running multiple instances, and then use a reverse proxy to provide one single entry point for external communications. Popular reverse proxies such as Nginx not only act as a reverse proxy, but can also support load balancing, stick yload balancing, route to any available server regardless of language, URL rewrites, caching, and serving static files.

Load Balancing and Scaling

forever and pm2 are popular Node.js modules that can keep Node.js processes alive by auto restarting. When a Service Registry is implemented, load balancing between different microservices and dynamically scaling the microservices instances becomes possible. Alternatively, peer-to-peer balancing is sometimes used when we don’t need a reverse proxy to hide the complexity of the app and all server instances are known (such as internal systems where requirements are pretty much static).

### Application Architecture

A monolithic architecture is when every module is part of the same codebase and runs part of a single application. We can create microservice architecture by breaking down the services each connected to separate databse instances (data ownership of each service). If one service crashes, some of the other services would still continue to work as normal. Also, if we decided to rebuild different services from scratch, it is much easier to do so for a microservice. If the services are accessed through a similar protocol (like a RESTful web service), the services can be programming language, database, infrastructure agnostic instances, allowing for the best tools for the job.

Challenges of the above include higher complexity in integration, deployment, and code sharing. Deploying, Scaling, and Monitoring the separate services become more difficult. Many DevOp and Cloud services help mitigate the problems, but Seneca and nscale are popular Node.js solutions specialized for this purpose.

Integration Patterns

API Proxy- a server that basically acts like a reverse proxy and a load balancer for different services.

API Orchestration Layer - an abstraction layer that takes generically-modeled data elements and/or features and prepares them in a more specific way for a targeted developer or application. For example, a “completeCheckout” Event in API Orchestration could use the payment service, empty the cart service, and update product qty. While this is easy to design/scale, it also creates an anti-pattern called God Object, which knows and does too much.

Message Broker - A publish/subscribe pattern, when events in one service are emit to the broker, and the event is then broadcasted to other services to perform the appropriate transactions. An interesting follow-up is how to handle load balancing with such event emitters, and that’s the topic of chapter 8


At scotch.io I found an intro to Meteor.js tutorial through creating a Slack clone. There were some unclear instructions in where code was to be placed etc, but it was decently easy to debug when you cross reference the comments and source code. Links to my repository and live site.

Syntax stuff aside, there were a few interesting differences compared to the MEAN stack:

  1. Meteor.js is both server side code + client side code
  2. In Node I’m used to declaring file depencies, while Meteor.js loads many files automatically (and alphabetically), making naming very important for non-special directories (client/, server/, public/, private/, test/)

Gems

Meteor.js has its own ‘gems’ packages ecosystem. By typing meteor add accounts-base accounts-password accounts-ui, by default it inserted a User collection in Mongo, and you just need to add to show the login functionality.

Setting up github oauth as was simple as:

meteor add accounts-github

Configuration of github-oauth (yes, a GUI!):

configuring github settings

And that’s it!

Built-in Stuff

Reactivity- detecting changes in a data source and then recomputing the values, publishing them to client side automatically

Minimongo- a local version of mongod that works offline, integrated with the mongo server

Latency Compensation- Updates UI without server confirmation, changes UI only if needed when server responds

Auto pub-sub to all clients (for rapid prototyping, authentication is very important step)

Simple deployment to Meteor.com through meteor deploy app-name.meteor.com

Conclusion

Disclaimer: Since I was just using Meteor.js through a tutorial designed to highlight the benefits of Meteor.js, I can’t really speak to anything conclusive. Here’s what I experienced:

Pro’s of Meteor.js: Built-in real time syncing and allowing offline usage/latency compensation through minimongo was cool. Easy integration with MongoDB and creating collections on the fly, also easy to seed database.

Confusing parts of Meteor.js: Requires alot of discipline in naming files for load order (compared to explicity dependency injection in AngularJS), differentiating between server side, client side, and shared code.

While I am more used to Angular’s dependency injection and relying on only RESTful API’s as connections to the server, Meteor.js has a healthy-ecosystem, and really amazed me in how fast potentially it took to prototype a real-time app with client/server-side database caching. I am interested though in also creating an Angular version of a slack clone that offers reactivity, latency compensation, and pub-sub to get a feel for it.


Thoughts

tl;dr: This is a wonderful book so far. In my quest to further my knowledge in the MEAN stack, I chose a book called “Node.js Design Patterns” by Mario Casciaro. I have many books that talks about JavaScript behavior, but this book ties design patterns to JavaScript and Node.js very well. What I felt for “JavaScript Patterns” did for me for Javascript, I feel this book does it doublely so for Node.js.

Chapter 1- Node.js Design Fundamentals

Node.js was not designed to solve a scalability problem, but to solve threads being idle while waiting for I/O. Its strength is passing data to many clients. In a recent interview with Netflix, it is mentioned that for the same function (I believe for the frontend Netflix), switching to Node.js from Java reduced the requirements for Amazon EC2 resoures by 75%. This makes sense as in the front end, you are probably just looking up (in a database or internal API) recommendations for a specific user. On the other hand, it also exemplifies using the right tool for the job.

  1. Since Javascript utilzies the synchronous event demultiplexer mechanism for its nature (and event loop), using synchronous code excessively would defeat the many benefits of Node.js.
  2. The reactor pattern, libuv (a C library that makes Node.js compatible with all the major platforms for the Event Demultipleer), V8 JavaScript engine, node-core (high level Node.js API), and code connecting libuv to JavaScript are the building blocks of Node.js
  3. Realization- callbacks (defined as “function passed as an argument to another function B and is invoked when B is done “) work naturally in Node.js due to implicit closures in JavaScript. (And callbacks is called “continuation-passing style” in functional programming).
  4. Throwing in async callbacks will cause an exception to jump to the event loop –> use try-catch blocks (page 29 for example)
  5. Continuing after uncaught exceptions may leave the application in an inconsistent state, hence the importance of having solid tests in Node.js
  6. Node.js modules are built with the CommonJS modules specifications
  7. Observer pattern and Event Emitters vs Callbacks

Chapter 2- Asynchronous Control Flow Patterns

In Chapter 2, the author starts with callback hell with a particular focus on the readability of the code. As some say 50-70% of engineer time is spent on reading code, writing readable good is very important. Instead of directly diving into promises and async libraries (and ES6 promises and factories are discussed later), the author first refactors the same code using plain old ES5 javascript through best practices: reducing if/else by returning callback(err) and factoring out reusable pieces.