Andrija's Blog

Functional design

Read Multiple Files in node.js

I had a requirement to read multiple files in one go in Node.js project.

Requirement:

  • Read UTF8 files asynchronously.
  • Notify when each file read is done.
  • Notify when reading all files is done.

I have found few examples on Stackoverflow:

In both cases answer was to use either async library or to just use native node.js functions.

I don’t like using external library and I am trying to use native/vanilla as much as possible. If there is one small part of external/third-party library that I have to use, I’d rather write my own. For example, if I have to get reference to DOM element by id on web page, I’d avoid using jQuery. If I would have to heavily traverse the DOM then I would definitely use jQuery.

So, I will skip using async library and use only native functions.

Here is complete code that we will go through:

Latest version of file is in repository on Github: multiread.js

multiread.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
   (function () {
      "use strict";
  
      var fs = require('fs'),
          readline = require('readline'),
          filePathList = [], i, ii,
  
          toArray = function () { return Array.prototype.slice.call(arguments[0]); },
  
          rl = readline.createInterface({
              input: process.stdin,
              output: process.stdout
          });
  
      // validate call, must contain at least 3 arguments
      if (process.argv.length < 3) {
          wl("Usage: node multiread.js [file paths to read]");
          process.exit(0);
      }
  
      // start from 3rd argument, add them to filePathList
      for(i = 2, ii = process.argv.length; i < ii; i++) {
          filePathList.push(getActualFilePath(process.argv[i]));
      }
  
      // check file path
      function getActualFilePath(filePath) {
          var relative;
  
          // check absolute path
          if(fs.existsSync(filePath)) {
              return filePath;
          }
  
          // get absolute path from relative path
          relative = [__dirname, filePath].join("/");
  
          // check relative path
          if(fs.existsSync(relative)) {
              return relative;
          }
  
          throw new Error("File " + filePath + " not found");
      }
  
      /**
      * @param execFunc {Function} Function that will be called for each argument set in @args array.
      * @param args {Object[]} Array where each item is array objects which will be used in each call.
      * @param eachCallback {Function} Callback after each execution.
      * @param callback {Function} Callback when all arguments are processed.
      */
      function batch(execFunc, args, eachCallback, callback) {
          var index = -1, cb, iterate, results= [];
  
          cb = function () {
              var callResult = toArray(arguments),
                  isLastItem = index == args.length - 1;
  
              // put results from callback to results list for later processing
              // results list is passed into final callback function
              results.push(callResult);
  
              // notify that current call is done
              eachCallback.apply(null, callResult);
  
              if(isLastItem) {
                  // if it is last item in args array, call final callback
                  callback(results);
              } else {
                  // continue iteration through args
                  iterate();
              }
          };
  
          iterate = function () {
              index++;
              var i = index,
                  argsArray = args[i] || [];
  
              // 'argsArray' collection was created in 'batchRead' method and it contains all arguments needed to invoke a function
              // here we are adding last argument in collection which is callback function 'cb' which is scoped inside parent function
              // inside 'cb' function iterate function will be called again until all arguments are not processed.
              argsArray.push(cb);
              execFunc.apply(this, argsArray);
          };
  
          // first iteration call
          iterate();
      };
  
      /**
      * @param files {string[]} File path list.
      * @param eachCallback {Function} Callback after each execution.
      * @param callback {Function} Callback when all arguments are processed.
      */
      function batchRead(files, eachCallback, callback) {
          var encoding = 'utf8',
              args = [];
  
          // build args array
          files.forEach(function(file) {
              args.push([file, encoding]);
          });
  
          batch(fs.readFile, args, eachCallback, callback);
      }
  
      batchRead(filePathList,
          // callback after each file read
          function(err, text) {
              console.log("File read done. Text: " + text);
          },
  
          // callback when everything is done
          function(result) {
              var insertTextArr = [];
  
              result.forEach(function(i) {
                  insertTextArr.push(i[1]);
              });
  
              console.log("");
              console.log("All:");
  
              console.log(insertTextArr.join("\n"));
          });
  
      // wait in console
      rl.question("", function () { rl.close(); });
  })();

TL;DR

batch function

The idea behind batch function was to have generic function that will execute array of asynchronous functions and wait for them to finish, something like what async doing, as long as signature of function is function([parameters], callback).

Parameter description:

  1. execFunc – function that is suppose to be executed for each argument set. In this case execFunc is fs.readFile.
  2. args – array of ‘argument sets’ for each execution => ‘argument set’ is array of arguments without callback. In this case, set contains file path and encoding.
  3. eachCallback – callback for each execFunc call.
  4. callback – callback when all is done.

Inside batch function there are two functions: cb and iterate.

iterate function does actual call to execFunc. It will add cb function to collection arguments at last position to act as actual callback of execFunc. This way we control each callback of execFunc and notify outside world when we want that it’s all done. Also, this method will queue execution of asynchronous functions i.e. making them execute sequentially.

cb function is a execFunc callback. As arguments of function are actual result of execFunc execution, this is the place where we will catch it.
In batch function level, there is results array that will hold all results returned from execFunc executions. This array will be an argument for final callback call. results array is populated here on each call.
Then, we will call eachCallback function to notify outside world that execution of execFunc is done.
Then, we will check if this was a last execFunc call. If yes, call final callback sending all results as an argument. If no, continue iteration.

batchRead function

batchRead function wrapping a call to batch function to simplify call to generic batch function.

Parameter description:

  1. files – array of file paths to be read.
  2. eachCallback – callback for each fs.readFile call.
  3. callback – callback when all is done.

As batch function is accepting object[object[]] array for arguments, we need to repack file paths array to new array. Also, in each of these arrays, we will add additional argument => utf8.
Then, we are calling batch function with all appropriate arguments.

batchRead call

At the end, we will call batchRead. eachCallback should have same signature as you would use when calling fs.readFile.

Summary

If you have to call same function with different arguments several times, queue each call and have final callback, you can use batch function. batchRead shows how batch function can be utilized by reading multiple files sequentially and notifying the application when all is done.

HTH

Javascript Breakpoint

This post will describe how to set breakpoint in before any javascript method call from console.

In post:

TL;DR

Background #

To make a long story short, I was working on a big single page web application a year ago, with huge javascript code base. Whenever I wanted to set a breakpoint in editor/dev tools/firebug, it took time to find the function. Either because it took time for source viewer to load (because of huge javascript files), or to locate in which file is function of interest.

So I have started inserting debugger; directive to my scripts when I need to break. The problem was, then I wanted to remove the directive, I had find the directive, remove it or comment it out, and to reload the page, but as we were working on SPA, I basically had to restart my debugging sessions. I also tried to make a global flag, and based on flag to set enable the debugger:

1
2
3
4
5
6
7
var IS_DEBUGGER_ENABLED = false;
...
function breakpoint() {
  if(IS_DEBUGGER_ENABLED) {
    debugger;
  }
}

Whenever I wanted to turn debugger on, I could just set in console:

1
IS_DEBUGGER_ENABLED = true

Problem was I did not really like to have this function call all over the code. Also, I wanted to have control where will code break, so I was then forced to go back and remove breakpoint() calls.

Implementation #

The goal is to make a function that can insert a breakpoint before the call of any function we are interested in, which is accessible from global object.

Function needs to do following:

  • Save reference to original function
  • Override original function with wrapper.
  • Option to restore original function and remove breakpoint.

Let’s say we have following object structure:

1
2
3
4
5
6
7
var foo = {};

foo.bar = {};

foo.bar.func = function () {
  console.log('test');
};

Path to func function is through foo.bar.func. This is usually a case in huge javascript libraries such as Ext.js (Sencha), YUI etc.

To be able to set breakpoint, I would need to set debugger; right before foo.bar.func call.

So, I thought it would be nice to wrap foo.bar.func function in new function:

1
2
3
4
function wrapper() {
  debugger;
  foo.bar.func();
}

But wrapper also needs to be called foo.bar.func, because if function is called it has to actually call wrapper. But if wrapper is set to foo.bar.func, it means it will override the original function, so we need save a reference to original function somewhere.

How to save a reference to a function?
I could pass into wrapper a direct reference to function and assign it to variable, but instead I am passing in a function name. The reason for this is because I need to access a function dynamically and also I will need a key to reference back to original function. You will see what I mean as I go further in post. I might change this in the future and be able to pass direct reference to function in wrapper. Ok, as I have only function name, only way I found (at the moment) to call a function was by compiling a new Function object that calls a function. You could easily call a global function by name through window object (in browser):

1
window['function_name'](parameters);

This works if function is on global object, but if function in inside another object, this will not work. So, to invoke foo.bar.func:

1
2
3
4
var func_name = 'foo.bar.func';

// Compile a new Function that is returning a reference to function and execute it.
var func_reference = (new Function("return " + func_name + ";"))();

Now we have original function reference assigned to variable and we can override function with wrapper. Wrapper should look like this:

1
2
3
4
5
6
7
8
9
10
11
12
function wrapper() {
  // Convert arguments to array
  var args = Array.prototype.slice.call(arguments);

  // Break before original function call
  debugger;

  // Call a function.
  // 'this' should be the same as it would be in original function since wrapper replaced original function.
  // We will always return result from original function.
  return original_function.apply(this, args);
}

Override function:

1
var override = new Function("overrideFunc", funcName + " = overrideFunc;");

“Break” function #

Finally, “break” function:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
var __break = (function() {
  "use strict";

  // as we need to keep original functions, we will use clousers to keep a original functions here.
  var cache = {},
      cArray = function(args) { return Array.prototype.slice.call(args); },
      concat = function() { return cArray(arguments).join(""); };

  // funcName - name of the function, full name accessable from global object.
  // removeBreakpoint - use it only when you want to remove a breakpoint, set it to any truthful value.
  return function(funcName, removeBreakpoint) {

      // get reference to original function
      var original = (new Function(concat("return ", funcName, ";")))(),
          // compile override function
          override = new Function("overrideFunc", concat(funcName , " = overrideFunc;"));

      // check removeBreakpoint flag if it is true.
      if(!!removeBreakpoint) {
          // restore original function
          override(cache[funcName]);
          // remove from cached collection
          delete cache[funcName];
          return;
      }

      // if function is already overriden, exit
      if(!!cache[funcName]) {
          return;
      }

      // cache original function
      cache[funcName] = original;

      // override original function
      override(function () {
          var args = cArray(arguments);
          debugger;
          return original.apply(this, args);
      });
  };
}());

Usage #

To set the breakpoint:

1
__break('foo.bar.func');

To remove the breakpoint:

1
__break('foo.bar.func', true);

Potentially, we could add this function to console object:

1
2
3
if(!console.break) {
  console.break = __break;
}

Repository is at: https://github.com/andrijac/break-js

HTH

Octopress Publishing Environment in the Cloud

I was looking for a blog platform:

  • Good support for developers (sharing code snippets and github gists).
  • Highly customizable layout – I don’t have to depend on decision of some third party whether they will add a feature that I need
  • Ability to back up blog content.
  • Independency of hosting platform.
  • Ability to recreate a blog anywhere anytime.
  • Keeping a track of changes.

Octopress#

It sounds too good to be true, but the thing is you can find all these features in Octopress.

octopress.org :

Octopress is a framework designed by Brandon Mathis for Jekyll, the blog aware static site generator powering Github Pages

Some features of Octopress mentioned here also apply to Jekyll.

Good support for developers

In Octopress, there is great support for embedding code in your blog.

Highly customizable layout

You can customize anything on the blog. Big advantage is that there is no admin user interface that will interfere of changing anything on blog. Some people would consider this as a disadvantage, but I think that to have highly generic interface with ability to change anything in website is very hard to make and to maintain. If you want to add a feature, it would require you to have user interface to support it, which also mean additional user interface to maintain along with this feature. By not having any user interface and counting on user to change the blog manually, the possibilities are endless.
Of course this assumes that you know what you are doing which means that Octopress is not for anyone.
Read more on this topic in The best interface is no interface.

Ability to back up blog content

You are writing Octopress articles in plain text Markdown so there is no database to be backed up, there is no special data store to extract data from, and you just back up the files. Ever since Markdown became popular through Stackoverflow.com and Github, it is becoming more and more accepted in developer community as a common formatting syntax for documentation.

Independency of hosting platform

Final output of Octopress is static html-css-javascript, which means it can be hosted anywhere.

Ability to recreate a blog anywhere anytime

You are not depended on proprietary software or some closed platform to recreate the output of your blog, you can do that anytime you want.

Keeping a track of changes

To start using Octopress, you need to fork it on Github, which means you will immediately have Git repository for Octopress. Anything you change or publish can be committed in Git. Is there any better way to track changes in blog? I do not think so.

Nitrous.IO#

I found about Nitrous.IO on Joe Marini’s post Tools for Developing on ChromeOS. It seemed very interesting because it was a remote view on VM in browser, rendered in HTML. And as we know once rendered in HTML it can be displayed and worked with anywhere, no plugins or third-party software required. I gave it a try and created Ruby box. What I got is Linux VM running:

1
2
lsb_release -a
# Ubuntu 12.04.3 LTS.

Once I have setup the box, I went on to bonus page and connected all of my online identities to Nitrous.IO to get additional N2O (N2O is like are Nitrous.IO currency for upgrading your box).

Octopress + Nitrous.IO#

I am switching workstations and environments a lot. On the other hand, Octopress requires some stuff to be preinstalled in your local environment. I also switch OSs so having to setup environment on Windows all the time can be time consuming. I figured I could have an always Octopress ready environment on Nitrous.IO. This is only way to have Octopress always available. The trend how cloud computing is progressing, I think that personal workstation on remote VM will be more and more practice.

Setting up#

In summary

These are the steps that we will need to go through to setup a blog:

  • Register on Github
  • Register on Nitrous.IO
  • Change a Ruby version
  • Clone and setup Octopress

I have chosen to host the blog on Github, which means that address will be [username].github.io.

I am hosting this blog on Github and I will describe here how to host it there. You do have other options available, and since blog is generated in plain html-css-javascript, it can be hosted anywhere.

Register on Nitrous.IO and create a Ruby box. I am still not sure what are the differences between other boxes, for example I have node.js installed on Ruby box so I’m not sure what is special about node.js box.

Box will be created in a few seconds and ready to roll. You will see workspace folder in folder tree on a side.

Further, I have used Octopress setup.

Git should be already installed, but you can check:

1
2
git --version
# git version 1.8.4.3

Check Ruby version:

1
2
ruby --version
# ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux]

Ruby version required for Octopress is 1.9.3. So, we need to change a Ruby version. I used rbenv:

1
2
3
4
5
git clone https://github.com/sstephenson/rbenv.git ~/.rbenv
echo 'export PATH="$HOME/.rbenv/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(rbenv init -)"' >> ~/.bashrc
git clone git://github.com/sstephenson/ruby-build.git ~/.rbenv/plugins/ruby-build
source ~/.bashrc

At this point, you might need to restart a console. Opening new console should do it. Install Ruby 1.9.3:

1
rbenv install 1.9.3-p0

Clone Octopress from Github. I created a folder inside workspace folder:

1
2
git clone git://github.com/imathis/octopress.git ~/workspace/octopress
cd ~/workspace/octopress

Setup use of Ruby 1.9.3 in local folder (your path should be now ~/workspace/octopress):

1
2
rbenv local 1.9.3-p0
rbenv rehash

Install dependencies:

1
2
3
gem install bundler
rbenv rehash
bundle install

Install default Octopress theme:

1
rake install

Deploying to Github Pages #

As I said before, if you are planning to deploy to Github pages, your address will be http://username.github.io. You need to create a repository which will be called username.github.io. Here are detail instructions, but here I will give summary of all the steps you need to do.

Run:

1
rake setup_github_pages

It will ask you for repository url, which it supposed to be https://github.com/username/username.github.io.git. Github recommends HTTPS over SSH, why?

Make sure are have Github email and username configured:

1
2
git config --global user.email "your email"
git config --global user.name "username"

Next, push the current source to source branch:

1
2
3
git add .
git commit -m "commit message"
git push origin source

Next, generate and deploy.

1
2
rake generate
rake deploy

To clarify, you will have 2 branches in repository:

  • source branch holding the source of the blog, which you are using for generation.
  • master branch which will be static generated content. Github pages are configured so anything that appears in repository called username.github.io in master branch will appear on your Github page.

rake generate will generate static content locally in _deploy/ folder. rake deploy will push the changes to master branch.

At this point, you are done. It will take few minutes for Github to publish the page for the first time, later on it will be pushed much quicker.

Blogging#

Helpful links:

Some quick notes:

  • Configuration file is _config.yml. You can configure all global settings there like title, description, add Github account, Disqus account etc.
  • Create a new post: rake new_post["post title"] this will create a new markdown document in source/_posts.
  • Preview the blog:
1
2
3
rake generate   # Generates posts and pages into the public directory
rake watch      # Watches source/ and sass/ for changes and regenerates
rake preview    # Watches, and mounts a webserver at http://localhost:4000

You will be able to see your blog on port 4000. In Nitrous.IO, go to menu Preview > Port 4000 and it will open a blog preview in new tab

  • To commit changes to Git, use the same git commands like in initial commit.
  • Once you are ready to publish, use the same rake commands for publish like you did initially.

HTH

Edit 2013-01-03:

It has come to my attention that rvm is already installed on VM, which is also used to change ruby version in environment. By using rvm, you can preserve disk space on VM. Documentation for using rvm.

Backup Existing Project to Cloud Source Control

This artical explains how to backup your existing Visual Studio project to Bitbucket using Mercurial. This may as well apply to any other type of project out there, but I might use some Visual Studio specifics in article.

All you need for this is to download Mercurial.

Create repository#

I use Bitbucket because it has free private repositories. It supports both Mercurial and Git. I prefer to use Mercurial on Bitbucket because I have used it from the beginning and find it more intuitive to use. There is great tutorial on how to use Mercurial by Joel Spolsky on hginit.com. I highly recommend starting there.

First, you need to create repository in Bitbucket. When giving a name to repository, try not to use white space because you might have trouble later on when you use command line. Make sure Mercurial is selected as a “Repository type”. Also, make sure “Access level: This is a private repository” is checked. You could create public repository, but I assume you will be backing up the code that you don’t want public to see.

Clone repository#

Let’s say you have folder structure like this:

~/
  |_MyProject
    |_DevelopBranch     

You want to backup all code from DevelopBranch.

Go to command prompt and locate cd ~/MyProject

Execute hg clone https://username@bitbucket.org/username/repository-name to create local repository.
This command will create local folder with repository. Note to replace username with your username on Bitbucket and repository-name with your repository name. When I did this, my repository name was MyProject-Develop and I did not want my folder to be named like this, but I wanted it to remain DevelopBranch. You can keep your folder name by adding folder name at the end of command: hg clone https://username@bitbucket.org/username/repository-name foldter-name. You can create as many respositories as you want, so if you make a mistake, just delete the folder and try again.

If you try to create repository in same folder where your code is already you will notice that cloning will not execute but it will say that folder is not empty. Workaround for this is to make a rename existing folder tempraraly, execute clone command, move .hg folder to original folder, delete clone folder and rename back the original folder :) So, I renamed the folder to DevelopBranch2 and executed clone command. Now, command was successful and I had new folder DevelopBranch which contains only one folder inside .hg. I copied the .hg folder to DevelopBranch2, deleted DevelopBranch and renamed DevelopBranch2 back to DevelopBranch. Simple, right? :)

Add files to repository#

Create .hgignore file from https://gist.github.com/andrijac/4027502 and place it in DevelopBranch folder. This will exclude files from eyes of source control that usually are compiled or outputted by build process, very similar like how TFS is ignoring specific files from your source folder (like .dll, .exe, /obj etc.).

In command line, go to cd ~/MyProject/DevelopBranch. All further commands will be executed from this path.

Run the command hg status in your DevelopBranch to see what has Mercurial detected for adding into source control. If the list is too big (it usually is when adding big project), use hg status > output.txt && output.txt to see all the files added. You will see that files have ? status on the left side of file path indicating that files are detected in target folder but are not part of repository yet.

Execute hg add to add all the files to repository.

Now when you run hg status you should get A status on the left side of file name indicating that files is added.

Next, commit the changes in repository:
hg commit -m "initial commit" -u username

Next, execute hg push which will push (upload) repository changes to Bitbucket repository. You will be asked to enter password for authentication. It might take some time to upload all the changes depending on your project size.

Maintaining the future changes#

And you are done. Next time you want to upload new changes in project:

hg status
hg addremove
hg commit -m "commit 2013-12-01" -u username
hg push

HTH