Git hangs while unpacking objects (Windows)

I’m not sure if this is because we’re behind a proxy, the network has issues or my work laptop isn’t great, but for some reason the git clones very often hang during the unpacking of objects.

remote: Counting objects: 21, done.
remote: Total 21 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (21/21), done.

There is a way to recover this, if you Ctrl+C to exit the git command then cd into the folder cloned into.

git fsck

notice: HEAD points to the unborn branch (master)
Checking object directories: 100% (256/256), done.
notice: No default references
dangling commit: 0a343894574c872348974a89347c387324324

The bit we’re interested in is the dangling commit, if we merge this commit manually all will be fine

git merge 0a343894574c872348974a89347c387324324

Job done, you should now have the completed clone.


Adventures with Spark, part one

For 18 months I’ve been working with Hadoop. Initially it was Hortonworks HDP on Windows then Hortonworks HDP on CentOS and for production we settled on Cloudera CDH5 on Red Hat. Recently we’ve been introduced to Spark and subsequently Scala which I am now in the process of skilling up on, the plan is to blog as I learn.

For the first entries I imagine it won’t be much more than the basic tutorial you could read elsewhere, however the plan is to get more detailed as I learn more.

I can’t introduce Scala better than Scala School so its worth taking a look at that.

I am going to use JetBrains IntelliJ IDEA for developing fuller applications, however for playing and learning you can download Spark for Hadoop in TAR format from the Spark Download Page and use the Spark shell.

For now I just extracted it to a folder in Downloads;

To start the Spark shell

\$ cd ~/Downloads/spark-1.0.2-bin-hadoop2/bin
./spark-shell

One of the key parts to Spark is the SparkContext which if you’ve done mapreduce seems to be similar to the JobConf. The SparkContext has all the required information about where to run the work and application details for view in the Spark UI web page.

In the spark shell you can use the SparkContext sc

scala> val sentence = "The quick brown fox jumps over the lazy dog"
scala> val words = sc.parallelize(sentence)
scala> words.count() // should return 9  
scala> words.filter(\_.toLowerCase() == "the").count() // should return 2

All this is doing is creating a string, splitting it into words and creating a Spark RDD with it. We can use the Action count() to find out how many words there are and we can use the filter() to create a new RDD with filtered results (in this case, filter to the word ‘the’)


Update Wallpaper from Bing (OSX)

I’m not a huge fan of Bing search engine, I’ve tried to use it but I don’t like the format of the search results and I don’t think it’s particularly good at finding relevant results either.

I do like Bing wallpapers, and I use Bing Desktop on my Windows laptop to update my desktop to Bings daily wallpaper.

Now that I’ve moved to a Mac I still want to get the picture, but the application is Windows only - so the script below will do the job for you. I’ve set it to download to the users picture folder ~/Pictures/bing-wallpapers just using the current date for the filename.

import urllib2
import json
from os.path import expanduser

response = urllib2.urlopen("http://www.bing.com/HPImageArchive.aspx?format=js&idx=0&n=1&mkt=en-US")
obj = json.load(response)

url = (obj['images'][0]['urlbase'])
name = (obj['images'][0]['fullstartdate'])
url = 'http://www.bing.com' + url + '\_1920x1080.jpg'
home = expanduser('~')
path = home +'/Pictures/bing-wallpapers/'+name+'.jpg'
print ("Downloading %s to %s" % (url, path))
f = open(path, 'w')
pic = urllib2.urlopen(url)
f.write(pic.read())

To run on a schedule, set up a cron job to run the script at 10am using crontab -e and add the line

0 10 \* \* \* python ~/Pictures/wallpaper.py

Creating environment variables from the command line (Windows)

I know that it is incredibly lazy and a non problem but I find it quite tedious in Windows 8 to go digging for the system environment variable GUI whenever I need to add or update something.

Generally I’m already in the command prompt so I was keen to find a way to create them from there without having to go into search for it each time.

Since Windows XP, setx has been available as an extra download, and more recently it’s included in Windows out of the box - this is the command that I wanted.

To create a persistent STORM_HOME environment variable, use the following command. The /M sets it as a system variable rather than the default user variable.

setx /M STORM_HOME d:\storm-latest

There are a number of other options, do setx /? to see them.


SFTP Connection Closed - password expired?

I’ve just had an interesting problem with an SFTP account that suddenly stopped working from a cron job. When the account was used directly from the bash prompt the response was simply Connection closed immediately.

Every thing was set up correctly as far as authorized_keys and /etc/ssh/sshd_config looked fine but the account wouldn’t connect.

As I don’t have the private key for the user that was connecting to the server, I created a new key pair and added the public key to the authorized_keys file in the users .ssh folder. Using the following command I got a response that was no more helpful;

psftp logdrop@10.0.0.1 -i logdrop_pk.ppk

This gave the response;

FATAL: Recieved unexpected end-of-file from SFTP Server

After a couple more checks on the server and no success I decided to try

putty logdrop@10.0.0.1 -i logdrop_pk.ppk

This opened putty which connected but reported that the password for the user had expired and needed a new one. From here I was able to reset the password then go on the server and set

passwd -x 0 logdrop

to ensure that the aging was deactivated.

So an interesting issue for a Friday afternoon.