Category Archives: General

Ocaml Packaging: Down the Rabbit Hole

It probably doesn’t come as a surprise that I am a big fan of the Ocaml language. I’m using it for most of my scientific research as can be seen on Github. When I started out working with Ocaml back in the old days, there was not much more than the Ocaml compiler. This required you to write extensive and essentially – what I like to call write-only – Makefiles like this one here.

Also, there was no good way of handling dependencies of packages which required us at the time to use git submodules of some sort without any particular support which didn’t make distribution, dependency management and compilation any easier.

So when Martin Lange and I decided to give PGSolver a major overhaul recently, I decided to see if time has caught up with Ocaml. Boy was I in for an adventure.

First, I wanted to figure out compilation. Instead of writing these extensive Makefiles, there is a tool called Ocamlbuild that can do most of the compilation for you, particularly finding all required source files and compiling and including them in the right order. Now how do you install Ocamlbuild? Looking at their Github page, it seems like you need another tool called OPAM in order to install Ocamlbuild.

OPAM turns out to be a package manager for Ocaml, so it is reasonable to install that.

Just using Ocamlbuild seemed good enough for updating PGSolver, as I didn’t want to change too many things in one go. Since package managers often want your git repositories to be somehow in sync with the package registry, I didn’t wanna mess with the submodules just yet by turning them into packages.

However, there is a single repository, Camldiets, that seems like the perfect test case for figuring out how to create and publish a package. It only has a single module and no dependencies on its own. Should be very straight-forward to create a package for this, right?

Well, not so fast. There also seems to be the so-called OASIS project that, according to their website, allows you to integrate a configure, build and install system in your Ocaml projects. While this is vague enough that I’m not completely sure what that means exactly, it does sound awesome somehow, so what the heck.

So let’s set up a OASIS project from scratch, starting out with our Camldiets repository. After installing OASIS (via OPAM, of course), we let it generate a bunch of boilerplate files by calling oasis quickstart and answering a few questions. When it asks you which plugins you wanna install, you have the choice of installing METAStdFiles and DevFiles. Well, no idea, probably none, so let’s continue. It then asks whether – among other things – I wanna create a library or a source repository. Well, no idea either, as long as my package is properly compiled and linked to a depending project, I don’t care. Let’s try library for now.

This creates an _oasis file that essentially contains all the information you just supplied during quickstart. Next, you’re supposed to call oasis setup. This now generates a whole slew of files. Hard to say whether they should be gitignored or not, but I guess I’ll worry about it later. Then, you’re supposed to call ocaml setup.ml -configure which seems to check all kinds of things. Moving on. Finally, we call ocaml setup.ml -build which seems to use ocamlbuild to build our library. Whatever that means exactly.

By now we have so many generated files in the repository that I’m starting to worry. Some of them must be ignored, I’m sure, particularly the _build folder that is being generated by ocamlbuild. I’m now including the “official” gitignore file for OCaml.

So what did I want to do again? Package everything up in OPAM, right. Does OPAM accept my OASIS-enriched repository? Unfortunately it doesn’t, and it requires it’s own set of proprietary meta data. Great. Luckily, there is another tool called oasis2opam that is supposed to convert your freshly created OASIS project into an OPAM-compatible packaging. So we install oasis2opam via OPAM and run it via oasis2opam –local. It immediately complains about our OASIS configuration leaving much to be desired, so we shamefully try to add some more key-value pairs to it and run it again.

This results in a syntax error, and – as it is common with OCaml – the error message is not particularly useful. It is just telling us that there is a syntax error somewhere in the OASIS file. Sigh. Once this is cleared up after some googling, the OPAM files are being generated.

Let’s see if this at least works locally now. According to the docs, we can do this by calling opam pin add camldiets . -n. Now we want to include the package somewhere else. We change into an empty folder and try to install it via opam install camldiets –verbose, but not so fast: Cannot find file ‘src/META’ for findlib library Camldiets

What? But why? Apparently, there is another tool called findlib that needs its own meta data describing the repository. Christ. Down the rabbit hole once again. Okay. So how do I now get the META file? After doing some research, it seems like adding META to the Plugins section of OASIS might seem to do the trick. We’ll go through the chain of generating commands again, and try to install the package one more time, and now it seems to work. We can even include it in a local test project. Success!

Now how do we get this package into OPAM? Obviously, we need another tool for this, opam-publish, that we install via OPAM. How deep can this hole be? Publishing repositories via OPAM seems to require a couple of steps for sure.

First, tag the repository on Github. Then call opam-publish prepare https://github.com/tcsprojects/camldiets/archive/v0.2.tar.gz which generates an awkward subfolder called Camldiets.0.2. You’re then supposed to submit this folder by calling opam publish submit ./Camldiets.0.2 which in turn asks you for your full Github credentials. Yikes, that’s giving me the chills! According to most sources I was able to find online, this call usually fails in any case. And sure enough it failed. Would have been too easy, wouldn’t it?

The manual way is to fork the full OPAM registry repository, add the generated folder to it and issue a pull request. Jeez. Okay. Once this is done, a variety of continuous integration tests kick in and you’re left waiting for them to finish and for somebody from OPAM to have a look at it. Fingers crossed that this can be pulled in as is. I’ll update this post once the pull request is updated.

Publishing and packaging things with OCaml seems much less straight-forward than most build-and-packaging systems for other languages, but I suppose that’s just the way it is. After all, this is definitely an improvement over the fugly Makefiles and submodules we were using a decade ago. Hopefully, updating packages will be less of a fuss than creating a new one from scratch.

Update: that pretty much worked.

Turning multi-factor-authentication into multi-people-authentication using a Slackbot

Once upon a time, I had a blog… oh, wait, I still have one.

I’m never really finding the time to write. Or, as Calvin put it so eloquently:

There is never enough time to do all the nothing you want.

So let’s get right to it.

Imagine you’re working with an engineering team, and you need to distribute access for manipulating your deployment environment. For security reasons, you don’t want to give everybody who should be able to access it, full access.

But even limiting the access, e.g. by using IAM roles on AWS, doesn’t fully do the trick, because in order for your team to do anything meaningful, they do need reasonably strong permissions, and hence, are able to do reasonable damage.

What if we could at least limit the time they have access to the system? What if they at least need to ask every time they wanna access the system?

Modern authentication systems usually do not support this, but something very similar: two-factor authentication. The idea behind this is that, in order to sign into an account, you need two factors, usually “traditional” account details like username and password plus a so-called one-time token that is being generated by another device like a smartphone.

Let’s quickly look at how these one-time tokens are being generated. In most cases, these are time-based one-time passwords. They are based on a unique string that is being generated upon setting up the account, usually presented as a QR code, that is e.g. scanned and saved by an application on a smartphone. The application then computes, given the unique string s and the current time t (truncated to intervals of e.g. 60 seconds), a deterministic function f(s,t) that usually returns a 6-digit number.

This 6-digit number then has to be provided alongside the traditional account-details to sign in. Since this 6-digit number is only valid for about 60 seconds, it cannot be reused again. Clever.

So how can we turn this multi-factor-authentication into multi-people-authentication? Well, instead of storing the unique string on the team member’s device, we store the string (symmetrically encrypted, if we want to) in a database. Now when the team member wants to sign in, he/she provides the traditional account details and when being asked for a one-time token, the team member queries one of the administrators, and the administrator computes the time-based one-time password for the team member. Boom.

Now while this sounds cool in theory, how can we make sure that this is not an organizational pain in practice? By turning it into a Slack bot. Since your team is already on Slack, why don’t your team members just ask the slack bot to generate a one-time token for them, and the slack bot in turn queries the administrators whether to grant access to the token generation function for that specific account.

Well, here it is, open-sourced on Github. Let me know what you think.

And apologies for being “offline” on this blog for so long.

Right time, right phase

There are different activities that we do throughout the day, every day, that need to get done. Some of them we enjoy more, some of them we enjoy less.

The ones we enjoy less can sometimes turn into a vicious cycle of procrastination and guilt, ultimately resulting in spending even more time in a rather unproductive and unsatisfying manner. We have all been there, and there is no way to avoid it.

One of the things that I enjoy the least is going through emails and answering them. Similarly doing phone calls. Which makes me sound like a sociopath, but that’s not it. Emails, at some point, become a pain, for everybody.

So how can we improve on our email-reply-to-procrastination ratio? I’m sure there are behavioural change systems that finally redirect your procrastination efforts to cleaning your kitchen in chapter 10, but I have found that for me, the ratio really depends on the time of the day. I am surprisingly efficient in going through my emails in the morning (where morning is defined as shortly after getting up) while I tend to suck at it the later it gets.

And this observation holds true for many other things as well. My takeaway is to observe yourself at what time you are your best self for doing the different things you need to do and schedule your day around it.

BetaEdit

Victor Lingenthal and myself have just launched a new toy project called BetaEdit that allows you to rapidly test web software online. It is based on our JavaScript Framework BetaJS (for both client and server) and serves as a running example for that as well. While the current version of the software really only is a starting point, stay tuned for major updates to come soon. We’d love to hear your feedback@betaedit.com.

All new announcements will be made on our BetaEdit blog.

People should not be afraid of their governments. But actually they couldn’t care less.

After all has been said and done over and over, it still took me a fair amount of time to realize what bothered me the most about the NSA affair. Have we been really surprised by what the NSA is doing? Not so much. Is Edward Snowden a hero? Probably, but hero is too much of a militaristic term for my taste. Should we be okay with being spied on? Of course not. Isn’t the data’s content much more important than the meta-data that the NSA is tracking? Not at all. But those who have nothing to hide have nothing to fear? Quite on the contrary.

The idea of spying massively on people is not new. Most classic dystopian stories like 1984 are centered around an omnipresent all-seeing eye kind of state. Even in our own recent history we had governments like the German Democratic Republic in Eastern Germany that massively spied on their people with severe consequences for major parts of the population. So not much about what we’re seeing today is a genuinely new phenomenon.

The graphic novel V for Vendetta also draws the picture of a dystopian state, and the main character that tries to liberate the oppressed people states at some point one of the most iconic sentences about dystopian states: “People should not be afraid of their governments. Governments should be afraid of their people.”

I always liked that sentence, and I still do. Now interestingly, while it should apply to the current situation, it doesn’t. It’s actually the other way around, although the government is spying on the people.

Edward Snowden himself closed with the following words in his famous interview with the Guardian, speaking about what he fears to come out of all of this:

The greatest fear that I have regarding the outcome for America of these disclosures is that nothing will change. People will see in the media all of these disclosures. They’ll know the lengths that the government is going to grant themselves powers unilaterally to create greater control over American society and global society. But they won’t be willing to take the risks necessary to stand up and fight to change things to force their representatives to actually take a stand in their interests.

But people are not only not willing to stand up, they, by and large, couldn’t care less. To me, that’s the most interesting thing about the whole thing. And even I feel fairly detached about the matter. But haven’t I been idealistic once? Haven’t I swore to myself to being one of the first to stand up against any form of state-sanctioned oppression or state-sanctioned undermining of civil rights? And yet, here I am, shrugging my shoulders. Is this what getting old feels like? You give up on your ideals? Maybe. But even if so – sadly enough -, this can only be half the truth.

Because I’m not the only one. There are millions of young people that must have had the same thoughts that I had at that age. But apparently they don’t care either. Is it the openness of the Internet that transformed our innermost perception of privacy and of the importance of privacy? Do we first need to see with our own eyes how our governments might use our own data against us?

I am not trying to be apologetic here. I am just surprised by how less people care, including me.

GitHub

This is just a quick announcement that we have migrated all our projects like PGSolver, MLSolver, FADecider and so on to GitHub. This allows you to get the newest versions of our software easily and to contribute your own code to our codebase.

Assuming that you might be an avid user of Subversion, you might want to read this tutorial which explains Git from a Subversion-point-of-view.

How are your genes?

Remember back in 2000 when the human genome was completely sequenced for the first time? Well, guess what, you can now get your own genome sequenced in three weeks. That might be a little exaggerated, but there are services that sequence parts of your DNA and test it for known diseases, known risks and other facts that you might want to know about. Or not. All it takes is a little bit of your spit.

Most critics say that you might not want to know about all your risks, particularly not about those that you can’t really manage. You’re at risk of having an aneurysm in your brain, but the preventive surgery is even riskier than just waiting and wishing? Would have been better if you hadn’t known about it in the first place, right?

Well, I can definitely see the critics’ point, but personally, I do want to know about all that. Even about the stuff that I can’t change. Call me a control-freak or whatever. But before you’re going to use such services, ask yourself if you really want to know that much about your biological setup.

Note that you should also take the results with a good grain of salt. The results are based on studies and correlation statistics. Many of the studies might be biased, flawed or just to small to make really decisive projections about your health. The service I was using always cited the study or the paper it was referring to, which might helps you to asses the legitimacy of the claimed results. And you should always remember that correlation that does not imply causation. I can’t repeat that sentence often enough.

I was using the service 23andme and it is quite astonishing what they’ve found out about my genome.

They start by telling you stuff that you already know. Guess that should make you feel comfortable that they’re really using your DNA and not just make wild random guesses. So according to them, I have brown eyes, slightly curly hair and I am able to taste bitter tastes. Yep, three of three points. Thumbs up.

They then tell you stuff that you might had always suspected but didn’t know for sure: I am likely lactose tolerant. Sounds about right, but I still don’t like cheese. And then “Subjects who drank coffee consumed a slightly higher amount of coffee per day, on average.” – right you are again! (although “slightly” is a very humble word to describe the amounts of coffee I drink per day)

Then there is also the bad stuff: “If a smoker, likely to smoke more”. I suppose I’m lucky that I’ve never started to smoke. Let’s get back quickly to the good stuff: strongly decreased likelihood of developing Alzheimer’s. I guess if I hadn’t forgotten what that was I could be happy about it…They also tell you an awful lot about your blood groups which might be helpful in case you need to get or give a blood transfusion in the future.

There are a couple of surveys that they encourage you to take once you’ve signed up for the service. They use their users’ answers to find new correlations with their users’ genomes, and some of their findings are already incorporated into the service. They found some genes, for instance, that are correlated with the photic sneeze reflex. They claim that I don’t have it, and they are right again.

Although hardcore quantified self services like 23andme are definitely not for everyone, I highly enjoyed using it. So how are your genes?

Enhanced by Zemanta

Launch of Ziggeo

We have just launched Ziggeo in open beta, a service that allows interviewers to screen candidates by video before actually meeting them. Which should save interviewers a lot of time.

You can use the service for finding roommates, job candidates or even dates (if you’re so adventurous!). Check out our blog to get more ideas about what Ziggeo could be used for.

Since the service is in beta, we would love to hear your comments. Just go there, sign up (it’s free!), and try it out. You can either send us an email or use the Feedback button to let us know what you think.

Modern Living: Information Overload

We live in the era of abundant information – in particular it is the real-time information that competes for our attention. And our peers obviously know that all this information is immediately available to us, so we are supposed to stay informed. It has become common sense to react quickly to new information and to be aware of it.

There are several different kinds of digital information that we are flooded with on an hourly basis.

First, there are social networks like Facebook or Google+ that keep you up to date with your friends’ interactions, activities and thoughts. You might feel like you’re missing out on some important information, so you find yourself checking your wall every now and then. However a lot of time is spent reading through spam postings. Then there is the real-time service Twitter that feeds you with the newest bits of information of people that you follow. Again, you have to spend an awful amount of time reading through the spam tweets.

Second, there are the classical news portals like the New York Times that provide you with serious news about politics, the economy, sports, arts, culture and all that. Since you’re supposed to stay informed, you regularly check the news sites throughout the day to see if something important has happened. Most of the time, however, you spend time reading some lurid or funny articles that you wouldn’t define as being of any importance to you. In fact, you would be happy if you hadn’t spent time reading them, but since you stumbled upon them, you felt the immediate urge to consume them.

Third, there are special interest blogs and sites that feed you with information about particular subjects. Usually, there is only a handful articles that a really worth a read from your point of view, but you still have to check all posts to see what deserves your attention. And again it happens that you get distracted by posts of no particular importance to you.

There are a couple of problems related to the way we digest digital information today. First and foremost, there is the problem of scanning: most of the incoming information bits are not relevant to us. But we have to commit our attention to every single item – at least for a short amount of time – to take the decision whether it is of any relevance to us. In other words, we are filtering through the content.

Second, there is the problem of content aggregation: information bits that would belong to each other coming from different sources are not grouped together. This entails the problems of invalidation, duplication and subsumption. We want to read information bits that belong to one context in one go. We do not want to read information that is not valid anymore or if there is updated information available. We do not want to read the same content twice. We do not want to read an article that is contained in another one (because we then read content twice again).

Third, there is the problem of temporal relevance. On the one hand, we need to decide whether this information piece needs our immediate attention or if we can consume it at a later stage. On the other hand, we need to decide that when we are about to consume an information piece, whether it is relevant anymore. If it’s not, there is no point in still reading it, even if we’ve decided to save it for later (but intentionally not that late) in the first place.

There are a couple of software solutions that try to solve the problem. Google News, for instance, aggregates news and presents them in a uniform way. At least for news, that solves the problem of duplication and aggregation. However, you never know whether the aggregation shows you real duplicates and you don’t really know whether one article is subsumed by another. There is also no way of invalidating news – there is only an implicit invalidation given by temporal distance and number of readers per temporal distance, but that isn’t necessarily the right way of invalidating news.

Other software systems like Flipboard or Prismatic ask you for your interests and then try to compile a personalized dashboard with information pieces published by news sites, blogs and your social networks that you might enjoy. However, the algorithms are not transparent to its users – sure, there is a lot of statistics going on and association rule mining and all that, but that’s not transparent to the layman that just wants to consume information relevant to him – and therefore not successful in solving the problem of filtering accurately. They might filter information out that would have been relevant to you. Even more problematic, they filter information types that are completely new to you, so you couldn’t say whether you’re interested in it or not. In other words, the discovery of new information is not possible with such systems in a satisfactory way.

The problem of temporal relevance is not handled in any useful way by any of these applications.

To my mind, a combination of both software and people could help to ease the problem of information overload. The software would be more of a framework like Wikipedia that allows people to commit their capacity to solve the problem of collective information overload.

First, the problem of filtering could be solved in several ways by involving human people. Instead of letting software filter information streams for you, you could instead subscribe to filtered and aggregated streams that are published by people that you put your trust in. There could be a filtered and aggregated stream by Guy Kawaski that features the important articles from the important tech blogs about the relevant new valley hot shots. There could be a filtered and aggregated stream by Kofi Annan on important articles on international policy. And so on. I would trust experts much more to filter information for me than any algorithm.

Second, the problem of content aggregation and the entailed issues could also be solved by human people. They could mark posts as invalidated, as duplicated, as being a subsumption and so on. They could even digest the most important information that is contained in a lengthy article. In many cases, I would rather subscribe to the digested version of economic news that just outlines the article’s facts about a company.

Third, the problem of temporal relevance can also easily be decided by the people. If there is an article about movie history, there is obviously almost no temporal relevance at all. If there is a feature on the upcoming soccer game, it is only relevant as long as the soccer game hasn’t happened.

All this can be decided by the people. A software system would give people the capabilities to act on the information in such ways and to promote a peer reviewed system like Wikipedia to prevent biases and vandalism. What would happen really is that a small fraction of people devotes their time to filter, tag, digest and select information so that a large fraction of people can save time – because today, all these things are done in parallel by people. And that seems like a waste of time from a society’s point of view.

Modern Living: It’s about time

There is no doubt that we are lacking time. Even our youngest students are cutting down their sleeping time to the viable minimum [Ny Mag]. The market of personal productivity tools is a booming Web 2.0 business. Self-help and self-organization books are ranked top 10 best selling books for a couple of years straight now [Amazon]. The burnout syndrome, a form of depression that occurs when people feel persistently unable to fulfill all demands that life is putting on them, has become one of the most diagnosed psychological illnesses.

Paradoxically on the other hand, we are increasingly procrastinating. There are more people playing computer games for a longer time than ever before [PC Gamer]. Despite all prophets predicting the death of television, it has never been as popular as today [LA Times]. People spend hours repeatedly checking their Facebook wall and all the links and videos that other people share with them. So it seems like there should be enough time after all. What is going on here?

First, I think that there are two main reasons contributing to all kinds of procrastination. Procrastination is giving us short-term incentives and almost instant gratifications, being it of social, entertaining or achieving kind. From an evolutionary point of view, our brain is geared towards short-term successes rather than going the extra mile for the long-term perspective. The other main reason might be related to fear. It seems favorable to postpone a stressful situation for a little time, when the situation might give you negative feedback or the impression that you need to apply even more in order to succeed.

Second, the information overload of our contemporary societies is eating up a lot of time. News is being broadcasted almost instantaneous on many different channels, and you are supposed to stay informed. This also holds true for our fields of expertise. Tech people for instance have to read a whole bunch of tech blogs and articles, economy people have to check every bits of news about macro and micro economical developments, scientists have to check all new publications of their fields (and the publication rate is steadily increasing), and so on. Then there are your social feeds, hobbies and brands that you love, such as a favorite musicians, authors, directors, actors etc.; although it’s a blessing that all this information is available on the internet for us, it also takes a lot of time to digest all of it. And yet we often get the feeling that we’re under-informed.

Third, there are many more things to do compared to earlier times (I am mostly referring to the first and second world here), and all these things are competing for your attention. Both the digital revolution and the rural exodus are contributing to that fact. As a consequence, you always feel like you’re missing out on something. This particularly holds true for your professional life. Now that we finally have the freedom to actually do everything (I know, I know, I am oversimplifying here), we are scared that the grass might be greener on the other side. We tend to think that we are making a mistake by sticking to a certain profession. There is also a number of studies concluding that an increased number of choices actually makes people more desperate than preselecting choices for them [TED].

Fourth, it seems like we are trying to pack increasingly more duties, responsibilities and training into young people, while we are not leveraging the fact that people get older and older. By stretching the load to higher ages, we could disburden the young and middle-aged people, involve the elder people and at the same time cope with the problem of paying their benefits. I think that in general, the phenomenon of delayed gratification is a trend in our cultures that might seem rational on first sight but overall is not beneficial to the individuals’ mental well-being.

Fifth – and I think this one of the most important reasons, at least in your professional life –, since the beginning of the globalization and the free markets, we, as people, are increasingly competing with each other all around the world. So by simple game theory, it follows that if my competitor is working harder and longer, I should also boost my workload, unless I am okay with missing out on a job, a patent, an invention, etc.; this trend will clearly continue, assuming, of course, that political systems won’t change in a substantial way. And this trend obviously is independent of technological advances.

I neither want to propose any solutions nor condemn any of the observed phenomena – I am struggling myself on a day-by-day basis with the feeling of not having enough time.

To give you my two cents, I think it helps to assume that you only have one life. I do not claim that you only have one life and I do not want to digress into a discussion on whether religion, atheism or agnosticism makes sense or not – I just want to stress that assuming that you only have one life really helps you to make the right decisions. And if you are lucky enough to have the opportunity to do different things, don’t do something that you don’t like. You should try stuff to see whether you like it, but if you don’t, do not stick to it for whatever reason. It also shows you that indefinite delayed gratification is not a good strategy to follow. I will write another blog post on that topic.

To my mind, our nowadays societies are still fixed to anachronistic patterns of educating people, putting people to work and retiring older people. I am fairly sure that this will gradually but in the end substantially change within the next decades. I will cover this in another blog post.

As for the issue of information overload, I am very confident that technology will find a solution to that problem within the next years. I will write another blog post on that topic.

Finally, I don’t think that there is any viable solution to the problem of global competition. There are interesting concepts like basic income guarantee and so on, but even they don’t change the general pattern of being better off by doing more than your competitor. And I think that in general, liberal democratic systems have been proven to be quite motivating and fruitful for moving forward.

Speaking of time, do we all perceive time at the same rate? And if so, is this perceived rate of time an inherently human thing or do we just learn to synchronize our perception of time with our peers while we’re growing up?

And as a follow-up question: is this question rather a philosophical or a neurobiological one?