Slides From Database Talk at Drexel

I gave a lecture at Drexel this week on non-relational databases and “big data”. The slides are up. They are all new since last time; the world of NoSQL and Big Data has changed a whole lot in 2 years :)

Tuning JVM for a VM - Lessons Learned, Directly From VMware

Ben Corrie from VMware  gave a talk on March 15, 2012 at the San Francisco Java Usergroup on tuning the JVM for a virtual machine. The event was filled to capacity, but fortunately you can find the video, slides, and a more detailed description of the talk below.

The number of Java workloads running on virtualized infrastructure has been increasing exponentially over the last few years. Advancements in processors and hypervisor technology now make virtualizing Java a compelling proposition. However, there are still best practice provisos and considerations, particularly in the area of JVM memory management.

This talk will present a lot of the innovation, practical insight, and lessons learned gained from the last year by a senior engineer from VMware who recently developed a Java “ballooning” solution called Elastic Memory for Java (EM4J)


Ben’s Slides:

Static Analysis of an Unknown Compression Format

I really enjoy reverse engineering stuff. I also really like playing video games. Sometimes, I get bored and start wondering how the video game I’m playing works internally. Last year, this led me to analyze Tales of Symphonia 2, a Wii RPG. This game uses a custom virtual machine with some really interesting features (including cooperative multithreading) in order to describe cutscenes, maps, etc. I started to be very interested in how this virtual machine worked, and wrote a (mostly) complete implementation of this virtual machine in C++.

However, I recently discovered that some other games are also using this same virtual machine for their own scripts. I was quite interested by that fact and started analyzing scripts for these games and trying to find all the improvements between versions of the virtual machine. Three days ago, I started working on Tales of Vesperia (PS3) scripts, which seem to be compiled in the same format as I analyzed before. Unfortunately, every single file in the scripts directory seemed to be compressed using an unknown compression format, using the magic number “TLZC”.

Very cool reveng post.

Secret HotSpot Option Improving GC Pauses on Large Heaps

Gotta try this soon and see if it helps reduce GC pause times for my apps.

Chrome Addons Hacking – Bye Bye AdBlock Filters!

Continuing the Chrome extension hacking (see part 1 and 2), this time I’d like to draw you attention to the oh-so-popular AdBlock extension. It has over a million users, is being actively maintained and is a piece of a great software (heck, even I use it!). However - due to how Chrome extensions work in general it is still relatively easy to bypass it and display some ads. Let me describe two distinct vulnerabilities I’ve discovered. They are both exploitable in the newest 2.5.22 version.

Govt. Agencies, Colleges Demand Applicants’ Facebook Passwords

In Maryland, job seekers applying to the state’s Department of Corrections have been asked during interviews to log into their accounts and let an interviewer watch while the potential employee clicks through wall posts, friends, photos and anything else that might be found behind the privacy wall.

Won’t be working for any of these places. *sigh*

24/192 Music Downloads Are Very Silly Indeed

Here’s an experiment anyone can do: Go get your Apple IR remote. The LED emits at 980nm, or about 306THz, in the near-IR spectrum. Relatively speaking, this is just outside of the visible range. Take the remote into the basement, or the darkest room in your house, in the middle of the night, with the lights off. Let your eyes adjust to the blackness.

Above: Apple IR remote photographed using a digital camera. Though the emitter is quite bright and the frequency emitted is not far past the red portion of the visible spectrum, it’s completely invisible to the eye.

Can you see the LED flash when you press a button [4]? No? Not even the tiniest amount? Try a few other IR remotes; most use an IR wavelength even closer to the visible band, around 310-320THz. You won’t be able to see them either, even though they would be blindingly, painfully bright if they were in the visible spectrum.

Above top: Frequency of an Apple IR remote emitter relative to the full visible spectrum.

Above bottom: Nyquist frequence of a 192kHz sample rate audio file relative to the full audible spectrum.


These near-IR LEDs emit at about 20% beyond the visible frequency limit. 192kHz audio extends to 400% of the audible limit. Lest I be accused of comparing apples and oranges, auditory and visual perception drop off similarly toward the edges.

SML – Scalable Machine Learning Course

SML: Scalable Machine Learning

Practical information

  • Volume: 3 hours per week (3 credits)

  • Time: Tuesday, 4-7pm (3 lectures /in one block)

  • Location: 306 SODA

  • Instructor: Alex Smola (available 1-3pm Tuesdays in Evans 418)

  • TA: Dapo Omidiran

  • Grading Policy: Assignments (40%), Project (50%), Midterm project review (10%), Scribe (Bonus 5%)

  • Piazza discussion board


  • 02222012 - Slides are online

  • 02222012 - New assignments are live

  • 02222012 - Video for SVM (first three sets) are uploaded

  • 02222012 - Video for Optimization complete

  • 02052012 - Slides for Streams and Optimization are uploaded

  • 02052012 - Videos now have sound enabled

  • 01252012 - Problem set 1 is uploaded

  • 01252012 - Slides and videos are uploaded

  • 01252012 - Project ideas and datasets are uploaded

  • 01192012 - The graphical models tab has links to video lectures on tutorials on the subject (this is mainly for students who didn’t get to attend the class by Mike Jordan and Martin Wainwright).

  • 01182012 - The systems slides are available now (follow the systems link)

  • 01182012 - Updated project guidelines


Scalable Machine Learning occurs when Statistics, Systems, Machine Learning and Data Mining are combined into flexible, often nonparametric, and scalable techniques for analyzing large amounts of data at internet scale. This class aims to teach methods which are going to power the next generation of internet applications.

The class will cover systems and processing paradigms, an introduction to statistical analysis, algorithms for data streams, generalized linear methods (logistic models, support vector machines, etc.), large scale convex optimization, kernels, graphical models and inference algorithms such as sampling and variational approximations, and explore/exploit mechanisms. Applications include social recommender systems, real time analytics, spam filtering, topic models, and document analysis.



  • Basic probability and statistics. Having attended a machine class would be a big plus but is not absolutely required. Particularly some knowledge of kernels and graphical models would be useful.

  • Basic linear algebra (matrices, vectors, eigenvalues). Knowing functional analysis would be great but not required.

  • Ability to write code that exceeds ‘Hello World’. Preferably beyond Matlab or R.

  • Basic knowledge of optimization. Having attended a convex optimization class would be great.

Page generated 2012-02-22 21:44:22 PST, by jemdoc.

Looks like some really awesome content in here.

NoSQL Data Modeling Techniques

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. This aspect of NoSQL is well-studied both in practice and theory because specific non-functional properties are often the main justification for NoSQL usage and fundamental results on distributed systems like CAP theorem are well applicable to the NoSQL systems.  At the same time, NoSQL data modeling is not so well studied and lacks of systematic theory like in relational databases. In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques.

To  explore data modeling techniques, we have to start with some more or less systematic view of NoSQL data models that preferably reveals trends and interconnections. The following figure depicts imaginary “evolution” of the major NoSQL system families, namely, Key-Value stores, BigTable-style databases, Document databases, Full Text Search Engines, and Graph databases:

Node.js Is Bad Ass Rock Star Tech

See: my earlier post re: Node.js ;-)