I just finished reading Neal Stephenson's Cryptonomicon. It's a daring story with lots of characters and parallel plots all revolving around cryptography, gold treasure hunting and second world war action. This in itself is not really compelling. What is though, at least for me, is that this is a book by a geek for geeks and about geeks. It's written by a geek because Stephenson has a way of playing language kungfu that seems to scream look at my wits/IQ and goes off on weird tangents all the time. It appeals to the inherent multi-tasking addiction in your average nerd in that you can ponder some off-hand remark and fancy idea thrown at you in a subordinate clause while continuing through the main thread. It's written for geeks - who else would want to read detailed descriptions of crypto-systems as part of a novell? But that's actually besides the point. The real reason is that geekdom is characterized so well in this book that you could actually use it as a definition. The leading characters are all more or less socially inept almost to the point of autists or Asbergers and at the same time math geniouses living in their own alternate realities. I laughed out loud at a scene at a dinner table where Randy, chronically forgetting all the unimportant details about the people around him, like - say - names, blurts into the discussion after enduring silently for a while and turns it into a heated debate. The sarcastic remark is that "normal" people flock into "consent groups" soon-after, because a real argument is too strenuous. And it's about geeks... well, just take the title for a hint. I really enjoyed it.
I wasn't in the mood for "real" work this afternoon so I decided to bug my girlfriend instead. She has developed a serious Sudoku addiction lately and solves them by the dozen. I thought it was a beautiful problem for a genetic algorithm to solve so I gave it a shot. The whole thing took little over 3 hours to code, starting from scratch using only the STL and MFC. The result won't win any beauty prices and is most certainly not the fastest or most intelligent solver out there - but it works. I've used the most basic of all GAs, implementing truncation selection for selecting suitable parents. Childs are created by using the first n blocks of one parent and the last 9 - n from the other. Mutation is archieved by simply swapping random fields in a block. The fitness function counts how often a digit appears in a row or a column, every deviation from one counts as an error. Sudoku is a pretty hard problem for a GA because it has many solutions with the same error count, i.e. the same fitness. Such a multimodal error function needs a GA that is better capable of exploration vs exploitation than mine. Using truncation selection for inheritance is pretty bad, cause it will not usually try enough approaches and easily get stuck in local minima. I "solved" the problem by simply using a high chance for mutations and thus a high random factor. You can tell that the GA isn't very good for exploring because it is highly dependant on the initial parameters and population. Usually, if you choose the right GA for the problem at hand, GAs are pretty robust and don't change much with respect to initialization. As it is, the algorithm needs a couple of seconds and about five to ten thausand iterations to solve a moderately complex sudoku. For the fiendish and extra hard problems it'll take minutes and several hundred thausand iterations. Anyways - that was fun ;-)
I have received a very well done eBay phishing spam mail that made it through both of my spam filters. I took the time to actually read the thing. It has a pretty long winded explanation of why I owe eBay some money and goes on to say where I should transfer it and how much. The line stating the amount due contained a pretty funny error: Faelliger Gesamtbetrag: ||%RND_FIRST_DIGIT50,68 %RND_FIRST_DIGIT ?! LOL! I owe someone a random amount of money ;-) I can only assume that the original spam mail was generated in a different language (most likely english) and a different character set. I guess the dollar symbol was supposed to be replaced by a euro one and the parser choked on that. And, btw, we put our currency symbol at the end of the number, not in front anyways. On another note I stumbled across some malware recently that infects your machine and goes on to encrypt random pieces of your data. It then pops up a dialog telling you your data is being held hostage and that you have to pay a certain amount to get the password to unlock it again. I found this a very creative idea and wonder if they actually do earn money with it? More original than just deleting and destroying stuff randomly in any case. I guess with good open source encryption like: http://www.truecrypt.org/ it would actually be quite simple to write such a trojan that would work very well indeed.
I have been busy at work recently building neural network ensembles for solving quite a complex prediction/classification problem. There are two major issues to solve for my particular problem: 1) How do you reduce the dimensionality of your input data to something manageable while still maintaining the relevant information? 2) How do you find a set of neural networks that each produces good results individually while making it's prediction errors in disjoint areas of the input space, thus archieving maximal benefit from grouping. In my research for 1) I've stumbled across two majorly cool algorithms: Isomap and Locally Linear Embedding. Both of which try to solve the same problem: Finding the inherent dimensionality of a dataset hidden in a high dimensional input space (finding an embedded manifold). As a simple example you can view the 2D still fotograph of a three dimensional object as a vector of pixels. This vector will have an insanely high number of components (i.e. dimensions), your average digital camera has millions. On the other hand the real world object they describe usually has far less degrees of freedom, thus needing far fewer components to be described. The aforementioned papers both have very cool graphical examples for this. Unfortunately both algorithms suffer from the fact that you cannot train them on one dataset and subsequently apply them to another, unknown, one. Real world problems are often of this nature though... Still, I think they are among the coolest ideas in recent years and definately a worthwhile direction for more research. I also investigated Self Organizing Maps, but they suffer from the same problem like LLE and IsoMap and additionally also include quantization as a necessary step, thus corrupting your data more than necessary. In the end, I settled for boring old plain vanilla principal component analysis, which manages to simplify my data at least a little bit. For the second problem ( 2) mentioned above) the primary issues are how to determine network topology in the first place and how to compare networks against each other. I implemented a genetic algorithm growing thausands of networks, thus solving the network topolgy problem in some pretty brute force way. Comparing network quality is more difficult than it sounds though, the standard mean squared error usually used in training backpropagation nets isn't very useful for this. I ended up creating a gini curve/coefficient for my alpha beta errors (it's a two class classification problem) and, more importantly, scatterplots of the output neurons' activation. The latter plots are very interesting, creating funky curves for certain training configurations and transfer functions (radial basis functions are most freaky). They also offer insight into the particular training samples a net makes errors on, thus allowing an optimal strategy for choosing a set of nets. Fun ;-)
Late at night I'm lying awake reading. My girlfriend curled up close to me, sound asleep. Her slow breathing and the soft rustle of pages turning the only sounds reaching me in my little island of light. I'm through with the book, thoughts wandering in comfortable idleness. Everything seems slow, silenced, numbed in happy melancholy. The book is over, the story ended. I start wondering about the author and read the cover flap: BuschnicK is. leaving. working. having. teaching. winning. publishing. living. writing. The author's description flap reduced to the verbs. Reduction. Time is a glacier.
One of the companies I work for has a contract for a project which is clearly military in nature. I refused to work on that project for moral reasons, as did many of my fellow colleagues. Anyways, the ensuing discussions reminded me of an excellent movie: From the movie Good Will Hunting. Will is a young maths genious and is being interviewed for a job at the NSA: Why shouldn't I work for the N.S.A.? That's a tough one, but I'll give it a shot. Say I'm working at N.S.A. Somebody puts a code on my desk, something nobody else can break. So I take a shot at it and maybe I break it. And I'm real happy with myself, 'cause I did my job well. But maybe that code was the location of some rebel army in North Africa or the Middle East. Once they have that location, they bomb the village where the rebels were hiding and fifteen hundred people I never had a problem with get killed. Now the politicians are sayin', "Send in the marines to secure the area" 'cause they don't give a shit. It won't be their kid over there, gettin' shot. Just like it wasn't them when their number was called, 'cause they were pullin' a tour in the National Guard. It'll be some guy from Southie takin' shrapnel in the ass. And he comes home to find that the plant he used to work at got exported to the country he just got back from. And the guy who put the shrapnel in his ass got his old job, 'cause he'll work for fifteen cents a day and no bathroom breaks. Meanwhile my buddy from Southie realizes the only reason he was over there was so we could install a government that would sell us oil at a good price. And of course the oil companies used the skirmish to scare up oil prices so they could turn a quick buck. A cute little ancillary benefit for them but it ain't helping my buddy at two-fifty a gallon. And naturally they're takin' their sweet time bringin' the oil back, and maybe even took the liberty of hiring an alcoholic skipper who likes to drink martinis and play slalom with the icebergs, and it ain't too long 'til he hits one, spills the oil and kills all the sea life in the North Atlantic. So my buddy's out of work and he can't afford to drive, so he's got to walk to the job interviews, which sucks 'cause the shrapnel in his ass is givin' him chronic hemorrhoids. And meanwhile he's starvin' 'cause every time he tries to get a bite to eat the only blue plate special they're servin' is North Atlantic scrod with Quaker State. So what do I think? I'm holdin' out for somethin' better. Why not just shoot my buddy, take his job and give it to his sworn enemy, hike up gas prices, bomb a village, club a baby seal, hit the hash pipe and join the National Guard? I could be elected president.
I'm working on a side project of mine involving a webcam, a projector and shadows. Anyways, webcams only have very small lenses and thus a strong fisheye distortion effect towards the edges. I needed a way to get rid of that. I dug out my copy of the excellent open source OpenCV image library and found a nice little function to do that for me. You can see the result in this post. The first image is the original, distorted one, the second is the undistorted version and the third shows the calibration object pattern detection. Next step is getting rid of the trapezoid distortion which should be far easier since it only involves a linear transformation as opposed to the radial lens distortion. Finding suitable reference points will be interesting though (or pins as such fix points are called in GIS applications).
Notes on the human visual system and perceptual image analysis and segmentation This is intended to become a loose collection of links and notes to gather my thoughts, ideas and resources on the topic in one place. Usually I have an unorganized mess of scribbles on various post-its, hyperlinks sitting in emails, browser favourites and text files. Also, this collection is intended to provoke feedback and get your neurons going, maybe we could spark some new ideas. Human perception, especially vision, is still an active research area and many things are yet unknown. On the other hand it is of tremendous interest to decipher nature's tricks. With today's computing power and camera systems lots of highly useful and practical applications could be built - if only we had the right algorithms to deal with the data. One of the most important and relatively low level vision tasks is segmenting an image into it's semantic parts and components. Human's do that tremendously efficient and fast, without even concious thought. Computers on the other hand are still relatively inept at this task and only perform well for certain small subsets of the general problem. So the first thing should be trying to understand how humans do image segmentation. Theories about the human visual system (some proven and true, others not) and ideas of how to emulate them algorithmically: - different light wavelengths provoke reactions of varying intensity on the retina. For example we can distinguish different shades of green far more easily and in greater fidelity than any other color. Evolutionary this makes perfect sense, considering that the great outdoors living habitat presents itself primarily in greens. Conclusion: model the colorspace accordingly, i.e. vary the dynamic range (bitdepth) used for representing signal strength in certain wavelengths. In the simplest case this means having more bits available for representing green than for any other color. - Humans perceive certain colors to be similar to each other while others appear dissimilar. Conclusion: model the colorspace accordingly, i.e. find a color space such that some mathematical metric represents human perception in color differences. One such colorspace is the CieLAB colorspace which has been devised with several hundred test candidates such that the euclidian distance metric corresponds to perceived color similarity. It still has quite a lot shortcomings though (see appendix). Problems: As opposed to this model human color perception is not absolute, but spatially and contextually varying. Our perception of color depends on the colors directly surrounding it and the number and type of colors visible simultaniously. It is easier to see color variation in a local color gradient than to count the number of different shades of a color randomly dotted all over an image. Also, we can only distinguish so many colors at once (a relatively small number). - The human eye is not modelled on a regular grid. Rather the perceptors are irregularly spaced on a rounded surface (the inner surface of the eye). On the other hand all artificial sensors adhere to a strict, regular and flat lattice. Question: does the irregularity of signal sampling in the human eye have any advantages over a fixed grid? Is it more robust dealing with aliasing problems? Is it more robust against over saturation with a certain spatial frequency? What purpose does it serfe? - Physical light intensity does not correspond 1 to 1 to perceived light intensity. Research indicates that perceived intensity approximately follows a logarithmic function of physical, measured, intensity. Computers already try to capture this effect for example with the nonlinear gamma curve for display devices. When modelling a color space or normalizing one this should be taken into account. - Humans seem to be more sensitive to changes in brightness than in color. The human eye has far more intensity than color receptors. This indicates that a color space that models brightness or intensity seperately from color would be desirable. Color could be represented with fewer bits than brightness. When building gaussian or laplacian image pyramids brightness could be sampled with a smaller (higher frequency) filter kernel while color could be smoothed much more with a larger filter producing a "perceptual image pyramid". BTW, why are all the usual color spaces 3 dimensional? There is nothing inherently 3 dimensional about color. Would it make sense to find another representation? (Update: there actually seems to be a theory saying color is at least 3 dimensional) - Humans only focus on a very small part of the whole visible field of view at a time. The spatial and color resolution of the eye falls off towards the edges. Color can only be perceived in the center of the field of view. Both of these facts would imply a non-linear, non-kartesian, coordinate system to represent images. The log-polar coordinate system might be one such choice, giving high spatial resolution at it's center and falling off towards the edges. Thus one could model a sharp and crisp "center of attention" while suppressing noise and uninteresting detail in the surrounding area. This would only deal with the spatial issue though. A similar approach should be used for color, so that an image fades to gray towards the eges and color detail is reduced. - The human eye and associated nervous system is organized higly hierarchical. Processing and understanding of the received data happen at each level of the hierarchy. Some simple metrics and information seem to be gathered very early in the process such as "it's dark/it's bright". Vertical and horizontal breakage lines are detected pretty early (probably because they are needed for navigating our three dimensional surrounding) as are foreground and background objects. Foreground segmentation is a particularly interesting subject since a lot of information seems to be used here. Examples would be: is the object in motion; a tendancy to place objects "on the ground", i.e. the lower part of an image where gravity would have them; a light vs. dark object segmentation; stereoscoping 3D segmentation, i.e. how close is the object. The last part of the system is the actual pattern and heuristics matching happening in the visual cortex. Pattern matching meaning matching the brain's database of actual known objects to the newly encountered ones. Heuristics matching means matching known physical truths to a probably unknown object (This last point is especially interesting since it implies that things that "cannot be true" won't easily be perceived. Supposedly this is anectotedly true for the first sightings of Columbus' ships by the natives - they didn't know such constructs and thus had problems "seeing" them - although they were huge). The hierarchical layout implies that an artificial perception system should probably be modelled the same way. Basic receptors, followed by ever more complex and complicated feature detectors followed by logic combining those features into semantic objects. Maybe some form of neural net with different types of neurons and different layers serving the individual detection tasks could be constructed for this. - Humans have two eyes. Stereo vision is useful for 3D navigation, but is also used for detecting redundancies and suppressing noise in 2D images. Also, the focus area is in the field of view of both eyes giving extra detail and resolution, while the edges of the image are only ever in the field of view of one eye. This again hints at a non-cartesian image representation with detail fall-off towards the edges. Maybe sensors could be built or refitted to produce stereo vision images as well. - Similar to the center of attention mentioned earlier humans seem to have something like locally dominant colors. We can only sense so many colors at once (a relatively small number) so we see certain areas as one homogenous color even if they are not. Similarily our ability to discertain colors directly depends on how many different colors are visible at once. It is easier to spot a smooth color gradient than to discertain shades of the same color in a chaotic patch of different colors. This could probably be modelled by a clustering algorithm (k-means, meanshift, self organizing maps, ...) in color space, finding the locally dominant colors. - While computer image analysis usually only works on one image at a time humans naturally blend several impressions of the same visual stimulus over time. I don't know whether the human eye has a concept of "frame rate/frames per second" or whether that would be constant for all receptors or different (it would be possible for example, that the gray receptors have faster circuit times than the color receptors, thus giving light intensity higher time resolution). All this is important in analysing motion of course, but even for static images it makes sense to view it over time. A human seeing an image for some time will gradually notice more and more detail. Due to the imperfect nature of our sensors (eyes) still frames will have a certain amount of noise and inaccuracies in them. Interpolating over time will eliminate most of these. Also, it allows one to focus on different or more details once the basics have been analyzed. Putting all these factors together it seems humans have a hiararchic system over time, reducing noise and redundancies and analyzing in increasing amount of detail. - Texture. Texture is an exceedingly complicated and overloaded term. There is not even a standard way to define what texture means. Yet at the same time humans have no trouble whatsoever recognizing objects of same texture. Why is that? Texture describes some spatial relationship between colors and intensities, a pattern. It is not necessarily exactly repeating (the texture of a plant leaf is always the same, yet at the same time it is unique for each leaf and from one square centimetre to the next). It is not necessarily of uniform scale. It is not easy to describe in simple geometric terms (clouds). It does not necessarily appear uniform in color or intensity. Nevertheless, there must be at least some characteristics appearing constant and non-varying to the human observer, so he can group parts of an image by their texture. Texture is a hierarchical term again. Viewed from a distance at coarse detail a patch of lawn will appear textured uniformely. Viewed in close detail the individual blades of gras will appear totally chaotic. There are multitudes of approaches to capturing texture in mathematical terms, most of them deriving some form of statistical features from gray value intensities. There are fourier descriptors, wavelet transforms, coocurance matrices, color clusterings, bayesian predictors, neural networks, genetic algorithms, self organizing maps, active contours/snakes, ... Each one of these is a valid approach and fit at capturing one aspect of texture. However none of them is adequat for the general problem. To sum up: obviously the human visual system is a highly sophisticated, highly dynamic and hierarchically organized system. It seems difficult if not impossible to model all of it's capabilities with only one algorithm. It seems more logical to start at the bottom of the hierarchy, with the simple or even trivial transformations and interpretations and work the way up. Ideally keeping the interfaces open such that at each step each algorithm is pluggable and replacable by a different, possibly better, one. Another very important preprocessing step is determing what will be the basic inputs to the whole system. These inputs should be modelled as close to the outputs of the receptors in the human eye as possible. I'm a big fan of emergent behaviour and believe a lot of nature's power comes from systems organized in such a way. Very basic and simple basic blocks in large quantities work together to archieve more than the sum of their parts. Examples for systems like these are all of a human's senses, each of which is built from tiny receptors, the human nervous system and brain and, on a larger scale, ant colonies. These systems exhibit a lot of very desirable properties: they scale well, they are robust to noise, they are robust to defects in individual agents, they do not have a single point of failure or a single mastermind, they are easily parallelizable and they are easy to construct (since each individual agent is trivial). Artificial examples for algorithms with these qualities would include neural nets, self organizing maps, autonomous agent systems (ants) and genetic algorithms. The idea is to build a hierarchy of lots of very simple building blocks and hope that something useful will evolve from that. Ok, now I'll start working on my genetically evolving, self organizing, hierarchical neural network based on fuzzy logic maths - it will call you and tell you when it's done and ready to take over the world ;-) Information about the eye and vision: http://webvision.med.utah.edu/ Irregular grid sampling: http://www.ee.surrey.ac.uk/Basic/ Perceptual image pyramids: http://www.cs.ucla.edu/~siavosh/iccv05.pdf Log-Polar image representation: http://users.isr.ist.utl.pt/~alex/Projects/TemplateTracking/logpolar.htm Locally dominant color adaptive segmentation: http://www.ece.northwestern.edu/~pappas/papers/jqchen_tip05.pdf Experimental Determination of Visual Color and Texture Statistics for Image Segmentation: http://commnet.ece.northwestern.edu/~jqchen/docs/spie05.pdf Human visual system foreground background detection: http://www.apa.org/journals/releases/xge1312194.pdf Stereo vision correspondence comparison: http://cat.middlebury.edu/stereo Comparison and benchmarking datasets for different segmentation strategies: http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/segbench Autonomous agents and pheromone maps for digital image segmentation (this one is just soo cool ;-) ): http://alfa.ist.utl.pt/~cvrm/staff/vramos/Vramos-WCLC05b.pdf Problems with the CIE Lab color space: http://www.aim-dtp.net/aim/evaluation/cie_lab/index.htm plus a gazillion more in my bookmarks and littered all over the place. I hope I'll come around to organizing them.