Thursday, March 21, 2013

Interesting papers (with links)

Over the past two years of so, I've read quite a fair number of research papers (somewhere in the vicinity of 300+). Although many (~150) were directly related to my honours project work and ended up being concerned with all the various visualisation schemes developed for navigating and presenting large datasets (TL;DR commentary - there is a lot of stuff out there already. They've tried practically every idea you could think of, the next few which immediately come to mind leading on from those, and on a few side tangents). There are also a number of other interesting results I've come across which you might find interesting too...

"Blur Aware Downsizing" (2011) - Image downsizing algorithm which amplifies the blurring of blurred areas when downsizing, so that perceptually, the image still shows the same blur characteristics (e.g. preserving DOF)
- Could this be an approach to the old preview vs render blur resolution problem in the Compositing Nodes?

    "TextRank" (2004) - Unsupervised algorithm for automatic keyword extraction from texts
    - The algorithm is based off the PageRank algorithm that Google famously uses as one of the factors it uses producing its search results. That is, it takes some entities (web pages, or in this case words) and determines how frequently they link to each other. More frequently linked have greater weight. Then once we've identified the most likely candidates, we can then work out if some of them are actually part of bi-grams (2 words denoting one entity, i.e. "credit card") or n-grams instead of just being limited to unigrams (1 word keywords).
    - However, beware if distribution issues (especially file size) is an issue. A key contributing factor to the quality of this method (I suspect) is that they claim to only build the word graph after passing the input text through a POS (Part of Speech Tagger, which basically identifies what "type of word" each word in a sentence is - e.g. noun, verb, adverb, adjective etc. - though you'll find that these typically get much more complicated). In the paper, it is claimed that they extract just the nouns and adjectives, as those "were found to give the best results" for interesting/useful keywords. So if going down this route, you'll firstly need to find a good quality POS tagger; but good quality POS taggers achieve this quality by having large trained models. Another warning is that there are quite a fair number of them out there ;)
    - As far as elegance goes, this method sounds quite nice. I like graph twiddling algorithms :)

    I found this paper while looking for ways to implement an automatic topic extraction and clustering/clumping tool for deriving realtime insights (i.e. highlighting classes of popular responses) from short text snippets submitted by audience members during live presentations.

    "Gliimpse" (2012) - A quick in-place previewing technique which transitions between markup (i.e. LaTeX and HTML) and the visual results of this, combining the best of the power/flexibility of text-based markup editing and the need to be able to quickly view the results while understanding how the markup relates to what is created.
    - The demo is seriously cool, and I can think of many times when working with LaTeX where this sort of capability would have come in handy
    - Where is my next gen text editor? :P

    Bret Victor's Work
    - While Bret Victor's stuff can't quite be classified as academic research papers, I thought I'd just include this here as well since it's pretty inspiring stuff (NOTE: I do have a draft post/ramblings about his stuff that I should get around to finishing up some time), and fits in nicely with Gliimpse.
    - In short, after watching his "Inventing on Principle" talk and reading a few of the other articles, I've been somewhat cured of some of the mathematics "allergies" I'd been increasingly plagued by following Year 13 + First Year Uni Maths courses *ahem*

    No comments:

    Post a Comment