2 comments on “Science as Compression

  1. Pingback: Response to Review of “Notes” by Peter McCluskey – Ozora Research

  2. [Replying to Dan Burfoot’s comments.]

    On lossless versus lossy compression, the book is clear about the advantage of lossless compression for objectively measuring a theory. I agree that lossless compression is the right method for comparing theories.

    My comment was an attempt to understand the practical problems of implementing CRM. For purposes other than measuring how good a theory is, I want to express the theory in a form that looks more like lossy compression than lossless compression. I’m unsure whether lossy compression is the right way to describe what I want here. Maybe I’m too confused to articulate what I do want. It’s something along the lines of expressing theories using traditional methods when we’re aiming for human comprehension of the theories, and having some standard toolkit to convert the theory into lossless compression when we want to measure the quality of the theory.

    No, I didn’t conclude that CRM was inapplicable to stock market data. I believe that CRM offers some benefits for my work, but I’ve been procrastinating due to a combination of being busy at other tasks and being confused about how to apply CRM to my work.

    Some of that confusion is due to CRM feeling sufficiently strange that it takes a good deal of thought to reframe my thoughts around it. But I suspect most of the difficulty is due to factors related to markets.

    His comments about the stock market mostly imply that it’s futile to do scientific study of how to beat the market, not that CRM is the wrong approach.

    Yet if everyone gave up on finding inefficiencies in the stock market, then the market would become inefficient. The only equilibrium that is close to being stable is for a fair number of people to think they can find inefficiencies, while actual inefficiencies are hard enough to find that many people fail to do so.

    I’ve got gigabytes of data that includes stock prices, earnings / balance sheet numbers, descriptions of each company’s business, etc.

    I’ve also got evidence of a more anecdotal nature concerning financial fluctuations over much longer time periods, and covering a variety of countries.

    I’ve got hints about human nature (e.g. from the heuristics and biases literature) that guide my intuitions about which patterns are due to mistakes that are persistent and widespread.

    My databases are hardly pure noise. They contain lots of patterns about which companies share which similarities to other companies.

    If I simply focus on compressing my database via automated techniques, I’ll get lots of nearly useless knowledge: some of it due to overfitting, and lots of it due to obviousness (e.g. similarities between Wells Fargo and Bank of America). The results will also include some valuable ideas (e.g. companies grouped by similarities that I’ve overlooked).

    Those underappreciated similarities may help me create new abstractions. That won’t directly lead me to better predictions. But at very least it will focus more of my attention on patterns I find interesting – I’m currently spending too much time manually looking for needles in haystacks, when if I had sufficiently good abstractions, I could automate some of that via software that makes educated guesses about which sections of the haystack are most promising.

    CRM has focused my attention more clearly on that goal, but still leaves some important domain-specific challenges. I’m still unclear on how much of that I’ll need to do manually, and how much I can use standard feature extraction tools.

    That’s likely to be a much smaller paradigm shift for me than the difference between math and physics, but still a real shift in my focus.

Leave a Reply

Your email address will not be published. Required fields are marked *