The stock market reflects demand and supply. However complex a security pricing model, it is only as good as its ability to understand these core market forces. And, from its accuracy, identify opportunities to exploit demand and supply inefficiencies. Basic economics.
The psychological equivalent to demand and supply is greed and fear. More base human forces that manifest in market behavior. From Jesse Livermore to Michael Saylor and everyone in between. Greed and fear drive markets entirely.
For as long as markets have been observed, humans have tried to measure greed and fear. This measurement is best classified as sentiment. Sentiment is everywhere. Political approval and disapproval polls, product reviews, and call center responses are all sentiment. The prose of this Substack can be converted into sentiment.
While sentiment is not exclusive to financial markets, it is maybe most obviously prevalent in financial markets. Measured in many forms.
The VIX (CBOE Volatility Index)
CNN Fear and Greed Index
High Low Sentiment Ratio
NYSE Bullish Percentage
Consumer Confidence
The Buffet Indicator
NAAIM Survey
StockTwits
Language is sentiment. Which means everything is sentiment. Including, of course, social media and the great bird application, Twitter.
Like earnings, sentiment is almost universally relevant. This is why we identified sentiment as an important input when we started building portfolio analytics. I wish I could say we started with PsychSignal, but we didn’t. Many of the mistakes we’ve made building Point Focal have been about failing to properly choose what not to do. And so it was when we started building a natural language processing (NLP) sentiment model to create analytics that would capture the most fundamental underpinning of market forces. Greed. And fear.
Three years ago, while teaching in Northeastern’s masters of analytics program I had a couple of capstone students helping Point Focal. Another Focus Signal post will likely explain why I was teaching at Northeastern. But I was. And one of the many benefits of the experience has been creating a rotational internship program with the best students from the capstone. By “best students”, I do not mean highest GPA. I mean the most ambitious and creative students who understand apprenticeship value. In my experience, this profile has not been correlated with the highest GPA.
Enter Zhoujun. A smart, creative capstone student who decided the benefits of working with Point Focal were greater than its costs. Zhoujun helped us build a simple model that could process Twitter content associated with stock symbols using R sentiment packages.
Many great models are simple. But not all simple models are great.
Our model was like a single blade of grass in an endless field of sentiment. Despite the limiting content constraints of the Twitter development environment, we were overwhelmed by tweet volume. The sentiment models we used classified language into emotions - none of which were greed or fear. They were more like the happy / sad faces of on-line surveys no one completes.
I recall Zhoujun trying to decode Apple product sentiment with the model and thinking that despite its tiny scale, we were hypothesizing that our model might explain iPhone sales. If a sample of Apple tweets could be converted into emotions that produced meaningful information through an off-the-R-Studio-shelf sentiment model, it could explain anything. And a model that explains anything explains nothing.
Weeks into our effort it was clear the challenge was beyond us. We lacked the financial capital the raw data required, the natural language processing expertise the model required, and the technology infrastructure the data operation required. Not least of all, we lacked the human capital the execution would require.
Otherwise, we were ready to roll.
All we needed was zero cost, post-NLP-processed data, a near-zero cost tech stack, and automation instead of humans. It is moments like these when many rational actors might stop. But solving problems is also rational. And this was a problem to solve.
Enter PsychSignal. I found PsychSignal online and remember thinking they were mature enough to produce the content we needed and small enough to be approachable. Indeed, they seemed to have a few employees and a few customers - one of whom evidently funded the operation. Today, Crunchbase reports PsychSignal has raised $230K and have 3 employees with the following company summary:
PsychSignal quantifies the real world psychology of the investment crowd by listening to social media conversations. They provide the most granular real time financial sentiment APIs in the world. They do so under the leadership of a team of experts in psychology, engineering, and data science, plus a few legendary traders from Wall Street.
I dropped a note into the PsychSignal website explaining why Point Focal wanted to access their data.
“We believe our product would benefit from enabling PMs and traders to view PsychSignal data… We are unlikely to have the capital required to subscribe to your data… However, I don’t think we need the fire-hose version of your data to create value for our users. A fairly limited set would likely be sufficient.”
James Crane-Baker responded immediately. He was receptive to engaging and simply asked what we could contribute to help defray resource allocation costs while allowing us time to get to market. In the end, what we and their other customers could contribute would not be enough. But it was enough to begin, and for that I remain grateful to James and those like him who are willing to engage with early stage, bootstrapping founders.
PsychSignal covered thousands of global securities. Many more than we needed with our U.S. equities scope. They offered real time intraday content and a single, overnight file. There is a huge difference in complexity between processing streaming data and processing static data. We started with the daily static files from PsychSignal. Choosing not to start with streaming data was one our better PsychSignal decisions.
We parsed the daily data into more than 13,000 individual security files. Then we migrated a few years of history and created a process to append daily data from the API to our stock files. A separate process produced a single, filtered, merged file of PsychSignal data for the roughly 3,500 stocks in our beta universe.
PsychSignal had a feed from Twitter sourced by cashtags ($TSLA) from which they produced the following fields: Bullish Intensity, Bearish Intensity, Bull-Minus-Bear, Bull-Scored Messages, Bear-Scored Messages, Total Scanned Messages, and Bull-Bear-Message-Ratio.
We loaded a few million rows of PsychSignal data joined with reference and pricing data into Spotfire. Finally, we had sentiment analytics.
There was a lot we wanted to analyze with our new PyschSignal data. Our primary goal was to understand the effect of sentiment on prices - analyzing correlation and its materiality or spuriousness. But beyond an unlikely golden-sentiment-bullet that might help forecast future prices, we were interested in what else we could learn from the flow of social emotions.
We wanted to understand the relationship between extreme sentiment and future volatility, regardless of price direction. We wanted to know over what time frames sentiment signals are most valuable. We wanted to design sentiment model parameters and generate back-tested results with historical performance attribution. We wanted to reveal currently actionable signals. And we wanted to know if creating and rebalancing long / short baskets of positive / negative sentiment stocks could ourperform a benchmark. We were excited. But first we wanted to do something we thought should be simple. Color stock prices by intensity of sentiment.
A typical line chart in visualization software is designed to plot lines based on continuous variables over time. And they allow the lines to be colored by a separate categorical value. Consider 2 lines plotted where Y is performance, X is time and each line is colored by a category, portfolio.
Or 6 lines in the same space where lines are colored by symbol.
This design solves most line chart use cases.
Now consider a line in the same space and our desire to color the line by a different dimension - a continuous scale of sentiment rather than a categorical scale of symbol or sector. This is far more complex and not the use case for which visualization software is designed. It is what project managers call a corner case. Smart product owners tend not to allocate lots of development resources to corner cases.
So we bastardized the scatter plot in Spotfire. Instead of drawing the price of the stock as a line, we drew price as a scattering of plots. Then we made the plots very small and connected them so that they appeared as a line. Finally, we colored the plots that made up our line by sentiment.
Now we could see greed and fear. And for the moment, it was thrilling to visualize the sentiment our social media world was ascribing to companies.
Afterwards, our technology lead informed me I was using the visualization software incorrectly. I explained that the software was not designed correctly. Of course, both of us were right. Risk is introduced when using software in a manner for which it is not designed. But with risk comes reward. And extending software to do something beyond its original design can lead to great advances.
While our sentiment views lacked maturity, they increased the number of alternative data sets in our product from two to three. We had off-exchange activity, we had Estimzie earnings, and now we had PsychSignal sentiment. The off-exchange activity was the first data we brought into Point Focal and therefore had the most mature analytics. PyschSignal was last of our initial three and therefore had the least mature analytics. But three feels more like a product than two. Heck, with three we were a self-described alternative data platform.
As we explained to prospects, narrow, deep analytics are superior to wide, shallow analytics. This is fundamentally true and an important premise on which we build. We had the narrow part down. We were working on the deep part.
Our sentiment views evolved. They became more complex and then simpler. The demo response to our sentiment analytics was heightened interest. People had not seen their portfolios colored by social sentiment. But interesting analytics are only necessary, not sufficient. People do not pay for interesting. They pay for compelling. And the bar for compelling analytics in asset management is improving performance and risk management decisions.
Eventually we overlaid moving averages and Bollinger bands onto the sentiment-colored price history. And we created parameters that could be configured to identify moments in time when pricing and sentiment information converged in a meaningful manner.
But what is meaningful? Our curiosity was based on whether extreme price and sentiment conditions created a propensity for stocks to extend or reverse direction. Momentum and mean-reversion. We populated greed and fear flags over scenarios isolated by model parameters.
With psychological labels applied to our visuals, stock behavior could be observed before and after the labels. Thus, from a set of configurable price and sentiment parameters, we could analyze volatility and performance within the context of social sentiment. We calculated metrics that quantified performance from the signals produced by the parameters.
This was the beginning of creating a discretionary framework from which to analyze sentiment impact on stocks and portfolios. The combination of outputs that could be produced from the model was large. There was so much to do.
“Everyone has a plan ‘til they get punched in the mouth.” - Mike Tyson
On the morning of April 14th, 2020 I received an email from James at PsychSignal with the subject line, “Data Subscription Termination Notice”.
“Unfortunately due to increasing upstream data costs associated with producing this data, we have decided to terminate the live data feed, effective immediately.”
After a brief email exchange with James, I confirmed we had a problem. All the sentiment views and functionality in Point Focal no longer had a data feed. The greed and fear we had worked so hard to capture in our portfolio analytics were gone.
Beyond the obvious pain this inflicted on our morale and the lost sunk cost of our sentiment efforts, it also forced us to drop everything we were doing and figure out how to save our sentiment.
This is the opportunity cost of time that can kill startups. Projects across infrastructure, data analysis, strategic partnerships, outreach, and everything else we were doing stopped to focus on sentiment. The thought of our demos going from 3 alt-data sets to 2 was demoralizing.
We began to consider alternative sentiment data sources and realized that IEX Cloud, the new financial data provider whose API we used for market and reference data had brought StockTwits sentiment data into their platform. We had seen the IEX Cloud API mature technically and expand its content over the years, but we had not realized sentiment was available. The API endpoint contained daily scores and activity which we could, at least in theory, use to replace our now legacy PsychSignal feed.
IEX Cloud knew Point Focal. Probably because we had been there from the start. We were using their API before they productized it. I’m confident we were early enough to have provided helpful product input. Over time we got to know their team and shared our own product with them. We published a joint case study and even convinced them to let Charlie and me ring the IEX bell.
Of course, it’s not actually a bell. It’s a button which “closes” the market. It’s a great gimmick because everyone knows it’s a gimmick and because IEX uses it to demonstrate how milliseconds matter. No one in practice ever presses the button at exactly 16:00 EST because the press is measured with incredible precision. This closing ceremony is messaging at its best. Reminding the world that IEX is designed to neutralize speed advantages sought after by high frequency trading operations. See Michael Lewis’s Flash Boys for the spellbounding details.
So when I wrote Josh Blackburn, head of IEX Cloud, to ask for help, he at least knew who we were. I explained that PsychSignal was immediately terminating their data feed and we wanted to rapidly convert to StockTwits through their API. I also asked for help organizing the historical data required to completely replace PsychSignal. Despite not knowing whether the signal in StockTwits was better, worse, or indistinguishable from PsychSignal, we knew we could not have sentiment history that began with one source and ended with another.
Josh responded in eleven minutes with “We can get this done for you today”. In less than 24 hours, IEX Cloud delivered a flat file of sentiment data to us that kept our product from regressing. We backed up PsychSiganl data and stripped it out of our product. When we loaded StockTwits data we didn’t even change the data pipeline. We morphed it into the PsychSignal stream with unchanged column headings that tricked our product into thinking StockTwits was PsychSignal (to antrhopomorphize technology).
When this occurred in April of 2020, our product was still in Beta. And with a minimal amount of pain we were back in business. Beta user experiences and demos were largely uninterrupted. Having the support of a larger partner with more resources than Point Focal was wonderful. It’s why in our case study that would be published one week to the day after this event, we wrote:
“IEX Cloudʼs responsive support makes us feel more like a partner than a client – they are invested in our success.”
Abraham Thomas explains in The Economics of Data Businesses:
A common failure mode is to build a business on top of somebody else’s data. If you depend on a single upstream source for your data inputs, they can simply raise prices until they capture all of the economics of your product. That’s a losing proposition. So you should try to build your own primary data asset, or work with multiple upstream providers such that you’re not at the mercy of any single one.
Indeed, PsychSignal could raise prices, but they could also terminate their feed! Both situations are undesirable. We are doubling our upstream data providers this quarter from 3 to 6, one of which has their own aggregated content. Between this data diversification and the maturing IEX Cloud offering, we are becoming less susceptible to the failure mode identified by Thomas, who goes on to say:
You should also try to add proprietary value of your own, lest either your suppliers or your customers encroach and disintermediate you. A sufficiently large transformation of your source data is tantamount to creating a new data product of your own.
This is critical for Point Focal. Across data sets, and specifically with sentiment, we are overlaying analytics onto the raw data to transform the original content into something new. This IP layer creates usable insight from raw information and is how we are creating a new data product of our own.
Sourcing and managing alternative data, producing signals with quantified performance for custom portfolios delivered with visual analytics, natural language narratives, and automated reporting… this is our new data product.
With our sentiment disaster averted, we began to analyze StockTwits sentiment performance relative to PsychSignal. And we returned our focus to our beta program. Then, three weeks later on May 20th, we received another sentiment email, this time from IEX Cloud:
We’re reaching out to let you know about upcoming changes regarding our social sentiment data provided by Stocktwits. On June 1, 2020, Stocktwits data will be moving off our Core Data offering.
We had just saved our platform from losing sentiment. Now we were losing sentiment again.
One thing starting a business teaches is when everything is a surprise, nothing is a surprise. While it may not be possible to expect the unexpected, it does seem possible to become desensitized to surprises. To become unsurprised.
Another startup lesson is understanding that the game is to continuously solve problems until you generate revenue. If you cannot solve an important problem, or if you run out of time (money) while solving important problems, the game is over. Having our sentiment feed shutoff twice in two months was just another problem to solve.
I spent the next month being rejected by StockTwits and RavenPack.
I figured if we couldn’t source StockTwits data from IEX Cloud, maybe we could obtain it directly from StockTwits. In fact, seeing how StockTwits was using their own sentiment data, I thought we could probably help them too. No one at StockTwits would respond to me - even after Linking-In with me.
Something about the nature of StockTwits made me not want to chase them. While retail / social sentiment could be a valuable alt-data input, it felt like a niche of a niche. Even their name, StockTwits, turned me off1. Of course, it’s easy to feel turned off by a firm that won’t engage with you.
I met RavenPack at a Tabb Forum fintech event in 2019. RavenPack was an established company out of Spain with a set of NLP analytics across text, news, insider transactions, and corporate events. After an exchange with their product expert from the event, I met with their Head of Strategy who reported to their CEO.
RavenPack appreciated what we were doing and thought they could help us. But they very nicely informed me that without significant upfront capital from Point Focal, they would not engage with us. They had tried revenue share deals with small firms before and determined that the risk-reward, cost-benefit equation for RavenPack was not favorable. In fact, they expressed that through no fault of our own, their appetite to engage with us was impacted by their prior engagements with other small firms. Understandable, but frustrating.
Most startups fail. But we were fighting to become unlike most startups.
It is this risk-reward equation for established firms that prompted me to write about Estimize’s willingness to engage with Point Focal in Solving an Earnings Challenge:
From experience, I know there are many firms unwilling to work with a bootstrapped startup. This is both understandable and exhausting. I appreciate the cost-benefit analysis of such decisions from established firms. However, I also believe if there is minimal cost, low risk, and asymmetric upside to the established firm, there is value in supporting aspiring businesses. Sometimes filters should be applied to cost-benefit analyses. Leigh’s willingness to work with Point Focal is something I am excited to pay forward to another firm in the future.
I don’t know where I discovered Brain. I don’t even think I was looking for sentiment when I saw Brain. But it was June 11th, 2020, nine days after my RavenPack meeting, when I sent Brain’s CEO, Francesco Cricchio, a LinkedIn message introducing myself, our sentiment predicament, and the nature of our bootstrapped fight for survival.
Francesco responded immediately and was happy to explore an engagement. Brain was based in Europe and had a variety of NLP alternative data sets, including single-stock sentiment. After a couple of calls with Francesco and Brain’s Head of Research, Matteo Campellone, we were not only back in the sentiment business, we had a new partner with complementary capabilities genuinely aligned with our model.
This felt different and was exciting.
As I discussed on the 12th episode of Signals, Split Personality, it’s not only the things you get in life that shape your path, it’s also the things you don’t get in life. We did not get PsychSignal, StockTwits, or RavenPack.
Brain is a young firm with wonderful Italian accents based in Milan. Like Point Focal they are small, entrepreneurial, and ambitious. Unlike Point Focal, they have deep expertise at the intersection of natural language processing and data science. Beyond global individual stock sentiment, they have a growing suite of alternative data and quantitative capabilities that includes global market sentiment, sentiment volatility, thematic baskets, 10-K, 10-Q, and corporate earnings call language metrics, machine learning stock rankings, and a programmatically accessible back-testing engine. Brain was more than a sentiment feed!
Brain enabled us fast. In a matter of weeks, we replaced our sentiment data pipeline for the second time.
The best partnerships come when natural synergies exist. Showing Brain their content in a way they had not seen it before demonstrated our capabilities. Brain’s NLP data expertise and our ability to visualize their sentiment created an end-product more valuable than what either of us could produce individually. The complementary nature of our offerings was promising.
Brain’s sentiment is sourced from news rather than social media, lending itself more towards signal than noise and creating a more institutional feel to our analytics. Brain has educated our team on how to interpret sentiment scores, how to align the timing of news sentiment with stock performance, and how to objectively analyze the predictive value of sentiment information.
This last point is crucial. As we quantify the statistical meaningfulness of stock sentiment and performance, we are learning that the sheer volume of strongly scored news sentiment - regardless of whether it is positive or negative - can indicate future stock volatility. Further, the volatility of broad market sentiment has recently preceded significant risk-on, risk-off inflection points for U.S. equities.
Our Brain sentiment content will quickly grow from individual stock to portfolio level analytics. Intraday broad market sentiment and sentiment volatility will follow, creating a completely new, quantitative view of sentiment behavior. Then, we will do something more ambitious.
As we integrate more Brain content we will create a comprehensive suite of sentiment and NLP analytics. Coupling these analytics with the Brain backtesting engine will effectively productize sentiment and NLP quantitative models. Pushing timely alt-data driven signals into our insight platform creates a novel offering. A joint capability that can help a broad swath of the asset management community.
The best part will be producing performance attribution reports that demonstrate that Brain and Point Focal turn sentiment analytics from nice-to-have to must-have.
Our original vision of producing a unique sentiment lens on portfolios has evolved into something greater. With the right partner we’ve learned that sentiment is the product of an NLP capability. And using that capability across a broader set of language content creates analytic depth. It is depth that differentiates our analytics.
As a startup focused on alternative data and portfolio analytics, content is king. Again from Solving an Earnings Challenge, I wrote that content selection is paramount:
Selecting data is a critical input to building portfolio analytics. The solution’s entire worth is data dependent.
As data selection is to portfolio analytics, partners are to businesses. The most important function of any company is talent acquisition, an HR term for deciding who you work with. This is true of internal employees and external partners. Companies do not have thoughts, feelings, or sentiment. Their people do. And so with whom you build your business determines your sentiment. And your outcome follows.
We feel very fortunate to have a strong and growing ecosystem of partners working with Point Focal. We are connected by a shared sentiment. And just as an analytics stack can be built by uniquely connecting technology components to create a system more valuable than its parts, so too can a product distribution stack be built by uniquely connecting complementary businesses into a network more valuable than their individual companies.
Our partnership with Brain epitomizes this alignment. And it is the result of solving one sentiment challenge after another until you have a sentiment solution.
I wrote about error correction in Pub-Sub Fit. Here its process has produced sentiment knowledge:
To understand sentiment analytics, understand sentiment production.
To understand sentiment production, understand its NLP engine.
Understanding the engine leads to other uses (leverage and scale).
Understanding other uses creates NLP analytic depth.
This has led us to a set of NLP analytics spanning single-stock, portfolio, and market sentiment, 10-K, 10-Q, and earnings calls language metrics, and smart thematic baskets built on processed language. Quantifying this content produces new ideas, signals, and insight for anyone who wants to understand how greed and fear can impact performance and risk.
Our sentiment path turns language into data into language2. News stories into sentiment scores into quantitative research. Words to numbers to words.
It’s a long way from our first attempt to interpret the mood of a tweet.
And surely the product of our own greed and fear.
“A slight aesthetic thing I believe in very strongly is the names of companies are often verydictive of future failure or success. PayPal was a very friendly name. It was the friend that helps you pay. Napster was a bad name. It was the music sharing site. You nap some music, you nap a kid. That sounds like a bad thing to be doing.” Peter Theil, on company names.
For the geeks: ABC > NLP > 123 > NLG* > DEF
* Natural Language Generation