Machine Learning Economist
Federal Reserve Bank of Philadelphia
10 N Independence Mall W Philadelphia, PA 19106
PEAD.txt: Post-Earnings Announcement Drift Using Text (with Pierre Liang, Bryan Routledge and Madeline Scanlon) Link
Forthcoming at JFQA
We construct a new numerical measure of earnings announcement surprises, standardized unexpected earnings call text (SUE.txt), that does not explicitly incorporate the reported earnings value. SUE.txt generates a text-based post-earnings announcement drift (PEAD.txt) larger than the classic PEAD and can be used to create a profitable trading strategy. Leveraging the prediction model underlying SUE.txt, we propose new tools to study the news content of text: paragraph impact and paragraph classification scheme based on the business curriculum. With these tools, we document many asymmetries in the distribution of news across content types, demonstrating that earnings calls contain a wide range of news about firms and their environment.
One Threshold Doesn't Fit All: Tailoring Machine Learning Predictions of Consumer Default for Lower-Income Areas Link
Modeling advances create credit scores that predict default better overall, but raise concerns about their effect on protected groups. Focusing on low- and moderate-income (LMI) areas, we use an approach from the Fairness in Machine Learning literature — fairness constraints via group-specific prediction thresholds — and show that gaps in true positive rates (% of non-defaulters identified by the model as such) can be significantly reduced if separate thresholds can be chosen for non-LMI and LMI tracts. However, the reduction isn’t free as more defaulters are classified as good risks, potentially affecting both consumers’ welfare and lenders’ profits. The trade-offs become more favorable if the introduction of fairness constraints is paired with the introduction of more sophisticated models, suggesting a way forward. Overall, our results highlight the potential benefits of explicitly considering sensitive attributes in the design of loan approval policies and the potential benefits of output-based approaches to fairness in lending.
Corporate Disclosure: Facts or Opinions? (with Shimon Kogan)
A large body of literature documents the link between textual communication (e.g., news articles, earning calls) and firm fundamentals, either through pre-defined “sentiment” dictionaries or through machine learning approaches. Surprisingly, little is known about why textual communication matters. In this paper, we take a step in that direction by developing a new methodology to automatically classify statements into objective (“facts”) and subjective (“opinions”) and apply it to transcripts of earning calls. The large scale estimation suggests several novel results: (1) Facts and opinions are both prominent parts of corporate disclosure, taking up roughly equal parts, (2) higher prevalence of opinions is associated with investor disagreement, (3) anomaly returns are realized around the disclosure of opinions rather than facts, and (4) facts have a much stronger correlation with contemporaneous financial performance but facts and opinions have an equally strong association with financial results for the next quarter.
The Language of Earnings Announcements Link
This study quantifies and characterizes the information content of earnings an- nouncement language via a statistical model of language that extracts the latent fac- tors most associated with absolute returns around the time earnings announcements are released. The language of earnings announcements explains 11% of the variation in absolute announcement returns out-of-sample. That is comparable to the explana- tory power of standard numerical variables. Using the latent factors to recover the features that are important, we show that the information content depends on what is mentioned, how it is mentioned, and where in a document it is mentioned. Find- ings show that earnings components are more important than bottom line net income. Sentiment and forward-lookingness amplify the information content of all themes, and information content is more concentrated at the beginnings of texts.
Bank Credit Supply and Shadow Mortgage Lending SSRN Page
With a novel application of a simple supply-demand decomposition methodology to residential mortgage markets, I analyze the role of bank credit supply, shadow lender's own supply, and local demand in lending growth by shadow mortgage companies. I show that shadow lending grew faster in counties exposed to increases in bank credit supply. At the same time, shadow firms' own supply shocks explain more variation in shadow lending growth than bank supply shocks. These results suggest that shadow lenders have operational advantages over banks, but are also connected to them, perhaps via warehouse lines of credit.
Work in Progress
Structural Model of Narrative Disclosure (with Pierre Liang and Bryan Routledge)
We go beyond statistical models of language and analyze earnings press releases in the strategic reporting context. We combine a deep neural net that generates earnings announcement texts with a utility model in which the manager maximizes the short-term price impact of language subject to constraints. In counterfactual analyses, we examine how the language changes when we modify the weight of the price maximization objective relative to the constraints.
Analyst-specific Use of Public Data
We study analyst-specific processing of publicly available data as a source of analyst disagreement. In our model, analysts form earnings forecasts based on publicly available data. Analysts following the same firm can arrive at different estimates because they put different weights on different kinds of signals. Estimating analyst-specific models allows us to classify analysts, for example, into "macro followers" and "stock market followers." The goal is to examine the relationship between the analyst use of public data and forecast accuracy, as well as market reactions to forecasts of analysts employing different models.
Identifying Non-GAAP reporting in earnings announcement text
We develop a Natural Language Processing pipeline for identifying Non-GAAP reporting in earnings announcement text. First, we identify the mentions of specific non-GAAP items such as "adjusted earnings" or "pro forma revenues." Second, we identify the mentions of their closest GAAP counterparts. Third, we identify the exclusions or additions used to calculate the non-GAAP number. Our system will be used to create a comprehensive dataset. That dataset will increase the potential scope of non-GAAP reporting research by expanding the dataset size and granularity. Possible venues include the studies of the prevalence of various non-GAAP measures, and their emphasis relative to the GAAP measures.