# Research

## Working Papers

** Distilling Data from Large Language Models: An Application to Research Productivity Measurement **

(with Maya M. Durvasula and Sabri Eyuboglu)

Last updated: September, 2024

[ Abstract | arXiv | Data ]

We develop a method for assigning high-quality labels to unstructured text. The method is based on fine-tuning an open-source language model with data extracted from a proprietary large language model. We apply this method to construct a census of published clinical trials. We revisit a literature that contends that pharmaceutical research productivity is declining, based on measured increases in the quantity of clinical trials, which outpace trends in output. In our data, the quantity and composition of trials are stable since 2010. Previous measurements are an artifact of biases driven by shifts in the composition of other forms of research.

** Simultaneous Inference for Local Structural Parameters with Random Forests **

(with Vasilis Syrgkanis)

Last updated: September, 2024

[ Abstract | arXiv ]

We construct simultaneous confidence intervals for solutions to conditional moment equations. The intervals are built around a class of nonparametric regression algorithms based on subsampled kernels. This class encompasses various forms of subsampled random forest regression, including Generalized Random Forests (Athey et al., 2019). Although simultaneous validity is often desirable in practice---for example, for fine-grained characterization of treatment effect heterogeneity---only confidence intervals that confer pointwise guarantees were previously available. Our work closes this gap. As a by-product, we obtain several new order-explicit results on the concentration and normal approximation of high-dimensional U-statistics.

** Randomization Inference: Theory and Applications **

(with Joseph P. Romano and Azeem M. Shaikh)

Last updated: June, 2024

[ Abstract | arXiv ]

We review approaches to statistical inference based on randomization. Permutation tests are treated as an important special case. Under a certain group invariance property, referred to as the ''randomization hypothesis,'' randomization tests achieve exact control of the Type I error rate in finite samples. Although this unequivocal precision is very appealing, the range of problems that satisfy the randomization hypothesis is somewhat limited. We show that randomization tests are often asymptotically, or approximately, valid and efficient in settings that deviate from the conditions required for finite-sample error control. When randomization tests fail to offer even asymptotic Type 1 error control, their asymptotic validity may be restored by constructing an asymptotically pivotal test statistic. Randomization tests can then provide exact error control for tests of highly structured hypotheses with good performance in a wider class of problems. We give a detailed overview of several prominent applications of randomization tests, including two-sample permutation tests, regression, and conformal inference.

**Reproducible Aggregation of Sample-Split Statistics**

(with Joseph P. Romano)

Revise and Resubmit, *American Economic Review*

Last updated: November, 2023

[ Abstract | arXiv ]

Statistical inference is often simplified by sample-splitting. This simplification comes at the cost of the introduction of randomness not native to the data. We propose a simple procedure for sequentially aggregating statistics constructed with multiple splits of the same sample. The user specifies a bound and a nominal error rate. If the procedure is implemented twice on the same data, the nominal error rate approximates the chance that the results differ by more than the bound. We analyze the accuracy of the nominal error rate and illustrate the application of the procedure to several widely applied statistical methods.

## Journal Articles

**Semiparametric Estimation of Long-Term Treatment Effects**

(with Jiafeng Chen)

*Journal of Econometrics*. 237 (2). December 2023.

[ Abstract | arXiv | Software ]

Long-term outcomes of experimental evaluations are necessarily observed after long delays. We develop semiparametric methods for combining the short-term outcomes of experiments with observational measurements of short-term and long-term outcomes, in order to estimate long-term treatment effects. We characterize semiparametric efficiency bounds for various instances of this problem. These calculations facilitate the construction of several estimators. We analyze the finite-sample performance of these estimators with a simulation calibrated to data from an evaluation of the long-term effects of a poverty alleviation program.

**Confidence Intervals for Seroprevalence**

(with Thomas J. DiCiccio, Joseph P. Romano, and Azeem M. Shaikh)

*Statistical Science*. 37 (3). August 2022.

[ Abstract | arXiv ]

This paper concerns the construction of confidence intervals in standard seroprevalence surveys. In particular, we discuss methods for constructing confidence intervals for the proportion of individuals in a population infected with a disease using a sample of antibody test results and measurements of the test's false positive and false negative rates. We begin by documenting erratic behavior in the coverage probabilities of standard Wald and percentile bootstrap intervals when applied to this problem. We then consider two alternative sets of intervals constructed with test inversion. The first set of intervals are approximate, using either asymptotic or bootstrap approximation to the finite-sample distribution of a chosen test statistic. We consider several choices of test statistic, including maximum likelihood estimators and generalized likelihood ratio statistics. We show with simulation that, at empirically relevant parameter values and sample sizes, the coverage probabilities for these intervals are close to their nominal level and are approximately equi-tailed. The second set of intervals are shown to contain the true parameter value with probability at least equal to the nominal level, but can be conservative in finite samples.

**Uncertainty in the Hot Hand Fallacy: Detecting Streaky Alternatives to Random Bernoulli Sequences**

(with Joseph P. Romano)

*The Review of Economic Studies*. Featured Article. 89 (2). March 2022.

[ Abstract | arXiv | Online Appendix ]

We study a class of permutation tests of the randomness of a collection of Bernoulli sequences and their application to analyses of the human tendency to perceive streaks of consecutive successes as overly representative of positive dependence—the hot hand fallacy. In particular, we study permutation tests of the null hypothesis of randomness (i.e., that trials are i.i.d.) based on test statistics that compare the proportion of successes that directly follow k consecutive successes with either the overall proportion of successes or the proportion of successes that directly follow k consecutive failures. We characterize the asymptotic distributions of these test statistics and their permutation distributions under randomness, under a set of general stationary processes, and under a class of Markov chain alternatives, which allow us to derive their local asymptotic power. The results are applied to evaluate the empirical support for the hot hand fallacy provided by four controlled basketball shooting experiments. We establish that substantially larger data sets are required to derive an informative measurement of the deviation from randomness in basketball shooting. In one experiment, for which we were able to obtain data, multiple testing procedures reveal that one shooter exhibits a shooting pattern significantly inconsistent with randomness – supplying strong evidence that basketball shooting is not random for all shooters all of the time. However, we find that the evidence against randomness in this experiment is limited to this shooter. Our results provide a mathematical and statistical foundation for the design and validation of experiments that directly compare deviations from randomness with human beliefs about deviations from randomness, and thereby constitute a direct test of the hot hand fallacy.