Embedding regression: fashions for context-specific description and inference
By Pedro L. Rodriguez, New York College, Arthur Spirling, New York Collegeand Brandon M. Stewart, Princeton College
Social scientists generally search to make statements about how phrase utilization varies throughout circumstances—together with time, partisan identification, or another document-level covariate. For instance, researchers would possibly wish to know the way Republicans and Democrats diverge of their understanding of the time period “immigration.” Constructing on the success of pretrained language fashions, we introduce the à la carte-on textual content (conText) embedding regression mannequin for this objective. This fast and easy technique produces legitimate vector representations of how phrases are used—and thus what phrases “imply”—in numerous contexts. We present that it outperforms slower, extra difficult alternate options and works nicely even with only a few paperwork. The mannequin additionally permits for speculation testing and statements about statistical significance. We display that it may be used for a variety of vital duties, together with understanding US polarization, historic legislative growth, and emotion detection. We offer open supply software program for customizing the mannequin.