Job postings data have become one of the most widely used sources of real-time labor market information. Researchers, policymakers, and workforce agencies increasingly rely on them to understand employer demand for skills and develop occupational projections and taxonomies. They do so, in part, because vendors have successfully marketed them as providing pinpoint-accurate real-time data.
But do those data live up to their promise?
Not entirely. Job ads contain considerable sources of noise, and one could even say that they add noise at the source of analysis. They also require researchers to convert unstructured text into reliable, structured data, which is no easy task. But they are nevertheless useful, especially when researchers are given access to unprocessed, transparent data.
Signal vs. Noise
The noise in job ads takes many forms, but perhaps the most glaring is inconsistency in the types of data that employers choose to include. Many job ads are missing key information about required skills and education. A recent analysis found that a typical job ad contains fewer than 200 words arguably related to skills, out of roughly 1,000 words on average, while some ads are virtually bare and contain little or no information on skills. A more rigorous analysis found that few ads contain useful salary information, and the Georgetown University Center on Education and the Workforce (CEW) previously found that only about 40% contain information about education level.
The sources of noise begin with the ads’ creation. When an HR department creates a job ad, they intend for it to solicit applications, not generate labor market information. Even so, we know of no analysis that explores the degree to which job ads lead to successful hires. And an HR department may be subject to various organizational mandates – as well as legal requirements, which may differ substantially across state lines – that have little connection to the actual job. In addition, some firms post job openings even when there is no actual job to fill (“résumé harvesting”) or when the preferred candidate has been predetermined, while others hire without ever posting open positions. Using data from the U.S. Bureau of Labor Statistics’ Job Openings and Labor Turnover Survey (JOLTS), Steven Davis and colleagues found that “establishments reporting zero vacancies at month’s end” make up more than 40% “of all hires in the next month.” This suggests that hiring depends heavily on passive recruiting or nontraditional channels rather than on job postings.
Ultimately, vendors need to translate job ads into usable data. Most vendors are not entirely transparent about how, or how well, they accomplish this; for instance, they do not publicize their error rates. Nonetheless, job ads do contain useful signals. In a 2014 report on online job postings, CEW compared job ads to JOLTS data and concluded that the two series move together, indicating that job ads can be a good indicator of demand. The same report found that the most reliable data fields within job ads were 70 to 80% accurate.
The public needs better information about the accuracy of job ad–based labor market information, especially as it grows in importance. The Bureau of Labor Statistics should provide guidance on the accuracy of this information. Job postings data could provide answers to many questions, but if the data elements derived from job ads have low reliability, any combination of these elements will be even less reliable.
From Text to Data: Giving Job Postings a Second Life in Research
The harsh reality for researchers and analysts is that job postings, like other sources of non-survey research data, were never intended to make their way into datasets used for research. They are written to attract applicants and comply with organizational and legal norms, with little thought given to providing standardized, research-ready or machine-readable descriptions of jobs. As a result, postings vary widely in structure, terminology, and level of detail. Titles are inconsistent, requirements are embedded in free text, and an advertisement may reflect HR conventions rather than the day-to-day reality of work. For those experienced in working with this data, this lack of structure is not an anomaly but a defining feature.
When handled thoughtfully, job postings can offer those looking for granular insights into labor demand and the nature of work something truly valuable. They provide timely signals about employer preferences, emerging skills, and changing credential requirements at a scale and level of detail that few other data sources can match. Used for trend analysis, comparison across regions or occupations, policy development, or building career pathways, they can meaningfully inform research and decision-making. While researchers have an obligation to ensure that research published based on job postings data does not gloss over limitations, there is a compelling argument that these advantages outweigh the limitations noted previously.
The value of job postings is strengthened when combined with more traditional survey-based sources of labor market information. To fully realize that value requires clear standards and openness. Researchers need access to unprocessed, transparent data as a source of truth, along with clear assumptions and tools that identify measurement error rather than obscuring it. The National Labor Exchange (NLx) Research Hub deserves special mention in this regard as a resource that provides affordable access to a large corpus of job postings data built on TRUST (Traceable, Reliable, Usable, Supervised, and Transparent) for research use. We can unlock the power of job postings not by pretending they are flawless, but by being honest about the nature of the data and leveraging modern methodologies to increase their utility.
Challenges with Establishing a “Ground Truth” for Job Postings
Gaining access to job postings from a dataset like the NLx Research Hub is an essential first step in the research process, but researchers also need to be able to make meaning from what at first will seem like drinking from a firehose of unstructured data. Researchers need a standard set of tools to add structure to their data and extract insights from it from standardized codes defining tasks, skills, and occupations. The open-source Job Ad Analysis Toolkit at Loyola University Chicago, developed in conjunction with the Technical University of Munich, addresses this issue using job postings from the NLx Research Hub. Given any job posting as an input, these tools produce standard and structured labels for tasks, skills, occupations, and more.
The challenge? The community of workforce and labor market researchers and analysts lack public benchmarks, agreed-upon frameworks, and prior work for assessing the ground truth of labor market information tools or products. This working paper addresses validity directly. For example, to test the accuracy of tools that map job titles to occupational codes, researchers built a dataset of 65,645 pairs of job titles and occupation codes from public employer filings from 2008 to 2024. They developed a specialized model, TitleMatch, to quickly code job titles to occupations by embedding a job title and finding the most semantically similar entry on O*NET’s standard list of sample and alternate job titles. They tested the TitleMatch model we developed against three other public models: SOCcer, NIOCCS and SOCkit.
Across all models, accuracy at the detailed six-digit level ranges between 30% and 49%. While TitleMatch consistently achieved high scores, the broader takeaway is that despite being common practice, coding occupation from job titles is not enough. Because titles are ambiguous, future tools that want to improve beyond the roughly 70% accuracy ceiling of title-based classification will need to analyze the actual task and context of a posting.
Technical exercises such as these speak to the need for a more open approach to workforce data and for more rigorous research. Even with noisy extraction, error can be addressed once identified. And, when aggregated using standard procedures, consumers can know the reliability of any estimates produced. Investments need to be made to bring meaning and useful information out of the truly vast corpus of unstructured data that exists in job posting repositories, and we would certainly benefit from many more open-source tools. It would take significant effort to code job postings more accurately to occupations and to the actual location of work (especially as hybrid and remote work become a permanent feature of the labor market), and then to match them to standardized lists of employers and required credentials. By publishing benchmarks and tools that enable common assessments of accuracy, we can move toward an ecosystem where researchers and policymakers can evaluate the trustworthiness of data. The Job Ad Analysis Toolkit (JAAT) and validation work suggest a path forward to make labor market information capable of capturing the dynamism, diversity, and details of the workforce itself.
Conclusion
Job postings have real potential for researchers interested in a wide range of labor market phenomena such as employer demand for skills and credentials. They present a view of the labor market that is inherently biased by the interests of the HR departments and hiring managers who create them, but the insights they provide have no real substitute in other data sources. The research community and the many philanthropic funders working to ensure that labor market information serves the public good should demand more transparency from data vendors, invest in validation studies comparing posted requirements to actual hires, and adopt open-source tools like JAAT that make their methods and error rates transparent.
However, researchers should not be afraid to use job ads as data, so long as they are realistic about what they are and what they are not and are careful to ensure that job ads are the right option for answering their specific research questions.
