A Web-Based Approach to Measure Skill Mismatches and Skills Profiles for a Developing Country:. Jeisson Arley Cárdenas Rubio
have used and derived conclusions from job portal data without considering in detail the possible biases and limitations of this information (e.g. Backhaus 2004; Kureková, Beblavy, and Thum 2016; Kennan et al. 2008). Like any other source of data, information from job portals has biases and limitations. For instance, given the type of internet users, among other data quality issues, job portals are unlikely to be representative of the whole economy or a specific sector, or they might not reflect real trends in labour demand. The lack of debate concerning data validity has affected the credibility of job portals as a consistent and useful resource for labour market analysis.
A conceptual and methodological framework is required in order to use vacancy data and to properly address issues such as skill mismatches. Therefore, this book seeks a better understanding about the use of new sources such as job portals to analyse the labour market (skill mismatches) in a developing country such as Colombia. This study responds to the need to develop a more efficient way to collect and analyse information about labour demand and skills in order to identify potential skill shortages. This kind of work supports the design of national skills strategies, while enhancing the capacity of governments to develop public policies to tackle current skill mismatches (Cedefop 2012a).
To this end, this book is structured as follows: Chapter 2 discusses the concepts and theoretical framework used in this document to analyse labour market based on the information found on online job portals. First, this chapter introduces basic conceptual and statistical definitions for labour demand (e.g. job vacancies) and labour supply (e.g. unemployed and employed workers). Second, given that a considerable share of the population in Colombia works in irregular market conditions, this chapter discusses what is understood in the academic literature by informality. Furthermore, the concept of skills and different ways to measure them for economic analysis are examined. Subsequently, the previously mentioned definitions are used to describe the dynamics of the labour market and its main outcomes, such as unemployment, wages, etc., under the assumption of perfect competition (e.g. assuming that companies and workers are perfectly informed about the quality and the price of “labour”). Nevertheless, the assumptions of perfect competition are unrealistic given that workers are usually not perfectly aware of employer skill requirements; similarly, this model is not an appropriate theoretical framework for economies such as Colombia (Garibaldi 2006). Based on a model with imperfect information (which seems more appropriate to describe Colombian labour market outcomes), Chapter 2 explains how skill mismatches can arise, as well as their consequences for informality and unemployment rates (Bosworth, Dawkins, and Stromback 1996; Reich, Gordon, and Edwards 1973; Stiglitz et al. 2013). This framework highlights that information failures might be one of the leading causes of high unemployment and informality rates. Thus, actions to decrease these information failures (such as the use of job portals) will considerably improve people’s employability.
Chapter 3 presents evidence that skill shortages, unemployment, and informality are high-frequency phenomena in Colombia (DANE 2017a; ManpowerGroup, n.d.; Arango and Hamann 2013). Moreover, it outlines how the government, as well as education and training providers, etc., face severe difficulties to tackle these issues due to the lack of a proper system to identify skills in demand and possible skill shortages (González-Velosa and Rosas-Shady 2016). First, the chapter describes the main characteristics of the Colombian labour market, such as unemployment, informality, etc., and their evolution during the last two decades. In addition, it provides a general description of the socio-economic characteristics of the labour force and—based on the little information available—the labour demand. Second, it evidences a high incidence of skill shortages in Colombia and their possible implications for labour market outcomes. It is argued that workers, education and training providers, as well as the government can do little to address these issues given the lack of proper information to monitor and identify employer requirements and possible skill shortages at the occupational level. Subsequently, the chapter presents an overview of the Colombian labour market focused on unemployment, informality, and skill shortages, and highlights the need for detailed information to adequately address these issues.
In Chapter 4, the concept of Big Data is introduced, with its advantages and limitations outlined for a labour market analysis. Moreover, this chapter explains why traditional statistical methods, such as household or sectoral surveys, encounter difficulties in providing detailed information about the labour market. First, it defines Big Data according to three properties: volume, variety, and velocity (Laney 2001). Then, it discusses the problems of traditional statistical methods, such as sample or survey design, that constrain labour market analysis in terms of occupations and skills (Kureková, Beblavy, and Thum 2014; Reimsbach-Kounatze 2015). Given these information gaps, the potential use of Big Data sources to complement labour market analysis is discussed, with a special focus on job portals and their possible application to tackle skill shortages. Subsequently, this chapter explains the limitations and caveats to be considered when online vacancy data are used for economic analysis. Furthermore, it emphasises the differentiating features of this book, compared to other ongoing studies.
Once the conceptual framework and the need for information and analysis to address skill shortages are established, Chapters 5 and 6 present a comprehensive methodology to systematically collect and standardise vacancy information from job portals. Chapter 5 describes available information that can be collected from Colombian job portals. Then, it proposes criteria to consider the volume of information on each job portal, as well as each website’s quality and traffic ranking to select the most important and reliable job portals for an analysis of the labour demand in Colombia. Subsequently, Chapter 5 describes the methodology (web scraping) and different challenges to automatically and rapidly collect a massive number of online job vacancies. The chapter also explains the methods that can be used to homogenise variables such as education and experience and to consolidate information from job portals into a single database.
Next, Chapter 6 illustrates the methods and challenges involved in standardising two of the most relevant variables for the economic analysis of the labour market: skills and occupations. Furthermore, this chapter examines the issues of duplication and missing value, which are some of the main concerns when analysing information from job portals. First, the chapter develops a method to automatically identify skill patterns in job vacancy descriptions based on international skill descriptors and text mining. Then, it proposes and applies a novel mixed-method approach (software classifiers and machine learning algorithms) to properly classify job titles into occupations. Third, as an employer might advertise the same job many times on the same job portal or on different job portals, the chapter identifies and minimises the issue of duplication. It also explains how missing values were imputed for the “educational requirement” and “wage offered” variables (which are relevant to test the validity of the vacancy database and to analyse labour demand) by using predictors such as occupation, city, and experience requirements. As a result of the above methods, a Colombian vacancy database is generated in Chapter 6 to be tested and analysed to address skill shortage issues.
Subsequently, a comprehensive descriptive analysis of the Colombian labour demand is conducted in Chapter 7. First, the analysis describes the selected job portals, as well as their geographic distribution, in order to build the mentioned vacancy database. Second, it provides a detailed descriptive analysis of the labour demand for skills in Colombia, such as education, occupational structure, potential new occupations, and skills and experience requirements. This description reveals characteristics of the labour demand that were unknown prior to this study. Third, this chapter examines the most notable labour demand trends by occupation: those with higher demand, those with a higher