A Web-Based Approach to Measure Skill Mismatches and Skills Profiles for a Developing Country:. Jeisson Arley Cárdenas Rubio
ISIC International Standard Industrial Classification of All Economic Activities
ISO International Organization for Standardization
IT Information Technology
LASSO Least Absolute Shrinkage and Selection Operator
LEFM Local Economy Forecasting Model
LFS Labour Force Survey
LTDA Limitada
MAC Migration Advisory Committee
MEN Ministerio de Educación Nacional de Colombia
N&E New and Emerging (Occupations)
NIF Normas de Información Financiera
NIIF Normas Internacionales de Información Financiera
NOS National Occupational Standards
NQF National Qualifications Framework
OECD Organisation for Economic Co-operation and Development
OEI Organización de Estados Iberoamericanos
OLS Ordinary Least Squares
O*NET Occupational Information Network
ONS Office for National Statistics
OSP Occupational Skills Profiles
OVATE Skills Online Vacancy Analysis Tool
PHP Hypertext Preprocessor
PIAAC Programme for the International Assessment of Adult Competencies
PISA Programme for International Student Assessment
RSPO Roundtable on Sustainable Palm Oil
RUES Registro único empresarial
SENA Servicio Nacional de Aprendizaje
SEO Search Engine Optimization
SIC Standard Industrial Classification
SMEs Small and Medium-Sized Enterprises
SMMLV Salario mínimo mensual legal vigente
SNIES Sistema Nacional de Información de Educación Superior
SNPP Sub-National Population Projections
SOC Standard Occupational Classification
SQA Software Quality Assurance/Advisor
SQL Structured Query Language
SST System Support Team
SSTA Gestión en seguridad, salud en el trabajo y ambiente
STEP Skills Measurement Program
SVM Support Vector Machine
TAT Store-to-store (for its acronym in Spanish)
TVET Technical and Vocational Education and Training
UAESPE Unidad Administrativa Especial del Servicio Público de Empleo
UK United Kingdom
UKCES UK Commission for Employment and Skills
US United States
VET Vocational Education and Training
XML Extensible Markup Language
This book studies how, and to what extent, a web-based system to monitor skills and skill mismatches could be developed for Colombia based on information from job portals. More specifically, this document seeks to answer the following questions: 1) How can information from job portals be used to inform policy recommendations? And, in order to address two of the major labour market problems in Colombia, which are high unemployment and informality rates, 2) to what extent can information from job portals (unsatisfied demand) and national household surveys (labour supply) be used together to provide insights about skill mismatch issues in a developing economy?
Consequently, this book investigates the challenges, advantages, and limitations of collecting information from job portals and proposes a framework to test this information’s validity for economic analysis. It conducts an innovative labour market analysis and develops indicators based on updated and robust labour demand (job portal) and labour supply (household survey) information to tackle skill mismatches, extending thus the use of novel sources of information to yet unexplored areas in the existing labour economics literature.
By doing so, this study makes conceptual, methodological, and empirical contributions to the ongoing debate in economics about the use of information from job portals for labour demand analysis. The main conceptual contribution consists of demonstrating that the concept and sources of Big Data (in this case, job portal sources) can provide consistent results to orient public policies (see Chapters 7 to 9). This document also demonstrates that, with the proper techniques, information from job portals can fulfil conceptual requirements to be considered as high-quality data for labour market analysis (see Chapters 4 and 10).
The main methodological contribution is the development of a detailed framework and methods to collect, clean, and organise (i.e. web scraping, occupation and skill identification, etc.) vacancy data, which allows testing and analysing this source of information for consistent labour market insights. Specifically, this book contributes to the methodology of processing information from job portals for public policy advise by: 1) discussing different criteria (volume, website quality, and traffic ranking) to select the most relevant and trustworthy job portals in order to collect vacancy information (Chapter 5); 2) providing a detailed explanation about Big Data techniques (web scraping) and the challenges they pose for automatically collecting job advertisements from job portals (Chapter 5); 3) applying mixed-methods approaches (text mining, word-based matching methods, etc.) to standardise information collected from different job portals into a single database for statistical analysis (Chapter 6); 4) implementing and extending a mixed-methods approach (stop words, stemming, extensions of a machine learning algorithm, etc.) in order to identify skills and occupations in online job announcements (Chapter 6); 5) and, importantly, using this extended mixed-methods approach (e.g. a skills dictionary to identify skill patterns) to find new or specific skills and occupations in the Colombian labour market, which would otherwise be complex to identify via other means (e.g. household surveys) (Chapter 6).
Moreover, the book proposes a (n-gram-based) method to reduce duplication issues (as information is collected from different job portals, some job advertisements can be repeated) and a (Lasso) method to impute missing values, such as education and wages (Chapter 6). Consequently, by implementing and extending novel mixed methods, 6) this document improves data collection and helps to understand methodological changes to collect and organise information from job portals.
As a product of the above methods, a vacancy database was consolidated for the period between January 1, 2016 and December 31, 2018 (Chapter 7). In addition, this document makes further methodological contributions by 7) proposing a framework to evaluate the internal (consistency) and external (representativeness) validity of this vacancy database. To test internal validity, a statistical