1) Data Sources & Joins
University Staff Directories
- Scrape official Accounting/Finance staff pages across G8 universities.
- Collected fields: researcher name, job title, profile URL, field (Accounting/Finance).
Journal Quality Lists (ABDC/JQL)
- Load ABDC/JQL files (e.g., 2010/2016/2022).
- Primary join on ISSN (preferred).
- Fallback to fuzzy title matching when ISSN is missing/ambiguous.
Journal Impact Factor (Clarivate JIF)
- Where ISSN exists in JCR, attach JIF and 5-year JIF to the journal.
- Journals without JIF are excluded from JIF-based averages.
Database Linking (journal matching pass)
- Match
Publications.journal_name → Journals using fuzzywuzzy process.extractOne with threshold 95.
force=True: reset existing journal_id before re-matching.
university="all" | "<name>": limit matching scope.
2) Standardisation Pipeline (Exactly How We Clean & Normalize)
2.1 Required fields & null handling
- Required: Title (index 0), Type (2), Researcher Name (5), Profile URL (6). Missing → row rejected.
- Empty →
None for: Year (1), Journal Name (3), Article URL (4).
2.2 Type cleanup
- Trim trailing
› from publication type strings where present.
2.3 Year coercion
- If Year is numeric, convert to integer.
2.4 Role blacklist (exclusions)
- If job title contains any of: Education-Focused, Education Focused, Education Focussed, Teaching-Focused, Teaching Focused, Teaching Focussed → set job title to
"Exclude".
- Rationale: these positions are not comparable to research-active roles for ranking.
2.5 Title normalization (canonical roles)
Normalize raw titles to canonical forms. We search Job Title first, then Researcher Name if needed.
| Raw / Variant | Canonical |
| Associate Lecturer | Associate Lecturer |
| Lecturer (A) | Associate Lecturer |
| Lecturer | Lecturer |
| Fellow | Fellow |
| Senior Lecturer | Senior Lecturer |
| Senior Fellow | Senior Fellow |
| Associate Professor / Associate Prof / AsPr | Associate Professor |
| Professor / Prof | Professor |
| Professorial Fellow | Professorial Fellow |
| Professor Emeritus / Emeritus Professor / Emeritus | Professor Emeritus |
2.6 Name cleaning
- Strip prefixes from Researcher Name: Dr., Ms., Mr., Mrs., Prof., Associate Professor, Professor, Scientia Professor, Professor Scientia, Emeritus Professor, Professor Emeritus, Emeritus, and short codes EmPr, AsPr.
- Result example: “Prof. Jane Q. Smith” → “Jane Q. Smith”.
2.7 Academic level mapping (A–E)
| Canonical Title | Level |
| Associate Lecturer | A |
| Lecturer | B |
| Fellow | B |
| Senior Lecturer | C |
| Senior Fellow | C |
| Associate Professor | D |
| Professor | E |
| Professorial Fellow | E |
| Professor Emeritus | E |
| Exclude | — |
Note: If role is None or unrecognized, the level is set to None.
3) What Gets Written to the Database
- After standardization, the academic level is inserted into each row at index 9.
- Skip researchers whose job title equals
"Exclude".
- Researcher uniqueness: (Name, Profile URL). If found, update job title / level / field when changed.
- Publication uniqueness: (Title, Researcher). New publications are inserted and linked.
4) Ranking Metrics (Exactly How They're Computed)
- Total Publications: count of journal articles retrieved (after cleaning).
- A*/A-Ranked Publications: count of publications whose journals map to ABDC A* or A.
- Average JIF: mean of JIF values across only those articles with valid JIF.
Avg JIF = (Σ JIFi) / N, where N = number of articles with JIF.
- Average 5-Year JIF: same as above, using 5-year JIF.
- Average Citation Percentile: mean of available OpenAlex citation percentiles for a researcher’s publications.
Notes: JIF and 5-year JIF are journal-level stats joined via ISSN → JCR. If no JIF exists, JIF-based averages are undefined (not zero).
5) Known Limitations (Transparency for Users)
- University data dependency: institutional directories may be incomplete/out-of-date.
- Name/affiliation matching noise: can produce over-/under-counting, especially for common names.
- Journal matching: fuzzy matching (threshold 95) can mis-map near-duplicates; ISSN is preferred but not always present.
- Coverage gaps: some journals lack ABDC ranks or JIF, leading to missing metrics.
- Metric bias: total publications favor volume; averages can be skewed by outliers; citation norms vary by field/year.
6) Worked Example (End-to-End)
Assume a researcher has 5 articles:
- Four with JIFs: 7.5, 5.0, 2.5, 1.2; ABDC: A*, A, A, B.
- One in a journal with no JIF and no ABDC entry.
Totals: total publications = 5; A*/A publications = 3.
Avg JIF: (7.5 + 5.0 + 2.5 + 1.2) / 4 = 4.05
Avg 5-year JIF: computed the same way, using 5-year values only.
Avg Citation Percentile: mean of all available publication percentiles.
7) Complete Lists (for Auditing)
Accepted canonical roles
- Associate Lecturer
- Lecturer
- Fellow
- Senior Lecturer
- Senior Fellow
- Associate Professor
- Professor
- Professorial Fellow
- Professor Emeritus
Raw variants recognized (mapped to above)
- Associate Lecturer, Lecturer (A), Lecturer, Fellow, Senior Lecturer, Senior Fellow
- Associate Professor, Associate Prof, AsPr
- Professor, Prof, Professorial Fellow
- Professor Emeritus, Emeritus Professor, Emeritus
Excluded keywords (blacklist)
- Education-Focused, Education Focused, Education Focussed
- Teaching-Focused, Teaching Focused, Teaching Focussed