Analyzing the Brand Marketing Conversation on LinkedIn with Python and Natural Language Processing

Introduction
PART I: Methodology
Scraping Framework – Why I Didn’t Automate Everything
PART II: Who Cares?
- Market Entry: Identifying Trends and Positioning Opportunities
- Competitive Intelligence: Revealing Patterns of Messaging Across Rivals
Scaling This Approach: From Insights to Long-Term Strategy
- Tracking Concept Lifecycles: From Niche to Mainstream
- Predictive Modeling for Trend Anticipation
Building a Unified Brand Health Score Across Digital Channels
- Metrics that could contribute to a more meaningful brand health score
Conclusion
Glossary

Introduction

In response to my comment wishing we didn’t have to reinvent the wheel on brand marketing, Clark Barron stated that his quirky take on ads as “legal cyberattacks” helped him go from a 0 to a million impressions over a 3 month period. Entertaining takes on marketing usually catch my eye since it’s a profession that still requires creativity at its core no matter how much we strip all of it down to numbers.

But even the most creative ventures can fail without the right intel to back it up. And there’s obviously more to it clever similes . To get a real sense of how brand marketing is discussed on LinkedIn, I needed to analyze the content beyond just a cursory read.

LinkedIn is one of the most valuable sources of real-time industry discussion, but keeping track of high-value conversations is difficult. I identified 11 industry leaders who frequently post about brand marketing and decided to analyze the last 10-12 posts that aligned with the topic. Since I’ve used Python and natural language processing (NLP) in the past for content analysis and SEO, I didn’t need to start from scratch.

I integrated scraping, keyword extraction, topic modeling to analyze the content, and zero-shot classification to categorize content into: Self-Promotion, Contrarian, Thought Leadership, Industry Insights, Educational, and Motivational.

The zero-shot classification results are masked for a few reasons:

Sampling bias (since not all posts were included)
Inadequate sample size
ModernBERT might not fully capture industry-specific nuance.
Zero-shot classification can be ambiguous, and misclassifications happen

The authors have been offered a copy of the unmasked results.

PART I: Methodology

Rather read its application instead?

Skip to Part II: Who Cares?

Scraping Framework – Why I Didn’t Automate Everything

I’ve built several scrapers over time in Python that can extract contents from blogs, dynamic and static websites, and other sources. It’s a simple process, requires only a basic knowledge of python, and is an easier way to collate content from multiple sources. With LinkedIn, I usually avoid coloring outside the lines. I extracted the URLs to all posts manually and automated everything else—extracting content, spotting important themes, and organizing the insights for deeper study.

Extracting Post Content

Apart from the actual text, you can easily grab:

Author name
Post timestamp
Total reactions and comments
Word count (Calculated at the time of extraction)

Without annoying LinkedIn’s compliance team.

Keyword Extraction with KeyBERT + KeyLLM

Each post was analyzed using KeyBERT’s KeyLLM function, which tags in a ChatGPT model toward the end. This extracted 5-6 most relevant keywords per post, summarizing the core themes without requiring manual review. (I mean I do go through it make sure the model didn’t hallucinate)

Topic Modeling with BERTopic

Posts were grouped into clusters of similar topics using BERTopic, revealing patterns across different authors. Raw topic labels from BERTopic are usually abstract—running an API request to ChatGPT can help refine these topic names and make them more human-friendly.

Topic distribution across authors (BERTopic clustered bar chart)

The clustered bar chart shows the frequency with which an author posts on a particular topic. While all authors post about marketing, you will notice some stay focused on one topic, while others have varying degrees of diversity within the topics they cover.

I used graph-based pattern recognition and multi-document summarization to cross-validate what topic modeling and zero-shot classification revealed because it’s always fun to second guess myself. But, everything lined up (woohoo!) and since I don’t want to add more technical details to this post, I’ll leave that bit out.

Zero-Shot Classification

Individually categorizing LinkedIn posts to analyze content trends is impractical at scale, and manually coding qualitative data was never my favorite assignment in grad school. Zero-shot classification solves this by allowing us to classify each post without the need for pre-labeled training data. Moritz Laurer‘s model (“ModernBERT-large-zeroshot-v2.0“) was fast, efficient, and more accurate than the other models I tested. By defining Self-Promotion, Thought Leadership, Contrarian, Industry Insights, Educational, and Motivational categories, I can analyze how different authors structure their content and which narratives they emphasize.

How ModernBERT Assigns Content Categories

ModernBERT’s advanced multi-label classification capabilities allow posts to be assigned to multiple categories with varying confidence scores. This allows for a more nuanced approach with complex writing as an author may convey both Thought Leadership and Industry Insights in the same post.

Recognizing that authors have diverse objectives—some to establish credibility, while others engage in audience discussions or challenge industry norms—ModernBERT quantified these tendencies effectively. The category scores provide a clear profile of each author’s content strategy, and a snapshot of the conversation around brand marketing on LinkedIn.

How diverse is an author’s content?

Shannon Entropy is a concept from information theory that helps quantify how evenly content is distributed across different themes. A higher entropy score means an author covers a wider range of topics in a balanced way, while a lower entropy score suggests their content is more focused on just one or two themes.

For example, an author who writes about industry insights, thought leadership, and education fairly equally will have high entropy—their content is diverse. Meanwhile, an author who consistently posts about self-promotion and nothing else will have low entropy, meaning their content is more specialized.

By analyzing entropy, we can go beyond just knowing what authors talk about and start understanding how varied their messaging truly is.

Analyzing structured data will always reveal more than simply doom-scrolling on LinkedIn. What I’ve learned from this so far is enough for some actionable insights but not enough to be the basis for a full-fledged marketing campaign.

Analyses I Decided Against

Correlation between word count with engagement levels – LinkedIn already gives you this data, and it’s still arbitrary at best unless it includes a large sample of authors across several industries and disciplines.
Time series analysis with individual authors – useful to track trends, identify growth factors and more. Not enough data, arbitrarily chosen posts, and authors weren’t tracked across the same time frame.
Common keywords, or content gap analysis – wasn’t analyzing this from a SEO perspective though it might matter for other aspects of content strategy.
Sentiment analysis of comments on the post and comparing it against different categories. This would be an entire project on its own because of the scope. Scraping LinkedIn without logging in does not give you all comments, so I wouldn’t be capturing the entire conversation. Conceptually, this would reveal several insights to build strategy but requires more effort.

PART II: Who Cares?

Market Entry: Identifying Trends and Positioning Opportunities

Imagine a consumer electronics company that plans to introduce a range of sustainable household appliances. NLP-driven content analysis could guide innovation and product marketing strategy. Apart from researching trade journals and consumer reports, analyzing what competitors are talking about can give insight into the direction in which those companies are moving.

Are competitor posts about carbon-neutral initiatives gaining traction and increased engagement? Product marketing teams can use that knowledge to create messaging around repairability, eco-friendly materials, and supply chain transparency, to match established audience interests. (Assuming you’re also building a product that matches the criteria)

Competitive Intelligence: Revealing Patterns of Messaging Across Rivals

There’s a lot of content out there telling B2B SaaS companies what they should and shouldn’t do. Messaging and building the right narratives, focusing on brand is great when you consider overall strategy but it still requires market intelligence to succeed. Since vanity metrics can’t offer much in terms of actionable insight, a deeper, more granular level of research is required.

For example, if NLP-based research shows that most competitors emphasize “AI-powered automation” whereas few discuss “ease of integration” or *“*team adoption” in their message, you now have a narrative around which you can differentiate. The marketing team can position their product around seamless onboarding and speedy implementation and bridge a message gap that competitors have missed.

Scaling This Approach: From Insights to Long-Term Strategy

Anyone with a holistic approach to brand development and growth could tell you that these are not one-time or short-term measures. It’s not just a snapshot in time but tracking how everything evolves over time and using that intelligence to make strategic, data driven decisions.

Tracking Concept Lifecycles: From Niche to Mainstream

Not all marketing trends emerge overnight. Some start as isolated discussions among a handful of experts before becoming widely adopted industry norms. By tracking LinkedIn posts over time, we can observe:

Which topics gain momentum and how long it takes for them to reach a critical mass of discussions.
How industries cross-pollinate ideas (e.g., sustainability trends in fashion influencing consumer electronics).
When companies should enter the conversation—too early, and it’s unproven; too late, and it’s already saturated.

For example, “brand-led growth” was a niche term discussed by a few thought leaders before becoming a serious alternative to product-led growth in SaaS. By mapping how often certain phrases appear in brand marketing discussions, we can anticipate which narratives will dominate industry conversations next.

Predictive Modeling for Trend Anticipation

With enough historical data, we can move beyond reactive analysis and start predicting marketing trends.

Topic decay rates: How quickly does interest in a topic fade after peak engagement?
Sustained growth signals: Which discussions see consistent, month-over-month increases rather than just viral spikes?
Emerging vs. cyclical trends: Some topics (e.g., “Black Friday marketing strategies”) are seasonal, while others (e.g., “AI-generated content”) indicate new long-term shifts.

A predictive approach can help marketing teams allocate resources before a trend peaks, ensuring they own the conversation rather than reacting too late.

Building a Unified Brand Health Score Across Digital Channels

Among other things that we’ve screaming into the void, “stop chasing vanity metrics” has probably been done the most. Impressions, likes, and engagement rates offer surface-level validation and marketing teams need to leave that high-school mentality behind. You want to see how brand marketing affects growth? Do more than just standard NPS studies. Try to get deeper brand positioning insights. How about developing a brand health score using a mix of qualitative and quantitative signals from LinkedIn, blogs, and other content sources.

Metrics that could contribute to a more meaningful brand health score

Narrative Consistency Score

How well does a company’s messaging align across LinkedIn, blog posts, press releases, and website content?
Do competitors own a topic that a brand has neglected?

Topic Authority Score

How often does a company’s content match the leading discussions in the industry?
Is the company setting trends, reinforcing existing ones, or lagging behind?

Competitor Differentiation Index

How linguistically unique is a company’s messaging compared to competitors?
Does the brand’s content overlap too much with industry norms, or does it establish distinctive positioning?

Brand Sentiment & Trust Score

Aggregated from comment sentiment analysis on LinkedIn, reviews, and industry blogs.
Tracks neutral vs. positive/negative sentiment shifts over time rather than just engagement spikes.

Cross-Industry Influence Score

Measures how often a company or its key employees are mentioned outside of its primary industry (e.g., a SaaS brand being referenced in e-commerce or fintech discussions).
High cross-industry influence signals thought leadership potential beyond just their niche.

By merging these qualitative insights with quantitative tracking, companies can develop a more holistic, long-term brand intelligence system that guides strategic decisions far beyond LinkedIn content performance.

More importantly, it requires you to actually do some work. Easy to say “brand marketing doesn’t work” when you measure all the wrong things and then claim it has no impact on revenue.

Conclusion

Analyzing LinkedIn content at scale reveals more than just engagement trends—it uncovers the underlying structure of how industry voices communicate, influence, and differentiate themselves. By combining automated content classification with structured data analysis, we move beyond intuition and surface measurable patterns in brand marketing conversations.

Glossary

Zero-Shot Classification – A machine learning technique that allows a model to classify text without prior training on labeled examples. The model determines how well a given text fits predefined categories based on its understanding of language.

ModernBERT – A new model series that is a Pareto improvement over BERT and its younger siblings across both speed and accuracy. This model takes dozens of advances from recent years of work on large language models (LLMs), and applies them to a BERT-style model, including updates to the architecture and the training process.

Multi-Label Classification – A classification method where a single text can belong to multiple categories simultaneously, rather than being forced into just one.

NLP (Natural Language Processing) – A field of artificial intelligence that enables machines to understand, interpret, and generate human language. Used here for analyzing text from LinkedIn posts**.**

Topic Modeling – An NLP technique that identifies recurring themes and patterns in text data, helping group content based on shared ideas.

Content Classification – The process of assigning structured labels to unstructured text (e.g., categorizing a LinkedIn post as “Thought Leadership” or “Self-Promotion”).