Get Started

Sentiment Analysis of Stock Portfolios Using Financial News Data and Natural Language Processing

Author: Daniel Requejo.

Daniel is a seasoned computer engineer with over 27 years of experience in IT consulting across diverse countries, including Spain, the USA, Ireland, Australia, Malta, and Portugal. In the past 15 years, he has dedicated himself to a self-education journey in the stock market, leveraging his analytical skills and technical background to craft profitable investment models. He is the manager of CARL El Capitan

Daniel has collaborated with various institutions and hedge funds in the development of trading algorithms and the refinement of investment strategies.

Get Started

Apple store buttonGoogle play button

Abstract

This paper investigates the application of natural language processing (NLP) and sentiment analysis in the context of stock market analysis and portfolio management. Financial news articles related to a predefined portfolio of stocks across various industries are extracted using the Polygon.io API. The sentiment of these articles is analyzed using FinBERT, a pre-trained NLP model fine-tuned for financial text, which classifies the news into positive, negative, or neutral sentiment. The primary objective of this study is to demonstrate the practical application of sentiment analysis in ranking stocks based on news sentiment, with the potential for future use in portfolio management.

The sentiment data is visualized through a treemap, which highlights the distribution of sentiment across different sectors, industries, and individual stocks. This representation offers an intuitive method for understanding the broader sentiment landscape within the portfolio. The methodology provides a novel approach to transforming unstructured financial news into actionable insights, with potential applications in algorithmic trading and investment decision-making.

Although the framework currently focuses on sentiment analysis and visualization, it is limited by the availability of financial news data over a short time period. Future work could address this limitation by incorporating a more extensive historical dataset to enable backtesting of sentiment-driven investment strategies and to analyze long-term sentiment trends. By applying sentiment analysis to long-term news data, it will be possible to evaluate the effectiveness of these strategies and explore their potential in creating more robust, data-driven portfolio management approaches.

Get Access

Tearing Down Barriers to the Alternative Investment Universe

CARL provides you with a selection of thrilling alternative investment opportunities you might never have heard of before. Build your wealth and diversify your portfolio with alternative investments – anywhere, anytime, with one easy-to-use mobile app.

How CARL works

Introduction

The rapid evolution of financial markets has led to an increased reliance on data-driven decision-making tools. Traditionally, investors have depended on historical price data and technical indicators to inform their trading strategies. However, with the explosion of unstructured data sources such as financial news, social media, and corporate announcements, there is a growing interest in leveraging sentiment analysis to extract actionable insights from this wealth of information. Sentiment analysis, a subfield of natural language processing (NLP), involves evaluating and quantifying the emotions expressed in textual data, such as news articles or tweets, to better understand market sentiment.

In the financial world, market sentiment plays a crucial role in influencing stock prices and investor behavior. Positive or negative sentiment expressed in news articles can drive market movements and impact the performance of individual stocks or entire sectors. By analyzing sentiment, traders and portfolio managers can gauge the broader emotional tone surrounding a company, industry, or market and incorporate this data into their investment strategies. Sentiment analysis offers a novel way to assess unstructured data and supplement traditional financial analysis methods with insights derived from qualitative information.

This study investigates the practical application of sentiment analysis in the context of stock portfolio management. Using the Polygon.io API, financial news articles related to a predefined portfolio of stocks across various industries are extracted. Sentiment analysis is then applied to this data using FinBERT, a pre-trained NLP model specifically designed for financial text classification. The objective of the study is to demonstrate how sentiment analysis can be employed to rank stocks based on the sentiment of news articles and visually represent the sentiment distribution within a portfolio using a treemap.

One key feature of this framework is its ability to transform unstructured financial news into quantifiable insights that can support decision-making. Visualizing sentiment using a treemap allows investors to quickly understand the overall sentiment landscape across sectors, industries, and individual stocks. This approach not only provides an intuitive view of market sentiment but also demonstrates the potential for using sentiment analysis as a tool in portfolio management. However, this study is limited by the availability of news data for only a short time period. Despite this limitation, the framework can be expanded in future research by incorporating historical news data for backtesting sentiment-driven strategies, analyzing long-term sentiment trends, and evaluating the effectiveness of these strategies in various market conditions.

Sophisticated Alternative Investments Aren’t Just for Institutions Anymore

Get Started

Methodology

This study leverages sentiment analysis of financial news data to rank stocks across various sectors, providing an intuitive visualization of sentiment distribution within a stock portfolio. The methodology is divided into four key steps: (1) stock selection across multiple sectors, (2) financial news data extraction via the Polygon.io API, (3) sentiment analysis using FinBERT, and (4) visualization of the results through a treemap representation.

1.Stock Selection:

A diverse portfolio of 34 stocks was selected from various sectors, representing key industries in the global economy. The selection was made to ensure comprehensive coverage of multiple sectors and industries, allowing for the analysis of sentiment across a broad range of market activities. The selected stocks are listed below, along with their ticker symbols and associated sectors:

2.Financial News Data Extraction

The extraction of financial news was performed using the Polygon.io API, a service that provides real-time access to financial news related to specific stock tickers. For each stock in the portfolio, we retrieved recent news articles with headlines, publication dates, and brief content summaries. The API was queried using the Python requests library to collect the most recent news articles for each stock, including headlines, publication dates, and content summaries.

The extraction process involved querying the API with each stock's ticker symbol and gathering the latest news articles related to the stock. Below is an example of the Python code used for this extraction process:

  1. Import the requests library to handle the HTTP request for retrieving the news articles:

    import requests

  2. Define the API key and specify the stock ticker:

    api_key = 'your_api_key'
    ticker = 'AAPL'

  3. Construct the API URL to request the news data for the specified ticker:

    base_url = 'https://api.polygon.io/v2/reference/news'
    params = {'ticker': ticker, 'limit': 10, 'apiKey': api_key} </p

  4. Send the HTTP request to the API and retrieve the response:

    response = requests.get(base_url, params=params)

  5. Parse the JSON response to extract the news data:

    news_data = response.json().get('results', [])

3.Sentiment Analysis with FinBERT

The sentiment analysis for this study was performed using FinBERT, a specialized NLP model designed for financial text. FinBERT is a variant of the BidirectionalEncoderRepresentations from Transformers (BERT) model, which is widely used for various NLP tasks due to its ability to understand the context and relationships between words in a sentence. Unlike general-purpose sentiment analysis models, FinBERT has been fine-tuned on financial data, making it more adept at interpreting the specific language used in financial news articles and reports.

Each news article extracted via the Polygon.io API was processed by FinBERT to classify the sentiment as either positive, negative, or neutral. This classification was based on the content of the news articles, and the model also provided a confidence score, which indicated the likelihood of the classification being correct.

Example of FinBERT usage:

  1. First, the data containing news articles is structured into a DataFrame with columns such as Ticker, Date, Time, Title, and Content:

    columns = ['Ticker', 'Date', 'Time', 'Title', 'Content']
    parsed_and_scored_news = pd.DataFrame(parsed_news, columns=columns)

  2. Next, the required libraries and models are imported, and FinBERT, along with the tokenizer, is loaded using pre-trained models:

    from transformers import pipeline, BertTokenizer, BertForSequenceClassification
    tokenizer = BertTokenizer.from_pretrained('yiyanghkust/finbert-tone')
    model = BertForSequenceClassification.from_pretrained('yiyanghkust/finbert-tone')
    finbert_sentiment = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

  3. The polarity scores (positive, negative, or neutral) are obtained by applying FinBERT to the Content of each news article:

    parsed_and_scored_news['sentiment'] = parsed_and_scored_news['Content'].apply(lambda x: finbert_sentiment(x)[0])

  4. The sentiment label and score are then extracted and added as new columns in the DataFrame:

    parsed_and_scored_news['label'] = parsed_and_scored_news['sentiment'].apply(lambda x: x['label'])
    parsed_and_scored_news['score'] = parsed_and_scored_news['sentiment'].apply(lambda x: x['score'])

For example, consider a news article on PayPal:

"PayPal, a financial technology company, has been underperforming the S&P 500 for the past three years. However, under the new CEO, the company has secured impressive partner

FinBERT works by tokenizing the text input, breaking it down into smaller components (tokens) that represent each word or phrase. These tokens are then fed into a multi-layer transformer architecture that uses self-attention mechanisms to evaluate the relationships between words and their meanings within the broader context of the article. This process allows FinBERT to generate a sentiment classification that reflects the overall emotional tone of the news article.

4.Model Selection in Sentiment Analysis

Several general-purpose NLP models could have been used for this task, such as BERT, DistilBERT, and RoBERTa. These models are effective for general text classification tasks but lack the financial domain-specific training required for accurately classifying financial news sentiment. While these models could be fine-tuned on financial datasets, this would require significant additional training.

In contrast, FinBERT is a specialized model already fine-tuned on financial texts, making it more efficient and accurate for this study. FinancialBERT and FinRoBERTa, alternative models specifically designed for financial tasks, could have been viable options as well. However, FinBERT was selected due to its proven track record in financial sentiment analysis and its ease of integration into the current workflow.

5.Visualization using Treemap

Once the sentiment analysis was completed, the results were visualized using a treemap, which provides a hierarchical representation of sentiment across sectors, industries, and individual stocks. Each stock is represented as a rectangle, where the size of the rectangle corresponds to its value in the portfolio, and the color reflects the sentiment score (positive, neutral, or negative).

Below is an example of how the treemap was created:

  1. Import the Plotly Express library for creating the treemap visualization:

    import plotly.express as px

  2. Define the path for grouping the data by Sector, Industry, and Stock:

    path = ['Sector', 'Industry', 'Stock']

  3. Set the values that determine the size of the rectangles in the treemap:

    values = 'Stock Value'

  4. Assign the color based on the Sentiment Score, using a color scale ranging from red (negative sentiment) to green (positive sentiment):

    color = 'Sentiment Score'
    color_continuous_scale = 'RdYlGn'

  5. Create the treemap using Plotly Express:

    fig = px.treemap(df, path=path, values=values, color=color, color_continuous_scale=color_continuous_scale)

  6. Display the treemap to visualize the sentiment across the portfolio:

    fig.show()

Results

The sentiment analysis conducted using FinBERT was visualized through a treemap,

This visualization provides an intuitive method for understanding the sentiment distribution across the portfolio, allowing investors to quickly identify areas of positive or negative sentiment across sectors and industries. 

The treemap helps in interpreting the overall sentiment landscape, making it a useful tool for portfolio analysis and decision-making. providing an intuitive way to represent sentiment distribution across different sectors and stocks.

The purpose of this visualization is to demonstrate how sentiment analysis can be mapped onto a portfolio to highlight sentiment trends in various industries. Each rectangle in the treemap corresponds to a stock, with its size indicating the stock's value in the portfolio and its color reflecting the sentiment score (green for positive, yellow for neutral, and red for negative sentiment).

This approach effectively shows how sentiment data can be used to visually rank and organize stocks in a portfolio, based on their sentiment scores derived from financial news. While this demonstration uses a relatively small dataset over a short timeframe, the method can be extended to much larger datasets and longer time periods, facilitating deeper analysis of sentiment trends.

The treemap visualization provides clear benefits:

  • Hierarchical Representation: It allows for sector and industry-based grouping of stocks, making it easy to spot which sectors exhibit stronger or weaker sentiment.
  • Sentiment-Driven Insights: The color gradient offers an at-a-glance understanding of market sentiment based on recent news.
  • Potential for Scaling: This framework can be scaled to accommodate historical datasets, enabling backtesting and a more robust analysis of sentiment-driven strategies.

Below is a screenshot of the treemap generated in this study, which visually illustrates how sentiment can be distributed across a portfolio:

Get Started

3 Easy Steps to Start Investing With CARL

Investing in quants is as easy as pie if you've got CARL on your side. Investors can set up an CARL account quickly and easily.

Set Up Your Account

Quickly and securely create your account, verify your investor status and become a member of our community.

Analyze Investments

Using the tools within the CARL app, determine which strategies at what allocations are right for your investment goals.

Fund Your Investment

Simply save your portfolio settings and on the next strategy funding cycle your investment will be live!

Conclusion

This paper demonstrates a novel methodology for applying sentiment analysis to financial news and visualizing the results using a treemap. The treemap provides a clear and intuitive way to understand the sentiment distribution across a portfolio, enabling investors and portfolio managers to quickly identify areas of positive or negative sentiment across sectors and industries.

While the analysis in this paper uses a small dataset of recent news articles, the same methodology can be extended to larger datasets that include historical news. Such expansion would enable backtesting of sentiment-driven strategies and provide more robust insights into the role of sentiment in stock performance over time. Additionally, integrating real-time sentiment with historical trends could help create more advanced predictive models for portfolio management and decision-making.

Future research should focus on scaling the dataset, incorporating different sources of financial sentiment, and exploring how sentiment analysis can be applied in algorithmic trading and long-term investment strategies. By refining the sentiment model and using more extensive datasets, this framework could become a valuable tool for enhancing portfolio management through data-driven insights.

References

  1. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
    https://arxiv.org/abs/1810.04805
  2. Araci, D. (2019). FinBERT: A Pretrained Language Model for Financial Communications.
    https://arxiv.org/abs/1908.10063
  3. Polygon.io API Documentation (2023). Polygon.io - Real-time Market Data and Financial News API.
    https://polygon.io/docs/stocks/get_v2_reference_news
  4. Plotly: Python Treemaps (2023). Plotly Express - Treemap in Python.
    https://plotly.com/python/treemaps/
  5. Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., & Ngo, D. C. L. (2014). Text mining for market prediction: A systematic review.
    http://www.romisatriawahono.net/lecture/rm/survey/information%20retrieval/Nassirtoussi%20-%20Text%20Mining%20for%20Market%20Prediction%20-%202014.pdf

Ready to Invest in Alternatives?

Get Started

Was this information useful?
(6ratings, Ø 4.8)