AI Trading Strategies Course 2: Data Wrangling in Mojo

Explore the differences between Mojo and Python, focusing on filtering lists and iterating with loops. This guide highlights practical examples, including the impact of Mojo's lack of list comprehensions and how iteration behavior differs.

AI Trading Strategies Course 2: Data Wrangling in Mojo
Mojo juggling financial data

Introduction


This article is a follow-up to our previous exploration of the AI Trading Strategies Course using Mojo, where we introduced Mojo as a programming language and focused on setting it up. In this article, we provide an overview of Course 2—a firehose of information about classical machine learning models, including unsupervised learning, regression, classification, and reinforcement learning—and then dive into its first section, which focuses on acquiring and cleaning data. While the course relies on Python for these tasks, we attempt to reproduce the results using Mojo, exploring its capabilities and limitations.


Facing Our First Set of Challenges

Eager to start, I used the trick from the Matrix Multiplication Notebook, used the code from the course verbatim, and added a %%python at the start of each code cell.

This ensures the cell runs as Python, even when the notebook uses the Max kernel.

First, we have to start by adding the Yahoo Finance module to the project environment, and it couldn't be easier. Use magic:

Shell

$ magic add yfinance
    

After getting the Python code running, I noticed a slight difference; since I am running the notebook in Codium, the output of the last command isn't automatically rendered as when running it in a browser. It's not a big deal, but something to keep in mind.

The next step was to see how much of a superset of Python Mojo is already. Removing the %%python marker was the obvious step to attempt, and to my surprise, it worked, except for a strange error in the output:

This looks like a borrow checker error.

If that error rings a bell, you might have been writing Rust. It sounds like a borrow checker error hinting that making the variable immutable might resolve that issue. This led to the day's first lesson: there is no more let keyword in Mojo. It was removed, but unfortunately, there are still a lot of internet resources mentioning it.

🚧
Immutable references have been removed from Mojo; hence, Mojo/Max 24.5 does not support them, and let is no longer a language keyword.

After rearranging the code, the next idea was to try the official way to import Python libraries into Mojo, but the error persisted.

Hence, I got Jupyter out of the equation and wrote the code to a 🔥 file directly; the compiler rejects the straight Python module imports. Jupyter probably preprocesses code, and the modified code sneaks up through the Mojo compiler. Chances are, Juypter is parsing the code before executing it in Mojo.

The code using the right style to import the Yahoo Finance library executes without issues and smells like a bug in the interaction between the Jupyter and the Mojo kernel.

While writing the article, Max 24.6 was released. I checked if the issues I found persisted, and they did. But I also discovered a bigger problem: the Jupyter Kernel does not start because it tries to start with the nightly configuration. I reported the Bug and will post a solution for that issue in a future article. For the moment, I will stay with MAX 24.5

Hitting Some Problems

Given Mojo is at early stages of development, some issues were to be expected. I was already aware that comprehension syntax is not yet supported, but what was a little surprising was some of the effects of the type system, which results at least at the moment in code behaving in unintuitive ways for someone with Python experience.

One big surprise is that Mojo's print function does not automatically call the repr() dunder method:

Mojo

fn get_sp500_tickers() raises -> List[String]:
    pd = Python.import_module("pandas")
    sp500_tickers = pd.read_html("https://en.wikipedia.org/wiki/List_of_S%26P_500_companies")[0]['Symbol']
    var filtered_tickers: List[String] = List[String]()
    for ticker in sp500_tickers:
        if ".B" not in str(ticker):
            filtered_tickers.append(str(ticker))
    return filtered_tickers

fn main() raises:
    sp500_tickers = get_sp500_tickers()
    print(sp500_tickers)
    

The above code results in a type error:

Print does not know how to print lists.

But the Mojo lists has the method that converts it to a printable string:

Mojo

    print(sp500_tickers.__repr__())
    
It prints if we manually call __repr__()

Another surprise was that looping over a collection using the in operator is not equivalent to using a range and indexed access:

Mojo

fn main() raises:
    /* Code removed for expediency */
          
    print("\n\n")
    print("Using range:")    
    for i in range(10):
        ticker = sp500_tickers[i]
        print(ticker) 

    print("\n\n")
    print("Using in:")    
    for ticker in sp500_tickers[:10]:
        print(str(ticker))
    
    

The root of the problem is that the indexed access returns a String, while the iterator returns a Pointer[String, (muttoimm "anonymous")] without an __repr__() implementation. The str conversion returns the memory address:

Unexpected iteration differences

Accepting Defeat

Since the course is heavily dependent on using a Dataframe library, specifically Pandas, and at the time of writing, there is no native Dataframe library for Mojo, which makes the original idea of using Mojo for the course impractical and a waste of time since Mojo would not be doing much more than calling Python.

With that in mind, I won't pursue this idea anymore, at least not while Mojo has no Dataframe library.

Maybe the lesson here is to attempt instead to migrate a Dataframe library to Mojo or maybe attempt to implement Arrow in Mojo.


Conclusion

Mojo has much potential, but the ecosystem needs to evolve more to facilitate its use in some scenarios. We discovered that a Dataframe library is a key missing piece that allows its practical application for data analysis tasks.


Addendum: A Special Note for Our Readers

I decided to delay the introduction of subscriptions. You can read the full story here.

If you find our content helpful, there are several ways you can support us:

  • The easiest way is to share our articles and links page on social media; it is free and helps us immensely.
  • If you want a great experience during the Chinese New Year, I am renting my timeshare in Phuket. A five-night stay in this resort in Phuket costs 11,582 € on Expedia. I am offering it in USD at an over 40% discount compared to that price. I received the Year of the Snake in style.
Anantara Vacation Club Phuket Mai Khao $$1,390/night
Phuket, Thailand / Posting R1239106

ReedWeek Timeshare Rental

  • If your finances permit it, we are happy over any received donation. It helps us offset the site's running costs and an unexpected tax bill. Any amount is greatly appreciated:
  • Finally, some articles have links to relevant goods and services; buying through them will not cost you more. And if you like programming swag, please visit the TuringTacoTales Store on Redbubble. Take a look. Maybe you can find something you like: