Synopsis: This blog explores solutions for overcoming the adoption barriers in Agtech, particularly for underserved smallholder farmers. It outlines Cropin's innovative response: ‘akṣara (Akshara),’ an open-source Micro Language Model. Here, we discover the potential of akṣara to dismantle knowledge barriers and empower smallholder farmers to adopt Climate Smart Agriculture and Regenerative Agricultural practices and thrive. It further discusses this innovation and explains how akṣara, tailored for the agriculture domain, is more factually relevant than globally available general Large Language Models.
Agriculture, the world's oldest occupation, is experiencing remarkable progress through technological advancements. While technology adoption already demonstrates the potential for increased farm productivity and resource efficiency in some regions, a vast disparity exists.
A 2022 McKinsey study surveying over 5,500 farmers across Asia, Europe, North America, and South America revealed a stark divide in agtech adoption rates. While over 60% of European and North American farmers leverage agtech, adoption in Asia remains at a mere 9%.
Source: Agtech - Breaking down the farmer adoption dilemma, McKinsey
A turning point: akṣara (Akshara) is now here!
This disparity highlights a critical barrier: the knowledge gap faced by smallholder farmers in the Global South. The launch of akṣara (Akshara), an open-source Micro Language Model (µ-LM) purpose-built for agriculture by Cropin, a pioneer in agri-intelligence, marks a turning point.
Cropin developed akṣara on Mistral's instruct fine-tuned version of the generative text model. It is designed to democratize knowledge and empower small land-holding farmers cultivating roughly 35% of the world's food to adapt and thrive.
What is an open-source large language model?
A language model is a probability distribution over words or word sequences to predict the next word in the input sentence. Large Language Models (LLMs) take this a step further, leveraging massive datasets and computationally intensive training. These algorithms can be used for text generation (GenAI), classification, translations, etc. Open-source LLMs unlock even greater potential by sharing their underlying code and blueprints of training methodology publicly. It fosters collaboration, innovation, and faster progress in AI adoption.
However, developing these really large models requires massive investments in terms of data, storage, and computing. Their size can be immense (tens of gigabytes), even with recent trends exploring hybrid architecture combining different approaches to achieve the desired performances. Training these models requires significant computational resources, making them expensive to train and inference. Reportedly, training Mistral Large costs close to $22 million, while in the case of GPT-4, the cost to develop is over $100 million.
Alternatively, Micro Language Models (µ-LMs) can be trained on specific, more targeted datasets intended for narrower tasks, offering a more practical and frugal solution for specific applications.
What is akṣara (Akshara)?
Developed by Cropin on Mistral's foundation model and hosted on Hugging Face, 'akṣara’ is a frugal and scalable µ-LM. akṣara is a text generation transformer-based model developed and fine-tuned on top of the Mistral-7B-v0.1 model, a state-of-the-art 7-billion-parameter open-source LLM. Recognizing the environmental impact of running LLMs, Cropin has meticulously compressed akṣara into a 4-bit format while still performing better than GPT-4 Turbo by about 40% on the internal test dataset as measured by the ROUGE (Recall-Oriented Understudy for Gisting Evaluation) scoring algorithm. The model ensures accuracy and efficiency while minimizing resource consumption.
'akṣara' has been fine-tuned with a dataset of more than 5,000 high-quality semi-automated prompt-response pairs specific to agriculture. As the model expands to cover more crops, geographic locations, and use cases, the number of training datasets is bound to grow.
Designed as a question-answering system, akṣara leverages verified agricultural data for accuracy. To ensure that the model remains faithful to the question, it is grounded using techniques like RAG (Retrieval-Augmented Generation) by cross-referencing an authoritative subject matter experts' knowledge base.
We have incorporated responsible AI principles to guard rail against the spread of misinformation and harmful content. Cropin AI Labs team’s discussions with Google Research's Responsible AI team helped guide the model's design process and add the necessary guard rails for its usage. This ensures
- akṣara's alignment with key responsible AI principles
- reduce biases common with general LLMs
- promote the use of AI to empower sustainable agricultural practices (biological controls, companion planting, soil & water conservation, etc.)
- equitable distribution of its benefits across farming communities in the Global South
Model Architecture
Steps used to redirect and incorporate knowledge of the Climate Smart Agriculture (CSA) into akṣara:
- Compile the package of practices (PoP) for the crops of interest.
- Generate Question & Answer (Q&A) pairs using OpenAI’s GPT-4-Turbo and handcrafted Package of Practices dataset.
- Use the generated Q&A pairs to train the Mistral 7B-Instruct model at 16-bit.
- Fine-tune the model with the QLoRA (Quantized Low-Rank Adaptation) technique.
- Compress to 4-bit precision (uses only one-third of the required GPU memory).
- Use RAG for question answering to ensure consistent and accurate responses drawn on verified agri-data from relevant repositories.
- RAG uses external knowledge from the context and Q&A base to derive extra information to answer user prompts.
How can Micro Language Models help smallholder farmers?
The agricultural landscape faces unprecedented challenges as climate change disrupts traditional practices, rendering age-old knowledge increasingly inadequate. Unpredictable weather patterns, including erratic rainfall, scorching heat waves, and intensified pest outbreaks, threaten farms, reducing yield, productivity, and profitability. These complex challenges underscore the critical need for Climate-Smart Agriculture (CSA) practices, and smallholder farmers lack this knowledge. Micro Language Models are cost-effective options to take the required knowledge to farmers.
'akṣara,' with a vision to create the world's first AI-powered digital agronomist, will empower human agriculture experts with solutions to precise and complex questions on regenerative agriculture and CSA practices. With simple prompts in a Q&A format, akṣara will provide contextual, accurate, and actionable insights.
akṣara (Akshara) brings knowledge to your fingertips for sustainable farming
By harnessing the power of GenAI 'akṣara' offers on-demand knowledge through a user-friendly interface. It delivers clear answers throughout the crop cycle, from sowing to harvest. Beyond simple answers, akṣara offers insights into best practices, crop health, disease prevention, and more to enable CSA.
akṣara's comprehensive advisories are contextualized to the crop and conditions. Whether you seek guidance on input usage for crops like rice or maize in specific agro-climatic conditions, thousands of climate-smart agricultural advisories, or other agriculture-specific questions, akṣara is the go-to choice.
Example of a query and response from ‘akṣara.'
Crops and regions akṣara (Akshara) will cover
akṣara 1.0 covers the following crops: Rice, Wheat, Maize, Sorghum, Barley, Cotton, Sugarcane, Soybean, and Millet cultivation in India, Bangladesh, Nepal, Pakistan, and Sri Lanka. It will soon expand to other crops and regions. Currently available in English, akṣara will initially support agronomists, agricultural scientists, field staff, and extension workers. We plan to extend the services to farmers, offering multi-lingual support gradually.
Sharpened by niche expertise, akṣara (Akshara) is better than globally available general LLMs
Unlike general-purpose LLMs, akṣara leverages Cropin's extensive proprietary crop knowledge base, encompassing over 500 crops and 10,000 varieties. Targeted training with niche agri-domain datasets ensures that akṣara delivers superior, fact-based advisories tailored to specific crops and situations. Furthermore, akṣara relies solely on verified agricultural data to provide users with the most accurate and actionable insights.
When responding to queries, akṣara incorporates relevant numerical data grounded in factual accuracy whenever applicable. For example, a question about pesticide use will generate a response that includes specific chemicals and quantities.
Limitations
akṣara 1.0 is presently trained on datasets from the Indian subcontinent. Hence, the accuracy of responses for this region will be very high, while responses for other geographies will not be as specific as those for the Indian subcontinent.
Collaborate with open-source akṣara to multiply the impact
We believe in the collaborative spirit of open-source technology to foster innovation through continuous improvement in agriculture. This conviction fueled the development of ‘akṣara’ as an open-source enterprise-grade µ-LM with the potential to put the power of AI directly into the hands of the farmers.
The power of open-source: We believe in the power of open-source to accelerate the adoption of AI in agriculture and developed akṣara, as a frugal and affordable enterprise-grade model that combines niche expertise with exceptional accuracy. This means akṣara, is freely available for anyone to use.
Transformative impact: akṣara empowers stakeholders with cost-effective development, deployment, and distribution of GenAI models in agriculture. Train akṣara with your data for region-specific crops and share it publicly for the greater good. Enterprises can provide us with data to train akṣara on all their crops across regions. They can then disseminate this knowledge through their apps.
Empowering farmers: Collaborate with us and leverage AI to transform agriculture. akṣara empowers farmers to embrace climate-smart agriculture and regenerative practices, ensuring farming emerges as a profitable and sustainable occupation in the future.
‘akṣara,’ the first purpose-built open-source (Apache 2.0 License and with no restrictions) µ-LM for climate-smart agriculture, provides full freedom to any downstream user of our work. We aspire to remove the barriers of language, resources, and literacy for every farmer with akṣara. More hands and brains can accelerate this goal to democratize access to technology. So we invite you to collaborate. Here is how different stakeholders can leverage akṣara to solve diverse challenges in agriculture.
Join the movement! Access aksara on Hugging Face
Collaborate, train on your datasets, and share feedback outcomes with this community as we double down on making AI accessible to the global farming community.
Conclusion:
In conclusion, AI has the potential to revolutionize agriculture, but its adoption is hindered by challenges such as access to large-scale structured data, expertise, and infrastructure. This is where open-source models like akṣara play a crucial role. They are not only valuable tools for farmers but also catalysts for innovation, fostering collaboration and continuous improvement.
By reaffirming its leadership in AI for food systems, Cropin demonstrates the power and potential of open-source models like akṣara in driving innovation and making AI more accessible and beneficial for agriculture.