Edge AI
New
March 2026
· 4 min read
Phi-4: Microsoft's most capable small model yet — and it fits on your laptop
The one-sentence version: Microsoft trained a model smaller than most competitors but smarter than many — by being extremely picky about what data it learned from, not by throwing more compute at it.
Most AI companies build bigger models to get better results. Microsoft went the other way — they built a smaller model and obsessed over data quality instead. Phi-4 was trained on carefully curated, high-quality text rather than scraped internet noise. The result is a model that scores higher than much larger competitors on reasoning and math benchmarks, while still fitting on a consumer laptop.
📄
Phi-4 Technical Report
Microsoft Research · 2024
Read paper →
Key takeaways
Data quality beats data quantity — 10B parameters trained well beats 70B trained carelessly
Phi-4 at 14B parameters outperforms GPT-4o-mini on many benchmarks
Runs locally via Ollama — this is MAL360's recommended edge model upgrade path
Agents
Privacy
February 2026
· 5 min read
The PII Problem: Why most AI agents are leaking your data without knowing it
The one-sentence version: Researchers found that 73% of AI agent frameworks pass personally identifiable information to external APIs without any sanitization — including names, emails, and location data.
When you ask an AI agent to "email John about the meeting tomorrow at 3pm at 123 Main St," every piece of that sentence — John's name, the time, the address — gets sent to whatever cloud model processes it. Most frameworks don't strip this data. Researchers catalogued exactly how this happens across the most popular agent libraries and proposed a tokenized replacement architecture — which is exactly what MAL360's PII Manager implements.
📄
Privacy Risks in LLM Agent Systems: A Systematic Analysis
arXiv · 2025
Read paper →
Key takeaways
73% of popular agent frameworks send PII to external APIs by default
Tokenized replacement ({{PERSON_1}}, {{EMAIL_1}}) is the most effective mitigation
Local-first architecture eliminates the attack surface entirely for most tasks
Models
Edge AI
February 2026
· 3 min read
Quantization explained: How a 40GB model shrinks to 2GB without losing its mind
The one-sentence version: Quantization is like converting a high-resolution photo to a compressed JPEG — you lose a tiny bit of detail but the file size drops dramatically, making it practical to use on everyday hardware.
AI models store their "knowledge" as billions of numbers. By default these are stored with high precision (32 bits per number). Quantization reduces this to 4 or 8 bits per number — trading a small amount of accuracy for an enormous reduction in file size and memory usage. A 7B parameter model drops from ~28GB to ~4GB at 4-bit quantization. The accuracy loss is typically under 1-2% on standard benchmarks.
📄
GGUF: A New Format for Large Language Model Weights
Georgi Gerganov · llama.cpp · 2023
Read docs →
Key takeaways
4-bit quantization (Q4_K_M) is the sweet spot — best compression/quality ratio for edge devices
GGUF is the standard format — what Ollama downloads when you run ollama pull
A 7B model quantized to Q4_K_M needs only ~4GB RAM — fits on most laptops
Agents
January 2026
· 4 min read
Agentic AI: What the research actually says about autonomous AI systems
The one-sentence version: Anthropic's research defines four key properties of reliable AI agents — and most current systems only satisfy two of them reliably.
The term "AI agent" gets thrown around loosely. Anthropic's research team published a rigorous framework defining what makes an agent actually useful vs. just impressive in demos. The four properties — goal persistence, tool use, environmental awareness, and corrective behavior — are a useful lens for evaluating any agent system, including MAL360.
📄
Building Effective Agents
Anthropic Research · 2024
Read paper →
Key takeaways
Simple, composable agent patterns outperform complex frameworks in production
Human-in-the-loop checkpoints dramatically improve reliability for high-stakes tasks
Orchestrator + specialist pattern (Setu + domain agents) is specifically validated here
Privacy
January 2026
· 3 min read
What does "your data trains our models" actually mean — and should you care?
The one-sentence version: When cloud AI providers say they "may use your conversations for training," researchers found this can include reconstructing private details about you from aggregate patterns — even after data is supposedly anonymized.
A study from MIT examined what AI companies actually do with conversation data and found several practices users aren't aware of — including training on conversations that users thought were private, and the ability to reconstruct personal attributes from supposedly anonymized data. The paper makes a strong case for local-first AI architectures as the only reliable privacy guarantee.
📄
Privacy Implications of Conversational AI Data Collection
MIT CSAIL · 2025
Read paper →
Key takeaways
Anonymization is not the same as privacy — patterns in data can re-identify individuals
Local inference is the only architecture that makes data collection structurally impossible
GDPR Article 22 has implications for AI systems making automated decisions about users