“We’re both concerned about the accuracy of our returns.”
BUSINESS
Meta’s Rare Selloff Deepens After Court Losses, AI Delays And Metaverse’s Decline
Back-to-back landmark court losses pushed further losses for the Facebook parent’s stock.
It’s Official: House Rejects DHS Funding Plan—Shutdown Continues
House conservatives opposed the Senate deal passed early Friday, because it didn’t include funding for immigration enforcement.
IndexCache, a new sparse attention optimizer, delivers 1.82x faster inference on long-context AI models
Processing 200,000 tokens through a large language model is expensive and slow: the longer the context, the faster the costs spiral. Researchers at Tsinghua University and Z.ai have built a technique called IndexCache that cuts up to 75% of the redundant computation in sparse attention models, delivering up to 1.82x faster time-to-first-token and 1.48x faster generation throughput at that context length.The technique applies to models using the DeepSeek Sparse Attention architecture, including the latest DeepSeek and GLM families. It can help enterprises provide faster user experiences for production-scale, long-context models, a capability already proven in preliminary tests on the 744-billion-parameter GLM-5 model.The DSA bottleneckLarge language models rely on the self-attention mechanism, a process where the model computes the relationship between every token in its context and all the preceding ones to predict the next token.However, self-attention has a severe limitation. Its computational complexity scales quadratically with sequence length. For applications requiring extended context windows (e.g., large document processing, multi-step agentic workflows, or long chain-of-thought reasoning), this quadratic scaling leads to sluggish inference speeds and significant compute and memory costs.Sparse attention offers a principled solution to this scaling problem. Instead of calculating the relationship between every token and all preceding ones, sparse attention optimizes the process by having each query select and attend to only the most relevant subset of tokens.DeepSeek Sparse Attention (DSA) is a highly efficient implementation of this concept, first introduced in DeepSeek-V3.2. To determine which tokens matter most, DSA introduces a lightweight “lightning indexer module” at every layer of the model. This indexer scores all preceding tokens and selects a small batch for the main core attention mechanism to process. By doing this, DSA slashes the heavy core attention computation from quadratic to linear, dramatically speeding up the model while preserving output quality.But the researchers identified a lingering flaw: the DSA indexer itself still operates at a quadratic complexity at every single layer. Even though the indexer is computationally cheaper than the main attention process, as context lengths grow, the time the model spends running these indexers skyrockets. This severely slows down the model, especially during the initial “prefill” stage where the prompt is first processed.Caching attention with IndexCacheTo solve the indexer bottleneck, the research team discovered a crucial characteristic of how DSA models process data. The subset of important tokens an indexer selects remains remarkably stable as data moves through consecutive transformer layers. Empirical tests on DSA models revealed that adjacent layers share between 70% and 100% of their selected tokens.To capitalize on this cross-layer redundancy, the researchers developed IndexCache. The technique partitions the model’s layers into two categories. A small number of full (F) layers retain their indexers, actively scoring the tokens and choosing the most important ones to cache. The rest of the layers become shared (S), performing no indexing and reusing the cached indices from the nearest preceding F layer.During inference, the model simply checks the layer type. If it reaches an F layer, it calculates and caches fresh indices. If it is an S layer, it skips the math and copies the cached data.There is a wide range of optimization techniques that try to address the attention bottleneck by compressing the KV cache, where the computed attention values are stored. Instead of shrinking the memory footprint like standard KV cache compression, IndexCache attacks the compute bottleneck. “IndexCache is not a traditional KV cache compression or sharing technique,” Yushi Bai, co-author of the paper, told VentureBeat. “It eliminates this redundancy by reusing indices across layers, thereby reducing computation rather than just memory footprint. It is complementary to existing approaches and can be combined with them.”The researchers developed two deployment approaches for IndexCache. (It is worth noting that IndexCache only applies to models that use the DSA architecture, such as the latest DeepSeek models and the latest family of GLM models.)For developers working with off-the-shelf DSA models where retraining is unfeasible or too expensive, they created a training-free method relying on a “greedy layer selection” algorithm. By running a small calibration dataset through the model, this algorithm automatically determines the optimal placement of F and S layers without any weight updates. Empirical evidence shows that the greedy algorithm can safely remove 75% of the indexers while matching the downstream performance of the original model.For teams pre-training or heavily fine-tuning their own foundation models, the researchers propose a training-aware version that optimizes the network parameters to natively support cross-layer sharing. This approach introduces a “multi-layer distillation loss” during training. It forces each retained indexer to learn how to select a consensus subset of tokens that will be highly relevant for all the subsequent layers it serves.Real-world speedups on production modelsTo test the impact of IndexCache, the researchers applied it to the 30-billion-parameter GLM-4.7 Flash model and compared it against the standard baseline.At a 200K context length, removing 75% of the indexers slashed the prefill latency from 19.5 seconds down to just 10.7 seconds, delivering a 1.82x speedup. The researchers note these speedups are expected to be even greater in longer contexts.During the decoding phase, where the model generates its response, IndexCache boosted per-request throughput from 58 tokens per second to 86 tokens per second at the 200K context mark, yielding a 1.48x speedup. When the server’s memory is fully saturated with requests, total decode throughput jumped by up to 51%.For enterprise teams, these efficiency gains translate directly into cost savings. “In terms of ROI, IndexCache provides consistent benefits across scenarios, but the gains are most noticeable in long-context workloads such as RAG, document analysis, and agentic pipelines,” Bai said. “In these cases, we observe at least an approximate 20% reduction in deployment cost and similar improvements in user-perceived latency.” He added that for very short-context tasks, the benefits hover around 5%.Remarkably, these efficiency gains did not compromise reasoning capabilities. Using the training-free approach to eliminate 75% of indexers, the 30B model matched the original baseline’s average score on long-context benchmarks, scoring 49.9 against the original 50.2. On the highly complex AIME 2025 math reasoning benchmark, the optimized model actually outperformed the original baseline, scoring 92.6 compared to 91.0.The team also ran preliminary experiments on the production-scale 744-billion-parameter GLM-5 model. They found that eliminating 75% of its indexers with the training-free method yielded at least a 1.3x speedup on contexts over 100K tokens. At the same time, the model maintained a nearly identical quality average on long-context tasks.Getting IndexCache into productionFor development teams wanting to implement the training-free approach today, the process is straightforward but requires careful setup. While the greedy search algorithm automatically finds the optimal layer configuration, the quality of that configuration depends on the data it processes.“We recommend using domain-specific data as a calibration set so that the discovered layer-sharing pattern aligns with real workloads,” Bai said.Once calibrated, the optimization is highly accessible for production environments. Open-source patches are already available on GitHub for major serving engines. “Integration is relatively straightforward — developers can apply the patch to existing inference stacks, such as vLLM or SGLang, and enable IndexCache with minimal configuration changes,” Bai said.While IndexCache provides an immediate fix for today’s compute bottlenecks, its underlying philosophy points to a broader shift in how the AI industry will approach model design.“Future foundation models will likely be architected with downstream inference constraints in mind from the beginning,” Bai concluded. “This means designs that are not only scalable in terms of model size, but also optimized for real-world throughput and latency, rather than treating these as post-hoc concerns.”
Friday, March 27. Russia’s War On Ukraine: News And Information From Ukraine
Dispatches from Ukraine. Day 1,493.
Iran War Will Not Be A ‘Prolonged Conflict,’ Rubio Says
The U.S. began its latest strikes against Iran nearly one month ago.
How a conflict can highlight long-term investment opportunities
Also in Weekend Reads: Tech stocks, tips on managing your IRA, how to select a chatbot and advice from the Moneyist
How to make Roth IRAs work for you at any age
From a teen’s first summer job to preparing your accounts for your heirs, you need a smart investment strategy.
Amazon is selling a 26-drawer organizer for only $31 with over 40,000 5-star ratings
TheStreet aims to feature only the best products and services. If you buy something via one of our links, we may earn a commission.Why we love this dealIf you’re a DIYer searching aimlessly for a single screw or a crafter looking for a particular item at a messy workstation, it can cause a significant delay in completing your project. Without proper organization, it’s almost impressive how a small piece can make such an impact. But whether you’re dealing with junk drawers packed to the brim or a disorganized tabletop, there’s a storage solution that might be the key to turning a mess of small parts into an organizer’s dream come true: a multidrawer organizer.The Akro-Mils 26-Drawer Organizer has both small and large drawers that offer extra storage for small pieces, whether you’re a DIYer, crafter, or hobbyist. The 26-drawer organizer is on sale for just $31 — which is 23% off its regular price of $40 — and it’s a space-saver that you can use in your garage, craft room, home office, and beyond. However, this is a limited-time deal during Amazon’s Big Spring Sale. Once it’s 100% claimed, it’s back to full price.Akro-Mils 26-Drawer Organizer, $31 (was $40) at Amazon
Courtesy of Amazon
Shop at AmazonWhy do shoppers love it?The secret to a long-lasting organization system is finding each item a home and sticking to it. This organizer can certainly help, with 26 drawers of various sizes to work with. It has six larger drawers with three on each side that measure 4.25 inches long by 4.88 inches wide by 2 inches high. The remaining 20 drawers are smaller and in the middle of the organizer, each measuring 1.69 inches long by 5.25 inches wide by 1.5 inches high. Each drawer has a clear bin that not only gives you a perfect spot to store small items, but also allows you to see them clearly. This makes it easier to find the items you need, which is certainly better than digging through a junk drawer for minutes to find one small piece. The drawers also have pulls for easy access and rear-stop tabs to prevent drawers from fully coming out and spilling over.A major bonus of this organizer in particular is that it actually comes with dividers. Many similar options are compatible with dividers, but you need to purchase them all separately. Hidden in the back panel are dividers you can pop out and use as needed. Additional noteworthy features include its stackability and the fact that you can mount it on a wall. Both make this organizer a space-saver, let alone its overall smaller footprint of approximately 20 inches long, 6.38 inches wide, and 10.25 inches high.Related: Amazon is selling a 36-drawer organizer for just $37 with over 12,000 5-star ratingsRugged and impact-resistant, the organizer is a clear winner for DIYers who want to store items like screws, nails, and bolts. However, it’s very effective at organizing small pieces in almost any setting, from a classroom to a craft room to a bathroom. It can store art or office supplies in a classroom, crafting essentials in a craft room, or Lego pieces and building materials for collectors and hobbyists. The 26-drawer organizer is also great for a comprehensive at-home first aid kit, as well as a jewelry organizer for anyone with a big collection.Pros and cons of the Akro-Mils 26-Drawer OrganizerProsVarious drawer sizes: There are six large drawers measuring 4.25 inches long by 4.88 inches wide by 2 inches high, and 20 small drawers measuring 1.69 inches long by 5.25 inches wide by 1.5 inches high.Comes with dividers: A lot of similar organizers can accommodate dividers, but don’t come with them. This option has removable organizers built into the back panel that you can pop out and use if needed.Versatile uses: The organizational possibilities are endless. It’s best for organizing small parts and can store DIY materials, crafting supplies, Legos, makeup, medical supplies, and more. ConsDrawers would be bigger: Some shoppers said they wished the drawers were wider to accommodate longer small objects like pencils and Crayons.Limited amount of dividers: While it’s a major bonus that the organizer actually comes with dividers, only a limited number are available. Since it doesn’t come with enough dividers for all of the drawers, you’ll have to buy more if needed.Amazon shoppers praise the organizer for how efficient and easy it is to use, whether it’s in a garage or in a bedroom. One shopper said it’s “perfect for garage organization,” saying it was ready to use out of the box and no assembly was required. They added that it’s lightweight yet sturdy, and great “for storing small parts and miscellaneous items that tend to pile up in a garage.” Another reviewer who used it as a jewelry organizer said they’ve been using it for three years to organize rings and earrings.Shop more dealsIris USA 26-Drawer Organizer, $28 (was $33) at AmazonAkro-Mils 6-Pack of AkroBins Plastic Storage Bins, $15 (was $20) at AmazonCraftsman 9-Drawer Organizer, $20 at AmazonThe Akro-Mils 26-Drawer Organizer offers just the right amount of storage for small pieces, making finding what you need significantly easier and more convenient. It’s on sale for just $31 for a limited time during Amazon’s Big Spring Sale.
TSA chaos unexpectedly gives one company a huge lift
Travel disruptions tied to the partial government shutdown and funding shortfalls at the Transportation Security Administration (TSA) have escalated just as spring break travel ramps up.The result has been widespread delays at airports across the U.S.Airport wait times have reached the highest levels in the agency’s history, with some lines exceeding four hours, according to testimony from TSA Deputy Administrator Ha Nguyen McNeill before Congress on March 25. Even the roughly 20 million Americans the TSA says have PreCheck credentials have not been immune, TSA data show. In recent weeks, airlines and airports have advised passengers to show up at least two or three hours before a flight.A publicly traded company offering expedited airport screening is benefiting from the partial government shutdown, as TSA staffing shortages have contributed to hourslong lines at airports across the U.S.Shares of Clear Secure rose 11% earlier this week and are up roughly 60% over the past month, according to reporting by The Wall Street Journal.Even though early Friday, March 27, the U.S. Senate agreed to fund the Department of Homeland Security, which in turn will bring TSA workers back to work, there will likely be a lag as things get back to normal. People with Clear access are likely to continue seeing shorter checkpoint wait times.What is Clear expedited airport screening?Clear speeds up the airport security process by using biometric identifiers, including fingerprints or iris scans, instead of traditional ID checks. After enrolling, travelers can use dedicated Clear lanes at many major airports (it’s not typically found at small regional airports), where their identity is verified instantly at a kiosk and they are escorted to the front of the security line.Unlike TSA PreCheck, which speeds up the physical screening process (i.e., allowing people to keep shoes and belts on and leaving laptops and tablets in bags), Clear lets travelers skip the ID check portion. Clear typically costs about $189 per year, making it especially useful for frequent flyers or people who travel during busy times of year.Related: What flyers need to know about huge TSA airport delaysWhile Clear has benefited from the most recent government shutdown, the company’s reaction has been muted. “We hope a resolution comes soon,” Clear’s Executive Vice President of Aviation Kyle McLaughlin said in a statement to the Wall Street Journal. “We are working hard to support all our stakeholders, including airlines, airports, the TSA, and most importantly, American travelers who deserve better.”The company also reportedly donated $200,000 worth of gas and grocery cards to TSA workers. Clear’s business modelClear was founded in 2003, filed for bankruptcy in 2009, and was relaunched in 2012 under new ownership. The revamped company has since grown under the leadership of CEO Caryn Seidman-Becker, who played a key role in rebuilding and expanding the business.Clear operates on a subscription model and also partners with airports, airlines, and credit card companies. Over time, the company has expanded beyond airports into stadiums, event venues, and broader identity verification services. The company went public in 2021 on the New York Stock Exchange under the ticker symbol “YOU.”Clear recent financial growth According to company filings and disclosures:2024: Quarterly revenues are growing about 20-25% year over year in multiple quarters.2025 (full year): Revenue was approximately $900.8 million (+16.9% YoY).2026: Clear stock is up 50% year to date, according to The Wall Street Journal.Membership growthTotal users reached about 38 million (up approx. 31% YoY), but this total includes free users and people who use Clear in stadiums and venues other than airports.Clear has more than 7 million CLEAR+ members who use the service for airline travel, according to SEC filings.”I think it remains to be seen if there’s appetite for paid add-ons, especially since many customers are getting CLEAR+ for free. However, maybe there’s an appeal in the so-called ‘luxury travel’ business for a wealthy enough person to skip the ‘skipping line,’ or have somebody carry your bags for you. We might be surprised by the success of this ‘add-on’ product,” wrote Noah Weidner of TheStreet Pro.
Clear operates on a subscription model and partners with airports, airlines, and credit card companies.Getty Images
Clear is free with some American Express cardsSeveral American Express cards offer statement credits to use for Clear accounts. These include: The Platinum Card from American ExpressThe Business Platinum Card from American ExpressAmerican Express Green CardHilton Honors American ExpressAspire Card Centurion® (Black Card)Clear vs. TSA PreCheckIf you’re a frequent traveler looking for the most efficient way to get through the airport, you might want to enroll in both Clear and TSA PreCheck. Here is how Clear compares to TSA PreCheck. ClearSpeeds up ID check at security using fingerprint or iris scanGets passengers directly to the front of the screening lineCost: Approx. $189/year (some Amex cards cover the fee)Available at about 60+ U.S. airports and some venues including sports stadiumsPaid airport members: Approx. 7 millionTSA PreCheckFollowing the Sept. 11 attacks, TSA security rules tightened significantly, shaping today’s screening process, including rules around what passengers could and could not pack in carry-ons. The rules continue to change often, such as the one involving portable batteries.Still, TSA PreCheck can be a timesaver. Some facts about PreCheck: Typically provides shorter lines at TSA checkpoints Allows passengers to keep shoes and belts on and laptopsin bagsCost: $85 for 5 years (about $17 per year)Related: An unfunded TSA is leaving air travelers stranded in hours-long security lines