AI can process information extremely impressively, but can it truly understand, learn, or think?
 
   
   
  Frischlich, L., Olsson, H., Roy, A. et al. The complexity of misinformation extends beyond virus and warfare analogies. npj Complex 2, 29 (2025). https://doi.org/10.1038/s44260-025-00053-z.
 
   
  Fine-tuning large language models via reinforcement learning is computationally expensive, but researchers found a way to streamline the process.
What’s new: Qinsi Wang and colleagues at UC Berkeley and Duke University developed GAIN-RL, a method that accelerates reinforcement learning fine-tuning by selecting training examples automatically based on the model’s own internal signals, specifically the angles between vector representations of tokens. The code is available on GitHub.
Key insight: The cosine similarity between a model’s vector representations of input tokens governs the magnitude of gradient updates during training. Specifically, the sum of those similarities that enter a model’s classification layer, called the angle concentration, governs the magnitude of gradient updates. Examples with higher angle concentration produce larger gradient updates. The magnitude of a gradient update in turn determines the effectiveness of a given training example: The larger the update, the more the model learns. Prioritizing the most-effective examples before transitioning to less-effective ones enhances training efficiency while adding little preprocessing overhead.
 
  Epilepsy is characterized by recurrent seizures and neurological consequences, which may be associated with impaired myelin and glial integrity, and exacerbated by environmental neurotoxicants. Environmental neurotoxicants, such as Cypermethrin (CPM), may heighten these impairments, worsening seizure outcomes. This study investigates the effects of Cypermethrin (CPM) on Pentylenetetrazole (PTZ)-induced seizures and the Vitamin E (Vit E) and valproate (VAP) co-interventions on myelin and glial integrity.
Histochemical and immunohistochemical analyses for hematoxylin and eosin (H&E), myelin basic protein (MBP), ionized calcium-binding adaptor molecule 1 (IBA1), glial fibrillary acidic protein (GFAP), and oligodendrocyte transcription factor 2 (OLIG-2) were conducted on cerebral white matter and corpus callosum tissues. The density of stained cells and immunoreactivity obtained with ImageJ was subjected to one-way analysis of variance.
Immunohistochemistry revealed that cypermethrin exposure in PTZ-induced seizure rats led to marked neuronal, oligodendroglial, and myelin loss, accompanied by substantial glial activation in both cerebral white matter and corpus callosum. Interventional ingestions of VAP and Vit E, especially when combined, substantially reduced both microglial activation and reactive astrogliosis, thereby consequently preventing oligodendrocyte and neuronal loss, thus preserving both cerebral white matter and callosal myelin.
 
   
   
  The (changing) belief distribution over possible environmental states may be represented in ventromedial prefrontal cortex (vmPFC). Several lines of evidence point to a general function of this brain region in maintaining a compact internal model of the environment (ie state belief) by extracting information across individual experiences to guide goal-directed behavior, such as the value of different choice options (Levy and Glimcher 2012; Averbeck and O’Doherty 2022; Klein-Flügge et al. 2022), cognitive maps (Boorman et al. 2021; Klein-Flügge et al. 2022; Schuck et al. 2016; Wilson et al. 2014), or schemas (Gilboa and Marlatte 2017; Bein and Niv 2025). Studies employing probabilistic learning tasks furthermore show that neural activity in vmPFC also reflects uncertainty about external states, which were linked to adaptive exploration behavior and learning-rate adjustments (Karlsson et al. 2012; McGuire et al. 2014; Starkweather et al. 2018; Domenech et al. 2020; Trudel et al. 2021). Notably, Karlsson et al. (2012) found that trial-to-trial neural population spiking variability in the medial PFC of mice peaked around transitions from exploitation to exploration periods following changes in reward structure when state uncertainty is highest, which may reflect more variable belief states. While ours is the first study to link human brain signal variability to belief precision, a previous study by Grady and Garrett (2018) observed increased BOLD signal variability while subjects performed externally-versus internally-oriented tasks; an effect spanning the vmPFC and other nodes of the canonical default mode network (DMN; Yeo et al. 2011). Since learning an abstract world model reflects a shift towards an internal cognitive mode, we tentatively expected brain signal variability compression over the course of learning to be (partly) expressed in the vmPFC.
We assume that uncertainty-related neural dynamics unfold on a fast temporal scale, as suggested by electrophysiological evidence in human and nonhuman animals (Berkes et al. 2011; Palva et al. 2011; Rouhinen et al. 2013; Honkanen et al. 2015; Orbán et al. 2016; Grundy et al. 2019). However, within-trial dynamics should also affect neural variability across independent learning trials (see Fig. 1). A more variable system should have a higher probability of being in a different state every time it is (sparsely) sampled. Conversely, when a system is in a less stochastic state, the within-trial variance is expected to reduce, yielding less across-trial variance at the same time. This argument aligns with work by Orbán et al. (2016), who showed that a computational model of the sampling account of sensory uncertainty captures empirically observed across-trial variability of neural population responses in primary visual cortex. In the case of human research, this means that neuroimaging methods with slower sampling rates, such as functional MRI (fMRI), may be able to approximate within-trial neural variability from variability observed across trials. Indeed, the majority of previous fMRI studies reporting within-region, within-subject modulation of brain signal variability by task demand have exclusively employed block designs, necessitating that the main source of variability be between-rather than within-trial (Garrett et al. 2013; Grady and Garrett 2014; Garrett et al. 2015; Armbruster-Genç et al. 2016).
In the current study, we acquired fMRI while participants performed a “marble task”. In this task, participants had to learn the probability of drawing a blue marble from an unseen jar (ie urn) based on five samples (ie draws from the urn with replacement). In a Bayesian inference framework, the jar marble ratio can be considered a latent state that participants must infer. We hypothesized that (i) across-trial variability in the BOLD response (SDBOLD) would compress over the sampling period, thus mirroring the reduction in state uncertainty, and that (ii) subjects with greater SDBOLD compression would show smaller estimation errors of the jars’ marble ratios as an index of more efficient belief updating. A secondary aim of the current study was to directly compare the effect of uncertainty on SDBOLD with a more standard General Linear Modeling (GLM) approach, which looks for correlations between average BOLD activity and uncertainty. This links our findings directly to previous investigations of neural uncertainty correlates, which disregarded the magnitude of BOLD variability (Huettel et al. 2005; Grinband et al. 2006; Behrens et al. 2007; Bach et al. 2011; Bach and Dolan 2012; Badre et al. 2012; Vilares et al. 2012; Payzan-LeNestour et al. 2013; McGuire et al. 2014; Michael et al. 2015; Meyniel and Dehaene 2017; Nassar et al. 2019; Meyniel 2020; Tomov et al. 2020; Trudel et al. 2021; Walker et al. 2023). We hypothesized (iii) that SDBOLD would uniquely predict inference accuracy compared to these standard neural uncertainty correlates.
 
  